RDKit in Jupyter Notebooks
The Jupyter Notebook is an open-source web application that allows the creation and sharing of documents that can contain embedded Python code, and, data visualizations in SVG (Scalable Vector Graphics), and text.
When analyzing data that has an associated molecular structure, it would be useful to visualize the structure within the current Jupyter notebook page rather than finding the record in the dataset and copy/pasting the raw structure format into another application.
RDKit is an open source cheminformatics software toolkit which can be called from Python and includes API’s to generate SVG representations of chemical structures.
Therefore it would seem to a good fit to use RDKit to visualize a structure inline within the Jupyter Notebook.
Install Anaconda
To install RDKit, I am using Anaconda which is a command line installation tool for Python libraries.
Visual Studio 2017 includes many different workloads for different functions. The Data Science workload includes Anaconda and a local instance of the Jupyter notebook which is ideal for analyzing datasets in Python.
For setting RDKit up with Jupyter, we are going to use the version of Anaconda that ships with Visual Studio 2017.5.
From the Visual Studio 2017 installer, install the Data Science Workload, ensuring the Anaconda install option is selected.
Once the Data Science Workload, to open the Anaconda Prompt in the Windows 10 search bar enter “Anaconda” and select the Anaconda Command prompt. If the Anaconda Command prompt is not listed check that you have the Visual Studio Data Science Workload installed).
conda --version
If Anaconda installed correctly the version should be displayed; I am using version 4.5.4.
Install RDKit
To install RDKit at the Anaconda command prompt type the following command:
conda install -c rdkit rdkit
If at this point you receive an “Access Denied” error message close the command prompt and re-open the command prompt as an Administrator.
Create an RDKit environment
Once the RDKit libraries have installed, you need to create an RDKit environment. At the command prompt type the following command
conda create -c rdkit -n my-rdkit-env rdkit
Once the environment creation is complete, we can activate the environment and test the installation.
To activate the rdkit environment enter the following command.
conda activate my-rdkit-env
To test the installation, we are going to enter python commands directly into the command line. To enable the code entry at the command line type.
python
Following the execution of the python command, any subsequent line is interpreted as commands to the python environment.
Type the following commands
>>> from rdkit import rdBase
>>> from rdkit import Chem
>>> m = Chem.MolFromSmiles('Cc1ccccc1')
>>> m
The final line will display the type of the m object.
<rdkit.Chem.rdchem.Mol object at 0x000001C644197530>
Note that the information for the object after the “at” text will be different on your system.
Now we have verified that the RDKit environment has installed correctly let’s register the environment with Jupyter.
Register the RDKit environment with Jupyter
First, deactivate the current rdkit environment created in the previous step with the following command
conda deactivate
Once the rdkit environment has been deactivated the command line is prefixed with the text “(base)”
To register the my-rdkit-env with Jupyter run the following command.
conda install -n my-rdkit-env nb_conda_kernels
Once the environment is registered a new Windows shortcut will be added to the start menu.
Open the RDKit Jupyter Notebook
Open the newly created shortcut to start the Jupyter Notebook, once the Jupyter notebook has opened select the “New” option to create a Python 3 Notebook.
Once the new page has opened type the following code:
from IPython.display import SVG
from rdkit import Chem
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
# Create mol object from smiles string
mol = Chem.MolFromSmiles('c1cccnc1O')
molSize=(450,150)
mc = Chem.Mol(mol.ToBinary())
if not mc.GetNumConformers():
....#Compute 2D coordinates
....rdDepictor.Compute2DCoords(mc)
# init the drawer with the size
drawer = rdMolDraw2D.MolDraw2DSVG(molSize[0],molSize[1])
#draw the molcule
drawer.DrawMolecule(mc)
drawer.FinishDrawing()
# get the SVG string
svg = drawer.GetDrawingText()
# fix the svg string and display it
display(SVG(svg.replace('svg:','')))
The test structure will display on the Jupyter page.
Using the RDKit API, you can now render smiles strings and molfiles on the notebook page; the code can also be wrapped as a function for reuse.
To render a reaction file you can use the following code:
from IPython.display import SVG
from rdkit.Chem import AllChem as Chem
from rdkit.Chem.Draw import rdMolDraw2D
# load the reaction from the rxn file
rxn = Chem.ReactionFromRxnFile('Reaction.rxn')
molSize=(800,300)
# Init the drawer with the size
drawer = rdMolDraw2D.MolDraw2DSVG(molSize[0],molSize[1])
#draw the reaction
drawer.DrawReaction(rxn, False, None, None)
drawer.FinishDrawing()
# get the SVG string
svg = drawer.GetDrawingText()
# fix the svg string and display it
display(SVG(svg.replace('svg:','')))
Unfortunately the reaction rendering is much lower quality than the single structure render (note missing R groups and triple bond spacing)
2 thoughts on “RDKit in Jupyter Notebooks”
Is the sample reaction file named Reaction.rxn available on Github?
hi thanks for your page. the last example throws up
ArgumentError Traceback (most recent call last)
in
8 drawer = rdMolDraw2D.MolDraw2DSVG(molSize[0],molSize[1])
9 #draw the reaction
—> 10 drawer.DrawReaction(rxn, False, None, None)
11 drawer.FinishDrawing()
12 # get the SVG string
ArgumentError: Python argument types in
None.DrawReaction(MolDraw2DSVG, NoneType, bool, NoneType, NoneType)
did not match C++ signature:
DrawReaction(class RDKit::MolDraw2D {lvalue} self, class RDKit::ChemicalReaction rxn, bool highlightByReactant=False, class boost::python::api::object highlightColorsReactants=None, class boost::python::api::object confIds=None)
do you have any idea why?