{"id":224,"date":"2018-10-06T20:55:05","date_gmt":"2018-10-06T20:55:05","guid":{"rendered":"https:\/\/leedavies.dev\/?p=224"},"modified":"2018-10-24T01:58:55","modified_gmt":"2018-10-24T01:58:55","slug":"rdkit-in-jupyter-notebooks","status":"publish","type":"post","link":"https:\/\/leedavies.dev\/index.php\/2018\/10\/06\/rdkit-in-jupyter-notebooks\/","title":{"rendered":"RDKit in Jupyter Notebooks"},"content":{"rendered":"<p>The Jupyter Notebook is an open-source web application that allows the creation and sharing of documents that can contain embedded Python code, and, data visualizations in SVG (Scalable Vector Graphics), and text.<\/p>\n<p>When analyzing data that has an associated molecular structure, it would be useful to visualize the structure within the current Jupyter notebook page rather than finding the record in the dataset and copy\/pasting the raw structure format into another application.<br \/>\nRDKit is an open source cheminformatics software toolkit which can be called from Python and includes API&#8217;s to generate SVG representations of chemical structures.<\/p>\n<p>Therefore it would seem to a good fit to use RDKit to visualize a structure inline within the Jupyter Notebook.<\/p>\n<h3>Install Anaconda<\/h3>\n<p>To install RDKit, I am using Anaconda which is a command line installation tool for Python libraries.<br \/>\nVisual Studio 2017 includes many different workloads for different functions. The Data Science workload includes Anaconda and a local instance of the Jupyter notebook which is ideal for analyzing datasets in Python.<br \/>\nFor setting RDKit up with Jupyter, we are going to use the version of Anaconda that ships with Visual Studio 2017.5.<br \/>\nFrom the Visual Studio 2017 installer, install the Data Science Workload, ensuring the Anaconda install option is selected.<br \/>\nOnce the Data Science Workload, to open the Anaconda Prompt in the Windows 10 search bar enter &#8220;Anaconda&#8221; and select the Anaconda Command prompt. If the Anaconda Command prompt is not listed check that you have the Visual Studio Data Science Workload installed).<\/p>\n<p><code>conda --version<\/code><\/p>\n<p>If Anaconda installed correctly the version should be displayed; I am using version 4.5.4.<\/p>\n<h3>Install RDKit<\/h3>\n<p>To install RDKit at the Anaconda command prompt type the following command:<\/p>\n<p><code>conda install -c rdkit rdkit<\/code><\/p>\n<p>If at this point you receive an &#8220;Access Denied&#8221; error message close the command prompt and re-open the command prompt as an Administrator.<\/p>\n<h3>Create an RDKit environment<\/h3>\n<p>Once the RDKit libraries have installed, you need to create an RDKit environment. At the command prompt type the following command<\/p>\n<p><code>conda create -c rdkit -n my-rdkit-env rdkit<\/code><\/p>\n<p>Once the environment creation is complete, we can activate the environment and test the installation.<br \/>\nTo activate the rdkit environment enter the following command.<\/p>\n<p><code>conda activate my-rdkit-env<\/code><\/p>\n<p>To test the installation, we are going to enter python commands directly into the command line. To enable the code entry at the command line type.<br \/>\npython<\/p>\n<p>Following the execution of the python command, any subsequent line is interpreted as commands to the python environment.<\/p>\n<p>Type the following commands<\/p>\n<p><code>&gt;&gt;&gt; from rdkit import rdBase<br \/>\n&gt;&gt;&gt; from rdkit import Chem<br \/>\n&gt;&gt;&gt; m = Chem.MolFromSmiles('Cc1ccccc1')<br \/>\n&gt;&gt;&gt; m<\/code><\/p>\n<p>The final line will display the type of the m object.<\/p>\n<p>&lt;rdkit.Chem.rdchem.Mol object at 0x000001C644197530&gt;<\/p>\n<p>Note that the information for the object after the &#8220;at&#8221; text will be different on your system.<br \/>\nNow we have verified that the RDKit environment has installed correctly let&#8217;s register the environment with Jupyter.<\/p>\n<h3>Register the RDKit environment with Jupyter<\/h3>\n<p>First, deactivate the current rdkit environment created in the previous step with the following command<\/p>\n<p><code>conda deactivate<\/code><\/p>\n<p>Once the rdkit environment has been deactivated the command line is prefixed with the text &#8220;(base)&#8221;<\/p>\n<p>To register the my-rdkit-env with Jupyter run the following command.<\/p>\n<p><code>conda install -n my-rdkit-env nb_conda_kernels<\/code><\/p>\n<p>Once the environment is registered a new Windows shortcut will be added to the start menu.<\/p>\n<figure id=\"attachment_235\" aria-describedby=\"caption-attachment-235\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Updated-Shortcuts-2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-235\" src=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Updated-Shortcuts-2-300x88.png\" alt=\"\" width=\"300\" height=\"88\" srcset=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Updated-Shortcuts-2-300x88.png 300w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Updated-Shortcuts-2.png 308w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-235\" class=\"wp-caption-text\">Jupyter Notebook shortcut.<\/figcaption><\/figure>\n<h3>Open the RDKit Jupyter Notebook<\/h3>\n<p>Open the newly created shortcut to start the Jupyter Notebook, once the Jupyter notebook has opened select the &#8220;New&#8221; option to create a Python 3 Notebook.<\/p>\n<figure id=\"attachment_236\" aria-describedby=\"caption-attachment-236\" style=\"width: 294px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/New-Jupyter-Notebook.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-236\" src=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/New-Jupyter-Notebook.png\" alt=\"\" width=\"294\" height=\"242\" \/><\/a><figcaption id=\"caption-attachment-236\" class=\"wp-caption-text\">Create new Jupyter Notebook.<\/figcaption><\/figure>\n<p>Once the new page has opened type the following code:<\/p>\n<p><code>from IPython.display import SVG<br \/>\nfrom rdkit import Chem<br \/>\nfrom rdkit.Chem import rdDepictor<br \/>\nfrom rdkit.Chem.Draw import rdMolDraw2D<br \/>\n# Create mol object from smiles string<br \/>\nmol = Chem.MolFromSmiles('c1cccnc1O')<br \/>\nmolSize=(450,150)<br \/>\nmc = Chem.Mol(mol.ToBinary())<br \/>\nif not mc.GetNumConformers():<br \/>\n....#Compute 2D coordinates<br \/>\n....rdDepictor.Compute2DCoords(mc)<br \/>\n# init the drawer with the size<br \/>\ndrawer = rdMolDraw2D.MolDraw2DSVG(molSize[0],molSize[1])<br \/>\n#draw the molcule<br \/>\ndrawer.DrawMolecule(mc)<br \/>\ndrawer.FinishDrawing()<br \/>\n# get the SVG string<br \/>\nsvg = drawer.GetDrawingText()<br \/>\n# fix the svg string and display it<br \/>\ndisplay(SVG(svg.replace('svg:','')))<\/code><\/p>\n<p>The test structure will display on the Jupyter page.<\/p>\n<figure id=\"attachment_247\" aria-describedby=\"caption-attachment-247\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Structure-Update-Code.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-247 size-large\" src=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Structure-Update-Code-1024x810.jpg\" alt=\"\" width=\"640\" height=\"506\" srcset=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Structure-Update-Code-1024x810.jpg 1024w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Structure-Update-Code-300x237.jpg 300w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Structure-Update-Code-768x607.jpg 768w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Structure-Update-Code-341x270.jpg 341w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Structure-Update-Code.jpg 1086w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"caption-attachment-247\" class=\"wp-caption-text\">Display Structure script<\/figcaption><\/figure>\n<p>Using the RDKit API, you can now render smiles strings and molfiles on the notebook page; the code can also be wrapped as a function for reuse.<\/p>\n<p>To render a reaction file you can use the following code:<\/p>\n<p><code>from IPython.display import SVG<br \/>\nfrom rdkit.Chem import AllChem as Chem<br \/>\nfrom rdkit.Chem.Draw import rdMolDraw2D<br \/>\n# load the reaction from the rxn file<br \/>\nrxn = Chem.ReactionFromRxnFile('Reaction.rxn')<br \/>\nmolSize=(800,300)<br \/>\n# Init the drawer with the size<br \/>\ndrawer = rdMolDraw2D.MolDraw2DSVG(molSize[0],molSize[1])<br \/>\n#draw the reaction<br \/>\ndrawer.DrawReaction(rxn, False, None, None)<br \/>\ndrawer.FinishDrawing()<br \/>\n# get the SVG string<br \/>\nsvg = drawer.GetDrawingText()<br \/>\n# fix the svg string and display it<br \/>\ndisplay(SVG(svg.replace('svg:','')))<\/code><\/p>\n<figure id=\"attachment_246\" aria-describedby=\"caption-attachment-246\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Reaction-Update-Code.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-246 size-large\" src=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Reaction-Update-Code-1024x661.jpg\" alt=\"\" width=\"640\" height=\"413\" srcset=\"https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Reaction-Update-Code-1024x661.jpg 1024w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Reaction-Update-Code-300x194.jpg 300w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Reaction-Update-Code-768x496.jpg 768w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Reaction-Update-Code-418x270.jpg 418w, https:\/\/leedavies.dev\/wp-content\/uploads\/2018\/10\/Reaction-Update-Code.jpg 1409w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"caption-attachment-246\" class=\"wp-caption-text\">Display Reaction Script<\/figcaption><\/figure>\n<p>Unfortunately the reaction rendering is much lower quality than the single structure render (note missing R groups and triple bond spacing)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Jupyter Notebook is an open-source web application that allows the creation and sharing of documents that can contain embedded Python code, and, data visualizations in SVG (Scalable Vector Graphics), and text. When analyzing data that has an associated molecular structure, it would be useful to visualize the structure within the current Jupyter notebook page rather than finding the record in the dataset and copy\/pasting the raw structure format into another application. RDKit is an open source cheminformatics software toolkit&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/leedavies.dev\/index.php\/2018\/10\/06\/rdkit-in-jupyter-notebooks\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":240,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24,25,4],"tags":[28,26,27],"class_list":["post-224","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","category-rdkit","category-technology","tag-jupyter","tag-python","tag-rdkit"],"_links":{"self":[{"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/posts\/224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/comments?post=224"}],"version-history":[{"count":16,"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/posts\/224\/revisions"}],"predecessor-version":[{"id":248,"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/posts\/224\/revisions\/248"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/media\/240"}],"wp:attachment":[{"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/media?parent=224"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/categories?post=224"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/leedavies.dev\/index.php\/wp-json\/wp\/v2\/tags?post=224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}