
Set up RDKit#

Installing RDKit with conda:

$ conda -c rdkit rdkit

Using in a Jupyter Notebook:

import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem

from rdkit.Chem.Draw import IPythonConsole

Basic usage#

Get a RDKit molecule object from SMILES. From the RDKit molecule object we can draw structures, compute fingerprints/properties, etc.

smiles = 'COC(=O)c1c[nH]c2cc(OC(C)C)c(OC(C)C)cc2c1=O'
mol = Chem.MolFromSmiles(smiles)

# <rdkit.Chem.rdchem.Mol object at 0x000001F84A4CEE90>

Reading a list of SMILES:

smiles = [

mols = [Chem.MolFromSmiles(smi) for smi in smiles]

Draw molecules into grid:

from rdkit.Chem import Draw

Draw.MolsToGridImage(mols, molsPerRow=2, subImgSize=(200, 200))

Using PandaTools to allow molecule objects in dataframes:

import pandas as pd
from rdkit.Chem import PandasTools

url = 'https://raw.githubusercontent.com/XinhaoLi74/molds/master/clean_data/ESOL.csv'

df = pd.read_csv(url)

PandasTools.AddMoleculeColumnToFrame(df, smilesCol='smiles')

This adds a column to the dataframe containing a rdchem.Mol object.

To draw the stuctures in a grid:

PandasTools.FrameToGridImage(df.head(8), legendsCol='logSolubility', molsPerRow=4)

To add new columns of properites use Pandas map method.

df['n_Atoms'] = df['ROMol'].map(lambda x: x.GetNumAtoms())

Computing descriptors/fingerprints#

RDKit has a variety of built-in functionality for generating molecular fingerprints/descriptors.

from descriptastorus.descriptors.DescriptorGenerator import MakeGenerator

generator = MakeGenerator(("RDKit2D",))

rdkit2d = [generator.process(x)[1:] for x in df['SMILES']]

rdkit2d_name = []
for name, numpy_type in generator.GetColumns():

rdkit2d_df = pd.DataFrame(rdkit2d, index=df.index, columns=rdkit2d_name[1:])

# (8221, 200)