Supporting code for: Jiménez-Luna et al.'s "Coloring molecules with explainable artificial intelligence for preclinical relevance assessment", available as a preprint in ChemRxiv
The recommended method of usage is via the Anaconda Python distribution. One can use the provided conda environment in the repository (should work for most *nix systems):
conda env create -f environment.ymlTo use the graph neural-network models that were trained for the manuscript (plasma protein binding, Caco-2 passive permeability, hERG & CYP3A4 inhibition), you need to download them from:
wget https://polybox.ethz.ch/index.php/s/dDDMzi3rTbqkWOV/download -O models.tar.gz
tar -xf models.tar.gzThen activate the environment and prepend the folder to your PYTHONPATH environment variable:
conda activate molgrad
export PYTHONPATH=/path_to_repo_root/:$PYTHONPATHAll the training data used in this study can be freely downloaded from:
wget https://polybox.ethz.ch/index.php/s/K0orABbeJmwOUEh/download -O data.tar.gz
tar -xf data.tar.gzIn order to generate explanations for a particular molecule, given a trained model, one only needs to call the main.py script. A CUDA-capable GPU is encouraged, but not required:
python molgrad/main.py -model_path model_weights.pt -smi SMILES -output_f RESULT_DIRFor instance, if we wanted to obtain feature colorings for nicotine for the hERG inhibition pre-trained endpoint, and store it under a home subfolder named results, one would do:
python molgrad/main.py -model_path molgrad/models/hERG_noHs.pt -smi "CN1CCCC1C2=CN=CC=C2" -output_f $HOME/results/This will create a comma-separated file global.csv in that folder, with feature attributions corresponding to global variables (i.e. molecular weight, log P, TPSA, and number of hydrogen donors). Another subfolder svg will be created with the produced feature colorings.
Further parameters (such as feeding an entire .smi) for batch prediction and coloring can be checked via the provided help:
python molgrad/main.py --help
The current framework also provides functionality for model training using custom data with the train_ext.py script. It assumes training data comes in a comma-separated (.csv) file, with one column carrying SMILES and another the target value, whose names need to be specified. For instance:
python molgrad/train_ext.py -data CSV_FILE -smiles_col "SMILES_COL" -target_col "TARGET_COL" -output path_to_weights.ptThe trained model can be then used to color molecules via the main.py routine as described above. Additional training options can be consulted with:
python molgrad/train_ext.py --helpA comma-separated file with examples drawn from the literature to validate this and other XAI approaches can be downloaded from here.
If you use this code (or parts thereof), please use the following BibTeX entry:
@article{jimenez2020color,
author = "Jose Jimenez-Luna and Miha Skalic and Nils Weskamp and Gisbert Schneider",
title = "{Coloring Molecules with Explainable Artificial Intelligence for Preclinical Relevance Assessment}",
year = "2020",
month = "11",
url = "https://chemrxiv.org/articles/preprint/Coloring_Molecules_with_Explainable_Artificial_Intelligence_for_Preclinical_Relevance_Assessment/13252286",
doi = "10.26434/chemrxiv.13252286.v1"
}