Molecular Docking in
Structure-based Drug Design
Ivan Tubert-Brohman
Presentation for Cooper Union, 2/22/2017
How do drugs work?
Drugs bind to a biological
target, usually a protein
or nucleic acid.
For example, binding to
an enzyme can disrupt or
modulate a metabolic
process.
Figure from Nature Reviews Microbiology 2003, 2, 704–720
Pharmacodynamics and pharmacokinetics
Pharmacodynamics: effect of the drug on the body; mechanism of action (e.g.,
binding to a biological target)
Pharmacokinetics: “how the body affects the drug”. ADME:
● Absorption
● Distribution
● Metabolism
● Excretion
Structure-based drug design principally focuses on pharmacodynamics, although
some structure-based approaches can also be applied to questions of
pharmacokinetics.
Drug discovery process
www.eupati.eu/non-clinical-studies/discovery-development-medicines/
Cost of drug discovery
Estimates vary widely, but are
on the order of a gigadollar,
taking failures into account.
Tufts Center for the Study of
Drug Development: $2.5 B.
Doctors Without Borders:
$186 M.
www.scientificamerican.com/article/cost-to-develop-new-pharmaceutical-drug-now-exceeds-2-5b/
www.doctorswithoutborders.org/article/rd-cost-estimates-msf-response-tufts-csdd-study-cost-develop-new-drug
Structure-based drug design
Given a target for which the detailed 3D structure is known, how can we design a
ligand to bind to that target?
● Structure determination
● Model preparation
● Virtual screening
● Lead optimization
An alternative is ligand-based drug design, which doesn’t require a receptor
structure:
● Pharmacophore modeling
● Quantitative structure-activity relationships (QSAR)
Protein structure determination
Also NMR (10%), electron microscopy (1%).
www.slideshare.net/anurag_yadav/protein-chemistryiv-anu
Protein Data Bank
www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100
Structure preparation
Crystal structures aren’t perfect:
● Unresolved sidechains
● No hydrogens
○ Need to decide protonation states
○ Tautomers
Ligand structures also need
preparation.
● 2D → 3D
● Protonation, tautomers
J. Comput. Aided. Mol. Des. 2012, 26, 787–799
Molecular docking
Receptor Ligand Complex (pose)
Approximations in “basic docking”
● Rigid receptor: with a structure already close to the binding conformation.
(Like a “lock and key” model, but with a flexible key!)
● No explicit water
● No covalent interaction with receptor
● Binding site is known
● Ligand is a “small molecule”
● Molecular mechanics model
The problem can be described as a global optimization with 3N degrees of
freedom, where N is the number of ligand atoms.
Molecular mechanics
Energy is modeled using
equations from classical
physics, such as Coulomb’s
law, Hooke’s law, and
approximations such as the
Lennard-Jones potential
and Fourier series.
These equations and the
associated set of empirical
parameters constitute a
force field. Example: OPLS.
J. Chem. Theory Comput. 2016, 2, 281–296
Glide funnel
Scoring function Number of poses
~1000 conformers
Confgen
Conformer ensemble
Docking:
1. Site search
2. Orientation unlimited (may be > 106)
“Greedy” score
5000
“Rough” score 3. Discrete refinement
400
4. Optimize torsions
Emodel / GlideScore and orientation
5
5. Post-min.
Emodel / GlideScore
Pose selection 1
J. Med. Chem. 2004, 47, 1739-1749
Conformational search (“confgen”)
1. Scan each rotatable bond and identify local minima.
2. Enumerate combinations except for the last bond of
each chain, producing ensemble of “core
conformers”.
3. Sample terminal rotatable bonds.
4. Flexible rings are sampled using a pre-generated
template library.
Input ligand (9 rotatable bonds) 214 core conformers (1069 total)
Grids
Computing intermolecular
interactions is expensive.
For N ligand atoms and M
receptor atoms, need N×M
distances. (e.g., 100×10 K
= 1 M).
Trick: pre-compute the
potential due to the receptor
in a “grid” of points in space.
During docking, interpolate
between the nearest 8 grid
points. Then we only need
~N distances!
Grid generation is slow but
only done once. Trilinear interpolation
Docking
Systematic, discrete search of six degrees of
freedom: x, y, z, ɸ, θ, ψ.
For each core conformer:
● Identify ligand diameter and center
● Site search: x, y, z (~2000 sites)
● Diameter search: θ, ψ (302 directions)
● Rotate around diameter: ɸ (25 angles)
Up to 15 million possible poses per core
conformer! But early pruning reduces this by
~99.9%.
Once the core conformer is docked, sample
the terminal rotatable bonds.
After rough scoring, keep 5000 poses.
Post-docking steps
● Partial minimization
○ 400 poses, but only torsions
● Resampling
○ Try non-local torsional jumps on the
docked structure
● Full minimization
○ 5 poses, full xyz minimization
● Final scoring
● Pose selection
○ Glide uses a different scoring
functions just for pose selection.
Total CPU time: 15–30 s/lig (including
confgen, docking, and post-docking).
Scoring
Goal: free energy of binding.
Approximation: empirical scoring function.
- Several special reward and penalty
terms, each with an adjustable weight.
J. Med. Chem. 2006, 49, 6177–6196
Virtual screening (VS)
Goal: find actives (“hits”) in a large library.
Procedure: dock each ligand into the
receptor structure, and rank them by docking
score.
Then pick the top X% for further inspection:
- More accurate computational methods
- Expert visual inspection
- Experimental verification
−ΔG
(Any scoring function can be used, not just
docking. There is also ligand-based VS.)
Binding energy Some error added
distribution
Enrichment and ROC
● ROC: “receiver operating characteristic” (term from WWII radar!)
● Sensitivity: true positive rate (fraction of actives found)
● 1 − specificity: false positive rate (fraction of inactives examined)
● AUC: area under curve measures success at classification
Parallelization
Docking for virtual screening is
embarrassingly parallel: ligands can be
split among multiple CPU cores without any
communication between them.
Example: 1 million ligands, 200 CPUs:
Wallclock time = 106 / 200 × 30 s × (1 h /
3600 s) = 41.7 h
Lead optimization
The lead structure is modified to improve potency, selectivity, pharmacokinetic and
toxicological properties.
Design
Measure
Synthesize
Molecular interactions
Free energy perturbation
Two sets of MD simulations doing
“alchemical transformations”
thermodynamic cycle
⇒ relative binding free energies close
to “chemical accuracy” (1.0 kcal/mol).
Thousands of times slower than
docking; best for lead optimization.
Advanced docking
Constraints
Covalent docking
Induced fit docking
Ensemble docking
Water thermodynamics
Macrocycle docking
Peptide docking
Constraints
Sometimes we already have information about the binding mode (e.g., from a
close analog). We can help ensure that docking finds the right pose by applying
constraints:
Hydrogen bond
Positional / NOE
Metal coordination
Core constraints
Torsional constraints
Unconstrained docking With core constraints
Excluded volumes
Induced Fit Docking
J. Med. Chem. 2006, 49, 534-553
Covalent docking
Covalent ligands are tricky because
the ligand becomes, in effect, part of
the receptor.
Regular (non-covalent) docking can
be used for the initial step of a
covalent-docking workflow which also
includes protein refinement.
J. Chem. Inf. Model. 2014, 54, 1932−1940
Ensemble docking
● Dock each ligand into multiple
variants of the receptor
structure and pick the best one.
● Much faster than IFD (typically
use ~4 receptor structures),
but:
○ Ensemble selection
○ Protein reorganization energy
J. Comput. Aided Mol. Des. 2008, 22, 621–627
Water thermodynamics
WaterMap uses an MD simulation to estimate the
locations of water molecules in the binding site and
the free energy of each.
WScore uses that information to account for the
effect of water displacement on ligand binding.
A) WaterMap water ( ) kept.
B) Reward for displacing water in hydrophobic
environment.
C) Penalty for displacing water in hydrophilic
environment without compensating H-bonds.
J. Am. Chem. Soc. 2008, 130, 2817–2831
J. Med. Chem. 2016, 59, 4364–4384
Macrocycle docking
Macrocycles typically have 12 or more ring
atoms and are very flexible (hundreds of ring
conformations!).
Confgen’s ring-templating approach can’t
cover all possible macrocycles.
A workflow is needed which does a much
more intensive conformational search before
docking.
More conformers than usual need to be kept
during docking.
Peptide docking
Oligopeptides (~2-20 residues) are common
ligands, sized between small molecules and
biologics.
● Long, flexible chains.
● Not well suited for core-based confgen.
● Inaccurate orientation is amplified.
Glide peptide docking mode:
● Enhanced sampling; more conformers.
● More orientations (1002 instead of 302).
● Constraints can be used.
● MMGBSA rescoring.
Success rate went from 21% to 58%.
J. Chem. Inf. Model. 2013, 53, 1689−1699.
Summary
Docking can be useful in early
preclinical drug discovery
● Lead identification (virtual
screening)
● Lead optimization
○ Rationalize molecular
interactions
○ Rough prediction of
binding affinity of analogs
○ Precursor to FEP for more
accurate predictions
The Boston Globe, 04/04/2016
See also: Curr. Opin. Struct. Biol. 2017, 43, 38–44