QSAR
Qualitative Structure-Activity Relationships
Can one predict activity (or properties in
QSPR) simply on the basis of knowledge of
the structure of the molecule?
In other, words, if one systematically changes
a component, will it have a systematic effect
on the activity?
Choice of Model
Can approach in two directions:
Simple to complex model
Complex to simple model
Simplest Model
Linear relationship between x and y
Y = mx + b
Minimize error by least squares:
(Yi – Y’i)2 = [Yi – (mXi + b)]2
Y’i is predicted value
Least Squares
Correlation coefficient
-1 < r < 1
Another test
Is the line better than the mean?
60
y = 2.9562x - 0.2597
y = 0.0676x - 0.3882 R 2 = 0.8686
R2 = 0.0045
30
0
-15 -10 -5 0 5 10 15 -10 -5 0 5 10 15
-30
-15 -60
A circle 2 lines
100 1000
y = 2.8515x - 31.647 y = 0.0008x + 275.11
2
R = 0.9179 R 2 = 0.978
75 750
50 500
25 250
0 0
10 20 30 40 50 0 200000 400000 600000 800000
One bad point Wrong model
Multiple Regression
Y = f (X1, X2…Xn)
Problems:
Choice of model – linear, polynomial, etc.
Visualization
Interpretation
Computationally demanding
Variable reduction
Principal Component Analysis
Principal Component
PC1 = a1,1x1 + a1,2x2 + … + a1,nxn
PC2 = a2,1x1 + a2,2x2 + … + a2,nxn
Keep only those components that
possess largest variation
PC are orthogonal to each other
Exploring QSAR
Pickup the NONLIN program
http://www.trinity.edu/sbachrac/drugdesign2007/
Unzip and install it on your computer
Read the Read.Me and Nonlin.doc
documentation
Look at the HeatForm.NLR file with any
word processor
Running NONLIN
Start an MSDOS window
Change to directory where the code is
Cd /d d:\nonlin
Execute the program with data file
Nonlin heatForm > output
assignment
Propose a QSAR scheme to predict the
Hf of the alkanes
Early Examples
Hammett (1930s-1940s)
COOH COO + H K0
X COOH X COO + H Kp
COOH COO + H Km
X X
Kp
para = log10
K0
meta = log10 Km
K0
Hammett (cont.)
Now suppose have a related series
CH2COOH CH2COO +H K'x
X X
log10 K'x =
K'0
reflect sensitivity to substituent
reflect sensitivity to different system
Hammett (cont.)
Linear Free Energy Relationship
G = -2.303RTlog10K
So
G – G0 = -2.303RT
and
G’ – G’0 = -2.303RT
Therefore
G’ – G’0 = (G – G0)
Free-Wilson Analysis
Log 1/C = ai +
where C=predicted activity,
ai= contribution per group, and
=activity of reference
Free-Wilson example
Br
X N
activity of analogs
Y HCl
Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br]
+ 0.58 [m-I] + 0.45 [m-Me] + 0.34 [p-F] + 0.77 [p-Cl]
+ 1.02 [p-Br] + 1.43 [p-I] + 1.26 [p-Me] + 7.82
Problems include at least two substituent position
necessary and only predict new combinations of the
substituents used in the analysis.
Hansch Analysis
Log 1/C = a + b + c
where
x) = log PRX – log PRH
and log P is the water/octanol partition
This is also a linear free energy relation
Molecular Descriptors
Simple rules for describing some aspect of a molecule
Structure
Property
2D descriptors only use the atoms and connection
information of the molecule
Internal 3D descriptors use 3D coordinate information
about each molecule; however, they are invariant to
rotations and translations of the conformation
External 3D descriptors also use 3D coordinate
information but also require an absolute frame of
reference (e.g., molecules docked into the same
receptor).
Descriptor examples
Physical Properties
MW
log P (ocanol/water partition)
bp, mp
Dipole moment
solubility
Descriptor examples
Structural descriptors
2D
Atom/Bond counts
Number non-H atoms
Number of rotatable bonds
Number of each functional group
2C chains, 3C chains, 4C chains, 5C chains, etc.
Rings and their size
3D
Number of accessible conformations
Surface area
Topological Descriptors
Weiner Path Index
Distance Matrix
6
0123423
4 1012312
2 2101221
3 5 3210132
1
7 1234043
2123403
3212330
w = dij w = 46
i j>i
Topological Descriptors
Randic Index
1
valence 2
3
at vertex
1 3 1
bond values 3
as product 3 9 2
of above 6
3
edge term .577
as reciprocal of .333
.577 .707
square rooot of
.408
above bond values
.577
Sum of
edge terms 3.179
Predict bp of alkanes
100
y = 1.5225x + 7.2917
R2 = 0.9547
90
80
bp
70
60
50
30 35 40 45 50 55 60 65
Weiner Index
3D Molecular Descriptors
Potential energy
Solvation energy
Water accessible surface area
Water accessible surface area of all
atoms with positive (negative) partial
charge
Pharmacophore
Specification of the spatial arrangement
of a small number of atoms or
functional groups
With the model in hand, search
databases for molecules that fit this
spatial environment
Creating a Pharmacophore
O O
O O
OH
OH
3D Pharmacophore searching
With the pharmacophore in hand,
search databases containing 3-D
structure of molecules for molecules
that fit
Can rank these “hits” using scoring
system described later
Pharmacophore Descriptors
Number of acidic atoms
Number of basic atoms
Number of hydrogen bond donor atoms
Number of hydrophobic atoms
Sum of VDW surface areas of hydrophobic atoms
Lipinski’s Rule of 5
potential drug candidates should
Have 5 or fewer H-bond donors (expressed as the
sum of OHs and NHs)
Have a MW <500
LogP less than 5
Have 10 or less H-bond acceptors (expressed as
the sum of Ns and Os)
Adv. Drug Delivery Rev., 1997, 23, 3
Docking
Interact a ligand with a receptor
Need to do the following
A) select appropriate ligands
B) select appropriate conformation of receptor
C) select appropriate conformations of ligands
D) combine the ligand and receptor (docking)
E) evaluate these combinations and rank order
them
Selection of Ligands
Want drug-like molecules
250< MW < 500
Lipinski’s rules
Search through databases
Available Chemicals Directory (ACD)
World Drug Index
NCI Drug database
In-house databases
Receptor Conformation
Usually Receptor is assumed to be static
Get structure from X-ray or NMR
experiment
Protein Data Bank (http://www.rcsb.org/pdb/)
41385 Structures
Ligand Conformation
Rigid or flexible
If rigid, optimize the structure then use it
throughout the docking procedure
If flexible, can
A) create a set of low energy conformations and
then use this set as a collection of rigid structures
in docking
B) optimize structure within active site of receptor,
i.e. dock and optimize together
Docking
Place ligand in appropriate location for
interacting with the receptor
Methodological problem:
1) No best method for defining shape
2) No general solution for packing irregular
objects (the knapsack problem)
Docking Algorithmic
Components
Receptor and Ligand Description (keep in mind
relative errors of structures, etc.)
Bind the Ligand to Receptor
(configuration/conformation search)
Geometric search (match ligand and receptor site
descriptions)
Search for minimum energy - molecular dynamics
(MD) or monte carlo (MC)
Evaluation of the dock (Gbind) also called
scoring
Descriptor Matching Method
DOCK program
1) Generate molecular surface for receptor
2) Generate spheres to fill the active site
(usually 30-50 spheres)
3) Match sphere centers to the ligand atoms
(originally just lowest E conformer, now use multiple
conformers, but still rigid) – generates 10K orientations per
ligand – Shape-driven!
4) Score the interaction
Fragment-Joining Method
FlexX, LUDI
Place base fragments into microstates
of the active site (Fragments can be small
molecules like benzene, formaldehyde,
formamide, naphthol, etc.)
Optimize position of the Base fragment
Join fragments with small connecting
chains made of CH2, CO, CONH, etc.
Scoring (evaluation of the dock)
Want to quickly evaluate the strength of
the interaction between ligand and
receptor
Full free energy computation
Expensive
Requires excellent force fields
Empirical method
Fast and cheap
Requires fitting to a broad set of ligand/receptor
complexes
Empirical Scoring
Method of Bohm (LUDI, FlexX, etc.)
Gbind = G0 + h-bonds Ghb f(R,) + ion Gion f(R,)
+ Glipo Alipo + Grot NROT
G0 reduction in binding energy due to loss of
rotation and translation of ligand
Ghb contribution from ideal hydrogen bond
Gion contribution from ionic interactions
Glipo contribution from lipophilic interactions
Grot contribution from freezing rotations within ligand
Bohm Method (cont.)
f(R,) are penalty functions for non-ideal
interactions – distances too short/long, angles
not linear
f (R,) = f1(R)f2()
f1(R) = 1, R<0.2 Å f2() = 1, <30°
1-(R-0.2)/0.4, R<0.6 Å 1-(-30)/50, <80°
0, R>0.6 Å 0, >80°
R is deviation from ideal H...O/N distance of 1.9 Å
is deviation from ideal N/O-H…O/N angle of 180°
Bohm Method (cont.)
Alipo is the lipophilic contact surface,
evaluated by a coarse grid of boxes
NROT is the number of rotatable bonds
– acyclic sp3-sp3, sp3-sp2 and sp2-sp2. No
terminal groups or flexibility of rings
incorporated.
H.-J. Bohm, J. Comput.-Aided Mol. Des., 1994, 8, 243-256
Scoring alternatives
Many variations on Bohm scheme
Buried Polar term, desolvation term, different
forms for the lipophilic term, include metal
bonding, etc.
Combine scoring functions, i.e. QSAR with
scoring functions as variables
Use empirical score to select set of hits, then
refine with free energy minimization