0% found this document useful (0 votes)
43 views9 pages

Molecular Modelling: Lecture 2: Geometry Optimization and Brief Repetition of Statistical Thermodynamics

The document discusses molecular modeling and geometry optimization methods. It begins with an overview of geometry optimization, which finds the lowest energy configuration. It then reviews several optimization methods including derivative-free methods like grid search and simulated annealing as well as first-order methods using gradients and second-order methods using the Hessian matrix. Newton-Raphson optimization uses the Hessian to iteratively find the minimum energy structure.

Uploaded by

Dushyant Patel
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views9 pages

Molecular Modelling: Lecture 2: Geometry Optimization and Brief Repetition of Statistical Thermodynamics

The document discusses molecular modeling and geometry optimization methods. It begins with an overview of geometry optimization, which finds the lowest energy configuration. It then reviews several optimization methods including derivative-free methods like grid search and simulated annealing as well as first-order methods using gradients and second-order methods using the Hessian matrix. Newton-Raphson optimization uses the Hessian to iteratively find the minimum energy structure.

Uploaded by

Dushyant Patel
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Molecular Modelling

Lecture 2: Geometry optimization and brief repetition of Statistical Thermodynamics


Herma Cuppen
3rd April 2013
Geometry Optimization
A good starting point for each modeling eort is to do a geometry optimization, which is
a search for the conguration that belongs to the local minimum on the potential energy
surface. A comparison between the optimized geometry and the experimental structure
or the structure that is obtained by theoretical methods of higher accuracy (which are
computationally more expensive) is a good rst measure of the accuracy of the force
eld that you are using. Moreover, calculations of properties using a structure that is not
in a minimum according to the potential energy surface that is used to calculate that
properties can lead to unreliable results. One can therefore not do a geometry
optimization with a high level of theory and then use a more approximate method to
calculate properties.
In this lecture we will discuss several optimization methods.
Remember stationary points: Stationary points are dened by
E
q
1
=
E
q
2
= =
E
q
n
= 0.
q
x
is the coordinate system we use. This can be a Cartesian coordinate system, in
internal coordinates (angles, bond distances), or some other coordinate system that is
more obvious for the system investigation. In the latter two cases, each of these
coordinates q has its own reduced mass. The stationary points are minima, maxima and
saddle points. Minima and maxima can be distinguished by their second derivatives. The
easiest way to do this is to construct a Hessian second derivative matrix
H =
_

2
E
q
2
1

2
E
q
1
q
2


2
E
q
1
q
n

2
E
q
2
q
1

2
E
q
2
2


2
E
q
2
q
n
.
.
.
.
.
.
.
.
.
.
.
.

2
E
q
n
q
1

2
E
q
n
q
2


2
E
q
2
n
_

_
and solve the following eigenvalue problem
H = PkP
1
.
The P matrix is the eigenvector matrix whose columns are direction vectors for the
vibrations whose force constants are given by the k eigenvalue matrix. For a minimum in
the PES,
k
i
> 0
for all k.
For a maximum in the PES
k
i
< 0
for all k
i
.
Geometry Optimization
All other stationary points are saddle points, where the number of negative eigenvalues
or frequencies indicate the order of the saddle point. In chemistry we are generally
interested in rst order saddle points. But more about this in the lecture about rates and
reaction paths.
Minimization Methods
A geometry optimization tries to nd a (local) minimum on the PES. One can generally
distinguish three types of methods to do this

methods that use no derivatives (only function values, such as the energy)

methods that use rst derivatives (slope, or force)

methods that use second derivatives (curvature, or Hessian)


Derivative-free methods
In general, methods that use no derivatives spend the least amount of time at each point
but require the most steps to reach the minimum. Methods such as the simplex
minimization, simulated annealing, Markov-chain Monte Carlo and optimization by
genetic algorithms fall into this category. Two methods will be discussed here.
Markov-chain Monte Carlo will be introduced later in this course.
Multivariate Grid Search
1. Choose a suitable grid for the variables.
2. Choose a starting point A on the grid.
3. For each variable q
1
, q
2
, ..., q
p
evaluate the molecular potential energy U at the two
points surrounding A (as determined by the grid size).
4. Select the new point for which U is a minimum, and repeat steps 3 and 4 until the
local minimum is identied.
Simulated Annealing

It mimics the process undergone by misplaced atoms in a metal when its heated and
then slowly cooled

Good in nding a local minimum for a system with many free parameters

Iterative process; a trial move is excepted according to exp(U/kT) > R(0, 1)


with U the dierence in energy between the trial move and the current state, T a
synthetic temperature, and R(0, 1) a random value in the interval [0,1].

The synthetic temperature goes down during the optimization. The cooling
scheme and cooling rate have an important inuence on the outcome and should be
tailored for each application. Adaptive simulated annealing algorithms exists that
address this problem by connecting the cooling schedule to the search progress.

Simulated annealing uses a Boltzmann factor and is in a sense a special case of a


Monte Carlo simulation. Monte Carlo simulations will be discussed at a later stage.
In the initial part it can eciently move out of a bad geometry. When lowering the
temperature, it gradually moves to the local minimum.
Geometry Optimization
First-order methods
Methods that use only the rst derivatives are sometimes used in computational
chemistry, especially for preliminary minimization of very large systems. The rst
derivative tells the downhill direction and also suggests how large the steps when
stepping down the hill (large steps on a steep slope, small steps on atter areas that are
hopefully near the minimum). Methods such as steepest descent and a variety of
conjugate gradient minimization algorithms belong to this group.
Steepest descent
This rst-order derivative scheme for locating minima on molecular potential energy
surfaces can be summarized as follows.
1. Calculate U for the initial structure.
2. Calculate U for structures where each atom is moved along the x, y and z-axes by a
small increment. Movement of some atoms will lead to a small change in U, whilst
movements of other atoms will lead to a large change in U. (The important quantity
is clearly the gradient.)
3. Move the atoms to new positions such that the energy U decreases by the maximum
possible amount (q
(k)
= q
(k1)
qg
(k1)
). k indicates the iteration number and g
is the gradient vector.
4. Repeat the relevant steps above until a local minimum is found.

The step length q is an input parameter. This


can be set a xed value or at each step one can
determine the optimum value by performing a
line search along the step direction to nd the
minimum function value along this line.

The step direction is always orthogonal to the


previous step direction, which makes the stepping
algorithm rather inecient. (many steps needed)
Conjugate gradients
Starting from point q
(k)
(where k is the iteration count), we move in a direction given by
the vector
q
(k)
= q
(k1)
+ qV
(k)
where
V
(k)
= g
(k)
+
(k)
V
(k1)
and g
(k)
is the gradient vector at point q
(k)
and
(k)
is a scalar given by

(k)
=
(g
(k)
)
T
g
(k)
(g
(k1)
)
T
g
(k1)
or
Geometry Optimization

(k)
=
(g
(k)
g
(k1)
)
T
g
(k)
(g
(k1)
)
T
g
(k1)
T denotes the transpose of a matrix. Which expression for is superior depends on the
functional form of the surface one wants to optimize.

Again a line search can be applied to determine q.

The step direction is not always orthogonal to the previous direction which results in
a faster than steepest decent.
Second-order methods
Methods that use both the rst derivative and the second derivative can reach the
minimum in the least number of steps because the curvature gives an estimation of
where the minimum is. The simplest method in this category is a method based on the
Newton-Raphson method for roots nding (points where function values are zero).
Newton-Raphson for roots nding
1. Start in x
(1)
and determine f
(1)
(function evaluation)
and g
(1)
(gradient in point x
(1)
).
2. The function can be linearly approximated by
y = g
(1)
x +f
(1)
g
(1)
x
(1)
3. This has a root at x
(2)
= x
(1)

f
(1)
g
(1)
4. Repeat steps 1-3
The starting point is crucial, for y = x exp(x
2
) (Fig. 5.7) |x| < 0.5 nds root x = 0
and |x| > 0.5 nd the roots at innity.
We can now apply this method to nd stationary points by searching for roots of the
gradient. We use a linear function for the approximation of the gradient which results in
a quadratic function for the approximation of the function.
g
(k)
(x) H
(k)
(x x
(k)
) +g
(k)
= 0
x
(k+1)
= x
(k)

g
(k)
H
(k)
with H
(k)
the Hessian, second derivative, at the current point x
(1)
.
In the case of multiple variables:
x
(k+1)
= x
(k)
(H
(k)
)
1
g
(k)
Geometry Optimization
Hessian
For geometry optimization of a single molecule virtually all software packages use a
method that needs a hessian and/or its inverse. If the functional form of the PES is
known, for instance in the case of force eld where only the force constant change, one
could derive an analytical expression for the gradient and the hessian. However, in most
cases the functional form is not known or too system specic to use in a general
modeling program, and one has to determine the gradient and hessian numerically.
Analytically dierentiation is easier than integration, numerically it is the reverse to get
accurate values. And minimizers need accurate gradients to converge correctly, they
fortunately do not need precise Hessians and often approximates are applied.
Quasi-Newton methods start with an approximate (inverse) Hessian and update this
every iteration. There are several routines for this. One is the Davidon-Fletcher-Powell
formula which update the inverse hessian B
k
= H
1
k
in the following way:
y
(k)
= g
(k+1)
g
(k)
B
k+1
= B
k

B
k
y
k
y
T
k
B
k
y
T
k
B
k
y
k
+
x
k
(x
k
)
T
y
T
k
x
k
The initial estimated hessian can be the unit matrix or one can use internal coordinates.
The individual diagonal elements of the Hessian can be identied as bond-stretching and
bond-bending force constants, etc. The Hessian should also have the correct number of
degrees of freedom. A nonlinear molecule of N atoms can be described by 3N Cartesian
coordinates. There are however only p = 3N 6 vibrational degrees of freedom, that
can be varied during a geometry optimization. The six coordinates that one looses
correspond to rotation and translation of the entire molecule and are therefore
degenerate. These should not be used in a geometry optimization. For this reason often
a Z-matrix which denes the molecule in terms of internal coordinates, is used as input
and indicating symmetry is also more straightforward in this way. However, Cartesian
coordinates can be preferred and are often just easier to use, but as mentioned they can
lead to too many degrees of freedom. How to circumvent this?
q = AX (q in internal and X in Cartesian coordinates)
A
T
U = (U)
C
(q is rectangular)
U = G
1
Au(U)
C
with G = AuA
T
Statistical thermodynamics
Geometry optimization results in the optimum structure at T = 0K. Properties that are
calculated on the basis of this structure are a good rst starting point, but since we work
experimentally mostly at T > 0K, they can be quite far o. At T > 0K not only the
system is not only its most optimum structure but also other states are accessible, which
can have dierent properties which leads to a dierent average macroscopic
(measurable) property. To determine this average property is the area of statistical
thermodynamics. Statistical thermodynamics predicts the probability that a molecule is
in a certain energy state (ground state vs. excited state)
The probability that a molecule is in a state i is
P
i
=
exp(
i
/kT)

j
exp(
j
/kT)
with discrete energy levels and the molecular partition function
q =

j
exp(
j
/kT).
If
0
= 0 then the partition function gives the number of accessible energy levels.
In molecular mechanics one usually has a collection of interacting molecules and the
mutual potential energy of the system is considered instead of the internal energy of the
molecule. We now move from the microcanonical to the canonical ensemble with
P
i
=
exp(U
i
/kT)

j
exp(U
j
/kT)
and
Q =

j
exp(U
j
/kT).
To calculate a property of a system:
< A >=

A
i
exp(U
i
/kT)
Q
So to get the average mutual potential energy
< U >=

U
i
exp(U
i
/kT)
Q
We can achieve this by considering a large number of cells with the same thermodynamic
properties (N, V, T, E, p and/or etc.) but dierent molecular arrangements. If N, V
and T is kept constant between cells we speak about the canonical ensemble
P
i
=
_
exp(U
i
/kT)
Q
_
N,V,T
with Q the canonical partition function.
Statistical thermodynamics
Ensembles
canonical ensemble with constant N, V, T
microcanonical ensemble with constant N, E, V (no energy ow between cells)
isothermal-isobaric ensemble with constant N, p, T
grand canonical ensemble with constant V, T, (chemical potential)
In MM, energy is a function of the coordinates of the atoms (parameter space) and all
points in parameter space contribute to the partition function. For this reason often in
instead of a sum an integration over parameter space is used:
Q =
1
N!
_
exp
_

U(r
N
)
kt
_
dr
N
The factor 1/N! is for indistinguishable particles; for particles that can be distinguished
this term drops from the equation. The average mutual potential energy can be
determined by solving:
< U >=
_
U(r
N
) exp
_

U(r
N
)
kT
_
dr
N
Q
Remember the angle dependent dipole-dipole mutual potential energy
(4
0
)U
AB
=
p
A
p
B
R
3
(2 cos
A
cos
B
sin
A
sin
B
cos )
We would like to determine the average potential energy. Lets assume a parallel
conguration:
A
=
B
and = 0.
U
AB
=
p
A
p
B
4
0
R
3
_
1 3 cos
2

_
Statistical Thermodynamics
Due to thermal uctuations can change:
< U
AB
>
dip...dip
=
_
U
AB
exp(U
AB
/kT)d
_
exp(U
AB
/kT)d
=
_

d
_

=0
U
AB
exp(U
AB
/kT) sin d
_

d
_

=0
exp(U
AB
/kT) sin d
Lets assume U
AB
<< kT then exp(U
AB
/kT) 1 U
AB
/kT and use
U
AB
= C
_
1 3 cos
2

_
< U
AB
>
dip...dip

=0
C
_
1 3 cos
2

_
(1 C
_
1 3 cos
2

_
/kT) sin d
_

=0
(1 C (1 3 cos
2
) /kT) sin d
Substitute x = cos , which implies that dx = sin d
< U
AB
>
dip...dip

_
1
x=1
C
_
1 3x
2
_
(1 C
_
1 3x
2
_
/kT)dx
_
1
x=1
(1 C (1 3x
2
) /kT)dx
=
4C
2
5kT
=
4p
2
A
p
2
B
5kT(4
0
)
2
1
R
6
if all angles are taken into account
< U
AB
>
dip...dip
=
2p
2
A
p
2
B
3kT(4
0
)
2
1
R
6
=
C
R
6
(Keesom)
For the dipole-dipole mutual potential energy the integrals in
< U >=
_
U(r
N
) exp
_

U(r
N
)
kT
_
dr
N
Q
could be solved analytically using some approximations. This is however often not the
case and the integrals need to be solved numerically. This problem together with the
arrival of the rst computers lead to the development of the Monte Carlo technique,
which will be discussed in the next lecture.

You might also like