C.K.
TEDAM UNIVERSITY OF TECHNOLOGY AND APPLIED SCIENCES
SCHOOL OF CHEMICAL AND BIOCHEMICAL SCIENCES
DEPARTMENT OF APPLIED CHEMISTRY
CHM 247: COMPUTATIONAL CHEMISTRY
Edward K. Armah, Dr-ing
“Anyone can do calculations nowadays. Anyone can also operate a scalpel. That doesn’t mean all our medical
problems are solved”
Karl Irikura
FUNDAMENTAL PRINCIPLES
THERMODYNAMICS VERSUS COMPUTATIONAL CHEMISTRY
This is one of the most well-developed mathematical descriptions of Chemistry. It is the field of
thermodynamics that defines many of the concept of energy, free energy and entropy.
Thermodynamics is no longer a subject for an extensive amount of research. The reasons for this are
two-folds:
(a) The completeness of existing or previous work
(b) The general inability to provide detailed insight into chemical processes.
Very often, any thermodynamic treatment is left for trivial pen-and-paper work since many aspects of
Chemistry are so accurately described with very simple mathematical expressions. Computational
results can be related to thermodynamics. The result of computations might be internal energies, free
energies, and so on, depending on the computations done. Likewise, it is possible to compute various
contributions to the entropy. One frustration is that computational software does not always make it
obvious which energy is being listed due to the differences in terminology between computational
Chemistry and thermodynamics.
Ab initio METHODS
The term ab initio is Latin for “from the beginning.” The name is given to computations that are derived
directly from theoretical principles with no inclusion of experimental data. This is an approximate
quantum mechanical calculation. The approximations made are usually mathematical approximations,
such as using a simpler functional form for a function or finding an approximate solution to a
differential equation. There are four (4) sources of errors in ab initio calculations. These includes the
following;
1. The Born-Oppenheimer approximation
2. The use of an incomplete basis set
3. Incomplete correlation
4. The omission of relativistic effects
Below are some examples of the ab initio calculations:
(a) Hartree-fock approximation: This is the most common type of the ab initio calculation. It is
abbreviated as HF, in which the primary approximation is the central field approximation. This
means that the Coulombic electron-electron repulsion is considered by integrating the
repulsion term. This gives the average effect of the repulsion, but not the explicit repulsion
interaction. This is a variational calculation, meaning that the approximate energies calculated
are all equal to or greater than the exact energy. The energies are calculated in units called
Hartrees (1 Hartree = 27.2116 eV). Because of the central field approximation, the energies
from HF calculations are always greater than the exact energy and tend to a limiting value
called the Hartree-Fock limit as the basis set is improved. One of the advantages of this method
is that it breaks the many-electron Schrödinger equation into many simpler one-electron
equations. Each one-electron equation is solved to yield a single-electron wave function, called
an orbital, and an energy, called an orbital energy. The orbital describes the behavior of an
electron in the net field of all the other electrons. The Gaussian functions are multiplied by an
angular function in order to give the orbital the symmetry of a s, p, d, and so on. A constant
angular term yields s symmetry. Angular terms of x, y, and z give p symmetry. Angular terms
of xy, xz, yz, x2-y2, 4z2-2x2-2y2 yield d symmetry. Note that this pattern can be continued for
the other orbitals. These orbitals are then combined into a determinant. This is done to satisfy
two (2) requirements of quantum mechanics. These are;
1. The electrons must be in-distinguishable.
2. The wave function for fermions (an electron is a fermion) must be antisymmetric with respect
to interchanging two particles.
There are two techniques for constructing HF wave functions of molecules with unpaired
electrons. One technique is to use two completely separate sets of orbitals for the α and β electrons.
This is called an unrestricted Hartree-fock wave function (UHF). This means that paired electrons will
not have the same spatial distribution. Another way is the restricted open shell Hartree-fock method
(OHF).
(b) Semi-empirical Methods: Usually, the core electrons are not included in the calculation and
only a minimal basis set is used. Also, some of the two-electron integrals are omitted. To correct
for the errors introduced by omitting part of the calculation, the method is parameterized. Also,
semi-empirical methods are parameterized to reproduce various results and most often,
geometry and energy (usually the heat of formation) are used. The advantage of the semi-
empirical calculation is that they are much faster than the ab initio calculations. The
disadvantage of the semi-empirical calculation is that the results can be erratic and fewer
properties can be predicted reliably. Some commonly used semi-empirical methods are Hůckel,
PPP (Pariser-Parr-Pople), CNDO (Complete Neglect of Differential Overlap), MINDO
(Modified Intermediate Neglect of Differential Overlap), MNDO (Modified Neglect of
Diatomic Overlap), AM1 (Austin Model 1), etc. The MNDO is still used but the more accurate
AM1 and PM3 methods have surpassed it in popularity. In the case of the density functional
theory (DFT), the energy of a molecule can be determined from the electron density instead of
a wave function. The original theorem applied only to finding the ground-state electronic
energy of a molecule. A density functional is then used to obtain the energy for the electron
density.
MOLECULAR MECHANICS
The most severe limitation of ab initio methods is the limited size of the molecule that can be modeled
on even the largest computers. Semi-empirical calculations can be used for large organic molecules, but
are also too computation- intensive for most biomolecular systems. If a molecule is so big that a semi-
empirical treatment cannot be used effectively, it is still possible to model its behavior avoiding
quantum mechanics totally by using molecular mechanics.
BASIC THEORY
The molecular mechanics energy expression consists off a simple algebraic equation for the energy of
a compound (it does not use a wave function or total electron density). The constants in this equation
are obtained either from spectroscopic data or ab initio calculations. A set of equations with their
associated constants is called a force field. The fundamental assumption of the molecular mechanics
method is the transferability of parameters (in other words, the energy penalty associated with a
particular molecular motion, say, the stretching of a carbon- carbon single bond, will be the same from
one molecule to the next). This gives a very simple calculation that can be applied to very large
molecular systems. The performance of this technique is dependent on four factors:
1. The fundamental form of the energy expression
2. The data used to parameterize the constants
3. The technique used to optimize constants from the data
4. The ability of the user to apply the technique in a way consistent with its strengths and
weakness
The energy expression consists of the sum of simple classical equations. These equations describe
various aspects of the molecule, such as bond stretching, bond bending, torsions, electrostatic
interactions, van der Waals forces, and hydrogen bonding (Force fields differ in the number of terms
in the energy expression, the complexity of those terms, and the way in which the constants were
obtained). Since electrons are not explicitly included, electronic processes cannot be modeled. Forces
fields may or may not include an electrostatic term. The electrostatic term most often used is the
Coulombs law term for the energy of attraction or repulsion between charged centers. These charges
are usually obtained from non- orbital- based algorithms designed for use with molecular mechanics.
These charges are meant to be the partial charges on the nuclei. The modeling of molecules with a net
charge is described best by using atom types parameterized for describing charged centers. A dielectric
constant is sometimes included to model solvation effects. Bond stretching is most often described by
a harmonic oscillator equation. Some force fields simplify the complexity of the calculations by
omitting most of the hydrogen atoms. Molecular mechanics methods are not generally applicable to
structures very far from equilibrium, such as transition structures.
Table 1: Some common force field terms and their usage
NAME USE
Harmonic Bond stretch
Harmonic Angle bend
Cosine Torsion
Leonard- Jones 6-12 Van der Waals
Leonard- Jones 10- 12 Van der Waals
Coulomb Electrostatic
Taylor Stretch- bend
Morse Bond stretch
EXISTING FORCE FIELDS
Most researchers do not parameterize force fields because many good force fields have already been
developed. On rare occasions, a researcher will add an additional atom as described in chapter 29. The
following are some commonly used molecular mechanics force fields. Many of these have been
implemented in more than one software package. There tend to be minor differences in the
implementation leading to small differences in results from one software package to another.
AMBER
Assisted model building with energy refinement (AMBER) is the name of both a force field and a
molecular mechanics program. It was parameterized specifically for proteins and nucleic acids. AMBER
uses only five bonding and non- bonding terms along with a sophisticated electrostatic treatment. No
cross terms are included. Results are very good for proteins and nucleic acids, but can be somewhat
erratic for other systems.
CHARMM
Chemistry at Harvard macromolecular mechanics (CHARMM) is the name of both a force field and a
program incorporating that force field. The academic version of this program is designated CHARMM
and the commercial version is called CHARMm. It was originally devised for proteins and nucleic acids.
It has now been applied to a range of biomolecules, molecular dynamics, solvation, crystal packing,
vibrational analysis, and QM/MM studies. CHARMM uses five valence terms, one of which is an
electrostatic term.
CFF
The consistent force field (CFF) was developed to yield consistent accuracy of results for conformations,
vibrational spectra, strain energy, and vibrational enthalpy of proteins. There are several vibrations on
this, such as the UreBradley version (UBCFF), a valence version (CVCFF), and Lynghy CFF. The
quantum mechanically parameterized force field (QMCFF) was parameterized from ab initio results.
CFF93 is a rescaling of QMFF to reproduce experimental results. These force fields use five to six
valence terms, one of which is an electrostatic term, and four to six cross terms.
CHEAT
Carbohydrate hydroxyls represented by external atoms (CHEAT) is a force field designed specifically
for modelling carbohydrates
DREIDING
DREIDING is an all- purpose organic or bio- organic molecule force field. It has been most widely used
for large biomolecular systems. It uses five valence terms, one of which is an electrostatic term. The
use of DREIDING has been dwindling with the introduction of improved methods.
ECEPP
Empirical conformational energy program for peptides (EFF) is the name of both a computer program
and the force field implemented in that program. This is one of the earlier peptide force fields that has
seen less use with the introduction of improved methods. It uses three valence terms that are fixed, a
Van der Waals term, and an electrostatic term.
EFF
Empirical force field (EFF) is a force field designed just for modeling hydrocarbons. It uses three valence
terms, no electrostatic term and five cross terms.
GROMOS
Gronigen molecular simulation (GROMOS) is the name of both a force field and the program
incorporating that force field. The GRAMOS force field is popular for predicting the dynamical motion
of molecules and bulk liquids. It is also used for modeling biomolecules. It uses five valence terms, one
of which is an electrostatic term.
MM1, MM2, MM3, MM4
MM1, MM2, MM3, and MM4 are general- purpose organic force fields. There have been many variants
of the original methods, particularly MM2. MM1 is seldom used since the newer versions show
measurable improvements. The MM3 method is probably one of the most accurate ways of modeling
hydrocarbons. At the time of this book’s publication, the MM4 method was still too new to allow any
broad generalization about the results. However, the initial published results are encouraging. These
are some of the most widely used force fields due to the accuracy of representation of organic molecules.
MMX and MM+ are variations on MM2. These force fields use five to six valence terms, one of which
is an electrostatic term and one to nine cross terms.
MMFF
The Merck molecular force field (MMFF) is one of the more recently published force fields in the
literature. It is a general- purpose method, particularly popular for organic molecules. MMFF94 was
originally intended for molecular dynamics simulations, but has also seen much use for geometry
optimization. It uses five valence terms, one of which is an electrostatic form, and one cross term.
MOMEC
MOMEC is a force field for describing transition metal coordination compounds. It was originally
parameterized to use four valence terms, but not an electrostatic term. The metal- ligand interactions
consist of a bond- stretch term only. The coordination sphere is maintained by nonbond interactions
between ligands. MOMEC generally works reasonably well for octahedrally coordinated compounds.
OPLS
Optimized potentials for liquid simulation (OPLS) was designed for modeling bulk liquids. It has also
seen significant use in modeling the molecular dynamics of biomolecules. OPLS uses five valence terms,
one of which is an electrostatic term, but no cross terms.
Tripos
Tripos is a force field created at Tripos Inc. for inclusion in the Alchemy and SYBYL programs. It is
sometimes called SYBYL force field. Tripos is designed for modeling organic and bio- organic
molecules. It is also often used for CoMFA analysis, a 3D QSAR technique. Tripos uses five valence
terms, one of which is an electrostatic term.
UFF
UFF stands for universal force field. Although there have been a number of universal force fields,
meaning that they include all elements, there has only been one actually given this name. This is the
most promising full periodic table force field available at this time. It was designed to use four valence
terms, but not an electrostatic term. UFF was originally designed to be used without an electrostatic
term. The literature accompanying one piece of software recommends using charges obtained with the
Q- equilibrate method. Independent studies have found the accuracy of results to be significantly better
without charges.
YETI
YETI is a force field designed for the accurate representation of non-bonded interactions. It is most
often used for modeling interactions between biomolecules and small substrate molecules. It is not
designed for molecular geometry optimization so researchers often optimize the molecular geometry
with some other force field, such as AMBER, then use YETI to model the docking process. Recent
additions to YETI are support for metals and solvent effects. A generalization of the results for studies
comparing force field accuracies is as follows:
1. The MM2, MM3, and Merck (MMFF) force fields perform best for a wide range of organic
molecules.
2. The AMBER and CHARMM force fields are best suited for protein and nucleic acid studies.
3. Most existing molecular mechanics studies of inorganic molecules required careful
customization of force field parameters.
4. UFF is the most reliable force field to be used without modification for inorganic systems.
5. Molecular dynamics studies are best done with a force field designed for that purpose.
6. The rings in sugars pose a particular problem to general-purpose force fields and should be
modeled using a force field designed specifically for carbohydrates.
EFFICIENT USE OF COMPUTER
Many computational chemistry techniques are extremely computer- intensive. Depending on the type
of calculation desired, it could take anywhere from seconds to weeks to do a single calculation. There
are many calculations, such as ab initio analysis of biomolecules, that cannot be done on the largest
computers in existence. Likewise, calculations can take very large amounts of computer memory and
hard disk space. In order to complete work in a reasonable amount of time, it is necessary to understand
what factors contribute to the computer resource requirements. Ideally, the user should be able to
predict in advance how much computing power will be needed. There are often trade-offs between
equivalent ways of doing the same calculation. For example, many ab initio programs use hard disk
space to store numbers that are computed once and used several times during the course of the
calculation. These are the integrals that describe the overlap between various basis functions. Instead
of the above method, called conventional integral evaluation, it is possible to use direct integral
evaluation in which the numbers are recomputed as needed. Direct integral evaluation algorithms use
less disk space at the expense of requiring more CPU time to do calculation. An in-core algorithm is
one that stores all the integrals in the RAM memory, thus saving on disk space at the expense of
requiring a computer with a very large amount of memory. Many programs use a semi-direct algorithm,
which uses some disk space and a bit more CPU time to obtain the optimal balance of both.
THE COMPLEXITY
Time complexity is a way of denoting how the use of computer resource (CPU time, memory, etc.)
changes as the size of the problem changes. For example, consider a HF calculation with N Orbitals. At
the end, the orbital energies must be added. Since there are N orbitals, there will be N addition
operations. There are a certain number of operations, which we will call C, which have to be done
regardless of the size of calculation, such as initializing variables and allocating memory. The standard
matrix inversion algorithm requires N3 operations. Computing the two- electrons Coulomb and
exchange integrals for a HF calculation takes N4 operations.