Vous êtes sur la page 1sur 332

1 Contents

1. Introduction 1
Who should use this documentation 2
What can simulation engines do? 2
Energy minimization 2
Molecular dynamics 3
Other forcefield-based calculations 3
What are forcefields and simulation engines? 4
Using this guide 5
Additional information 6
Typographical conventions 7

2. Forcefields 9
The potential energy surface 10
Empirical fit to the potential energy surface 11
The forcefield 12
The energy expression 16
Forcefields supported by MSI forcefield engines 19
Main types of forcefields 20
Advantages of having several forcefields 23
Primary uses of each MSI forcefield 24
Second-generation forcefields accurate for many properties 26
CFF91, PCFF, CFF, COMPASS—consistent forcefields 28
MMFF93, the Merck molecular forcefield 34
Rule-based forcefields broadly applicable to the periodic table
35
ESFF, extensible systematic forcefield 36
UFF, universal forcefield 42
VALBOND 44
Dreiding forcefield 51
Classical forcefields 53
AMBER forcefield 53
CHARMm forcefield 56
CVFF, consistent valence forcefield 57
Special-purpose forcefields 61

Forcefield-Based Simulations/September 1998 i


1. Contents

Glass forcefield 62
MSXX forcefield for polyvinylidene fluoride 64
Zeolite forcefields 65
Forcefields for sorption on zeolites 67
Forcefields for Cerius2•Morphology module 67
Archived and untested forcefields 68

3. Preparing the Energy Expression and the Model 73


Using forcefields 75
Selecting forcefields 78
Assigning forcefield atom types and charges 78
What are atom types in forcefields? 79
Assigning atom types to a model 79
Assigning charges 81
Parameter assignment 84
Determination of which parameters are used with which
atom types 84
Automatic assignment of values for missing parameters 86
Manual parameter assignment 89
Using alternative forms of energy terms 92
Removing terms from the energy expression 93
Scaling or editing any selected type of term 94
Alternative bond terms 94
Scaled torsion terms 95
Inversion terms 96
Nonbond functional form 96
Hydrogen bonds and hydrogen-bond terms 96
Bond–angle cross terms vs. Urey–Bradley terms 98
Applying constraints and restraints 98
When to use constraints/restraints 100
Fixed atom constraints 102
Template forcing, tethering, quartic droplet restraints, and
consensus conformations 103
General internal-coordinate restraints 106
Distance and NOE restraints 106
Distance and angle constraints in dynamics simulations 109
Angle restraints 110
Torsion restraints 110
Inversion, out-of-plane, and chiral restraints 112
Plane and other geometrical constraints and restraints 112
Modeling periodic systems 113
Minimum-image model 115
Explicit-image model 117
Crystal simulations 119
Bonds across boundaries 120
Handling nonbond interactions 120

ii Forcefield-Based Simulations/September 1998


Combination rules for van der Waals terms 124
The dielectric constant and the Coulombic term 125
Nonbond cutoffs 127
Cell multipole method 138
Ewald sums for periodic systems 143

4. Minimization 153
General minimization process 155
Specific minimization example 155
Line search 157
Minimization algorithms 160
Steepest descents 161
Conjugate gradient 164
Newton–Raphson methods 166
General methodology for minimization 173
Minimizations with MSI simulation engines 174
When to use different algorithms 176
Convergence criteria 178
Significance of minimum-energy structure 180
Energy and gradient calculation 181
Vibrational calculation 182
Application of minimization to vibrational theory 183
Vibrational frequencies 185
General methodology for vibrational calculations 187

5. Molecular Dynamics 189


Integration algorithms 192
Introduction 192
Criteria of good integrators in molecular dynamics 193
Integrators in MSI simulation engines 194
The choice of timestep 197
Integration errors 198
Example 1—Two colliding hydrogen atoms 199
Example 2—Energy conservation of a harmonic oscillator
204
Statistical ensembles 205
NVE ensemble 207
NVT ensemble 208
NPT and NST ensembles 208
NPH and NSH ensembles 209
Equilibrium thermodynamic properties 210
Temperature 211
How temperature is calculated 212
How temperature is controlled 214

Forcefield-Based Simulations/September 1998 iii


1. Contents

Pressure and stress 220


Units and sign conventions for pressure and stress 221
How pressure and stress are calculated 222
How pressure and stress are controlled 225
Types of dynamics simulations 229
Quenched dynamics 232
Simulated annealing 232
Consensus dynamics 233
Impulse dynamics 233
Langevin dynamics 234
Stochastic boundary dynamics 234
Multibody order-N dynamics 234
Constraints during dynamics simulations 235
The SHAKE algorithm 236
The RATTLE algorithm 237
Dynamics trajectories 238
General methodology for dynamics calculations 238
Stages and duration of dynamics simulations 239
Dynamics with MSI simulation engines 241
Restarting a dynamics simulation 246

6. Free Energy 251


Relative free energy—theory and implementation 251
Finite difference thermodynamic integration (FDTI) 251
Relative free energy—methodology 255
Absolute free energy 258
Theory and implementation 258
Example: Fentanyl 264
Analysis of results 270

A. References 273

B. Forcefield Terms and Atom Types 283


Forcefield term definitions 284
AMBER atom types 287
Standard AMBER forcefield 287
Homan’s carbohydrate forcefield 290
CFF91 atom types 291
CHARMm atom types 294
COMPASS atom types 297
CVFF atom types 302
CVFF_aug atom types 305

iv Forcefield-Based Simulations/September 1998


ESFF atom types 307
PCFF—additional atom types 312

Forcefield-Based Simulations/September 1998 v


1. Contents

vi Forcefield-Based Simulations/September 1998


1 Introduction

This Forcefield-Based Simulations documentation is a general guide


to all MSI’s simulation engines, that is, software products whose
computational work is based on a forcefield. These include
CHARMm®, Discover®, and the Open Force Field™ (OFF) mod-
ules, which are run through the molecular modeling programs
(i.e., graphical interfaces) shown in Table 1.

Table 1. Simulation enginesa within MSI’s molecular modeling


programsb

Molecular modeling Simulation engine


program and release
number CHARMm Discoverc OFF
Cerius2™ 4.0 √ √d √
Insight® II 4.0.0 √
Insight® II 97.0 √ √
QUANTA® √
standalonee √ √
a
See definitions under What are forcefields and simulation engines?.
bDiscover and OFF each offer a choice of several forcefields (see Table 3);
the CHARMm program gives access only to the CHARMm forcefield in
QUANTA and standalone, only to MMFF in Cerius2, and is used only in spe-
cialized modules in Insight II.
c
Discover exists in two versions: one written in FORTRAN (series 2.8.x, 2.9.x
and earlier ; referred to as FDiscover) and the other in C (series 3.0, 3.1, 3.0.0,
4.0.0, 95.0, 96.0, 97.0; referred to as CDiscover). CDiscover and FDiscover
are specified in this documentation only where the FORTRAN and C Dis-
cover programs have different capabilities. FDiscover and CDiscover (in
Insight II) are accessed through the Discover and Discover_3 modules,
respectively.
dCDiscover only.
eCHARMm and Discover can also be run without the assistance of a graph-
ical molecular modeling program.

Forcefield-Based Simulations/October 1997 1


1. Introduction

Who should use this documentation


This guide is written mainly for the typical scientist-user of MSI’s
simulation engines. Although these programs are written to run
with reasonable default values for basic simulations, you should
read this guide if you want to make efficient use of the programs,
obtain the best results possible, and understand the results.
Prerequisites You should already be familiar with:
♦ The system and windowing software on your workstation.
♦ How to use the particular MSI molecular modeling program
that contains the desired simulation engine (Cerius2, Insight,
and/or QUANTA).
Your workstation should have:
♦ A licensed copy of Cerius2, Insight, and/or QUANTA as well
as of the appropriate simulation engine.
♦ A directory in which you have write permission.

What can simulation engines do?

Energy minimization
Typical uses of energy minimization include:
♦ Optimizing initial geometries of models constructed in a
molecular modeling program such as Cerius2™ or Insight®.
♦ Repairing poor geometries occurring at splice points during
homology building of protein structures.
♦ Mapping the energy barriers for geometric distortions and con-
formational transitions using “torsion forcing” to obtain Ram-
achandran-type contour plots for proteins or RIS statistical
weights for polymers.

2 Forcefield-Based Simulations/October 1997


What can simulation engines do?

♦ Evaluating whether a molecule can adopt a template conforma-


tion consistent with a pharmacophoric or catalytic site model
(“template forcing”).

Molecular dynamics
Typical uses of molecular dynamics include:
♦ Searching the conformational space of alternative amino acid
sidechains in site-specific mutation studies.
♦ Identifying likely conformational states for highly flexible
polymers or for flexible regions of macromolecules such as pro-
tein loops.
♦ Producing sets of 3D structures consistent with distance and
torsion constraints deduced from NMR experiments (simu-
lated annealing).
♦ Calculating free energies of binding, including solvation and
entropy effects.
♦ Probing the locations, conformations, and motions of mole-
cules on catalyst surfaces.
♦ Running diffusion calculations.

Other forcefield-based calculations


In addition, simulation engines can be routinely used for:
♦ Calculating normal modes of vibration and vibrational fre-
quencies.
♦ Analyzing intramolecular and intermolecular interactions in
terms of residue–residue or molecule–molecule interactions,
energy per residue, or interactions within a radius.
♦ Calculating diffusion coefficients of small molecules in a poly-
mer matrix.
♦ Calculating thermal expansion coefficients of amorphous poly-
mers.
♦ Calculating the radial distribution of liquids and amorphous
polymers.

Forcefield-Based Simulations/October 1997 3


1. Introduction

♦ Performing rigid-body rms comparisons between minimized


conformations of the same or similar structures or between
simulated and experimentally observed structures.

What are forcefields and simulation engines?


The fundamental computation at the core of a forcefield-based
simulation is the calculation of the potential energy for a given
configuration of atoms (and cells, if requested and possible). The
calculation of this energy, along with its first and second deriva-
tives with respect to the atomic coordinates (and cell coordinates),
yields the information necessary for minimization, harmonic
vibrational analysis, and dynamics simulations. This calculation is
actually performed by the simulation engine, or forcefield-based
program.
Simulation “engine” Simulation engines are the computational packages that handle
defined the application of forcefields in minimization, dynamics, and
other molecular mechanics simulations. Currently, MSI-supplied
simulation engines include CHARMm, Discover, and OFF.
(CHARMm is the name for both a simulation engine and for the
forcefield included in that engine.)
“Forcefield” and “energy The functional form of the potential energy expression and the
expression” defined entire set of parameters needed to fit the potential energy surface
constitute the forcefield (Ermer 1976). The energy expression is the
specific equation set up for a particular model and including (or
not) any optional terms.
For example, a forcefield would contain bond-stretching parame-
ters for all combinations of atoms for which it was parameterized,
as well as a defined, summed functional form for the bond-stretch-
ing term. The corresponding energy expression would contain
bond-stretching parameters for only those combinations of
bonded atoms actually found in the model being studied, as well
as the specific bond-stretching terms for the number and types of
bonds in that model (see example under Example energy expression
for water).
Importance of the force- It is important to understand that the forcefield—both the func-
field in simulations tional form and the parameters themselves—represents the single
largest approximation in molecular modeling. The quality of the

4 Forcefield-Based Simulations/October 1997


Using this guide

forcefield, its applicability to the model at hand, and its ability to


predict the particular properties measured in the simulation
directly determine the validity of the results.
Moledular modeling pro- Molecular modeling programs are the graphical user interfaces (UIFs
grams or GUIs) that can be used to prepare models, set up forcefields, and
access the simulation engines. Some simulation engines can also
be run in standalone mode, that is, outside the graphical molecular
modeling program. Molecular modeling programs currently sup-
plied by MSI include Cerius2, Insight, and QUANTA.

Using this guide


This guide contains background information on forcefields, the
theories involved in their use, and how they are implemented in
MSI’s simulation engines, as well as general methodology and
strategies for performing the various types of calculation most
commonly done with these programs:
♦ Forcefields presents the concept of an energy surface and com-
pares the forcefields available to MSI’s simulation engines,
including their functional forms and atom types. Different
forcefields have been developed specifically for different types
of models or computational experiments.
♦ Preparing the Energy Expression and the Model concerns concepts
such as periodic boundary conditions, nonbond interactions,
restraints, and constraints. You typically need to refine both the
model and the energy expression that you intend to set up, in
order to optimize your calculation conditions.
♦ Minimization includes information on the minimization algo-
rithms that these programs can use; how, in general, to carry
out minimization calculations; and the applicability of minimi-
zation in energy and vibration calculations.
♦ Molecular Dynamics covers the dynamics algorithms in these
programs, thermodynamic ensembles, control of temperature
and pressure, and constraints during dynamics.
♦ Free Energy presents relative and absolute free energy calcula-
tions.

Forcefield-Based Simulations/October 1997 5


1. Introduction

References contains the scientific references cited in this guide.


Atom types and forcefield terms are listed under Forcefield Terms
and Atom Types.

Additional information
Available documentation Guides are available for every simulation engine and modeling
program that MSI provides, including these:
♦ MSI Forcefield Engines: CDiscover.
♦ Cerius2 Forcefield Engines: OFF.
♦ MSI Forcefield Engines: FDiscover.
♦ MSI Forcefield Engines: CHARMm.
♦ Cerius2 Modeling Environment.
♦ Insight II.
♦ QUANTA documentation set.
On-screen help In addition to the task-oriented documentation for each simula-
tion engine, on-screen help is available within the Cerius2, Insight,
and QUANTA environments. Please see the documentation for the
specific program for how to access the help.
Supplemental documen- Additional information on using the Cerius2, Insight, and
tation QUANTA interfaces, including building models and writing
scripts for automated running, is contained in their respective
guides. Technical information that is mainly of use to program-
mers and system administrators is contained in installation/
administration guides. Supplemental information that may be of
general interest (including additional information on the elec-
tronic documentation) is contained in release notes.
MSI’s website The URL for the documentation and customer support areas of
MSI’s website are :
http://www.msi.com/doc/
http://www.msi.com/support/
Information relevant to forcefields, simulation engines, and mod-
eling programs can be found.

6 Forcefield-Based Simulations/October 1997


Typographical conventions

Typographical conventions
Unless otherwise noted in the text, Forcefield-Based Simulations uses
these typographical conventions:
♦ Terms introduced for the first time are presented in italic type.
For example:
Instructions are given to the software via control panels.
♦ Keywords in the interface are presented in bold type. In addi-
tion, slashes (/) are used to separate a menu item from a sub-
menu item. For example:
Select the View/Colors… menu item means to click the View
menu item, drag the cursor down the pulldown menu that
appears, and release the mouse button over the Colors… item.
♦ Words you type or enter are presented in bold type. For exam-
ple:
Enter 0.001 in the Convergence entry box.
♦ UNIX command dialog and file samples are represented in a
typewriter font. For example, the following illustrates a line
in a .grf file:
CERIUS Grapher File

♦ Words in italic represent variables. For example:


discovery input_file

In this example, the name of the file from which data are read
in replaces input_file.

Forcefield-Based Simulations/October 1997 7


1. Introduction

8 Forcefield-Based Simulations/October 1997


2 Forcefields

This chapter focuses specifically on the forcefields supported by


MSI’s simulation engines.
Who should read this You should read this chapter if you want to know:
chapter
♦ What a forcefield is.
♦ What a potential energy surface is.
♦ How to choose the best forcefield for your system.

This chapter explains The potential energy surface


Empirical fit to the potential energy surface
Forcefields supported by MSI forcefield engines
Second-generation forcefields accurate for many properties
Rule-based forcefields broadly applicable to the periodic table
Classical forcefields
Special-purpose forcefields
Archived and untested forcefields

Related information Preparing the Energy Expression and the Model presents information
on how the functional forms of forcefields are used for real simu-
lations. You need to read it to optimize how you set up your sim-
ulation. The general procedure for forcefield-based calculations is
outlined under Using forcefields.
The atom types defined for each forcefield are listed under Force-
field Terms and Atom Types. Illustrations of various types of cross
terms are also included.
The files that specify the forcefields are described in the separate
documentation for each simulation engine.

Forcefield-Based Simulations/October 1997 9


2. Forcefields

Table 2. Finding information in Forcefields section

If you want to know about: Read:


The theory behind forcefields. The potential energy surface; Empirical fit to the potential
energy surface.
What a forcefield is. The forcefield; The energy expression.
Characteristics of forcefields. Main types of forcefields.
What forcefields are available in which Table 3. Primary uses of forcefields provided in MSI prod-
MSI modeling programs. ucts (Page 1 of 2).
Choosing the best forcefield for your Table 3. Primary uses of forcefields provided in MSI prod-
calculation. ucts (Page 1 of 2). followed by one or more of the
descriptive subsections starting under Second-genera-
tion forcefields accurate for many properties.

The potential energy surface


The complete mathematical description of a molecule, including
both quantum mechanical and relativistic effects, is a formidable
problem, due to the small scales and large velocities. However, for
this discussion, these intricacies are ignored and the focus is on
general concepts, because molecular mechanics and dynamics are
based on empirical data that implicitly incorporate all the relativ-
istic and quantum effects. Since no complete relativistic quantum
mechanical theory is suitable for the description of molecules, this
discussion starts with the nonrelativistic, time-independent form
of the Schrödinger description:
The Schrödinger equation HΨ ( R ,r ) = EΨ ( R, r ) Eq. 1

where H is the Hamiltonian for the system, Ψ is the wavefunction,


and E is the energy. In general, Ψ is a function of the coordinates of
the nuclei (R) and of the electrons (r).
The Born–Oppenheimer Although this equation is quite general, it is too complex for any
approximation practical use, so approximations are made. Noting that the elec-
trons are several thousands of times lighter than the nuclei and

10 Forcefield-Based Simulations/October 1997


Empirical fit to the potential energy surface

therefore move much faster, Born and Oppenheimer (1927) pro-


posed what is known as the Born–Oppenheimer approximation:
the motion of the electrons can be decoupled from that of the
nuclei, giving two separate equations. The first equation describes
the electronic motion:
Equation for electronic Hψ ( r ;R ) = Eψ ( r ;R ) Eq. 2
motion, or the potential
energy surface and depends only parametrically on the positions of the nuclei.
Note that this equation defines an energy E(R), which is a function
of only the coordinates of the nuclei. This energy is usually called
the potential energy surface.
Equation for nuclear The second equation then describes the motion of the nuclei on
motion on the potential this potential energy surface E(R):
energy surface
HΦ ( R ) = EΦ ( R ) Eq. 3

The direct solution of Eq. 2 is the province of ab initio quantum


chemical codes such as Gaussian, CADPAC, Hondo, GAMESS,
DMol, and Turbomole. Semiempirical codes such as ZINDO,
MNDO, MINDO, MOPAC, and AMPAC also solve Eq. 2, but they
approximate many of the integrals needed with empirically fit
functions. The common feature of these programs, though, is that
they solve for the electronic wavefunction and energy as a function
of nuclear coordinates. In contrast, simulation engines provide an
empirical fit to the potential energy surface.

Empirical fit to the potential energy surface


Solving Eq. 3 is important if you are interested in the structure or
time evolution of a model. As written, Eq. 3 is the Schrödinger
equation for the motion of the nuclei on the potential energy sur-
face. In principle, Eq. 2 could be solved for the potential energy E,
and then Eq. 3 could be solved. However, the effort required to
solve Eq. 2 is extremely large, so usually an empirical fit to the
potential energy surface, commonly called a forcefield (V), is used.
Since the nuclei are relatively heavy objects, quantum mechanical
effects are often insignificant, in which case Eq. 3 can be replaced
by Newton’s equation of motion:

Forcefield-Based Simulations/October 1997 11


2. Forcefields

2
dV d R
– = m 2 Eq. 4
dR dt

Molecular dynamics and The solution of Eq. 4 using an empirical fit to the potential energy
mechanics surface E(R) is called molecular dynamics. Molecular mechanics
ignores the time evolution of the system and instead focuses on
finding particular geometries and their associated energies or
other static properties. This includes finding equilibrium struc-
tures, transition states, relative energies, and harmonic vibrational
frequencies.

The forcefield
Components of a force- The forcefield contains the necessary building blocks for the calcu-
field lations of energy and force:
♦ A list of atom types.
♦ A list of atomic charges (if not included in the atom-type infor-
mation).
♦ Atom-typing rules.
♦ Functional forms for the components of the energy expression.
♦ Parameters for the function terms.
♦ For some forcefields, rules for generating parameters that have
not been explicitly defined.
♦ For some forcefields, a defined way of assigning functional
forms and parameters.
This total “package” for the empirical fit to the potential energy
surface is the forcefield.
Coordinates, terms, func- The forcefields commonly used for describing molecules employ a
tional forms combination of internal coordinates and terms (bond distances, bond
angles, torsions, etc.), to describe part of the potential energy sur-
face due to interactions between bonded atoms, and nonbond terms
to describe the van der Waals and electrostatic (etc.) interactions
between atoms. The functional forms range from simple quadratic
forms to Morse functions, Fourier expansions, Lennard–Jones
potentials, etc.

12 Forcefield-Based Simulations/October 1997


Empirical fit to the potential energy surface

Purpose of forcefields The goal of a forcefield is to describe entire classes of molecules


with reasonable accuracy. In a sense, the forcefield interpolates
and extrapolates from the empirical data of the small set of models
used to parameterize the forcefield to a larger set of related mod-
els. Some forcefields aim for high accuracy for a limited set of ele-
ment types, thus enabling good prediction of many molecular
properties. Other forcefields aim for the broadest possible cover-
age of the periodic table, with necessarily lower accuracy.
Physical significance The physical significance of most of the types of interactions in a
forcefield is easily understood, since describing a model’s internal
degrees of freedom in terms of bonds, angles, and torsions seems
natural. The analogy of vibrating balls connected by springs to
describe molecular motion is equally familiar. However, it must be
remembered that such models have limitations. Consider for
example the difference between such a mechanical model and a
quantum mechanical “bond”.
Quantum and mechani- Covalent bonds can, to a first approximation, be described by a
cal descriptions of bonds harmonic oscillator, both in quantum and classical mechanical the-
ory. Consider the classic oscillator in Figure 1. A ball poised at the
intersection of the pale horizontal line with the parabolic energy
surface (thick line) would begin to roll down, converting its poten-
tial energy to kinetic energy and achieving a maximum velocity as
it passes the minimum. Its velocity (kinetic energy) is then con-
verted back into potential energy until, at the exact same height as
it had started, it would pause momentarily before rolling back.
The interchange of kinetic and potential energy in such a mechan-
ical system is familiar and intuitive.
The probability of finding the ball at any point along its trajectory
is inversely proportional to its velocity at that point (which is
opposite to the probability for a real atom). This probability is plot-
ted above the parabolic curve (thin line, Figure 1). The probability
is greatest near the high-energy limits of its trajectory (where it is
moving slowly) and lowest at the energy minimum (where it is
moving quickly). Because the total energy cannot exceed the initial
potential energy defined by the starting point, the probability
drops to zero outside the limit defined by the intersection of the
total energy (pale horizontal line) with the parabola.
Describing a quantum mechanical “trajectory” is impossible,
because the uncertainty principle prevents an exact, simultaneous
specification of both position and momentum. However, the prob-

Forcefield-Based Simulations/October 1997 13


2. Forcefields

classical harmonic oscil- quantum harmonic os-


lator cillator

3.0 3.0

0.0 0.0
-2 2 -2 2
Figure 1. Energy and probability of a mechanical and quantum particle
in a harmonic energy well
The energy is indicated by the heavy lines and probability by the thin lines. The
total energy of the system is indicated by the pale horizontal line. The classical
(mechanical) probability is highest when the particle reaches it maximum
potential energy (zero velocity) and drops to zero between these points. The
quantum mechanical probability is highest where the potential energy is low-
est, and there is a finite probability that the particle can be found outside the
classical limits (pale vertical lines).

ability that the quantum mechanical ball will be at a given point on


the parabola can be quantified. The quantum mechanical probabil-
ity function plotted in the right panel of Figure 1 is very different
from the mechanical system. First, the highest probability is at the
energy minimum, which is the opposite of the mechanical case.
Second, the quantum mechanical ball can actually be found
beyond the classical limits imposed by the total energy of the sys-
tem (tunneling). Both these properties can be attributed to the
uncertainty principle.
Utility of the forcefield With such a different qualitative picture of fundamental physical
approach principles, is it reasonable to use a mechanical approach for obvi-
ously quantum mechanical entities like bonds? In practice, many
experimental properties such as vibrational frequencies, sublima-

14 Forcefield-Based Simulations/October 1997


Empirical fit to the potential energy surface

tion energies, and crystal structures can be reproduced with a


forcefield, not because the systems behave mechanically, but
because the forcefield is fit to reproduce relevant observables and
therefore includes most of the quantum effects empirically. Never-
theless, it is important to appreciate the fundamental limitations of
a mechanical approach.
Limitations of the force- Applications beyond the capability of most forcefield methods
field approach include:
♦ Electronic transitions (photon absorption).
♦ Electron transport phenomena.
♦ Proton transfer (acid/base reactions).
The power of the force- The true power of the atomistic description of a model embodied
field approach in the energy expression lies in three major areas:
♦ The first is that forcefield-based simulations can handle large
systems, since these simulations are several orders of magni-
tude faster (and cheaper) than quantum-based calculations.
Forcefield-based simulations can be used for studying con-
densed-phase molecules, macromolecules, crystal morphology,
inorganic and organic interphases, etc., where the properties of
interest are not sensitive to quantum effects (e.g., phase behav-
ior, equations of state, bond energies, etc.).
♦ The second is the analysis of the energy contributions at the
level of individual or classes of interactions. For instance, you
can decompose the energy into bond energies, angle energies,
nonbond energies, etc. or even to the level of a specific hydro-
gen bond or van der Waals contact, in order to understand a
physical observable or to make a prediction.
♦ The third area, which is described under Applying constraints
and restraints, lies in the modification of the energy expression
to bias the calculation. You can impose constraints (absolute
conditions), such as fixing an atom in space and not allowing it
to move. You can also add extra terms to the energy expression
to restrain or force the system in certain ways. For instance, by
adding an extra torsion potential to a particular bond, you can
force the torsion angle toward a desired value. (You can apply
constraints also for quantum-based energy calculations.)

Forcefield-Based Simulations/October 1997 15


2. Forcefields

The energy expression


The actual coordinates of a model combined with the forcefield
data create the energy expression (or target function) for the model.
This energy expression is the equation that describes the potential
energy surface of a particular model as a function of its atomic
coordinates.
The potential energy of a system can be expressed as a sum of
valence (or bond), crossterm, and nonbond interactions:

E total = E valence + E crossterm + E nonbond

Valence interactions The energy of valence interactions is generally accounted for by


diagonal terms, namely, bond stretching (Ebond), valence angle
bending (Eangle), dihedral angle torsion (Etorsion), and inversion
(also called out-of-plane interactions) (Einversion or Eoop) terms,
which are part of nearly all forcefields for covalent systems. A
Urey–Bradley term (EUB) may be used to account for interactions
between atom pairs involved in 1–3 configurations (i.e., atoms
bound to a common atom):

Evalence = Ebond + Eangle + Etorsion + Eoop + EUB Eq. 5

Valence crossterms Modern (second-generation) forcefields generally achieve higher


accuracy by including cross terms to account for such factors as
bond or angle distortions caused by nearby atoms. Crossterms can
include the following terms: stretch–stretch, stretch–bend–stretch,
bend–bend, torsion–stretch, torsion–bend–bend, bend–torsion–
bend, stretch–torsion–stretch. (These are illustrated under Force-
field Terms and Atom Types.)
Nonbond interactions The energy of interactions between nonbonded atoms is
accounted for by van der Waals (EvdW), electrostatic (ECoulomb),
and (in some older forcefields) hydrogen bond (Ehbond) terms:

Enonbond = EvdW + ECoulomb + Ehbond Eq. 6

Restraints Restraints that can be added to an energy expression include dis-


tance, angle, torsion, and inversion restraints. Restraints are useful
if you, for example, are interested in the structure of only part of a
model. For information on restraints and their implementation

16 Forcefield-Based Simulations/October 1997


Empirical fit to the potential energy surface

and use, see Preparing the Energy Expression and the Model in this
documentation set and also the documentation for the particular
simulation engine.
Example energy expres- As a simple example of a complete energy expression, consider the
sion for water following equation, which might be used to describe the potential
energy surface of a water model:

0 2 0 2 0 2
V ( R ) = K oh ( b – b oh ) + K oh ( b′ – b oh ) + K hoh ( θ – θ hoh ) Eq. 7

where Koh, b0oh, Khoh, and θ0hoh are parameters of the forcefield, b is
the current bond length of one O–H bond, b′ is the length of the
other O–H bond, and θ is the H–O–H angle.
In this example, the forcefield defines:
♦ The coordinates to be used (bond lengths and angles).
♦ The functional form (a simple quadratic in both types of coor-
dinates).
♦ The parameters (the force constants Koh and Khoh, as well as the
reference values b0oh and θ0hoh).
The reference O–H bond length and reference H–O–H angle are
the values for an ideal O–H bond and H–O–H angle at zero
energy, which is not necessarily the same as their equilibrium
values in a real water molecule.
Example forcefield func- Eq. 7 is an example of an energy expression as set up for a simple
tion molecule. Eq. 8 is an example of the corresponding general,
summed forcefield function:

Forcefield-Based Simulations/October 1997 17


2. Forcefields

V(R) =
∑ D [1 – exp(–a(b – b )) ] + ∑ H ( θ – θ ) + ∑ H [ 1 + s cos ( nφ) ]
b 0
2
θ 0
2
φ

b θ φ

+
∑H χ + ∑∑F
χ
2
bb′ ( b – b 0 ) ( b′ – b′ 0 ) +
∑∑F θθ′ ( θ – θ 0 ) ( θ′ – θ′ 0 )

χ b b′ θ θ′

+
∑∑F bθ ( b – b0 ) ( θ – θ0 ) +
∑∑F θθ′φ ( θ – θ 0 ) ( θ′ – θ′ 0 ) cos φ

b θ θ θ′

∑∑ ∑∑
A ij B ij q i q j
+ F χχ′ χχ′ + ------- – ------ + ---------
r ij12 r ij6 r ij
χ χ′ i j>i
Eq. 8

The first four terms in this equation are sums that reflect the
energy needed to stretch bonds (b), bend angles (θ) away from
their reference values, rotate torsion angles (φ) by twisting atoms
about the bond axis that determines the torsion angle, and distort
planar atoms out of the plane formed by the atoms they are
bonded to (χ). The next five terms are cross terms that account for
interactions between the four types of internal coordinates. The
final term represents the nonbond interactions as a sum of repul-
sive and attractive Lennard–Jones terms as well as Coulombic
terms, all of which are a function of the distance rij between atom
pairs. The forcefield defines the functional form of each term in
this equation as well as the parameters such as Db, α, and b0. The
forcefield also defines internal coordinates such as b, θ, φ, and χ as
a function of the Cartesian atomic coordinates, although this is not
explicit in Eq. 8.
We should note that the energy expression in Eq. 8 is cast in a gen-
eral form. The true energy expression for a specific model includes
information about the coordinates that are included in each sum.
For example, it is common to exclude interactions between bonded
and 1–3 atoms in the summation representing the nonbond inter-
actions. Thus, a true energy expression might actually use a list of
allowed interactions rather than the full summation implied in
Eq. 8.

18 Forcefield-Based Simulations/October 1997


Forcefields supported by MSI forcefield engines

Forcefields supported by MSI forcefield engines


The results of any mechanics or dynamics calculation depend cru-
cially on the forcefield. The quality of the description of both the
system and the particular properties being analyzed is of para-
mount importance. Accurate, specific parameters generally give
better results than automatic, generic parameters. Choosing the
correct forcefield is vitally important in getting reasonable results
from energy calculations.
Contents of this section This section gives a general comparison of the forcefields that are
available in MSI products and presents the reasoning behind mak-
ing a wide variety of forcefields available to our customers. It
should enable you to make at least a preliminary choice of which
forcefield to use.
Forcefield descriptions Complete descriptions of each forcefield follow in subsequent sec-
tions:
Second-generation force- CFF91, PCFF, CFF, COMPASS—consistent forcefields
fields
MMFF93, the Merck molecular forcefield
Broadly applicable force- ESFF, extensible systematic forcefield
fields
UFF, universal forcefield
VALBOND
Dreiding forcefield
Classical forcefields Standard AMBER forcefield
Homans’ carbohydrate forcefield
CHARMm forcefield
CVFF, consistent valence forcefield
Special-purpose force- Glass forcefield
fields
MSXX forcefield for polyvinylidene fluoride
Zeolite forcefields
Forcefields for sorption on zeolites
Forcefields for Cerius2•Morphology module

Forcefield-Based Simulations/October 1997 19


2. Forcefields

Other forcefields Archived and untested forcefields


Related information The atom types defined by each forcefield are listed under Force-
field Terms and Atom Types, and the types of parameters used in the
forcefields are described in the documentation for each simulation
engine.

Main types of forcefields


MSI provides four main types of forcefields:
♦ Second-generation forcefields capable of predicting many
properties.
♦ Rule-based forcefields applicable to a broad range of the peri-
odic table.
♦ Classical, first-generation forcefields applicable mainly to bio-
chemistry.
♦ Special-purpose forcefields that are narrowly applicable to par-
ticular applications or types of models.
A complete list of these forcefields, their main uses, and the simu-
lation engine that handles them is given in Table 3.
In addition, we supply (but do not support) several older or
untested forcefields.
Second-generation force- ♦ The CFF family of forcefields (CFF91, PCFF, CFF, COMPASS) are
fields closely related second-generation forcefields (Maple et al. 1988,
1994a, b, Dinur and Hagler 1991, Waldman and Hagler 1993,
Hill and Sauer 1994, Hwang et al. 1994, Hagler and Ewig 1994,
Sun et al. 1994, Sun 1994, 1995).
The CFF family of forcefields were parameterized against a
wide range of experimental observables for organic com-
pounds containing H, C, N, O, S, P, halogen atoms and ions,
alkali metal cations, and several biochemically important diva-
lent metal cations.
CFF has slightly more atom types than CFF91 (Forcefield Terms
and Atom Types).

20 Forcefield-Based Simulations/October 1997


Forcefields supported by MSI forcefield engines

PCFF is based on CFF91, extended so as to have a broad cover-


age of organic polymers, (inorganic) metals, and zeolites.
COMPASS is a new version of PCFF.
The CFF family of forcefields have been shown to reproduce
experimental results more accurately than classical forcefields
such as CVFF and AMBER.
♦ The Merck molecular forcefield (MMFF93), developed by T. A.
Halgren at the Merck Research Laboratories (1992, Halgren &
Nachbar, 1996) is designed to be used with a large variety of
chemical models.
The main application of MMFF93 is to the study of receptor–
ligand interactions involving proteins or nucleic acids as recep-
tors and a wide range of chemical structures as ligands. The
forcefield can describe ligands and receptors in isolation as well
as in the bound state.
Rule-based forcefields ♦ The ESFF forcefield (extensible systematic forcefield) is a rule-
based forcefield that was developed at MSI.
The goal of this forcefield is to provide the widest possible cov-
erage of the periodic table, enabling both the structures of iso-
lated molecules and crystals to be reproduced. Its scope does
not extend to highly accurate vibrational frequencies or other
properties such as conformational energies.
♦ The Universal forcefield (Rappé et al. 1992) is an excellent gen-
eral-purpose forcefield. All the Universal forcefield parameters
are generated from a set of rules based on element, hybridiza-
tion, and connectivity.
The Universal forcefield was parametrized for the full periodic
table and has been carefully validated for main-group com-
pounds (Casewit et al. 1992b), organic molecules (Casewit et al.
1992a), and metal complexes (Rappé et al. 1993).
♦ VALBOND is a combination of the UFF, universal forcefield, and
the VALBOND method for the angle energy.
This forcefield combines the advantages of a general forcefield
with the strengths of the VALBOND method and may give bet-
ter results for non-hypervalent structures where the geometry
of ligands around a central atom is unknown.

Forcefield-Based Simulations/October 1997 21


2. Forcefields

♦ The Dreiding forcefield (Mayo et al. 1990) is a good, robust, all-


purpose forcefield. While a specialized forcefield is more accu-
rate for predicting a limited number of structures, the Dreiding
forcefield allows reasonable predictions for a very much larger
number of structures, including those with novel combinations
of elements and those for which there is little or no experimen-
tal data.
It can be used for structure prediction and dynamics calcula-
tions on organic, biological, and main-group inorganic mole-
cules.
Classical forcefields ♦ The AMBER forcefield Weiner et al. 1984, 1986) was parameter-
ized against a limited number of organic models. It has been
widely used for proteins, DNA, and other classes of molecules
and may be considered well characterized.
The standard AMBER forcefield is mainly useful for proteins
and nucleic acids. The Homans (1990) carbohydrate forcefield
is based on AMBER, but extended to polysaccharides. It is not
generally recommended for use in materials science studies.
♦ The CHARMm forcefield (Chemistry at HARvard Macromolecu-
lar mechanics) is packaged in a highly flexible molecular
mechanics and dynamics engine originally developed in the
laboratory of Dr. Martin Karplus at Harvard University. It has
been widely used and can be considered well tested and char-
acterized (e.g., Brooks et al. 1983, Momany and Rone 1992).
A variety of systems, from isolated small molecules to solvated
complexes of large biological macromolecules, can be simu-
lated using CHARMm.
♦ The CVFF forcefield is a classic forcefield having some anhar-
monic and cross term enhancements. As the traditional default
forcefield in the Discover program, it has been used extensively
and can be considered well tested and characterized.
CVFF was parameterized to reproduce peptide and protein
properties.
Special-purpose force- ♦ In addition to some standard forcefields, the Cerius2•Open
fields Force Field module provides several smaller forcefield param-
eter files for more specialized work.

22 Forcefield-Based Simulations/October 1997


Forcefields supported by MSI forcefield engines

These include separate forcefields for glasses, zeolites, and


polyvinylidene fluoride, as well as some forcefields that are
intended only for use in the Cerius2•Morphology module.

Advantages of having several forcefields


The ability to choose among several forcefields has several advan-
tages:
1. A broader range of systems can be treated:
Some classical forcefields were originally created for modeling
proteins and peptides, others for DNA and RNA. Some have
been extended to handle more general systems having similar
functional groups.
The rule-based forcefields have extended the range of forcefield
simulations to a broader range of elements.
The second-generation forcefields currently include parame-
ters for all functional groups appropriate for protein simula-
tions.
2. Identical calculations with two or more independent forcefields
can be compared to assess the dependence of the results on the
forcefield:
For example, amino acid parameters are defined in the
AMBER, CHARMm, CVFF, CFF, and MMFF93 forcefields, so
peptide and protein calculations with these forcefields can be
compared to assess the effect of the forcefields.
3. The different functional forms used in the various energy
expressions increase the flexibility of the Discover program and
the Open Force Field module:
You can balance the requirements of high accuracy vs. available
computational resources. (Highly accurate forcefields are gen-
erally more complex and therefore require more resources.)
Different energy terms can be compared. For example, approx-
imations such as a distance-dependent dielectric constant or
scaling of 1–4 nonbond interactions can be assessed.

Forcefield-Based Simulations/October 1997 23


2. Forcefields

Harmonic bond terms are accurate only at bond lengths close to


the reference bond length, but the Morse term can be used to
model bond breaking.
4. The development of new forcefields at MSI and elsewhere con-
tinues to provide more accurate and more broadly applicable
forcefields. As experience is gained in parameterizing force-
fields and as new experimental data become available, the
range of both properties and systems fit by these newer force-
fields will increase.

Primary uses of each MSI forcefield


Table 3 summarizes the forcefields best suited for various types of
work and lists the simulation engines that handle each one:

Table 3. Primary uses of forcefields provided in MSI products (Page 1 of 2)

Forcefield Docu-
Type and use of forcefield name Simulation engine Forcefield filename(s) mented
Second-generation, CFF91 Discover; OFFa cff91.frc; cff91_950_ here
general-purpose 1.01
CFF95b Discover; OFF cff95.frc; cff95_950_ here
1.01
CFF Discover, OFF cff.frc; cff1.01 here
MMFF93 CHARMmc mmff_setup.STR here
2nd-generation, poly- PCFF, COM- Discover; OFF pcff.frc; pcff_300_1.01, here
mers PASS, compass.frc;
COMPASS COMPASS1.0,
982 compass98.frc;
Compass98.01
Rule-based, broadly ESFF Discoverd esff.frc here
applicable, general- Universal OFF UNIVERSAL1.02 here
purpose UFF-VAL- OFF UFF_VALBOND1.01 here
BOND
Dreiding OFF DREIDING2.21 here
Classical, general-pur- AMBER Discover, OFFe amber.frc here
pose (biochemistry) CHARMm CHARMmf here
CVFF Discover; OFF cvff.frc; cvff_950_1.01g here

24 Forcefield-Based Simulations/October 1997


Forcefields supported by MSI forcefield engines

Table 3. Primary uses of forcefields provided in MSI products (Page 2 of 2)

Forcefield Docu-
Type and use of forcefield name Simulation engine Forcefield filename(s) mented
Special-purpose:
Inorganic oxide glasses Glass OFF glassff_1.01, glassff_ here
2.01
Morphology module of Lifson OFF morph_lifson1.11 here
Cerius2 Momany OFF morph_momany1.1 here
Scheraga OFF morph_scheraga1.1 here
Williams OFF morph_williams1.01 here
Polyvinylidene fluoride MSXX OFF msxx_1.01 here
polymers
Zeolites BKS OFF bks1.01 here
Burchart OFF burchart1.01 here
Burchart– OFF burchart1.01- here
Dreiding DREIDING2.21
Burchart– OFF burchart1.01- here
Universal UNIVERSAL1.02
CVFF_aug Discover; OFF cvff_300_1.01 here
Zeolite sorption Yashonath OFF sor_yashonath1.01 here
Demontis OFF sor_demontis1.01 here
Pickett OFF sor_pickett1.01 here
Watanabe– OFF watanabe-austin1.01 here
Austin
Older, archived, misc. several Discover; OFF gifts/*, archive/* here
Untested, misc. several OFF untested/ here

a
OFF = the Open Force Field module of Cerius2.
b
Marketed as an add-on forcefield, not present in Discover or OFF by default.
cCHARMm as run through the Cerius2•MMFF module, not in QUANTA or standard
CHARMm.
d 2
In CDiscover, not FDiscover; in other words, in the Cerius •Discover and the Insight•Discover_3 modules,
not the Insight•Discover module.
eIn the Insight•Discover_3 and the Insight•Discover modules but not the Cerius2•Discover module. An
older version of AMBER is accessible through the Cerius2•OFF module.
fCHARMm is both the name of a forcefield and the name of a simulation engine that handles the
CHARMm forcefield.
g
CVFF differs slightly in versions 3.0.0 and 95.0 of Insight II— both versions are included in Cerius2•OFF.

Additional information Additional information about forcefields included with Cerius2 is


printed to the text window when you load a forcefield. Alterna-

Forcefield-Based Simulations/October 1997 25


2. Forcefields

tively, you can click the Show information action button in the
Load Force Field control panel.
Additional information about forcefields included with Insight
4.0.0 can be obtained with the Forcefield/FF_Info parameter
block, which is accessed through the Builder and other modules.
(It is not included in Insight 97.0.)

Second-generation forcefields accurate for


many properties
Availability Second-generation forcefields provided or developed by MSI
include CFF91, CFF, PCFF, COMPASS, and MMFF93:
♦ The CFF family of forcefields (CFF91, PCFF, CFF, COMPASS—
consistent forcefields) are run via the Discover program, which is
available in the Insight•Discover_3, Insight•Discover, and
Cerius2•Discover modules. They can also be used by the
Cerius2•Open Force Field module. Discover is also used
implicitly by other modules in Insight II (such as some in the
Polymer suite of products). The CFF and COMPASS forcefields
are separately licensed (that is, not present by default within
Discover).
♦ MMFF93 (MMFF93, the Merck molecular forcefield) is run via a
version of CHARMm that supports the Cerius2•MMFF mod-
ule.
Characteristics The topography of an energy surface is usually very complex,
especially for large and/or complex models, with many energy
minima and barriers and regions of greatly varying energy and
curvature. Nevertheless, the forcefield expression must be as accu-
rate and complete as possible, to avoid spurious or misleading
results. The newer, second-generation forcefields meet this
requirement through their greater complexity than the classical
forcefields, having expanded analytic energy expressions that
include additional terms.
Parameterization The complexity of second-generation forcefields requires the use
of a large number of forcefield parameters. There are almost
always far more parameters than can be inferred from experiment,
such as by microwave or infrared spectroscopy. However, modern

26 Forcefield-Based Simulations/October 1997


Second-generation forcefields accurate for many properties

quantum mechanical methods can generate enough quantum


observables so that all the necessary parameters can be accurately
determined by fitting the energy expression to these observables.
Quantum calculations of the energy surfaces of a series of model
compounds (equilibrium structures, models at conformational
energy barriers, and distorted structures) yield energies as well as
their derivatives with respect to atomic coordinates (i.e., the sur-
face gradients and curvatures) (Maple et al. 1994a, b). Many atomic
partial charges are also determined quantum mechanically.
Intermolecular or nonbond parameters are computed by fitting to
experimental crystal lattice constants and sublimation energies of
crystals (Hagler et al. 1979a, b).
Since quantum mechanics (Hartree–Fock approximation with the
6-31G* basis set) yields results that differ consistently from exper-
iment, the parameterized forcefield is then fit to experimental data
by parameterizing a small number of scaling factors (Hwang et al.
1994).
The CFF family of forcefields (within Discover) can use automatic
parameters (Automatic assignment of values for missing parameters)
when no explicit parameters are present. These are noted in the
output file from the calculation.
Advantages of deriving The use of quantum calculations in the development of second-
forcefields from quantum generation forcefields has the advantages that:
calculations
♦ Sufficient data are available for accurately determining all the
forcefield parameters.
♦ The resulting forcefield may be broad in terms of the types of
molecules and molecular environments that may be modeled,
since no recourse to experiment is required, even for unusual or
transient species. Properties can be modeled for:
Isolated small molecules (structure, thermodynamics, spectros-
copy).
Condensed phases (crystal structure, sublimation energies,
heats of vaporization).
Macromolecular systems.
♦ The resulting forcefield is consistent, since all parameters, func-
tional groups, and molecular species are modeled in the same

Forcefield-Based Simulations/October 1997 27


2. Forcefields

way. This is in contrast to forcefields whose parameters are


derived empirically, since the experimental data for different
molecules necessarily come from greatly differing sources and
types of measurements, and are sometimes of questionable
accuracy.
♦ Fewer atom types are necessary.

CFF91, PCFF, CFF, COMPASS—consistent forcefields

Functional form
All the CFF forcefields (CFF91, CFF, PCFF, COMPASS) have the
same functional form, differing mainly in the range of functional
groups to which they were parameterized (and therefore, having
slightly different parameter values). These differences can be
examined by using the forcefield editing capabilities of Cerius2
and Insight or in the forcefield files. Atom equivalences for assign-
ment of parameters to atom types may also differ, as may some
combination rules for nonbond terms (see Preparing the Energy
Expression and the Model for explanation of these processes, which
occur during forcefield setup).
The analytic expressions used to represent the energy surface are
shown in Eq. 9. Both anharmonic diagonal terms and many cross-
terms are necessary for a good fit to a variety of structures and rel-
ative energies, as well as to vibrational frequencies.
The CFF forcefields employ quartic polynomials for bond stretch-
ing (Term 1) and angle bending (Term 2) and a three-term Fourier
expansion for torsions (Term 3). The out-of-plane (also called
inversion) coordinate (Term 4) is defined according to Wilson et al.
(1980). All the crossterms up through third order that have been
found to be important (Terms 5–11) are also included—this gives
a forcefield equivalent to the best used in a formate anion test case
(Maple et al. 1990). Term 12 is the Coulombic interaction between
the atomic charges, and Term 13 represents the van der Waals
interactions, using an inverse 9th-power term for the repulsive part
rather than the more customary 12th-power term.
No explicit special atom types are used for carbons in strained
three- and four-membered rings. The quartic angle potential, com-
bined with crossterms, enables accurate description of normal

28 Forcefield-Based Simulations/October 1997


Second-generation forcefields accurate for many properties

alkanes, cyclobutane, and cyclopropane with one set of parame-


ters.

Note
Because the Wilson out-of-plane definition is used in the CFF
family of forcefields, results calculated with CDiscover,
FDiscover, and Cerius2•OFF should agree exactly.

Eq. 9

Forcefield-Based Simulations/October 1997 29


2. Forcefields

∑ [K (b – b ) + K (b – b ) + K (b – b ) ]
2 0
2
3 0
3
4 0
4

b (1)

2(θ – θ0 ) 2 + H3 ( θ – θ0 ) 3 + H 4 ( θ – θ0 ) 4

(2)

∑ [V [1 – cos ( φ – φ ) ] + V [1 – cos (2φ – φ )] + V [1 – cos (3φ – φ


0 0 0
1 1 2 2 3 3

(3)

χχ
2 +
∑∑F bb′ ( b – b 0 ) ( b′ – b′0 ) +
∑∑F θθ′ ( θ – θ 0 ) ( θ′ – θ

b b′ (5) θ θ′ (6)

∑F bθ ( b – b0 ) ( θ – θ0 ) +
∑ ∑ (b – b )[ V cos φ + V cos 2φ + V
0 1 2

θ (7) b φ (8)

∑ ( b′ – b′ )[ V cos φ + V cos 2φ + V cos 3φ]


0 1 2 3

φ (9)

∑ ( θ – θ )[ V cos φ + V cos 2φ + V cos 3φ]


0 1 2 3

φ (10)

∑∑ ∑ ∑
q i qj A ij B ij
K φθθ′ cos φ ( θ – θ 0 ) ( θ′ – θ′0 ) + --------- + ------ – ------
εr ij r ij9 r ij6
θ θ′ i>j i>j
(11) (12) (13)

30 Forcefield-Based Simulations/October 1997


Second-generation forcefields accurate for many properties

CFF91 forcefield
Applicability CFF91 is useful for hydrocarbons, proteins, protein–ligand interac-
tions. For small models it can be used to predict: gas-phase geom-
etries, vibrational frequencies, conformational energies, torsion
barriers, crystal structures; for liquids: cohesive energy densities;
for crystals: lattice parameters, rms atomic coordinates, sublima-
tion energies; for macromolecules: protein crystal structures.
It has been parameterized explicitly (based on quantum mechan-
ics calculations and molecular simulations, see Parameterization)
for acetals, acids, alcohols, alkanes, alkenes, amides, amines, aro-
matics, esters, and ethers (Maple 1994a, Hwang 1994).
The functional form of CFF91 is exactly as shown in Eq. 9.
Atom types CFF91 has parameters for functional groups that consist of H, Na,
Ca, C, Si, N, P, O, S, F, Cl, Br, I, and/or Ar. The atom types of the
CFF91 forcefield are listed in Table 27.
Partial charges The bond increment section of the .frc file for CFF91 enables partial
charges to be determined whenever the Discover program or the
Cerius2•OFF module is able to assign automatic atom types.

CFF forcefield
Applicability CFF (formerly CFF95) was parameterized for additional func-
tional groups beyond CFF91 (Maple et al. 1994a, b, Hwang et al.
1994, Hagler & Ewig 1994). It is recommended for all life sciences
applications and for organic polymers such as polycarbonates and
polysaccharides.
Almost all types of computations within Insight or Cerius2 life sci-
ence modules may be performed using CFF. These include inter-
molecular and intramolecular energies and forces, optimization of
model structures, and molecular dynamics simulations. CFF is not
currently implemented for relative free-energy perturbations or
for applications in the Docking module of Insight.
Atom types The atom types of the CFF forcefield are listed in the separate doc-
umentation for CFF (below).
Additional information Additional information on CFF, which is sold as a separately
licensed product, is contained in the MSI Forcefields:CFF book
(published separately by MSI).

Forcefield-Based Simulations/October 1997 31


2. Forcefields

PCFF forcefield for polymers and other materials


Applicability PCFF was developed based on CFF91 and is intended for applica-
tion to polymers and organic materials. It is useful for polycarbon-
ates, melamine resins, polysaccharides, other polymers, organic
and inorganic materials, about 20 inorganic metals, as well as for
carbohydrates, lipids, and nucleic acids and also cohesive ener-
gies, mechanical properties, compressibilities, heat capacities,
elastic constants. It handles electron delocalization in aromatic
rings by means of a charge library rather than bond increments.
Validation Parameterization, testing, and validation of PCFF included the
compounds listed for CFF91 and these functional groups: carbon-
ates, carbamates, phosphazene, urethanes, siloxanes, silanes,
ureas (Sun et al. 1994, Sun 1994, 1995), and zeolites (Hill and Sauer
1994). Metal parameters (listed below) were derived by fitting to
crystal structures and elastic constants.
Atom types PCFF has parameters for functional groups that consist of those
listed for CFF91 and also He, Ne, Kr, Xe. In addition, it includes
Lennard–Jones parameters for the metals Li, K, Cr, Mo, W, Fe, Ni,
Pd, Pt, Cu, Ag, Au, Al, Sn, Pb. Atom type coverage in PCFF
includes those listed for CFF91 (Table 27) and the atoms listed
here.

COMPASS forcefield for organic and inorganic materials


A high quality general COMPASS (Condensed-phase Optimized Molecular Potentials for
forcefield Atomistic Simulation Studies) represents a technology break-
through in forcefield method. It is the first ab initio forcefield that
enables accurate and simultaneous prediction of gas-phase prop-
erties (structural, conformational, vibrational, etc.) and con-
densed-phase properties (equation of state, cohesive energies, etc.)
for a broad range of molecules and polymers. It is also the first
high quality forcefield to consolidate parameters of organic and
inorganic materials.
Parameterization COMPASS is an ab initio forcefield — most parameters were
derived based on ab initio data. Generally speaking, the parame-
terization procedure can be divided into two phases: ab initio
parameterization and empirical optimization. In the first phase,
partial charges and valence parameters were derived by fitting to
ab initio potential energy surfaces. At this point, the van der Waals
parameters were fixed to a set of initial approximated parameters.

32 Forcefield-Based Simulations/October 1997


Second-generation forcefields accurate for many properties

In the second phase, emphasis is on optimizing the forcefield to


yield good agreement with experimental data. A few critical
valence parameters were adjusted based on the gas phase experi-
mental data. More importantly, the van der Waals parameters
were optimized to fit the condensed-phase properties. For cova-
lent molecular systems, this refinement was done based on molec-
ular dynamics simulations of liquids; for inorganic systems, this is
based on energy minimization on crystals.
Validation The parameters for covalent molecules have been thoroughly val-
idated using various calculation methods including extensive MD
simulations of liquids, crystals, and polymers. (Sun 1998, Sun et
al., 1998, Rigby et al. 1998). For the inorganic materials, validations
of COMPASS were performed based on energy minimization
method.
Applicability The COMPASS forcefield has broad coverage in covalent mole-
cules including most common organics, small inorganic mole-
cules, and polymers. For these molecular systems, the COMPASS
forcefield has been parameterized to predict various properties for
molecules in isolation and in condensed phases. The properties
include molecular structures, vibrational frequencies, conforma-
tion energies, dipole moments, liquid structures, crystal struc-
tures, equations of state, and cohesive energy densities. The latest
development in COMPASS extended the coverage to include inor-
ganic materials - metals, metal oxides, and metal halides using
various non-covalent models. Currently, some of these materials
have been parameterized. COMPASS is able to predict various
solid-state properties: unit cell structures, lattice energies, elastic
constants, and vibrational frequencies. The combination of param-
eters for organics and for inorganics opens up the possibility of
future study of interfacial and mixed systems.
License The COMPASS forcefield is licensed and related files are
encrypted. You must have a license for this forcefield in order to
use it. The parameters of encrypted forcefields may be viewed
with the forcefield editor, but it is not possible to save changes
made to the forcefield.
More information For more information about the COMPASS forcefield, please see
the COMPASS user guide.

Forcefield-Based Simulations/October 1997 33


2. Forcefields

MMFF93, the Merck molecular forcefield


The Merck molecular forcefield is derived largely from ab initio
calculations and can be accurately applied to a variety of con-
densed-phase and aqueous systems. It uses a unique functional
form for describing the van der Waals interactions (Halgren 1992)
and employs novel combination rules that systematically correlate
van der Waals parameters with those that describe experimentally
characterized interactions involving rare-gas atoms. Electrostatic
interactions are scaled to mimic solution effects.
Applicability Conformational energies, geometries, and vibrational frequencies
of small organic molecules.
Functional form The MMFF93 energy expression is similar to that of MM2 and
MM3:

∑E + ∑E + ∑E
E MMFF = b ij a ijk ba ijk Eq. 10

+
∑E + ∑E + ∑E + ∑E
oop t vdW q

Where:
E Quartic bond stretching term.
bij
E Cubic angle bending term (cosine when the refer-
aijk ence angle is ≅180°).
E Stretch–bend crossterm.
baijk
E Term for out-of-plane motion at tri-coordinate cen-
oopijk;l ters, using the Wilson definition of the out-of-
plane angle.
E Torsion twisting term.
tijkl
E Buffered 14–7 van der Waals interaction term.
vdWij
E Buffered Coulombic term for electrostatic interac-
qij tions. Use of a distance-dependent dielectric
“constant” is supported.

To allow straightforward application to condensed-phase simula-


tions employing implicit solvent molecules, MMFF93 includes a
dielectric constant in its electrostatic interaction terms.

34 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

Charges are implemented via bond increments (similar to CVFF or


the CFF family of forcefields) that are included as part of the force-
field.
Missing parameters are supplied via a generic step-down and
equivalency typing scheme (see Preparing the Energy Expression and
the Model).
The terms of the energy expression are calculated in kcal mol-1.
They are described in detail by Halgren (1992, 1996a–d, Halgren &
Nachbar, 1996).

Rule-based forcefields broadly applicable to the


periodic table
Availability Rule-based forcefields provided by MSI with generally broad
applicability across the periodic table include ESFF, the Universial
forcefield, and the Dreiding forcefield:
♦ The ESFF forcefield (ESFF, extensible systematic forcefield) is run
via the CDiscover program, which is available in the
Insight•Discover_3 module.
♦ UFF-VALBOND (VALBOND) is available via the Cerius2 Open
Force Field module.
♦ The Universal (UFF, universal forcefield) and Dreiding (Dreiding
forcefield) forcefields are accessible through the Cerius2•Open
Force Field module.
Parameterization Although the second-generation (Second-generation forcefields accu-
rate for many properties) and classical (Classical forcefields) forcefields
derive the forcefield parameters by fitting ab initio and/or exper-
imental data sets, these rule-based forcefields rely on atomic param-
eters coupled with theoretically and empirically derived rules for
generating explicit forcefield parameters. The rules embody phys-
ical reality (electronegativity, hardness, atomic radii for UFF and
ESFF, simple hybridization for Dreiding) and therefore tend to
break redundancies and guarantee transferability. As much as pos-
sible, the atomic parameters are directly determined from experi-
ment or calculated rather than fit.

Forcefield-Based Simulations/October 1997 35


2. Forcefields

Characteristics ESFF can be used for structure prediction of organic, inorganic,


and organometallic systems in gas or condensed phases. It covers
all elements in the periodic table up to Rn. Its scope does not
extend to highly accurate vibrational frequencies or conforma-
tional energies (Shi et al., no date).
UFF covers all elements in the periodic table and is the default
forcefield in Cerius2. It is recommended for any system that is not
covered by the more accurate special-purpose forcefields. It gives
better structures than the Dreiding forcefield but may not be as
accurate for properties that depend on intermolecular interactions.
In the VALBOND formalism, hybrid orbital strength functions are
used as the basis for a molecular expression of molecular shapes.
These functions are suitable for accurately describing the energet-
ics of distorted bond angles not only around the energy minimum,
but also for very large distortions.
The Dreiding forcefield predicts bulk material properties that
depend on intermolecular interactions better than does UFF, but it
is not as widely applicable to the periodic table. It is not as accurate
as the special-purpose forcefields for the materials for which they
are applicable.

ESFF, extensible systematic forcefield

Derivation
ESFF was derived using a mixture of DFT calculations on dressed
atoms to obtain polarizabilities, gas-phase and crystal structures,
etc. The training set included primarily organic and organometal-
lic compounds and a few inorganic compounds. The focus was on
crystal structures and sublimation energies. The training set
included models containing each element in the first 6 periods up
to lead (Z = 82) (except for the inert gases), Sr, Y, Tc, La, and the lan-
thinides (except for Yb).
Parameters and charges are generated on-the-fly, based on the
model configuration, the local environment, and the derived rules.

36 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

Functional form
Valence energy The analytic energy expressions for the ESFF forcefield are pro-
vided in Eq. 11. Only diagonal terms are included.
Bond energy The bond energy is represented by a Morse functional form, where
the bond dissociation energy D, the reference bond length r0, and
the anharmonicity parameters are needed. In constructing these
parameters from atomic parameters, the forcefield utilizes not
only the atom types and bond orders, but also considers whether
the bond is endo or exo to 3-, 4-, or 5-membered rings.
The rules themselves depend on the electronegativity, hardness,
and ionization of the atoms as well as atomic anharmonicities and
the covalent radii and well depths. The latter quantities are fit
parameters, and the former three are calculated.

Forcefield-Based Simulations/October 1997 37


2. Forcefields


0 2
( –α ( rb – rb ) )
E pot = Db 1 – e

 b K (1)

 ---------------
a 0 2
- ( cos θ a – cos θ a )
 0
 sin θ a
2
(normal)
a



 2K a ( cos θ a + 1 ) (linear)

a
+


θ (perpendicular)
 K a a cos2 θ a

a

 (equatorial)

2K a ( – β ( r 13 – ρ a ) )
 --------- ( 1 – cos ( nθ a ) ) + 2K a
 n2 (2)

 a
 sin2 θ 1 sin2 θ 2 sinn θ 1 sin2 θ 2


+ - cos [ nτ ]
Dτ  -------------------------------- + sign -------------------------------
 sin2 θ 1 sin2 θ 2
0 0 0
sinn θ 1 sinn θ 2
0

τ
(3)

∑ D χ + ∑  --------------------------- r  ∑r
 Ai Bj + Aj Bi B i B j q i qj
+ o
2 - – 3 ---------- + ---------
6
r 9
nb nb nb
o (4) nb (5) nb (6)
Eq. 11

Angle types The ESFF angle types are classified according to ring, symmetry,
and π-bonding information into five groups:
♦ The normal class includes unconstrained angles as well as those
associated with 3-, 4-, and 5-membered rings. The ring angles
are further classified based on whether one (exo) or both bonds
(endo) are in the ring. Additionally, angles with only central
atoms in a ring are also differentiated.
♦ The linear class includes angles with central atoms having sp
hybridization, as well as angles between two axial ligands in a
metal complex.

38 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

♦ The perpendicular class is restricted to metal centers and


includes angles between axial and equatorial ligands around a
metal center.
♦ The equatorial class includes angles between equatorial ligands
of square planar (sqp), trigonal bipyramidal (tbp), octahedral
(oct), pentagonal bipyramidal (pbp), and hexagonal bipyrami-
dal (hbp) systems.
♦ The π system class includes angles between pseudoatoms. This
class is further differentiated in terms of normal, linear, perpen-
dicular, and equatorial types.
The rules that determine the parameters in the functional forms
depend on the ionization potential and, for equatorial angles, the
periodicity. In addition to these calculated quantities, the parame-
ters are functions of the atomic radii and well depths of the central
and end atoms of the angle, and, for planar angles, two overlap
quantities and the 1–3 equilibrium distances.
Torsions To avoid the discontinuities that occur in the commonly used
cosine torsional potential when one of the valence angles
approaches 180°, ESFF uses a functional form that includes the
sine of the valence angles in the torsion. These terms ensure that
the function goes smoothly to zero as either valence angle
approaches 0° or 180°, as it should. The rules associated with this
expression depend on the central bond order, ring size of the
angles, hybridization of the atoms, and two atomic parameters for
the central atom which is fit.
Out-of-plane centers The functional form of the out-of-plane energy is the same as in
CFF91, where the coordinate (φ) is an average of the three possible
angles associated with the out-of-plane center. The single parame-
ter that is associated with the central atom is a fit quantity.

Nonbond energy
Partial charges The charges are determined by minimizing the electrostatic energy
with respect to the charges under the constraint that the sum of the
charges is equal to the net charge on the molecule. This is equiva-
lent to equalization of electronegativities.
Derivation The derivation of the rule begins with the following equation for
the electrostatic energy:

Forcefield-Based Simulations/October 1997 39


2. Forcefields

∑ ∑
qi q j
 E0 + χ q + 1
- η q 2 + Eq. 12
E =
 i i i --
2 i i
B ---------
R ij
i i>j

where χ is the electronegativity and η the hardness. The first term


is just a Taylor series expansion of the energy of each atom as a
function of charge, and the second is the Coulomb interaction law
between charges. The Coulomb law term introduces a geometry
dependence that ESFF for the time being ignores, by considering
only topological neighbors at effectively idealized geometries.
Atomic charges Minimizing the energy with respect to the charges leads to the fol-
lowing expression for the charge on atom i:

λ – χ i – ∆χ i
q i = ---------------------------- Eq. 13
ηi

where λ is the Lagrange multiplier for the constraint on the total


charge, which physically is the equalized electronegativity of all
the atoms. The ∆χ term contains the geometry-independent rem-
nant of the full Coulomb summation.
Adjustment to chemical Eqns. 12 and 13 give a totally delocalized picture of the charges in
reality a relatively severe approximation. To obtain reasonable charges as
judged by, for example, crystal packing calculations, some modifi-
cations to the above picture have been made. Metals and their
immediate ligands are treated with the above prescription, sum-
ming their formal charges to get a net fragment charge. Delocal-
ized π systems are treated in an analogous fashion. And σ systems
are treated using a localized approach in which the charges of an
atom depend simply on its neighbors. Note that this approach,
unlike the straightforward implementations based on the equal-
ization of electronegativity, does include some resonance effects in
the π system.
Electronegativity and The electronegativity and hardness in the above equations must be
hardness obtained by DFT determined. In earlier forcefields they were often determined from
experimental ionization potentials and electron affinities; how-
ever, these spectroscopic states do not correspond to the valence
states involved in molecules. For this reason, ESFF is based on
electronegativities and hardnesses, calculated using density func-
tional theory as implemented in DMol. The orbitals are (fraction-

40 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

ally) occupied in ratios appropriate for the desired hybridization


state, and calculations are performed on the neutral atom as well
as on positive and negative ions.
van der Waals interactions ESFF uses the 6–9 potential for the van der Waals interactions.
Since the van der Waals parameters must be consistent with the
charges, they are derived using rules that are consistent with the
charges.
Derivation Starting with the London formula:

( B i ∼ α i2 ⋅ IP ) Eq. 14

where α is the polarizability and IP the ionization potential of the


atoms, the polarizability, in a simple harmonic approximation, is
proportional to n / IP where n is the number of electrons. Across
any one row of the periodic table, the core electrons remain
unchanged, so that the following form is reasonable:

a′ b′n eff
α = ------ + ------------- Eq. 15
IP IP

where a′ and b′ are adjustable parameters that should depend on


just the period, and neff is the effective number of (valence) elec-
trons. Further assuming that α is proportional to R3 and that
another equivalent expression to that in Eq. 14 is:

B i ∼ εR 6 Eq. 16

where ε is a well depth, the following forms are deduced for the
rules for van der Waals parameters:
Rules for van der Waals b ⋅ n eff 1/3
a
parameters R i = ------------------ + ------------------ and ε i = c ( IP ) Eq. 17
( IP ) 1 / 3 ( IP ) 1 / 3

The van der Waals parameters are affected by the charge of the
atom.
Modification for metal In ESFF we found it sufficient to modify the ionization potential
atoms (IP) of metal atoms according to their formal charge and hardness:

IP = ( IP ) 0 + qη i Eq. 18

Forcefield-Based Simulations/October 1997 41


2. Forcefields

Treatment of nonmetals and for nonmetals to account for the partial charges when calculat-
ing the effective number of electrons.

ESFF atom types


ESFF atom types (Table 32) are determined by hybridization, for-
mal charge, and symmetry rules (Atom-typing rules in ESFF). In
addition, the rules may involve bond order, ring size, and whether
bonds are endo or exo to rings. For metal ligands the cis–trans and
axial–equatorial positionings are also considered. The addition of
these latter types affects only certain parameters (for example,
bond order influences only bond parameters) and thus are not as
powerful as complete atom types. In one sense they provide a fur-
ther refinement of typing beyond atom types.
Coverage of the periodic The ESFF forcefield has been parameterized to handle all elements
table in the periodic table up to radon. It is recommended for organome-
tallic systems and other systems for which other forcefields do not
have parameters. ESFF is designed primarily for predicting rea-
sonable structures (both intra- and intermolecular structures and
crystals) and should give reasonable structures for organic, biolog-
ical, organometallic and some ceramic and silicate models. It has
been used with some success for studying interactions of mole-
cules with metal surfaces. Predicted intermolecular binding ener-
gies should be considered approximate.

UFF, universal forcefield


Cerius2 contains a full implementation of the Universal forcefield,
including bond order assignment. The Cerius2 implementation
has been rigorously tested and results are in agreement with pub-
lished work on this forcefield (Rappé et al. 1992, Casewit et al.
1992a, b, Rappé et al. 1993).
Parameter generation is based on physically realistic rules.
Functional form UFF is a purely diagonal, harmonic forcefield. Bond stretching is
described by a harmonic term, angle bending by a three-term Fou-
rier cosine expansion, and torsions and inversions by cosine–Fou-
rier expansion terms. The van der Waals interactions are described
by the Lennard–Jones potential. Electrostatic interactions are
described by atomic monopoles and a screened (distance-depen-
dent) Coulombic term.

42 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

Atom types The Universal forcefield’s atom types are denoted by an element
name of one or two characters followed by up to three other char-
acters:
♦ The first two characters are the element symbol (for example,
N_ for nitrogen or Ti for titanium).
♦ The third character (if present) represents the hybridization
state or geometry (for example, 1 = linear, 2 = trigonal, R = an
atom involved in resonance, 3 = tetrahedral, 4 = square planar,
5 = trigonal bipyramidal, 6 = octahedral).
♦ The fourth and fifth characters (if present) indicate characteris-
tics such as the oxidation state (for example, Rh6+3 represents
octahedral rhodium in the +3 formal oxidation state; H_ _ _b
indicates a diborane bridging hydrogen type; and O_3_z is a
framework oxygen type suitable for zeolites).
Coverage of the periodic UFF has full coverage of the periodic table. UFF is moderately
table accurate for predicting geometries and conformational energy dif-
ferences of organic molecules, main-group inorganics, and metal
complexes. It is recommended for organometallic systems and
other systems for which other forcefields do not have parameters.
Parameterization The Universal forcefield includes a parameter generator that calcu-
lates forcefield parameters by combining atomic parameters.
Thus, forcefield parameters for any combination of atom types can
be generated as required.
The atomic parameters are combined using a prescribed set of
equations (rules) that generate forcefield parameters for bond,
angle, torsion, inversion (i.e., out-of-plane), and van der Waals and
Coulombic energy terms. For further details, including the gener-
ator equations, see Rappé et al. (1992).
Dummy atoms are used in π-complexation and are associated with
explicit parameters.

Important
To obtain correct results when using UFF, calculate fractional
bond orders after atom typing the structure and before setting
up the energy expression. Cerius2 does this correctly by default,
and you need not worry about it unless you change the default
behavior.

Forcefield-Based Simulations/October 1997 43


2. Forcefields

Charges in the Universal The Universal forcefield was developed in conjunction with the
forcefield charge equilibration (Rappé & Goddard 1991) method. Therefore
this method of electrostatic charge calculation is highly recom-
mended for use with the Universal forcefield. For more on the
charge equilibration calculation, see the documentation supplied
with Cerius2•OFF).
Versions UNIVERSAL1.02 is the most up-to-date, and recommended, ver-
sion of the UFF. It includes full bond-order correction. (UFF 1.02
differs from UFF 1.01 in that some explicit torsion parameters were
corrected and one of the oxygen atom-typing rules was modified.)
UFF-VALBOND is UFF with a different function to calculate the
angle energy, so most things which are true for UFF, are true for
UFF-VALBOND.
The burchart1.01–UNIVERSAL1.02 forcefield combines UFF with
the Burchart forcefield. See burchart1.01-UNIVERSAL1.02 for more
information.

VALBOND

Introduction
Most molecular mechanics methods attempt to describe accurate
potential energy surfaces by using a variant of the general valence
forcefield, and a large number of parameters. These simple force-
fields are not accurate outside the proximity of the energetic min-
ima and often are difficult to apply to the different shapes and
higher coordination numbers of transition metal complexes.
In the VALBOND formalism, hybrid orbital strength functions are
used as the basis for a molecular expression of molecular shapes.
These functions are suitable for accurately describing the energet-
ics of distorted bond angles not only around the energy minimum,
but also for very large distortions.
The combination of these functions with simple valence bond
ideas leads to a simple scheme for predicting molecular shapes.
Structures and vibrational frequencies calculated by the VAL-
BOND method agree well with experimental data for a variety of
molecules from the main group of the periodic table.

44 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

UFF-VALBOND is a combination of the original VALBOND


method described by Root et al. (1993), augmented with non-
orthogonal strength functions taken from Root (1997) and the Uni-
versal Forcefield of Rappé et al. (1992).

Validation
Although the original VALBOND was developed for use with the
CHARMM forcefield (Brooks et al., 1983), the table below shows
that the quality of the new UFF-VALBOND forcefield is compara-
ble to the original, and similar to popular forcefields.

Table 4

Ref- Cal-
Molecule Item Calc Ref Diff Exp Exp Exp
Ethane H-C-H 108.7 108.3 0.4 107.5 0.8 1.2
Ethane C-C-H 110.2 110.6 -0.4 111.2 -0.6 -1.0
Propane C-C-C 112.9 112.0 0.9 112.0 0.0 0.9
Propane H-C-H 107.6 107.2 0.4 107.8 -0.6 -0.2
Butane C-C-C 112.9 112.0 0.9 113.3 -1.3 -0.4
Isobutane C-C-C 111.3 110.7 0.6 110.8 -0.1 0.5
Cyclopentane C2-C1-C5 104.4 103.7 0.7 103.0 0.7 1.4
Cyclopentane C1-C2-C3 106.0 104.9 1.1 104.2 0.7 1.8
Cyclopentane C2-C3-C4 106.8 106.4 0.4 105.9 0.5 0.9
Cyclohexane C-C-C 111.9 110.7 1.2 111.4 -0.7 0.5
Cyclohexane H-C-H 107.4 107.7 -0.3 107.5 0.2 -0.1
Methyl-Cyclo- C-C-C(exo) 111.5 110.9 0.6 112.1 -1.2 -0.6
hexane
Norbornane C1-C2-C3 102.5 102.4 0.1 102.7 -0.3 -0.2
Norbornane C2-C1-C6 110.6 109.0 1.6 109.0 0.0 1.6
Norbornane C1-C6-C4 92.7 91.8 0.9 93.4 -1.6 -0.7
Ethylene C=C-H 120.6 120.9 -0.3 121.4 -0.5 -0.8
Propene C=C-C 122.9 122.8 0.1 124.3 -1.5 -1.4
Propene C=C-H 118.6 118.7 -0.1 121.3 -2.6 -2.7

Calc: calculated with UFF-VALBOND


Ref: taken fromRoot et al. (1993)
Exp: experimental values

Forcefield-Based Simulations/October 1997 45


2. Forcefields

Table 4

Ref- Cal-
Molecule Item Calc Ref Diff Exp Exp Exp
Propene =C-C-H 110.3 110.4 -0.1 116.7 -6.3 -6.4
Cis-2-butene C-C=C 125.3 124.9 0.4 125.4 -0.5 -0.1
Cyclopentene C3-C2=C1 112.0 111.9 0.1 111.0 0.9 1.0
Cyclopentene C2-C3-C4 104.9 104.9 0.0 103.0 1.9 1.9
Cyclopentene C3-C4-C5 106.2 106.3 -0.1 104.0 2.3 2.2
Cyclohexene C-C=C 123.0 123.0 0.0 124.0 -1.0 -1.0
Cyclohexadi- C-C=C 122.8 123.2 -0.4 122.7 0.5 0.1
ene
Cyclohexadi- C-C-C 114.4 113.6 0.8 113.3 0.3 1.1
ene
Norbornene C1-C7-C4 91.8 93.4 -1.6 95.3 -1.9 -3.5
Norbornene C2-C2=C3 106.2 106.7 -0.5 107.7 -1.0 -1.5
Methanol O-C-H(trans) 111.6 112.8 -1.2 107.2 5.6 4.4
Methanol H-C-H 106.3 105.4 0.9 108.5 -3.1 -2.2
1,4-Dioxane C-C-O 112.8 112.5 0.3 109.2 3.3 3.6
1,4-Dioxane C-O-C 114.6 111.7 2.9 112.6 -0.9 2.0
Formaldehyde O-C-H 122.3 122.5 -0.2 121.8 0.7 0.5
Formaldehyde H-C-H 115.4 114.9 0.5 116.5 -1.6 -0.8
Acetaldehyde C-C-H 115.5 115.4 0.1 113.9 1.5 1.6
Acetone C-C-C 117.1 116.0 1.1 116.0 0.0 1.1
Acetone C-C=O 121.4 122.0 -0.6 122.0 0.0 -0.6
Methyl-For- O-C=O 123.8 125.9 -2.1 125.9 0.0 -2.1
mate
Methyl-For- C-O-C 115.0 111.9 3.1 114.8 -2.9 0.2
mate
Acetic Acid O-C=O 116.8 117.6 -0.8 126.6 -9.0 -9.8
Acetic Acid O-C-O 117.3 117.5 -0.2 110.6 6.9 6.7
Acetic Acid O-C=O 126.0 124.6 1.4 123.0 1.6 3.0
Methyl Ace- C-O-C 116.1 115.1 1.0 114.8 0.3 1.3
tate
Piperazine C-C-N 110.9 110.8 0.1 109.8 1.0 1.1
Piperazine C-N-C 115.6 113.8 1.8 112.6 1.2 3.0
Nitromethane [N-C-H] 110.2 109.8 0.4 107.2 2.6 3.0

Calc: calculated with UFF-VALBOND


Ref: taken fromRoot et al. (1993)
Exp: experimental values

46 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

Table 4

Ref- Cal-
Molecule Item Calc Ref Diff Exp Exp Exp
Succinamide N-C=O 119.6 121.4 -1.8 122.0 -0.6 -2.4
Succinamide C-C-N 116.9 114.0 2.9 116.0 -2.0 0.9
Succinamide C-C=O 123.6 124.5 -0.9 122.0 2.5 1.6
Acetamide C-C-N 117.6 117.5 0.1 115.1 2.4 2.5
BHF2 F-B-F 118.4 118.1 0.3 118.3 -0.2 0.1
BHCl2 Cl-B-Cl 121.1 119.6 1.5 119.7 -0.1 1.4
BF2NH2 F-B-F 119.6 119.9 -0.3 117.9 2.0 1.7
BF2NH2 H-N-H 114.4 115.0 -0.6 116.9 -1.9 -2.5
NH3 H-N-H 106.8 106.8 0.0 106.7 0.1 0.1
NCl3 Cl-N-Cl 106.5 106.5 0.0 107.1 -0.6 -0.6
NHCl2 H-N-Cl 106.8 106.7 0.1 102.0 4.7 4.8
NHCl2 Cl-N-Cl 106.1 106.1 0.0 106.0 0.1 0.1
NH2Cl H-N-H 107.1 107.1 0.0 106.8 0.3 0.3
NH2Cl H-N-Cl 106.4 106.3 0.1 102.0 4.3 4.4
N3- N-N-N 179.9 180.0 -0.1 180.0 0.0 -0.1
NClO Cl-N-O 114.6 114.2 0.4 113.3 0.9 1.3
PH3 H-P-H 93.8 93.8 0.0 93.3 0.5 0.5
PCl3 Cl-P-Cl 100.1 100.1 0.0 100.1 0.0 0.0
CH3PH2 C-P-H 93.1 97.1 -4.0 96.5 0.6 -3.4
CH3PH2 H-P-H 91.3 91.3 0.0 93.4 -2.1 -2.1
(CH3)2PH C-P-C 99.7 98.5 1.2 99.7 -1.2 0.0
(CH3)2PH C-P-H 91.2 95.6 -4.4 97.0 -1.4 -5.8
(CH3)3P C-P-C 97.0 95.6 1.4 98.6 -3.0 -1.6
AsH3 H-As-H 91.7 91.7 0.0 92.1 -0.4 -0.4
AsF3 F-As-F 96.0 96.0 0.0 96.0 0.0 0.0
AsCl3 Cl-As-Cl 98.7 98.7 0.0 98.6 0.1 0.1
AsBr3 Br-As-Br 99.6 99.6 0.0 99.7 -0.1 -0.1
AsI3 I-As-I 100.2 100.2 0.0 100.2 0.0 0.0
O3 O-O-O 116.8 116.8 0.0 116.8 0.0 0.0
(CH3)2O C-O-C 114.7 111.6 3.1 111.7 -0.1 3.0
(CH3)2S C-S-C 100.1 99.3 0.8 98.9 0.4 1.2
(CH3)2Se C-Se-C 96.9 96.2 0.7 96.0 0.2 0.9

Calc: calculated with UFF-VALBOND


Ref: taken fromRoot et al. (1993)
Exp: experimental values

Forcefield-Based Simulations/October 1997 47


2. Forcefields

Table 4

Ref- Cal-
Molecule Item Calc Ref Diff Exp Exp Exp
(SiH3)2O Si-O-Si 142.6 142.7 -0.1 144.1 -1.4 -1.5
(SiH3)2S Si-S-Si 99.4 99.6 -0.2 97.4 2.2 2.0
(SiH3)2Se Si-Se-Si 97.9 98.0 -0.1 96.6 1.4 1.3
(GeH3)2O Ge-O-Ge 126.1 126.2 -0.1 126.5 -0.3 -0.4
(GeH3)2Se Ge-Se-Ge 93.9 94.1 -0.2 94.6 -0.5 -0.7
Si4O4C8H24 Si-O-Si 143.4 143.4 0.0 142.5 0.9 0.9
Si4O4C8H24 O-Si-O 112.1 112.4 -0.3 109.0 3.4 3.1
Si4O4C8H24 C-Si-C 107.4 107.3 0.1 106.0 1.3 1.4
Ge4S6(CF3)4 [S-Ge-S] 114.6 113.8 0.8 113.8 0.0 0.8
Ge4S6(CF3)4 [Ge-S-Ge] 97.9 99.9 -2.0 99.9 0.0 -2.0
Sn6Ph12 [Sn-Sn-Sn] 112.8 112.4 0.4 112.5 -0.1 0.3
Sn6Ph12 C-Sn-C 107.0 105.5 1.5 106.7 -1.2 0.3
B3Ph3O3 [B-O-B] 121.8 121.8 0.0 121.7 0.1 0.1
B3Ph3O3 [O-B-O] 118.2 118.2 0.0 118.0 0.2 0.2
PO4P3O3 O1-P1-O2 114.1 114.1 0.0 115.0 -0.9 -0.9
PO4P3O3 O2-P1-O4 104.5 104.5 0.0 103.0 1.5 1.5
PO4P3O3 O2-P2-O3 99.4 99.3 0.1 99.0 0.3 0.4
PO4P3O3 P2-O2-P1 121.7 122.9 -1.2 124.0 -1.1 -2.3
PO4P3O3 P2-O3-P3 128.8 127.3 1.5 128.0 -0.7 0.8
Ga2Pyr2Br4 Br-Ga-Br 107.2 107.0 0.2 105.8 1.2 1.4
Ga2Pyr2Br4 Ga-Ga-Br 114.3 115.1 -0.8 116.3 -1.2 -2.0
As3(CH3)6In6(C C-In-C 101.7 101.9 -0.2 99.0 2.9 2.7
H3)6
As3(CH3)6In6(C C-As-C 125.6 124.7 0.9 126.0 -1.3 -0.4
H3)6

[|x|] 0.68 1.28 1.52

Calc: calculated with UFF-VALBOND


Ref: taken fromRoot et al. (1993)
Exp: experimental values

Applicability
UFF-VALBOND can be used for compounds containing elements
from across the periodic table.

48 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

Because the angular energy function is based on hybridization


considerations, improved results are expected for non-hypervalent
complexes for which the molecular shape is not a priori known.
Hypervalent molecules are molecules that contain atoms with more
occupied orbitals than there are valence orbitals. Thus, for a nor-
mal valent atom of the p-block, only the valence s and p orbitals are
occupied, and molecules are hypervalent if the electron count
around such an atom exceeds eight.
For transition state elements engaged in covalent bonding, only
the valence s and the five d orbitals participate in bonding. For
these compounds the molecule is hypervalent if the electron count
around any central atoms exceeds 12. Most common transition ele-
ment complexes are hypervalent.
For compounds containing hypervalent atoms, the forcefield anal-
ysis yields similar results as the Universal Forcefield, provided
that the hypervalent atoms are declared as a non-VALBOND cen-
ter.
The current version of Cerius2 cannot differentiate between hyper-
valent and non-hypervalent species, and the onus is thus on the
user to correctly assign VALBOND centers for hypervalent mod-
els. By default the UFF-VALBOND forcefield will assign all atoms
bonded to two or more atoms to be a VALBOND center. This will
give correct results for the majority of organic compounds.
Depending on the topology Cerius2 may assign gross hybridiza-
tion (see Assigning gross hybridization) to hypervalent atoms, but
the energy calculated by OFF may be incorrect.

Assigning VALBOND centers


If the model contains hypervalent atoms, VALBOND centers
should be assigned by hand. This may be accomplished as follows:
1. Load UFF_VALBOND1.01 via the Open Force Field card
2. From the Open Force Field card select Energy Expression -
Automate Setup
3. Disable the Perform Valbond Initialization option
4. From the Open Force Field card select Typing - VALBOND
centers

Forcefield-Based Simulations/October 1997 49


2. Forcefields

5. Select atoms from the model window and set the appropriate
VALBOND centers.

Note
It is possible to assign VALBOND centers with any forcefield
loaded. However for the angle energy function to be used, the
forcefield must be specifically defined to use the VALBOND
bend energy function. Currently only the UFF_VALBOND1.01
uses this function.

Assigning gross hybridization


The gross hybridization of an atom is equal to the number of occu-
pied orbitals minus one. For example, in a sp3 carbon it is three.
Cerius2 will assign values for the gross hybridization to VAL-
BOND centers based on the hybridization of the atoms as stored in
the data model.
For p-block elements the gross hybridization is of the form spn. For
d-block elements VALBOND assumes a hybridization of the form
sdm.
In certain cases, i.e., some hypervalent atoms, Cerius2 may not be
able to assign a gross hybridization to all atoms, or occasionally,
the user may wish to override the assigned values. Gross hybrid-
izations may be assigned manually as follows:
1. Disable automatic VALBOND initialization as described in
Assigning VALBOND centers.
2. From the Open Force Field card select Typing - VALBOND
centers.
3. Click on the Gross Hybridizations button.
4. Gross hybridizations may be assigned from the panel that pops
up. For each VALBOND center a hybridization of the form
spndm may be assigned. The values for n and m need not be inte-
gers.

Bond Hybridization
Bond hybridization is the net hybridization of an individual bond
connected to a VALBOND center. The hybridization is a function
of the nature of the VALBOND center, its gross hybridization, and

50 Forcefield-Based Simulations/October 1997


Rule-based forcefields broadly applicable to the periodic table

the number and nature of all the ligands connected to the VAL-
BOND center.
For example the gross hybridization of the N atom in ammonia,
NH3, equals three (sp3).
The bond hybridization of the N-H bonds is calculated as sp3.47.
The gross hybridization of O in water is three, and the bond
hybridization of the O-H bond is sp3.87. Thus VALBOND calcu-
lates these bonds to have more p character than the C-H bond in
say methane, where both gross and bond hybridizations are sp3,
exactly.
Bond hybridizations are calculated when VALBOND centers are
assigned.

Examples
Standard Minimization ♦ From OFF Setup load UFF_VALBOND1.01 via the Open Force
Field card.
♦ From OFF Methods select Minimizer and Run.
Minimization of hyperval- ♦ From OFF Setup load UFF_VALBOND1.01 via the Open Force
ent compound Field card.
♦ From the Open Force Field card select Energy Expression -
Automate Setup.
♦ Disable the Perform Valbond Initialization option.
♦ From the Open Force Field card select Typing - VALBOND
centers.
♦ Select the appropriate atoms from the model window and set as
VALBOND centers
OR
set all atoms as VALBOND centers and unset selected atoms.
♦ From OFF Methods select Minimizer and Run.

Dreiding forcefield
General force constants and geometry parameters for the Dreiding
forcefield are based on simple hybridization rules rather than on

Forcefield-Based Simulations/October 1997 51


2. Forcefields

specific combinations of atoms. The Dreiding forcefield does not


generate parameters automatically the way that UFF and ESFF do;
however, its explicit parameters were derived by a rule-based
approach.
Functional form The Dreiding forcefield is a purely diagonal forcefield with har-
monic valence terms and a cosine–Fourier expansion torsion term.
The umbrella functional form is used for inversions, which are
defined according to the Wilson out-of-plane definition (see
Table 24). The van der Waals interactions are described by the Len-
nard–Jones potential. Electrostatic interactions are described by
atomic monopoles and a screened (distance-dependent) Coulom-
bic term. Hydrogen bonding is described by an explicit Lennard–
Jones 12–10 potential (Mayo et al. 1990).
Coverage of the periodic The Dreiding forcefield has good coverage for organic, biological
table and main-group inorganic molecules. It is only moderately accu-
rate for geometries, conformational energies, intermolecular bind-
ing energies, and crystal packing.
Atom types Atom typing in the Dreiding forcefield is straightforward. An
atom type is denoted by a name of up to five characters:
♦ The first two characters are the elemental symbol (for example,
C_ for carbon, Sn for tin).
♦ The third character (if present) represents the hybridization
state (for example, 1 = linear, sp1; 2 = trigonal, sp2; 3 = tetrahe-
dral, sp3; and R = an sp2 atom involved in resonance).
♦ The fourth character (if present) indicates the number of
implicit hydrogen atoms (for example, C_R2 is a resonant car-
bon with two implicit hydrogens).
♦ The fifth character (if present) is reserved to indicate other spe-
cial characteristics (for example, H_ _ _A denotes a hydrogen
atom that is capable of forming a hydrogen bond).
Versions The Dreiding II forcefield is an extension and improvement over
the Dreiding I forcefield. DREIDING2.21 is the recommended, up-
to-date version of the Dreiding II forcefield.
See the archive directory (archive directory) for older versions of the
Dreiding forcefield.

52 Forcefield-Based Simulations/October 1997


Classical forcefields

Classical forcefields
Availability Classical forcefields provided by MSI include AMBER,
CHARMm, and CVFF:
♦ The standard AMBER forcefield (Standard AMBER forcefield)
has been supplemented (Homans’ carbohydrate forcefield), so as to
cover oligosaccharides. This enhanced AMBER forcefield is the
version that is run via the Discover program, as available in the
Insight•Discover and Insight•Discover_3 modules. (Nonvali-
dated versions of AMBER are also available for the
Cerius2•Open Force Field module, in the directory named
“untested”, see untested directory.)
♦ The CHARMm forcefield (CHARMm forcefield) is run through
the CHARMm program as implemented in the QUANTA
molecular modeling interface.
♦ The CVFF forcefield (CVFF, consistent valence forcefield) is run
via the Discover program, as available in the Insight•Discover,
Insight•Discover_3, and Cerius2•Discover modules, and via
the Cerius2•Open Force Field module.
Characteristics The parameters of classical forcefields were derived by fitting
experimental data sets. They were generally designed for biologi-
cal macromolecules, although they have been used or adapted for
other classes of models. Since they are relatively old, they are well
characterized and many research studies have used them.

AMBER forcefield
The standard AMBER forcefield (Weiner et al. 1984, 1986) is
parameterized to small organic constituents of proteins and
nucleic acids. Only experimental data were used in parameteriza-
tion.
However, AMBER has been widely used not only for proteins and
DNA, but also for many other classes of models, such as polymers
and small molecules. For the latter classes of models, various
authors have added parameters and extended AMBER in other
ways to suit their calculations. The AMBER forcefield has also

Forcefield-Based Simulations/October 1997 53


2. Forcefields

been made specifically applicable to polysaccharides (Homans


1990, and see Homans’ carbohydrate forcefield).
AMBER is used mainly for modeling proteins and nucleic acids. It
is generally lower in accuracy and has a limited range of applica-
bility. The use of AMBER is recommended mainly for those cus-
tomers who are familiar with AMBER and have developed their
own AMBER-specific parameters. It generally gives reasonable
results for gas-phase model geometries, conformational energies,
vibrational frequencies, and solvation free energies.
Within Discover, AMBER cannot automatically replace missing
parameters with default or generic parameters (see Automatic
assignment of values for missing parameters).

Standard AMBER forcefield


Functional form The AMBER energy expression contains a minimal number of
terms. No cross terms are included. The functional forms of the
energy terms used by AMBER are given in Eq. 19.

∑ K ( b – b ) + ∑ H (θ – θ ) + ∑ -----2- [ 1 + cos ( nφ – φ ) ]
2 2 Vn
E pot = 2 0 θ 0 0
b (1) θ (2) φ (3)

∑ ε[ ( r*/r) ∑ q q /ε r + ∑
C ij D ij
+ 12 – 2 ( r*/r ) 6 ] + i j ij ij ------- – -------
r ij12 r ij10
(4) (5) (6)
Eq. 19

The first three terms in Eq. 19 handle the internal coordinates of


bonds, angles, and dihedrals. Term 3 is also used to maintain the
correct chirality and tetrahedral nature of sp3 centers in the united-
atom representation. (In the united-atom representation, nonpolar
hydrogen atoms are not represented explicitly, but are coalesced
into the description of the heavy atoms to which they are bonded.)
Terms 4 and 5 account for the van der Waals and electrostatic inter-
actions.
The final term, 6, is an optional hydrogen-bond term that aug-
ments the electrostatic description of the hydrogen bond. This
term adds only about 0.5 kcal mol-1 to the hydrogen-bond energy
in AMBER, so the bulk of the hydrogen-bond energy still arises

54 Forcefield-Based Simulations/October 1997


Classical forcefields

from the dipole–dipole interaction of the donor and acceptor


groups.
Atom types The atom types in AMBER are quite specific to amino acids and
DNA bases. In the original publications, the atom types and
charges are defined by means of diagrams of the amino acids and
nucleotide bases. In the Insight environment, this information has
been placed in a residue library. Descriptions of the atom types,
from the original papers defining the AMBER forcefield, are
shown in Table 25.

Homans’ carbohydrate forcefield


Extension of AMBER to car- Homans’ forcefield for oligosaccharides (Homans 1990) has been
bohydrates incorporated into the AMBER forcefield available in the Discover
program. It uses the same functional form as AMBER and extends
its applicability to polysaccharides and glycoproteins.
Parameterization Homans’ approach in developing the carbohydrate forcefield was
to combine the parameters for monosaccharides (Ha et al. 1988)
with the results of ab initio calculations on model compounds rel-
evant to the glycosidic linkage (Wiberg and Murcko 1989), to gen-
erate an AMBER-compatible forcefield. The bond, angle, and
torsion parameters for each monosaccharide residue were in gen-
eral taken directly from Ha et al. (1988). However, certain parame-
ters required adjustment and others were added, to account for the
glycosidic linkage between contiguous monosaccharide residues.
The torsion parameters were adjusted to fit the quantum mechan-
ical data (6-31G*) of Wiberg and Murcko (1989) for
dimethoxymethane.
In addition, the carbohydrate forcefield utilizes charges and van
der Waals parameters derived for monosaccharides by Ha et al.
(1988). Since the latter parameters were derived without an
explicit hydrogen-bonding term, the carbohydrate forcefield also
does not contain hydrogen-bonding parameters.
Homans-specific atom To account for the anomeric effect associated with carbohydrates,
types the linking atoms were defined as different atom types. Table 26
lists these atom types, as well as the types corresponding to the
ring atoms of sugars.

Forcefield-Based Simulations/October 1997 55


2. Forcefields

CHARMm forcefield
CHARMm, which derives from CHARMM (Chemistry at HAR-
vard Macromolecular Mechanics), is a highly flexible molecular
mechanics and dynamics program originally developed in the lab-
oratory of Dr. Martin Karplus at Harvard University. It was
parameterized on the basis of ab initio energies and geometries of
small organic models.
Applicability CHARMm performs well over a broad range of calculations and
simulations, including calculation of geometries, interaction and
conformation energies, local minima, barriers to rotation, time-
dependent dynamic behavior, free energy, and vibrational fre-
quencies (Momany & Rone, 1992). CHARMm is designed to
give good (but not necessarily “the best”) results for a wide
variety of modelled systems, from isolated small molecules to
solvated complexes of large biological macromolecules; however,
it is not applicable to organometallic complexes.
Functional form CHARMm uses a flexible and comprehensive empirical energy
function that is a summation of many individual energy terms.
The energy function is based on separable internal coordinate
terms and pairwise nonbond interaction terms (Brooks et al. 1983).
The total energy is expressed by the following equation:

E pot = E bond + E angle + E torsion + E oop (internal terms)


+ E elec + E vdW (external terms)
+ E constraint + E user (special)
Eq. 20

where the oop (out-of-plane) angle is defined as an improper tor-


sion. A more specific (Brooks et al. 1983) statement of the function
is:

E pot =
∑ k (r – r ) + ∑ K (θ – θ ) + ∑ k
b 0
2
θ 0
2
φ – k φ cos ( nφ ) +
∑ k (χ – χ )
χ 0
2 Eq. 21

 A ij B ij
∑ ∑
qi qj
+ ------------------ + - – ------ sw ( r ij2 , r on
 ------ 2 , r2 ) + E
off constraint + E user
4πε 0 r if  r 12 r 6 

56 Forcefield-Based Simulations/October 1997


Classical forcefields

The electrostatic term can be scaled to mimic solvent effects. The


van der Waals combination rules and functional form are derived
from rare-gas potentials. The function optionally used in
CHARMm to calculate the hydrogen bond energy is:

 A   B 
E =   ---------  –  ------  cosm ( φ A – H – D ) cosn ( φ AA – A – H )
  r 6AD  r 4A 

2 2 2
: switch(r AD, r on, r off)

: switch( cos2 ( φ AHD ), cos2 ( φ on ), cos2 ( φ off ))


Eq. 22

Hydrogen bond energy is not included as a default energy term.


The current CHARMm parameter set has been derived in such a
way that hydrogen bond effects are described by the combination
of electrostatic and van der Waals forces.
Constraint terms are described under Applying constraints and
restraints. The use of user terms to customize the CHARMm force-
field is described in the CHARMm documentation.

CVFF, consistent valence forcefield


The consistent-valence forcefield (CVFF), the original forcefield
provided with the Discover program (and now available also in
Cerius2•OFF), is a generalized valence forcefield (Dauber-
Osguthorpe 1988). Parameters are provided for amino acids,
water, and a variety of other functional groups.
The augmented CVFF was developed for materials science appli-
cations and is provided with Discover in the Insight 4.0.0 program.
It includes additional atom types for aluminosilicates and alumi-
nophosphates. (The default CVFF that is included with Insight
4.0.0 and Cerius2•Discover also has a few more atom types than
the CVFF included with Insight 95.0, since the former Insight
series, as well as Cerius2, is intended for use in materials science.)
CVFF also has the ability to use automatic parameters (Automatic
assignment of values for missing parameters) when no explicit param-

Forcefield-Based Simulations/October 1997 57


2. Forcefields

eters are present. These are noted in the output file from the calcu-
lation.
Applicability CVFF was fit to small organic (amides, carboxylic acids, etc.) crys-
tals and gas phase structures. It handles peptides, proteins, and a
wide range of organic systems. As the default forcefield in Dis-
cover, it has been used extensively for many years. It is primarily
intended for studies of structures and binding energies, although
it predicts vibrational frequencies and conformational energies
reasonably well.
Out-of-planes and the The CDiscover program has undergone extensive validation tests
two versions of Discover comparing it with FDiscover. These tests have indicated that the
two programs provide exactly the same results for all components
of the energy expression with one exception: the out-of-plane
energy for the CVFF forcefield.
The out-of-plane energy for the CVFF forcefield is calculated as an
improper torsion. An improper torsion views three connected
atoms and a central atom as if it were a torsion (see Table 24). There
are three possible improper torsions that can be generated for a
particular out-of-plane, based on permutations of the connected
atoms.
For CVFF, only one of these improper torsions is used. The rules
that FDiscover employs to select the particular improper torsion
are somewhat arbitrary, and it was not possible to replicate them
in CDiscover. However, the changes in energy are very small (on
the order of 0.01 kcal mol-1). (A more rigorously defined out-of-
plane, the Wilson out-of-plane, is used in the CFF forcefield. This
energy term provides exact agreement between CDiscover and
FDiscover.)

Functional form
The analytic form of the energy expression used in CVFF is shown
in Eq. 23.

58 Forcefield-Based Simulations/October 1997


Classical forcefields

∑ D [1 – e ∑ H ( θ – θ ) + ∑ H [ 1 + s cos (nφ) ]
–α ( b – b0 )
E pot = b ]+ θ 0
2
φ

b (1) θ (2) φ (3)

+
∑H χ + ∑∑F
χ
2
bb′ ( b – b 0 ) ( b′ – b′0 ) +
∑∑F θθ′ ( θ – θ 0 ) ( θ′ – θ′0 )

χ (4) b b′ (5) θ θ′ (6)

+
∑∑F bθ ( b – b0 ) ( θ – θ0 ) +
∑F φθθ′ cos φ ( θ – θ 0 ) ( θ′ – θ′0 ) +
∑∑F χχ′ χχ′

b θ (7) φ (8) χ χ′ (9)

+
∑ ε[ ( r*/r) 12 – 2 ( r*/r ) 6 ] +
∑ q q /εr i i ij

(10) (11)
Eq. 23

Types of terms and com- Terms 1–4 are commonly referred to as the diagonal terms of the
putational costs valence forcefield and represent the energy of deformation of bond
lengths, bond angles, torsion angles, and out-of-plane interactions,
respectively.
A Morse potential (Term 1) is used for the bond-stretching term.
The Discover program also supports a simple harmonic potential
for this term. The Morse form is computationally more expensive
than the harmonic form. Since the number of bond interactions is
usually negligible relative to the number of nonbond interactions,
the additional cost of using the more accurate Morse potential is
insignificant, so this is the default option.
When not to use the When the model being simulated is high in energy (caused, for
Morse term example, by overlapping atoms or a high target temperature), a
Morse-style function might allow bonded atoms to drift unrealis-
tically far apart (see Figure 2). This would not be desirable unless
you were intending to study bond breakage.
Use of crossterms Terms 5–9 are off-diagonal (or cross) terms and represent cou-
plings between deformations of internal coordinates. For example,
Term 5 describes the coupling between stretching of adjacent
bonds.

Forcefield-Based Simulations/October 1997 59


2. Forcefields

A B

Figure 2. Morse vs. harmonic potentials


A: Morse potential for a C–H bond; B: harmonic potential for a C–H bond. The
Morse potential allows a bond to stretch to an unrealistic length.

These terms are required to accurately reproduce experimental


vibrational frequencies and, therefore, the dynamic properties of
molecules. In some cases, research has also shown them to be
important in accounting for structural deformations. However,
crossterms can become unstable when the structure is far from a
minimum. Therefore, although both Cerius2•OFF and the standa-
lone Discover program include crossterms by default when using
CVFF, the Insight program and Cerius2•Discover explicitly omit
the crossterms by default.
Nonbond interactions Terms 10–11 describe the nonbond interactions. Term 10 represents
the van der Waals interactions with a Lennard–Jones function.
Term 11 is the Coulombic representation of electrostatic interac-
tions. The dielectric constant ε can be made distance dependent
(i.e., a function of rij).
In the CVFF forcefield, hydrogen bonds are a natural consequence
of the standard van der Waals and electrostatic parameters, and
special hydrogen bond functions do not improve the fit of CVFF to
experimental data (Hagler 1979a, b).
Additional information on the forcefields and how they can be
augmented is contained in the documentation for the individual
simulation engines, where the file formats are described.

60 Forcefield-Based Simulations/October 1997


Special-purpose forcefields

CVFF atom types


The CVFF forcefield supplied by MSI defines atom types for the 20
commonly occurring amino acids, most hydrocarbons, and many
other organic models (Table 30).
It automatically supplies generic parameters when specific param-
eters are not found (Automatic assignment of values for missing
parameters).
Augmented CVFF The augmented version of CVFF (available in version 4.0.0 of the
Insight program) includes nonbond parameters (Born model) for
additional atom types (Table 31) that are useful for simulations of
silicates, aluminosilicates, clays, and aluminophosphates. These
added parameters were derived using Ewald summation for non-
bond interactions between the additional atom types.
Partial charges The bond increment section of the .frc file for CVFF has been
expanded so that partial charges can be determined whenever the
Cerius2•OFF module or the Discover program is able to assign
automatic atom types.

Special-purpose forcefields
Availability Several forcefields are provided by MSI that are specialized for
certain uses:
♦ Forcefields optimized for glasses (Glass forcefield), for polyvi-
nylidene fluoride (MSXX forcefield for polyvinylidene fluoride),
and for zeolites (Zeolite forcefields), as well as forcefields
intended for use only in the Cerius2•Morphology module
(Forcefields for Cerius2•Morphology module), are accessed through
the Cerius2•Open Force Field module.
♦ The PCFF (PCFF forcefield for polymers and other materials), COM-
PASS (COMPASS forcefield for organic and inorganic materials),
and augmented CVFF (Augmented CVFF) forcefields, which are
run via both Cerius2•OFF and the Discover program, are ver-
sions of standard forcefields that have been extended for use
with polymers and zeolites, respectively.
Characteristics The specialized forcefields described in the following sections
have been developed for the purpose of simulating certain sys-

Forcefield-Based Simulations/October 1997 61


2. Forcefields

tems or performing limited types of calculationfss well. They


should not be used for other purposes, since they were not
designed to be accurate outside their limited areas of applicability.

Glass forcefield
The Glass forcefield (Glassff) exists in versions that include two-
and three-body nonbond terms (glassff_1.01 and glassff_2.01,
respectively) and is used for studying a range of inorganic oxide
glasses (and other ionic systems) under periodic boundary condi-
tions. The newer, three-body, glass forcefield is recommended and
documented here.
The glass forcefield is applicable to systems containing Si, O, Al,
Li, Na, K, Mg, Ca, B, and Ti. Predicted properties include structure,
radial distribution functions, angular distributions, and short-
range order.
The form of interaction and parameterization for this forcefield is
based mainly on the work of Soules, Garofalini, and co-workers
(Soules 1979, Soules & Varshneya 1981, Garofalini 1984, Tesar &
Varshneya 1987, Rosenthal & Garofalini 1987, 1988, Zirl & Garo-
falini 1989, Garofalini & Zirl 1990, Kohler & Garofalini 1994).
Functional form All ion pairs are subjected to an interaction containing a repulsive
van der Waals term and a screened Coulombic term.
The screeening of the Coulombic potential with the complemen-
tary error function represents an approximate Ewald summation
in real-space only (Woodcock 1975). The reciprocal-space Ewald
sum is omitted, to be able to model a bulk amorphous system with
finite computational resources. This also reduces the effects of the
imposed periodicity of the simulation model.
The potential equation is:

q i qj
V ij ( r ) = A ij exp ( – r ij ⁄ p ) + C --------- erfc ( r ij ⁄ βij ) Eq. 24
εr ij

Where:
Aij A pre-exponential constant specific to each element-pair
van der Waals interaction.

62 Forcefield-Based Simulations/October 1997


Special-purpose forcefields

rij Interatomic distance.


p 0.29 Å.
C Constant that converts the units to kcal mol-1.
q Charge on each species.
erfc Complementary error function.
βij Depends on species.
The RSL2 water potential can be used as part of the off-diagonal
van der Waals term.
If you want to customize the glass forcefield, the parameter A for
each element combination ij can be computed approximately,
using the formula:

qi qj ri + r j
A ij =  1 + ---- + ---- b exp  -------------- Eq. 25
 n i n j  ρ 

Where:
qi Formal charge on ion i.
ni Number of valence-shell electrons in ion i.
b 3.38E-20 J.
ri Radius repulsion constant of the element ion i, related to
the ionic radius.
ρ 0.29 Å.
For the glass forcefield, A parameters were computed for the full
set of ion-pair interactions using quoted values of r. Where explicit
values of Aij existed in the literature, these were preferred. This
produced a parameter set able to tackle a range of ions:
O2-, Si4+, Al3+, Li+, Na+, K+, Mg2+, Ca2+, Zn2+, Ti4+, B3+
Automated setup For models within the scope of the glass forcefield, the default
Open Force Field settings included in the file can be used without
adjustment. Atom typing and charging also can be done automat-
ically.

Forcefield-Based Simulations/October 1997 63


2. Forcefields

MSXX forcefield for polyvinylidene fluoride


The MSXX forcefield for polyvinylidene fluoride (PVDF) and
related polymers and small organic models (Karasawa & Goddard
1992) is based on a combination of first-principles quantum
mechanical calculations. It includes cross terms (for accurate
vibrational frequencies), and charges are associated with each
atom type.
MXFF is designed for modelling and predicting a wide range of
properties of PVDF, including geometry, cell parameters, elastic
constants, dielectric constants, mechanical stability, and vibra-
tional frequencies.
Parameterization Four aspects were addressed to generate an accurate forcefield for
PVDF: charges, van der Waals interactions, torsion terms, and
valence terms.
Partial atomic charges for the various atoms were obtained using
the PS-Q program to calculate potential-derived charges from the
Hartree–Fock wavefunction for CF3–CH2–CF2–CH3.
For carbon and hydrogen, van der Waals parameters that were
determined from fitting lattice parameters, elastic constants, lat-
tice phonons, and cohesive energies of crystalline polyethylene
and crystalline graphite were used. Fluorine parameters were
derived for CF4 and crystalline polytetrafluoroethelene. Hartree–
Fock calculations were used to obtain the torsion potential curve
for CF3–CH2–CF2–CH3.
The valence parameters were optimized using Hessian-biased
forcefield methodology and the vibrational frequencies of the
form I crystal. Van der Waals parameters and charges were fixed
during optimization of the valence parameters.

Note
The MSXX forcefield uses the Morse functional form for bond
stretching, which means that the force goes to zero at large bond
distances (see Figure 2). Therefore this forcefield should not be
used when starting a run with bad initial geometries. Instead,
use an alternative forcefield (e.g., the Universal forcefield)
when starting from a bad geometry and then switch to the
MSXX when close to the energy minimum.

64 Forcefield-Based Simulations/October 1997


Special-purpose forcefields

See also documentation of PCFF (PCFF forcefield for polymers and


other materials) and COMPASS (COMPASS forcefield for organic and
inorganic materials).

Zeolite forcefields
See also documentation on the augmented CVFF (Augmented
CVFF).
bks1.01 The BKS forcefield was developed by van Beest et al. (1990) to
describe the geometries, vibrational frequencies, and mechanical
properties of silicas and aluminophosphates. The parametrization
is based on both ab initio and experimental data. The forcefield
contains four atom types: Si, O, Al, and P.
Interatomic interactions are considered to be ionic (nonbond)
rather than covalent. The van der Waals interactions are described
with a Buckingham potential, and the electrostatic interactions are
described by atomic monopoles and a Coulombic term.
burchart1.01 The Burchart forcefield was developed by Burchart (1992) to
describe the geometry, heats of formation, transitions under pres-
sure, crystal morphology, and vibrational frequencies of silicas
and aluminophosphates. The parametrization is based mainly on
experimental data and includes both valence and nonbond terms.
It contains four atoms types: Si, O, Al, P.
This forcefield treats the zeolite framework as largely covalent.
Bond-stretching is described by the Morse term, 1–3 interactions
by the Urey–Bradley term, van der Waals interactions by an expo-
nential-6 term, and electrostatic interactions by partial atomic
charges and a screened Coulombic term.
The Cerius2 implementation of the Burchart forcefield differs from
the published version in three respects:
♦ The Cerius2 implementation of the forcefield ignores the Cou-
lombic interaction if the atom pair interacts via a bond interac-
tion, but Burchart allows both bond and Coulombic
interactions for a pair of atoms.
♦ The Cerius2 implementation of the forcefield does not allow
Coulombic or van der Waals interactions between two atoms

Forcefield-Based Simulations/October 1997 65


2. Forcefields

when they interact via a Urey–Bradley interaction, but Burchart


does.
♦ The Cerius2 implementation applies a 0–15 Å spline to both
types of nonbond interaction, but Burchart applies a 0–15 Å
spline function to the Coulombic term and not to the van der
Waals term.
burchart1.01- The Burchart–Dreiding forcefield combines the Burchart and Dre-
DREIDING2.21 iding II forcefields. It is useful for the study of properties of
(mainly) organic molecules inside silica/aluminophosphate
frameworks. There are four distinct types of interaction to con-
sider:
♦ Framework—All interactions within the silica/aluminophos-
phate framework.
♦ Intramolecular—The interactions within each molecule.
♦ Intermolecular—The interactions between molecules.
♦ Framework–molecule—All nonbond interactions between the
framework and the molecules.
The Burchart forcefield treats the framework, and the Dreiding II
forcefield treats the intra- and inter-molecular interactions The
parameters for the framework–molecule interactions are derived
from parameters from both forcefields, combined by the arith-
metic combination rule (Combination rules for van der Waals terms).
burchart1.01- The Burchart–Universal forcefield combines the Burchart and Uni-
UNIVERSAL1.02 versal forcefields. Similar to the Burchart–Dreiding forcefield
described above, the Burchart forcefield treats the framework; the
Universal forcefield treats the intra- and inter-molecular interac-
tions; and the parameters for the framework–molecule interac-
tions are derived from parameters from both forcefields, combined
by the geometric combination rule.

Forcefields for sorption on zeolites


Several forcefields have been implemented especially for studies
of sorption of rigid small molecules onto zeolite structures, using
the Cerius2•Sorption module. They can be used in studies of bind-
ing sites, interaction energies, Henry’s constants, adsorption iso-
therms, and relative selectivity.

66 Forcefield-Based Simulations/October 1997


Special-purpose forcefields

sor_yashonath1.01 The parameters for C, H, O, and Na in the Sorption Yashonath


forcefield are taken from Yashonath et al. (1988). Mainly for satu-
rated hydrocarbons in zeolites
sor_demontis1.01 The parameters for C, H, O, and Na in the Sorption Demontis
forcefield are taken from Demontis et al. (1989). Mainly for ben-
zene in zeolites.
sor_pickett1.01 The parameters for O and Xe in the Sorption Pickett forcefield are
taken from Pickett et al. (1990). Mainly for xenon in zeolites.
watanabe-austin1.01 The Watanabe–Austin forcefield (Watanabe et al. 1995) was fit to
experimental adsorption isotherms (Miller et al. 1987) for argon,
oxygen, and nitrogen adsorption in zeolite types A, X, and Y (alu-
monosilicates with Ca+2, Na+, and Li+ counterions). It contains
parameters for argon, oxygen and nitrogen sorbates, Ca, Na, K,
and Li cations, and zeolite framework atoms (Si, Al, O).
van der Waals interactions are described by a Lennard–Jones
potential. Electrostatic interactions are described by off-center
monopoles and a Coulombic term. The change in framework oxy-
gen polarizabilities as the aluminium content increases was taken
into account during parameterization of the forcefield.

Forcefields for Cerius2•Morphology module


These forcefields include only nonbond interaction terms and are
intended only for determining crystal morphology in the
Cerius2•Morphology module.
morph_lifson1.11 In the Morphology/Lifson forcefield (Lifson et al. 1979), you need
to assign the charges on the C, C_O, and H_N. Charges should be
assigned by assuming electroneutrality of the CH3, CH2, CH,
amide, CO, NH, NH2, and COOH groups. This forcefield is recom-
mended for studies of the morphology of crystals or carboxylic
acids and amides and is limited to regid-body calculations involv-
ing C, H, N, and O.
The van der Waals term is Lennard–Jones; electrostatic interac-
tions are described by partial atomic charges and a Coulombic
term; and hydrogen bonding is described by an explicit Lennard–
Jones 10–12 term.

Forcefield-Based Simulations/October 1997 67


2. Forcefields

morph_williams1.01 The Morphology/Williams (Williams 1966) forcefield is applicable


only to hydrocarbon crystal morphology, because it contains only
parameters for carbon and hydrogen.
morph_momany1.1 The Morphology/Momany and Morphology/Scheraga force-
morph_scheraga1.1 fields (Momany et al. 1974, Némethy et al. 1983) were developed
for polypeptides and are suited for predicting the packing config-
urations and lattice energies in crystals of hydrocarbons, carboxy-
lic acids, amines, and amides (also amino acids and polypeptides
for Morphology/Scheraga). They are limited to rigid-body calcu-
lations.
The van der Waals term is Lennard–Jones; electrostatic interac-
tions are described by atomic monopoles and a screened Coulom-
bic term; and hydrogen bonding is described by an explicit 10–12
term.
The Morphology/Scheraga forcefield contains an updated version
of the Momany parameter set. Because these parameter sets con-
tain parameters only for van der Waals energy calculations, they
are appropriate for use within the Morphology module but not in
other applications such as energy minimization.

Archived and untested forcefields


Cerius2 Several old and nonvalidated forcefields for the Cerius2•Open
Force Field module are included in the untested and archive sub-
directories of the forcefield directory (Cerius2-Resources/FORCE-
FIELD directory). These forcefields are documented below.
Another forcefield, CHEAT95, which is an addendum to the
CHARMm forcefield, for polysaccharides, is available free at
MSI’s website (http://www.msi.com) and is not documented
here.
Insight II Several of old, nonvalidated, unsupported forcefields for some
Insight modules are included in subdirectories of the $BIOSYM/
gifts directory. These forcefields are documented (minimally) by
means of README files in those directories.
In Insight 4.0.0, several additional unsupported forcefields are
present in an archive subdirectory in the $BIOSYM_LIBRARY
directory. Additional information is available through the Force-

68 Forcefield-Based Simulations/October 1997


Archived and untested forcefields

field/FF_Info parameter block, which is accessed through the


Builder and other modules (but which is not included in Insight
97.0).
archive directory The Cerius2 archive directory contains copies of forcefield files
that were previously released with CERIUS, as well as a copy of
the Dreiding forcefield previously released with POLYGRAF.
Newer versions of most of these forcefields are available in the
top-level forcefield directory, as discussed above. These archived
forcefields should be used only when necessary for compatibility
with work carried out using CERIUS 3.0–3.2 or POLYGRAF.
The “CFF93” forcefield that is included in Cerius2•OFF is actually
CFF89 (see CFF91, PCFF, CFF, COMPASS—consistent forcefields),
which was parameterized only for the alkyl functional group and
alkane models (Hwang et al. 1994).
Most of these are CERIUS forcefields and produce the same results
when used in Cerius2 as in CERIUS 3.2. However,
GRAFDREIDING1.00 is the Dreiding II forcefield from
POLYGRAF3.2.1.
Notes on the archive/ The GRAFDREIDING1.00 forcefield is the direct result of a format
GRAFDREIDING1.00 force- conversion (via the pf converter, as discussed in the documenta-
field tion for C2•OFF) of the dreidii321.par forcefield of
POLYGRAF3.2.1. Nevertheless, discrepancies may result between
the energies calculated in Cerius2 and in POLYGRAF.
This is because only the first-found inversions and torsions con-
tribute to the respective energy terms, and these are not necessar-
ily picked in the same order by Cerius2 and POLYGRAF. However,
you will obtain identical energies if the options to find all torsions
and all inversions are chosen in Cerius2 and POLYGRAF (see
Scaled torsion terms).
This version of the Dreiding II forcefield also differs from both the
published one (Mayo et al. 1990) and the other Dreiding forcefield
files in Cerius2 in that this version uses the geometric combination
rule for nonbond interactions (see Combination rules for van der
Waals terms).
We strongly recommend that you use the UNIVERSAL1.02 or
DREIDING2.21 forcefields in preference to the
GRAFDREIDING1.00 field, unless you want to match the results
of POLYGRAF calculations.

Forcefield-Based Simulations/October 1997 69


2. Forcefields

Note
When using the GRAFDREIDING1.00 forcefield, if you load in
a Biodesign (bgf) format structure file that already has the
correct atom types assigned (for example, a structure that was
atom typed using the default Dreiding forcefield rules and
saved in a POLYGRAF Biodesign file with its default atom
types), do not atom type the structure again.

untested directory The untested directory contains several forcefields that have not
been validated by Molecular Simulations. Some are incomplete
and intended only as examples of how to incorporate parameters
from other forcefields. We supply these forcefields as a conve-
nience to our customers, but do not support them.
♦ amber1.01 — This forcefield file is based on the AMBER
(Weiner et al. 1984, 1986) forcefield developed for the simula-
tion of proteins and nucleic acids. Its atom types are limited to
those needed for these structures: H, C, O, S, N, and P.
♦ amber2.01 — This forcefield is an extension of the amber1.01
forcefield. In addition to the atom types of amber1.01, it con-
tains atom types for several biologically important cations: Br,
Na, Cl, and Ca.
♦ mm2_77_1.01 and mm2_85_1.01 — These forcefields are based
on the MM2 and MMP2 forcefields developed by Allinger’s
group (Kao & Allinger 1977, Liljefors et al. 1987, Sprague et al.
1987). They are particularly suitable for small organic models.
The mm2_77 forcefield applies only to saturated hydrocarbons,
and the mm2_85 forcefield is a significant extension of the MM2
and MMP1 forcefields. The mm2_85 parameters are compatible
with those of mm2_77, but calculations on conjugated systems
can be done with the mm2_85 forcefield. The mm2_85 force-
field also has more atom types for a much larger range of ele-
ments.
♦ The “CFF93” forcefield for hydrocarbons (cff93_1.01), although
not in the untested directory, is no longer supported, since it has
been superceded by the CFF91 and CFF forcefields, which
should be used instead.

70 Forcefield-Based Simulations/October 1997


Archived and untested forcefields

Forcefield-Based Simulations/October 1997 71


2. Forcefields

72 Forcefield-Based Simulations/October 1997


3 Preparing the Energy Expression
and the Model

Many forcefields allow you a great deal of flexibility with respect


to how atom types are assigned to atoms, which terms of the
energy function are used, and how the simulation engine applies
the forcefield. You can also perform computational experiments by
using alternative functional terms and applying constraints and
restraints to your model.
Who should read this Although largely automated model preparation and energy
chapter expression setup is possible for simple systems, you should read
this chapter if you want to perform the best simulations possible.
You need to read this chapter if you want or need to know about:
♦ The general procedures for using and selecting forcefields.
♦ Using nondefault atom typing.
♦ Customizing an energy expression’s parameters.
♦ Using nondefault functional forms of the energy expression
(e.g., to avoid difficulties in converging).
♦ Applying constraints or restraints to your model to perform
computational experiments.
♦ Working with periodic systems.
♦ Working with large models.

This chapter explains Using forcefields


Selecting forcefields
Assigning forcefield atom types and charges
Parameter assignment
Using alternative forms of energy terms

Forcefield-Based Simulations/October 1997 73


3. Preparing the Energy Expression and the Model

Applying constraints and restraints


Modeling periodic systems
Handling nonbond interactions

Related information in this Forcefields presents the functional forms of energy expressions and
book describes the forcefields that are available in Molecular Simula-
tions products.
The atom types defined for each forcefield are listed under Force-
field Terms and Atom Types.
The files that specify the forcefields are described in detail in sep-
arate documentation.

Table 5. Finding information in Preparing the Energy Expression and the Model section

If you want to know about: Read:


Charge assignment. Assigning charges.
Missing parameters. Automatic assignment of values for missing parameters.
Custom parameters. Manual parameter assignment.
Editing forcefields. Editing a forcefield.
Hydrogen bond terms. Hydrogen bonds and hydrogen-bond terms.
Criteria for defining hydrogen bonds. Hydrogen bonds and hydrogen-bond terms.
Optimizing or simulating only part of a Applying constraints and restraints.
model.
Reducing computational costs. Applying constraints and restraints.
Forcing a model towards a desired Applying constraints and restraints.
conformation.
Images of models in periodic systems. Modeling periodic systems.
Relation between Cartesian and crys- Figure 7. Relationship between Cartesian coordinate sys-
tal axes. tem (xyz) and periodic system (abc) in Discover and
CHARMm.
Solvent molecules. Minimum-image model; Explicit-image model.
Combination rules for van der Waals Combination rules for van der Waals terms.
terms.
Dielectric constants. The dielectric constant and the Coulombic term.
Neighbor lists. Neighbor lists and buffer widths.

74 Forcefield-Based Simulations/October 1997


Using forcefields

Table 5. Finding information in Preparing the Energy Expression and the Model section

If you want to know about: Read:


Charge groups, functional groups. Charge groups and group-based cutoffs.
Cell multipoles, nonperiodic systems. Cell multipole method.
Ewald sums, periodic systems. Ewald sums for periodic systems.

Specific information For specific information on procedures, please see the manual for
the molecular modeling program and/or simulation engine that
you are using (see Available documentation):

Using forcefields
Graphic interface mode All MSI’s simulation engines and forcefields can be used through
at least one graphical molecular modeling interface (Cerius2,
Insight II, QUANTA, see Table 1).
Standalone mode The Discover and CHARMm programs can also be run in a com-
mand-based, standalone mode with input from a text interface
and/or a script and other input files.
Mixed-mode use For Discover and CHARMm, you can optionally perform some
tasks through the appropriate graphical interface and others in
standalone mode. For example, you might want to prepare model
structure and command input files with the graphical interface,
save both types of files, edit the command input file with a text edi-
tor so as to perform some complex simulation, start the run in stan-
dalone mode, and then analyze the results with the aid of one of
the graphical interfaces. In addition, facilities exist for directly
entering specific BTCL or CHARMm commands from the Insight
or QUANTA interface and for reading a BTCL or CHARMm com-
mand input file into the Insight, Cerius2, or QUANTA interface.
Additional information How to run Discover and CHARMm in standalone mode is docu-
mented separately (see Available documentation).
General procedure Regardless of whether simulation engines are run through the
graphical interface, in standalone mode, or in some combination of
both modes, the general sequence of activities for doing forcefield-
based calculations is as follows.

Forcefield-Based Simulations/October 1997 75


3. Preparing the Energy Expression and the Model

Important
For specific instructions, see the documentation for the
appropriate simulation engine and/or molecular modeling
program (see Available documentation).

1. Read in the forcefield—Based on the type of model and the sci-


entific problem that you want to simulate, decide which force-
field to use (see Forcefields). If it is not the default forcefield for
the molecular modeling program you are using, you need to
specify the desired forcefield.
The forcefield parameter files (which contain parameters that
specify force constants, equilibrium geometries, van der Waals
radii, and other data needed for calculating energies) are read
in as part of this process.
2. Prepare the model:
a. If necessary, read in monomer/residue definitions—In most
cases, the molecular modeling program automatically does
this for you, or (depending on the type of model) it is not
necessary.
Information about residues or monomers, the basic chemical
units that comprise many models, is stored in topology (or
library, dictionary, or template) files. The atoms, atomic
properties, bonds, bond angles, torsion angles, improper
torsion angles, hydrogen bond donors, acceptors, and ante-
cedents, nonbond exclusions, and charge increments are all
specified on a per residue basis.
If this information is not included in a structure or model file
you intend to read in (for example, a Brookhaven Protein
Database file), then you may have to specify the appropriate
topology file.
b. Read in or construct a model—Read in a model from an
appropriate file or construct it using the builder functional-
ity in the molecular modeling program. Make sure your
final model is correct with respect to atom connectivity,
hybridization, bond orders, valences, etc.
c. Assign forcefield atom types and charges to each atom in
your model (see also Assigning forcefield atom types and
charges). If you are using UFF, calculate fractional bond
orders. These are largely automated processes.

76 Forcefield-Based Simulations/October 1997


Using forcefields

d. If necessary, define charge groups—The need for charge


groups depends on the type of model and the scientific
problem that you want to simulate. Some calculations are
impossible without this kind of information, others can be
significantly speeded up by supplying it (see also Charge
groups and group-based cutoffs).
e. Read in or generate Cartesian coordinates—In most cases,
the molecular modeling program automatically does this for
you when you save a model that you have built or read in
from a non-MSI type of file.
3. Set up the energy expression—You may want to use alternative
terms in the energy expression, use or avoid using certain
default terms, specify how nonbond interactions are handled,
apply constraints or restraints (biases) to your model, etc. (see
also this chapter).
4. Set up the calculation—Unless you want your calculation to
run under the default conditions, you need to specify items
such as which minimization algorithm(s) and what termination
criteria to use (see also Minimization, Molecular Dynamics, and
Free Energy).
Most programs also allow you to control your calculation by
specifying various nondefault conditions. You may want to use
a more robust minimizer for a highly strained model, then
switch to a more accurate one for the final stages of computa-
tion. Some forcefield engines allow sophisticated if-tests and
conditional branching. You can determine what will happen if
certain parameters are not found (see also Parameter assign-
ment). You can also send jobs that are time consuming to some
other (faster) computer or run them in the background.
5. Specify output—If desired, specify nondefault kinds or
amounts of output.
6. Run the calculation.
7. Analyze the results—The molecular modeling programs pro-
vide analysis functionality that allows you to view your results
in the form of graphs and tables, as well as by graphicallly dis-
playing the final (and/or intermediate) conformations of the
model.

Forcefield-Based Simulations/October 1997 77


3. Preparing the Energy Expression and the Model

Selecting forcefields
For details on how to select a forcefield in the molecular modeling
program that you are using, please see the appropriate specific
documentation (see Available documentation). A brief summary:
♦ Cerius2•Discover—Use the Run Discover control panel (which
is accessed by selecting Run on the DISCOVER menu card) or
the Select Discover Forcefield control panel (accessed by select-
ing Forcefield/Select on the DISCOVER card).
♦ Cerius2•OFF—Use the Load Force Field control panel, which
is accessed by selecting Load on the OPEN FORCE FIELD
menu card.
♦ Cerius2•MMFF, go to the MMFF card deck and select Run to
access the MMFF Energy Minimization control panel (check the
Use MMFF For Energetic Calculations check box if you want
MMFF93 to be used for calculations in other relevant modules).
♦ Insight II—Use the Forcefield/Select parameter block. This
parameter block is found in several modules, including the
Builder.
♦ QUANTA—Use the CHARMm/CHARMm Mode menu item
on the main QUANTA menu bar.

Assigning forcefield atom types and charges


Who should read this sec- You should read this section if you want to understand what atom
tion types are, how types and charges are assigned automatically, and
how you can make your own atom type or charge assignments.
In addition, if you want to understand parameter assignment
(Determination of which parameters are used with which atom types)
and/or edit the forcefield parameters (Manual parameter assign-
ment), you need to understand something about atom type assign-
ment first.
Availability All molecular modeling programs supplied by MSI perform auto-
matic and/or semi-automatic atom-type and charge assignment

78 Forcefield-Based Simulations/October 1997


Assigning forcefield atom types and charges

(which needs to be re-done if you switch to a different forcefield).


Please see the guidebook for the appropriate molecular modeling
program for details on how to assign atom types and charges for
the simulation engine you are using (see Available documentation).

What are atom types in forcefields?


The simulation engine needs the forcefield atom type of each atom
in the model in order to determine which forcefield parameters to
use. Forcefield parameters apply to particular combinations of
atom types as specified by the forcefield.
Relation between force- The forcefield atom types are related to the microchemical envi-
field atom types and ronment of the atoms in a way defined by the particular forcefield.
chemical atoms For example, a methane model has only two atom types, one for
the carbon and one for the hydrogens, even though each of the
atoms may have a distinct atom name for labeling purposes. The
hydrogen atoms are equivalent by symmetry; therefore, they
would all have the same atom type in any forcefield.
As a more complicated example, consider propane, which has four
distinct types of atoms: methyl carbon atoms, methyl hydrogen
atoms, a methylene carbon atom, and the methylene hydrogens. In
principle, a forcefield could consider these to be four distinct atom
types, but in practice, the chemical difference between the carbon
atoms or between the hydrogen atoms is very small, so in most
forcefields the carbon atoms are all assigned the same atom type,
and all the hydrogens are assigned a second atom type.

Assigning atom types to a model


Atom types and charges Atom types are assigned by the simulation engine or the molecular
supplied by the structure modeling program. Atom types are automatically assigned by
file using a set of rules that link the type of an atom to its element type
and its chemical microenvironment (for example, the number and
nature of connected atoms). Different forcefields use different
atom types and atom-typing rules, which are contained in a resi-
due library or the forcefield file.
The atom type information can also be supplied by a molecular
data file such as an .msi file (OFF), an .mdf file (Discover and OFF),
or an RTF (or PSF) file (CHARMm). These structure files are typi-

Forcefield-Based Simulations/October 1997 79


3. Preparing the Energy Expression and the Model

cally created in the Cerius2, Insight, or QUANTA molecular mod-


eling program.
To make sure that atom types are assigned:
♦ Cerius2 warns you when you try to perform some action for
which atom types need to have been assigned, if they have not
been assigned.
♦ Insight checks whether types (and charges) have been assigned
when you exit the model-building module and gives you an
opportunity to assign them.
♦ QUANTA takes care of assigning the atom types when the
model is saved in a structure file.
Charge information also is saved in the structure file.
To assure that you use the most appropriate atom types in your
studies, you should always check the assigned atom types against
the appropriate table under Forcefield Terms and Atom Types. In
most cases, Cerius2 and Insight automatically assign the atom
types. However, these assignment engines of course require the
models to be correctly built. One of the most critical types of infor-
mation is the bond order, which should be set before the forcefield
is assigned.
Atom typing in different In Cerius2 and Insight, atoms with unassigned atom types are
modeling programs labelled with question marks when you label the model according
to the atom type (FFTYPE or potential type).
♦ In Cerius2•Discover, use the Discover Atom Typing control
panel (accessed by selecting the Forcefield/Typing item on the
DISCOVER card) to assign all atom types and charges (if you
do not want to do this automatically). You can also select indi-
vidual atoms and manually assign an atom type different from
the one assigned automatically.
♦ In Cerius2•OFF, use the Force Field Atom Typing control panel
(accessed by going to the OFF SETUP deck of cards and select-
ing the Typing/Atoms card menu item) to assign all atom types.
Charges are also assigned if the forcefield being used contains
charge information. You can also select individual atoms and
manually assign an atom type different from the one assigned
automatically.

80 Forcefield-Based Simulations/October 1997


Assigning forcefield atom types and charges

♦ In Insight II, assignment of potential types (and charges) to


each atom is done with the Forcefield/Potentials parameter
block, which appears automatically when appropriate or can be
accessed from the Biopolymer, Builder, and other modules of
the Insight program. You can re-type individual atoms by using
the Atom/Potential parameter block in the Biopolymer or
Builder module.
♦ QUANTA automatically assigns atom types when you con-
struct or modify a model. Library (or “dictionary”) files for
commonly used chemical units (amino acids, nucleic acids, etc.)
are supplied with CHARMm. You can also manually assign
atom types different from what were assigned automatically,
by using the Molecular Editor (accessed from Applications/
Builders/3-D Builder on the main QUANTA menu bar).

Important
A newly assigned atom type (including associated parameters
such as mass and charge) replaces any previously assigned or
calculated value.

Important
Currently, the automatic atom-typing engines cannot
distinguish a metal atom from a metallic ion. Hence, by default,
“metal” atom types are assigned by the automatic typing
engines. So for metal ions, you need to assign the formal charges
and atom types by hand.

Assigning charges
Charges (when available) are generally assigned at the same time
as the forcefield atom type (see Assigning atom types to a model).
An atom-type charge is simply a fixed value associated with the
atom types. Overall neutrality of a model is not necessarily
achieved by assigning forcefield atom types. You may prefer to
assign charges specifically. (The exact method depends on the
molecular modeling program being used.)
Importance of correct Electrostatic interactions play a critical role in determining the
charge assignment structures of inorganic systems and in defining the packing of
organic molecules.

Forcefield-Based Simulations/October 1997 81


3. Preparing the Energy Expression and the Model

Many forcefields already Forcefields that include Coulombic terms generally already
include charges include standard charges (or “bond increments”) associated with
the atom types. These forcefields have been parameterized with
nonzero atom type charges or charge increments (Table 6) and
therefore you usually just assign charges automatically when you
do atom typing, instead of having to assign specific charges:

Table 6. Forcefields parameterized with nonzero atom charges or bond


increments

Forcefield Engine
CFF91–95, CFF, PCFF, COMPASS OFF, Dis-
cover
CVFF OFF, Dis-
cover
bks1.01 OFF
burchart1.01 OFF
burchart1.01-DREIDING2.21 (not all atom types) OFF
burchart1.01-UNIVERSAL1.02 (Burchart atom types only) OFF
glassff_1.01 OFF
MMFF93 CHARMma
msxx_1.01 OFF
CHARMm CHARMm
a
CHARMm as run through the Cerius2•MMFF module, not in QUANTA or stan-
dard CHARMm.

Important
If you want to assign charges different from those in the
forcefield, you need to assign the charges after atoms typing
(and automatic charge assignment) is done.

Finding charges, if If you need to specifically assign charges, most relevant modules
needed allow you to set atomic charges directly or specify an overall net
charge for the whole structure using charge editing functions.
For small models, you can obtain values for charges by using an ab
initio or semiempirical quantum chemistry module (for example,
MOPAC).
For larger, distorted, models and when charge assignment is done
by the charge equilibration method (in Cerius2), you usually need

82 Forcefield-Based Simulations/October 1997


Assigning forcefield atom types and charges

to perform a short minimization before assigning charges. This is


because charge equilibration calculations on distorted models can
lead to assignment of unrealistic charges.
Charge assignment in dif- ♦ In Cerius2•Discover, charges are automatically assigned when
ferent modeling programs atoms are typed.
♦ In Cerius2•OFF, if you are using UFF or the Dreiding forcefield,
charges should be assigned to the model by using the Charges
module, accessed from the OFF SETUP deck of cards (see
Cerius2 Forcefield Engines: OFF). The Charges module uses the
charge equilibration approach developed by Rappé & Goddard
(1991) to predict the charges from the model geometry and the
atomic electronegativities. If the model geometry changes
much during minimization, you should iterate the procedure of
reassigning charges and reminimizing until the energy reaches
a constant. The Charges control panel also allows you to edit or
assign charges manually.
♦ In Insight II for all forcefields except CFF, assignment of charges
(and atom types) to each atom is done with the Forcefield/
Potentials parameter block, which appears automatically when
appropriate or can be accessed from the Biopolymer, Builder,
and other modules. Potential function atom types must be (and
are) assigned before charges or partial charges are assigned.
The Insight program assigns atom types and partial charges to
each atom in the structure based on information in a residue
library file or (if not found in a residue library file) on the bond
increments found in the forcefield file. You can edit the charges
on individual atoms with the Atom/Charge parameter block in
the Biopolymer or Builder module.
♦ QUANTA automatically assigns charges when you construct or
modify a model. Library (or “dictionary”) files for commonly
used chemical units (amino acids, nucleic acids, etc.) are sup-
plied with CHARMm. You can also manually assign charges
different from what were assigned automatically, by using the
Molecular Editor (accessed from Applications/Builders/3-D
Builder on the main QUANTA menu bar).

Forcefield-Based Simulations/October 1997 83


3. Preparing the Energy Expression and the Model

Parameter assignment
Who should read this sec- If you are a novice user or routinely run relatively simple calcula-
tion tions on relatively simple or standard models, you do not need to
read this section. However, if, for example, an error message
informs you of missing parameters or you want to customize your
energy expression for an atypical model, then you do need to
understand how the simulation engines determine what parame-
ters are used for which atoms, bonds, angles, etc.
You should also understand something about atom type and
charge assignment (Assigning forcefield atom types and charges) to
make effective use of this section.

Determination of which parameters are used with which


atom types
Before calculating the energy of a model, the simulation engine
must construct the complete energy expression for the model by
associating the correct forcefield parameters with the appropriate
atoms and other coordinates. For example, methane has one type
of bond (C–H) and one type of bond angle (H–C–H). The program
must create a list of the four actual bonds and then associate the C–
H bond parameters with each. Similarly, there are six H–C–H
angles, but they are characterized by the same set of parameters.
It is important to understand how the parameters from the force-
field are associated with individual internal coordinates, because
the energy, derivatives, structures, and almost all other properties
calculated by the program depend on these forcefield parameters
and the way in which they are associated with the internal coordi-
nates. The following sections describe two facets of this process:
atom type equivalences and wildcards in parameter definitions.

Atom type equivalences


Chemically distinct atoms often differ in some, but not all, of their
forcefield parameters. For example, the bond parameters for the
C–C bonds in ethene and in benzene are quite different, but the
nonbond parameters for the carbon atoms are essentially the same.

84 Forcefield-Based Simulations/October 1997


Parameter assignment

In Discover, rather than duplicating the nonbond parameters in


the forcefield parameter file, atom type equivalences are used to
simplify the problem. In the example, the phenyl carbon atom type
is equivalent to the pure sp2 carbons of ethene insofar as the non-
bond parameters are concerned.
The Discover program recognizes five types of equivalences for
each atom type: nonbond, bond, angle, torsion, and out-of-plane.
Crossterms such as bond–bond terms have the same equivalences
(insofar as atom types are concerned) as the diagonal term of the
topology of all the atoms defining the internal coordinates. For the
bond–bond term, this means that the atom type equivalences for
angles would be used.
The actual format of the equivalence data in the forcefield param-
eter file is detailed in the File Formats documentation. For the
equivalences used in any particular forcefield, you should exam-
ine the actual forcefield parameter file for current information.
CHARMm PRM files handle equivalences for nonbond parame-
ters by using partial wildcards, for example, N* means that the
associated nonbond parameters apply to any nitrogen type that is
not specifically parameterized.
For forcefields in Cerius2•OFF, wildcards are usually used.

Wildcard atom types in the parameter file


For some internal coordinates, the parameters do not depend
strongly on the specific atom types of one or more atoms. For
example, the parameters of torsion terms may not be strongly
affected by the end atoms. This means that the torsion parameters
are essentially defined by the central bond rather than its substitu-
ents.
The forcefield engines allow wildcard atom types to conveniently
handle this type of situation. This special atom type, indicated by
an X in CHARMm .PRM files and in relevant Cerius2 forcefield
files and by an asterisk (*) in Discover forcefield files, matches any
atom type when the forcefield engine is searching for the parame-
ters to associate with a particular internal coordinate. (In
CHARMm, this applies only to bond, angle, torsion, and
improper-torsion parameters.)

Forcefield-Based Simulations/October 1997 85


3. Preparing the Energy Expression and the Model

Automatic assignment of values for missing parameters


Availability

Table 7. Automatic parameter assignment in MSI’s molecular modeling programs

Automatic
Modeling program parameters Comments

Cerius2 yes not for all forcefields


Insight II (CDiscover) yes but not for AMBER forcefield
insight II (FDiscover) yes controllable in standalone mode only, not for AMBER
QUANTA yes but only if the PSF Generator is used

What happens if parame- Some classic and second-generation forcefields are not completely
ters are not found parameterized for all their atom types. (For rules-based forcefields
(Rule-based forcefields broadly applicable to the periodic table), all
parameters are generated according to rules rather than read from
the forcefield file.) When parameters for classic and second-gener-
ation forcefields are not available, one of several things can hap-
pen, with varying consequences:
♦ The missing parameters are simply ignored (i.e., set to zero in
the energy expression). The simulation runs and yields results,
but they may be very inaccurate.
♦ Setup of the energy expression is interrupted, the simulation
run is not started, and a message is output to the textport or text
window.
♦ Missing parameters are obtained automatically from a simpler
generic set of parameters (using many wildcards, see above).
The results may be reasonable, but not as accurate as if specific
parameters existed.
Temporary patches for A forcefield may include automatic parameters for use when bet-
missing parameters; pre- ter-quality explicit parameters are not defined for a particular
cautions bond, angle, torsion, or out-of-plane interaction. These parameters
are intended as temporary patches, to allow you to begin calcula-
tions immediately. While MSI has made every effort to ensure that
the automatic parameters used in CVFF, the CFF family of force-
fields, and CHARMm produce reasonable geometries for a wide

86 Forcefield-Based Simulations/October 1997


Parameter assignment

variety of models, we cannot guarantee that the automatic param-


eters are appropriate in every instance. You therefore should
always carefully evaluate any results that you obtain using auto-
matic parameters.
How missing parameters Discover automatically assigns values for parameters missing
are supplied from the CFF and CVFF forcefields by switching to an automatic
forcefield. This switching is accomplished with an equivalence
table that converts the original set of atom types to a smaller set of
generic atom types.
Cerius2•OFF behaves similarly.
QUANTA’s parameter chooser looks through the existing
CHARMm parameters for similar cases and averages them all to
come up with suggested values.
Discover’s automatic In the automatic forcefield in the Discover program, the atom
forcefield types for bonds, angles, torsions, and out-of-plane deformations
have different levels of specificity. For example, while bond-
stretching parameters are determined by the atom types of both
atoms; angle-bending and torsion parameters may be determined
by the atom type of only the central atom(s). A wildcard (*), repre-
senting any type of atom, is used for the end atoms of torsions and
angles.
In some cases, angle-bending parameters are specified by two
atoms (rather than only the central atom). This can lead to ambigu-
ity—for example, C–C–N (if not explicitly defined in the force-
field) can be associated with c_–c_–* or with n_–c_–*. The
underscore in this example is used to denote the generic (or auto-
matic) atom types. Here, a one-sided wildcard (*#, where # is an
integer indicating the precedence), is used for one of the end atoms
in an angle.
Cerius2•OFF handles precedence with an additional field (P0, P1,
… P9) rather than wildcards.
In interpreting the wildcard, the Discover program and the
Cerius2•OFF module use the parameter for which the integer is
lower. The parameters for a C–C–N angle would, for example (if
not explicitly defined in the forcefield), be taken from those for
atom types n_–c_–*6 rather than c_–c_–*7, because 6 is smaller
than 7.

Forcefield-Based Simulations/October 1997 87


3. Preparing the Energy Expression and the Model

An example As an example, the parameters for the angle oh–c"–c" in oxalic acid
(Figure 3) are not present in CFF91.

h*
oh o’

c" c" h*

o’ oh
Figure 3. Oxalic acid structure and CFF91 atom types

When the automatic parameter assignment process is used in Dis-


cover, it looks at the auto-equivalence table in the cff91.frc file to
find the generic atom types for this angle (indicated in bold type):
#auto_equivalence cff91_auto

! Equivalences
! -----------------------------------------------------------------------------
!Ver Ref Type NonB Bond Bond Angle Angle Torsion Torsion OOP OOP
Inct End Atom Apex Atom End Atoms Center Atoms End Atom Center Atom
!--- --- ---- ---- ---- --- -------- ----------- --------- ------------ -------- -----------
2.0 2 c" c" c" c’_ c_ c’_ c_ c’_ c_ c’_
2.0 2 oh o o_ o_ o_ o_ o_ o_ o_ o_

Thus, for parameter assignment purposes only, atom type oh is


reassigned to o_, c" is reassigned to c’_ for the apex atom, and c" is
reassigned to c_ for the end atom. The parameters for the oh–c"–c"
angle could be taken from either the o_c’_*7 or the c_c’_*9 lines in
the quadratic_angle section of the cff91.frc file—o_c’_*7 is chosen
because 7 is lower than 9:

#quadratic_angle cff91_auto

> E = K2 * (Theta - Theta0)^2

!Ver Ref I J K Theta0 K2


!---- --- ---- ---- ---- --------- ---------
2.0 2 c_ c’_ *9 120.0000 40.0000

88 Forcefield-Based Simulations/October 1997


Parameter assignment

2.0 2 n_ c’_ *8 120.0000 53.5000


2.0 2 o_ c’_ *7 110.0000 122.0000
2.0 2 o’_ c’_ *6 120.0000 68.0000
2.0 2 h_ c’_ *2 110.0000 55.0000

Manual parameter assignment

Notification of missing parameters


If parameter(s) for a potential type are not present in the forcefield
file and are not generated when the energy expression is set up, an
appropriate error message is written to the textport or text win-
dow of your molecular modeling program. (In QUANTA, this
occurs only if you use QUANTA’s Applications/Builders tools to
construct your model; there is no warning for models that are read
in from some other application or database.)
Missing and/or automatic parameters are also listed in an output
file after the completion of a simultion run. You can find out if
parameters are missing before starting your run:
♦ Cerius2 notifies you if atom types are missing and lists them in
the text window. It then quits trying to set up an energy expres-
sion.
If terms are missing, in Cerius2•Discover the energy expression
is set up unless one or more of the Stop check boxes in the Dis-
cover Parameters control panel (accessed by selecting Force-
field/Parameters in the DISCOVER card) are checked.
If terms are missing, in Cerius2•OFF the energy expression is set
up by only if Ignore undefined terms is checked in the Energy
Expression control panel (accessed from the Energy Expres-
sion/Setup card menu item on the OPEN FORCE FIELD card).
♦ In Insight 97.0, you can request a list of missing parameters
with the Forcefield/Tabulate parameter block in the Builder or
Biopolymer module.
♦ In Insight 4.0.0, you can use the Forcefield/FF_Info parameter
block (in the Builder or other modules) to check the model for
unassigned potentials and charges.
♦ In QUANTA, you can request a parameter report with the
CHARMm/Parameters/Set Options menu item.

Forcefield-Based Simulations/October 1997 89


3. Preparing the Energy Expression and the Model

Obtaining new parameters


If the forcefield you are using is similar in functional form and
atom typing to another forcefield which does contain the desired
parameters, you may be able to use those parameters in your force-
field, at least on a trial basis. You may also be able to obtain new
parameters (or help in deriving them yourself) from the scientific
literature (see References) or from the developers of the forcefield
you are using.

Editing a forcefield
Expert users can edit MSI forcefields in different ways to custom-
ize them to their needs or to create new forcefields.
Editing through a graphi- MSI’s principal molecular modeling programs include forcefield
cal interface editors (see Available documentation):
♦ Cerius2 —Use the Force Field Editor (in the OFF SETUP card
deck), which allows you to edit all existing defined terms in the
current forcefield and also lets you create new forcefields. The
Cerius2•FFE module also allows you to create, edit, and delete
atom types and to change the rules on which automatic atom-
typing is based.
You can change the functional form of terms in rules-based
forcefields (see Rule-based forcefields broadly applicable to the peri-
odic table) by adding an explicit term. Any explicit term is
always used in preference to a generated term.

Important
The Cerius2•FFE module cannot be used to modify the CFF or
CVFF forcefield files that are included with Cerius2 —if you
want to use customized versions of these forcefields, you need
to modify them with Insight II 4.0.0 or as described in the
Discover documentation.

♦ Insight II 4.0.0—Use the Forcefield/Edit_FF parameter block,


which is found in several modules and allows you to edit the
parameters of the current forcefield for all terms except cross-
terms. (This parameter block is not included in Insight 97.0,
since well parameterized forcefields exist for life science appli-
cations.)

90 Forcefield-Based Simulations/October 1997


Parameter assignment

♦ QUANTA—The Edit/Atom Data/Parameters menu item


allows you to change a limited set of parameters for any atom
type. The PSF Generator allows you to edit automatically sup-
plied parameters when you do a CHARMm calculation in PSF
mode.
Manual editing Expert users can edit the files that define many forcefields with a
text editor:
♦ The classical and second-generation forcefields that are avail-
able through the Discover program—How to modify the
parameter file is explained in separate documentation. The
forcefield files are located in the directory defined by the
$BIOSYM_LIBRARY environment variable.
The potential template rule file used by Insight can also be
edited. You may add new atom types by making additions to
this file (refer to the Insight II documentation for a complete
description).
♦ CHARMm—Parameter file contents and formats are explained
in the electronic documentation supplied with CHARMM. The
forcefield file is located in the directory defined by the $CHM_
DATA environment variable—PARM.BIN is the binary version
and PARM.PRM the ASCII version.
♦ ESFF—Since this forcefield is rule-based, a parameter file
needs to be generated and then edited. Please see the CDiscover
book.
♦ CFF—Since this forcefield is encrypted, it is edited indirectly,
by means of a special template file, as explained in MSI Force-
fields: CFF.

Important
For forcefields accessed through Cerius2, we strongly
recommend that you never edit these files by hand. Please use
the Force Field Editor module. This is important because some
forcefield values are linked to others and only the Force Field
Editor reliably assures that related values are modified in a
coordinated way.

Forcefield-Based Simulations/October 1997 91


3. Preparing the Energy Expression and the Model

Using alternative forms of energy terms


The energy expression is the heart of the forcefield. Potential
energy is described in the energy expression as the sum of various
terms that indicate the energy costs of bond stretching, angle bend-
ing, etc. Not all terms are present in all forcefields, and the func-
tional forms of the terms vary among forcefields (see Forcefields).
This section and the following main section (Applying constraints
and restraints) describe energy term preferences that you can set
and restraint terms that can be optionally included in the energy
expression.
Who should read this sec- If you are a novice user, you should alter the default energy terms
tion and parameters as little as possible. One exception to this recom-
mendation is nonbond methods (see Handling nonbond interac-
tions), where you should choose the method according to the
model type rather than necessarily accept the default settings.
Availability

Table 8. Modifications of energy expression in MSI’s simulation engines

term modification engine (restrictions) details


a
any removal FFE , CHARMm here
all of a type scaling FDiscover, CDiscover here
all of a type editing FFE, Insight 4.0.0b here
bond stretching Morse vs. harmonic FDiscover, CDiscover (CVFF) here
bond stretching scaling OFFc (C–H bonds) here
torsion twisting scaled, averaged, OFF here
or first found
out-of-plane movement averaged or first OFF here
found
van der Waals interac- Lennard–Jones vs. FDiscover (standalone) here
tions quartic
hydrogen bond interac- if used, how set up OFF, FDiscover (standalone, AMBER), CDis- here
tions cover (standalone, AMBER), CHARMm
crossterms removal FDiscover, CDiscover (CVFF) here
1–3 or bond stretching– Urey–Bradley term OFF, CHARMm (standalone only) here
angle bending interac- vs. crossterm
tions

92 Forcefield-Based Simulations/October 1997


Using alternative forms of energy terms

aThe Force Field Editor module in Cerius2.


b
The Forcefield/Edit_FF parameter block in the Builder and other modules can be used to edit parameters
in all terms except crossterms for AMBER, CVFF, and CFF. (Parameter block not present in Insight 97.0.)
cThe Open Force Field module in Cerius2.

Most methods for changing the functional form of the energy


expression are available via the graphical UIFs:
♦ Cerius2•FFE, OFF—Controls in the Energy Terms control
panel in the Open Force Field module. (You can also use the
Energy Terms control panel in the Force Field Editor to modify
the forcefield itself.)
♦ CDiscover—Items in the Forcefield menu of the Cerius2•Dis-
cover module and commands in the Specify pulldown of the
Insight•Discover_3 module.
♦ FDiscover—Commands in the Parameters pulldown of the
Insight•Discover module.
♦ QUANTA—Controls in the CHARMm Energy Setup dialog
box (accessed via the CHARMm/Energy Terms menu item)
and the CHARMm Update Parameters dialog box.
Discover and CHARMm also offer additional functionality when
run in standalone mode.

Removing terms from the energy expression


Why remove terms You may, for example, want to save computation time during the
early stages of minimization of a model that is far from its equilib-
rium conformation by not calculating any cross terms. Or you may
have found that certain terms are insignificant with respect to the
purposes of your study.
How to remove terms You can effectively remove terms from the energy expression in
several ways:
♦ In Cerius2•OFF, you can use the Energy Terms Selection control
panel in the Open Force Field module to include or exclude
entire classes of terms (e.g., all bond–bond crossterms) when
setting up the energy expression.

Forcefield-Based Simulations/October 1997 93


3. Preparing the Energy Expression and the Model

♦ With the CVFF forcefield in Discover, you can choose to turn off
(i.e., not use) all cross terms.
♦ You can accomplish the same end for other terms in CVFF and
for any class of terms in the other forcefields supplied with the
Discover program by scaling terms with a zero scaling factor
(see next section).
♦ In CHARMm, you can omit (“skip”) any type(s) of terms or
constraints, for example, all bond terms.

Scaling or editing any selected type of term


Uses The contributions of various terms in the potential energy expres-
sion to the total energy can be scaled up or down and/or otherwise
edited. This can be useful, for example, in the early stages of min-
imizing very “bad” structures, where large contributions by cer-
tain terms might interfere with convergence.
How it works The Cerius2•Force Field Editor module allows you to directly
change the parameters in any term (e.g., for all C_3–H_ bond
terms, but not for all bond terms) in the forcefield. The Energy
Terms control panels include entry boxes for all relevant parame-
ters.
In the Discover program, scaling applies to an entire class of
energy terms (e.g., all bond terms) in the energy expression. The
force constants (or some other parameters) for the chosen class of
terms are multiplied by some constant factor. For example, all
bond interactions can be scaled by one factor and all van der Waals
radii by another.
In the Insight 4.0.0 molecular modeling program, you can use the
Forcefield/FF_Edit parameter block in the Builder and other mod-
ules to directly change the parameters in any term except cross-
terms (e.g., for all C–H bond terms, but not for all bond terms) in
the forcefield. The *_Par parameter blocks accessed through the
editor include entry boxes for all relevant parameters.

Alternative bond terms


With the CVFF forcefield in Discover, you can choose to use qua-
dratic bond terms rather than Morse bond terms. The Morse term

94 Forcefield-Based Simulations/October 1997


Using alternative forms of energy terms

can allow bonds to stretch to unrealistic lengths (Figure 2), so you


may get quicker convergence from a hightly distorted configura-
tion if you replace the Morse term by a harmonic term. You do this
by specifying the “no Morse” version of CVFF.

Scaled torsion terms


If all torsions about a common bond were simply summed, the tor-
sion energy term could be too large. Cerius2•OFF therefore allows
several methods for scaling torsion terms (Discover and
CHARMm automatically handle torsions optimally, because of
how their forcefields are parameterized):
Behavior in Cerius2•OFF ♦ The usual treatment in Cerius2•OFF is to divide the sum of all
the parameterized torsion terms around a common bond by the
number of torsions around that bond (as is done in Discover).
♦ An alternative method of scaling torsions is to use the energy of
only the first torsion found about a bond. This method is not
generally recommended, because the torsion term used (and,
therefore, the torsion energy) depends on the order in which
atoms are created.
♦ Calculated energies for torsions that are exocyclic to aromatic
rings (Figure 4), tend to be too high and may be scaled by an
additional factor, usually 0.4.

Figure 4. Torsion exocyclic to an aromatic ring

Forcefield-Based Simulations/October 1997 95


3. Preparing the Energy Expression and the Model

Inversion terms
The inversion, improper, or out-of-plane torsion term represents
the energy involved in inverting a chiral center or otherwise
changing this out-of-plane angle.
In Cerius2•OFF, you may use the first inversion term found or the
average of all inversion energies, but the first approach is not rec-
ommended.

Nonbond functional form


In the Cerius2•Force Field Editor module, you can use the van der
Waals Energy Terms control panels to choose among several non-
bond functional forms. (These are listed in the online help, which
is accessed by right-mouse clicking over the Function popup.)
However, you would have to change the relevant parameters as
well, if you wanted good results.
You can use the DSL (the FDiscover command language) set com-
mand to choose between the usual Lennard–Jones 6–12 or 6–9
potential (e.g., term 10 in Eq. 23) and a quartic form:

k [ ( sr min ) 2 – r 2 ] 2 Eq. 26

The quartic form is useful when you need to eliminate bad van der
Waals contacts, but the second derivatives are not calculated.

Hydrogen bonds and hydrogen-bond terms


Who should read this sec- Many forcefields, especially the newer ones, fully account for
tion hydrogen bonds by other terms in the forcefield and so do not
have or require specific terms for handling hydrogen bond inter-
actions.
However, some older forcefields include specific terms for hydro-
gen bonding (e.g., older versions of AMBER). Others (e.g.,
CHARMm) allow you to use a hydrogen bond term if you want
(but MSI does not recommend this). If you are using a forcefield
with explicit hydrogen bond terms, you should read this section.

96 Forcefield-Based Simulations/October 1997


Using alternative forms of energy terms

Lack of hydrogen bond If specific hydrogen bonds are required, generation of a list of
terms is an asset hydrogen bonds is a major step in evaluating the energy of a sys-
tem. This process involves looking at all possible pairs of hydro-
gen bond donors and acceptors and selecting those that meet
certain criteria (Figure 5):
♦ The hydrogen bond length is less than a defined cutoff.
♦ The deviation of D–H–A from linearity is less than a defined
cutoff. Typically, the best hydrogen bond has a D–H–A angle of
180°.

angle
A
H
D
ce
distan
Figure 5. Distance and angle criteria for hydrogen bonds
A = hydrogen acceptor; D = hydrogen donor.

Since hydrogen bond interactions depend on both angle and dis-


tance, both angle cutoffs and distance cutoffs must be specified for
a switching function (see Nonbond cutoffs). A switching, or spline,
function (Figure 15) is needed to conserve energy by smoothing
transitions over the cutoffs.
Specifying the criteria In Cerius2•OFF, the hydrogen bond criteria can be changed by
using the Hydrogen Bond Preferences control panel (accessed by
selecting the Energy Terms/Hydrogen Bond menu item from the
OPEN FORCE FIELD card). This control panel also allows speci-
fication of switching function (“spline”) parameters.
In Discover, default hydrogen bond criteria are contained in the
forcefield file (amber.frc). CDiscover allows you to use the BTCL
forcefield scale command to scale hydrogen bond terms (if they
exist). In FDiscover, they can be changed by editing the command
input file to change the variables HBDIST and HBANGL.

Forcefield-Based Simulations/October 1997 97


3. Preparing the Energy Expression and the Model

In CHARMm, you can change the hydrogen bond criteria with the
CHARMm Update Parameters dialog box, which is accessed from
the CHARMm/Update Parameters menu item. This dialog box
also allows specification of switching function parameters for
hydrogen bonds. Setting the Update Frequency to 0 (the default)
effectively omits the hydrogen bond term from the potential
energy expression. You can also omit explicit hydrogen bond
terms by using the CHARMm/Energy Terms menu item.

Bond–angle cross terms vs. Urey–Bradley terms


An alternative or supplement to bond–angle interactions is the
Urey–Bradley term, which accounts for 1–3 interactions between
two atoms that are bonded to a common atom.
In Cerius2•OFF, use the Energy Terms Selection control panel to
specify whether to use the Urey–Bradley term (assuming it is
available in the current forcefield).
In CHARMm, the Urey–Bradley term, if present, can be omitted
from the energy expression (standalone only) or can be specified
in the parameter file (ANGLE statement).

Applying constraints and restraints


Why read this section Constraints and restraints allow you to focus the calculation on a
region or conformation of interest and also to set up computational
experiments. Such experiments are one of the primary uses of
molecular modeling, allowing you control over a model at the
atomic level. Several examples are described under When to use
constraints/restraints.
Restraints vs. constraints The seminal difference between a constraint and a restraint is that
a constraint is an absolute restriction imposed on the calculation,
while a restraint is an energetic bias that tends to force the calcula-
tion toward a certain restriction (even though many people use
these terms as if they were interchangeable).

98 Forcefield-Based Simulations/October 1997


Applying constraints and restraints

Availability

Table 9. Constraints and restraints in MSI’s simulation engines

constraint/re-
straint type enginea details
b
atom fixed (constraints) OFF , FDiscover, CDis- here
cover, CHARMm
template forc- harmonic (Eq. 28) restraint FDiscover here
ing
tethering and quadratic (Eq. 29) restraint CDiscoverc here
template
forcing
tethering harmonic (Eq. 28) restraint FDiscover here
tethering mass-weighted harmonic (Eq. 30) restraint CHARMm here
quartic droplet harmonic (Eq. 31) restraint CHARMm here
distance harmonic (Eq. 32) restraint OFF here
distance quadratic (Eq. 29), flat-bottomed (Eq. 34), CDiscover3 here
or cosine (Eq. 36) restraint
distance harmonic (Eq. 32) or flat-bottomed (Eq. 33) FDiscover here
restraint
distance flat-bottomed (Eq. 35) restraint CHARMm here
dynamics RATTLE algorithm (constraints) CDiscover here
dynamics SHAKE algorithm (constraints) CHARMm here
dynamics consensus dynamics (Eq. 28) (standalone FDiscover, CDiscover here
only)
angle harmonic (Eq. 37) restraint OFF here
angle quadratic (Eq. 29), flat-bottomed (Eq. 34), CDiscoverc here
or cosine (Eq. 36) restraint
torsion harmonic (Eq. 38) restraint OFF here
torsion quadratic (Eq. 29), flat-bottomed or J3 dihe- CDiscoverc here
dral (Eq. 34), cosine (Eq. 36), cis (Eq. 39),
trans (Eq. 40), or cis/trans (Eq. 41) restraint
torsion flat-bottomed (Eq. 33) restraint (standalone FDiscover here
only)
torsion cosine (Eq. 42) or harmonic (Eq. 38) torque FDiscover here
(one of these is standalone only)
torsion harmonic (Eq. 38) restraint CHARMm here
inversion harmonic (Eq. 43) restraint OFF here

Forcefield-Based Simulations/October 1997 99


3. Preparing the Energy Expression and the Model

Table 9. Constraints and restraints in MSI’s simulation engines

constraint/re-
straint type enginea details
3
chiral flat-bottomed (Eq. 34) restraint CDiscover here
out-of-plane quadratic (Eq. 29), flat-bottomed or J3 dihe- CDiscover3 here
dral (Eq. 34), or cosine (Eq. 36) restraint
out-of-plane harmonic (Eq. 43) restraint (standalone FDiscover here
only)
aThe standalone modes of running simulation engines may give access to additional constraints and
restraints—please see the appropriate documentation.
b
The Open Force Field module in Cerius2.
c
Not available yet in the Cerius2•Discover module; restraints applied with CDiscover (in Insight and stan-
dalone modes) can also be scaled.

Most restraints and constraints are available via the graphical


UIFs:
♦ Cerius2•OFF—Controls in the Restraints control panel, which
is accessed from the Energy Terms/Restraints card menu item
and in the Atom Constraints control panel, which is accessed
from the Atom Constraints card menu item. The latter also
allows you to color-code immovable atoms.
♦ Cerius2•Minimizer—Controls in the Atom Constraints control
panel, which is accessed from the Constraints/Atoms card
menu item.
♦ CDiscover—Controls in the Atom Constraints control panel of
the Cerius2•Discover module and commands in the Specify
pulldown of the Insight•Discover_3 module.
♦ FDiscover—Commands in the Constraint pulldown of the
Insight•Discover module.
♦ QUANTA—Controls accessed by the CHARMm/Constraints
Options and CHARMm/SHAKE Options menu items.
Discover and CHARMm offer additional restraint and constraint
functionality when run in standalone mode.

100 Forcefield-Based Simulations/October 1997


Applying constraints and restraints

When to use constraints/restraints


Constraints and restraints are often used to control and direct the
minimization.
Fixed-atoms example For example, you can fix some atoms in space, not allowing then
to move. For example, part of the structure of a molecule may have
been well solved experimentally, but the structures of other areas
are less clear. Or you might want to keep parts of your model (e.g.,
solvent molecules) rigid to decrease computational costs.
Torsion-rotation example You can add extra terms to the energy expression to restrain or bias
the system in certain ways. For example, if you are investigating
the adiabatic energy barrier to rotation about a bond, you would
restrain the value of that torsion and minimize the structure.
Repeating this procedure for a set of torsion values in the range 0°–
360° yields a complete energy profile for rotation about the bond.
A similar process is used to generate phi/psi maps and other mul-
tidimensional energy surfaces in studies of model conformation.
Docking example If a substrate is being docked onto an enzyme and a specific hydro-
gen bond between the enzyme and the ligand is thought to be
involved in binding, the donor and acceptor atoms can be pulled
together to provide a docking coordinate. In this way, the results
are not so dependent on the initial starting configuration, which
may have been only a crude graphic alignment. In cases like this,
the restraint is turned off at some point to make sure that the
biased minimum is close to a true minimum.
Modeling incomplete Another example of the use of restraints is in modeling incomplete
models systems. Often, it is difficult or impossible to construct a realistic
environment around parts of a model system. For example, only a
partial structure of a large protein complex may be available, and
some atoms must be restrained to stay near their initial crystal
positions because they do not “feel” interactions with neighboring
(missing) amino acids, membrane, or solvent. If the site of interest
(for instance, a binding site for a competitive inhibitor) is well
characterized but other parts of the enzyme are unknown or
would require too much computation time if they were included,
a limited study can still be carried out with the ends tethered to
their crystal coordinates. Usually, these restraints are permanent
parts of the model. The results of such calculations must be criti-
cally evaluated but can be valid if the ligand binding does not

Forcefield-Based Simulations/October 1997 101


3. Preparing the Energy Expression and the Model

depend on interactions with missing pieces of the model or on con-


formational flexibility in the tethered regions.
Relaxing crystal structures As a final example, tethering can be used to gently relax a crystal
structure. Often, crystal coordinates, even if highly refined, have
several strained interactions due to intrinsically disordered or
poorly defined atomic positions, which, upon minimization, give
rise to large initial forces. If these forces are not restrained, they can
result in artifactual movement away from the original structure.
The general approach is to progressively relax parts of the model
in stages, starting with the least well determined atoms, until the
entire system can minimize freely. The restraints are ultimately
removed so that the final minimum represents an unperturbed
conformation. It is usually not necessary to minimize to conver-
gence at each stage—the object is to relax the most-strained parts
of the system as quickly as possible without introducing artifacts.

Fixed atom constraints


Cost-saving Fixed atoms are constrained to a given location in space; they can-
not move at all. Fixed atoms reduce the expense of a calculation in
two ways:
♦ Terms in the energy expression involving only fixed atoms can
be eliminated, because they merely add a constant to the total
energy. Since the positions of fixed atoms cannot change, nei-
ther can the contribution of the terms that depend only on these
positions. (Interactions between moving and fixed atoms are
calculated.)
♦ Fixing atoms reduces the number of degrees of freedom in the
system, so minimizers converge in fewer steps and dynamics
requires fewer steps to sweep out the available conformational
space.

Important
The energy calculated by simulation engines is correct only to
an arbitrary constant, depending on the model as well as the
fixed atoms. Thus, only differences in energy between
conformations of the same model having the same fixed atoms
are meaningful.

102 Forcefield-Based Simulations/October 1997


Applying constraints and restraints

Uses Use atom constraints when you want to apply minimization or


dynamics to part of a model, while keeping the remainder of the
model fixed and rigid. For example, use atom constraints to
quickly minimize a sorbate in a zeolite by fixing the atom positions
of the zeolite frame and allowing only the sorbate atoms to move.
Or fix all residues in a protein except for those in the active site.

Template forcing, tethering, quartic droplet restraints,


and consensus conformations
Uses Typical uses of these related types of restraints are to bias the con-
formation of one model towards that of another, to bias selected
atoms towards their experimentally known positions, to restrain
the core of a model while allowing its solvent-exposed constitu-
ents more freedom of movement, or to find an identical or close set
of conformations that a group of related models can achieve.
Template forcing To force the conformation of one model to be similar to that of a
template model, a one-to-one correspondence between atoms in
the template and in the moving structure is set up, and (for exam-
ple) one of the following restraint terms is added to the energy
expression:

1/2
N  2
 R – R template
 i 

i
E = K --------------------------------------- Eq. 27
N
pairs

or:

N
 2
E =
∑ K i  R – R template
 i i 
Eq. 28

pairs

The term in Eq. 27 is proportional to the root-mean-square (rms)


deviation of the analog atoms from the template atoms. (This form
cannot be used with the Newton–Raphson minimizer in FDis-
cover.) The values obtained for the energy and the rms function

Forcefield-Based Simulations/October 1997 103


3. Preparing the Energy Expression and the Model

depend on the value of the forcing parameter K. Typical values for


this constant are in the neighborhood of 5 kcal Å-1. It is often
instructive to look at the dependence of the energy and rms func-
tions on the forcing parameter by making several determinations
with different forcing parameters. If several runs (minimization or
dynamics) are made, it may also be helpful to plot the energy as a
function of the rms value. For tethered minimizations, a very large
forcing constant (e.g., 2000.0 kcal Å-1) is often used to prevent sig-
nificant movement of any of the tethered atoms.
Eq. 28 represents a conceptually more straightforward restraint,
with each atom restrained by an isotropic spring to the position of
its template atom. In either form, the summation is over a list of
pairs of atoms to restrain: one from the moving model, and one
from the template model. FDiscover uses this quadratic form by
default.
The Ki in Eq. 28 are determined by the distance of atom i from the
atom defining the origin as:


0 r < r min

K i =  k min + k max – k min × ( r – r min ) ⁄ ( r msx – r min ) r min ≤ r ≤ r max


k max r max < r

where rmin is the distance at which the tethering turns on, kmin is
the initial force constant at that distance, rmax is the distance where
the force constant reaches its maximum allowed value, and kmax is
the maximum allowed force constant. If rmin and kmin are not
given, the default values are zero. If rmax is zero, tethering uses a
constant force constant of kmax.
In CDiscover, a simpler quadratic is used:

E = scale factor k ( V – V 0 ) 2 Eq. 29

where V is any appropriate internal (bond length, angle, etc.—the


same functional form is used for several types of restraints).
Advantages of each type The first form (Eq. 27) gives the best rms fit for the least energetic
of template-forcing cost, but individual atoms may remain quite far from their tem-
restraint plate position. The second form (Eq. 28) restrains each atom indi-

104 Forcefield-Based Simulations/October 1997


Applying constraints and restraints

vidually, so each atom is forced toward its template partner. The


resulting rms fit is not as good as that from Eq. 27, but no one atom
is allowed to deviate as much as is possible with Eq. 27. The form
in Eq. 28 also allows for a different force constant for each pair,
which means that different atoms or classes of atoms can be
treated differently.
Tethering Tethering is the same as template forcing, except that the atoms are
restrained to their original positions rather than to positions in a
template structure. Both Eq. 27 and quadratic forms are applicable
for tethering; however, Eq. 28 is used by FDiscover and Eq. 29 by
CDiscover, because tethering is usually used to keep atoms from
moving too far from their original positions.
CHARMm allows mass-weighted tethering by calculating an
additional energy term for all atoms that are to be restrained. This
term has the form:

E cons =
∑ k m (r – r )
i i i 0
n Eq. 30

Where Econs is the constraint energy, ki is the force constant, mi is


the mass of atom i (if mass weighing is used) or 1, ri is the position
of atom i, r0 is the reference position about which the atom is to be
centered, and n is an exponent.
Quartic droplet restraint The quartic droplet restraint term in CHARMm is designed to put
the entire model into a “cage” by constructing a restraining sphere
around a model. The potential is scaled so that atom positions fur-
thest from the center of mass or the geometric center of the model
have the greatest restraining force applied.
The quartic droplet restraint term is based on the center of mass
(COM) or the center of geometry of the model. No net force or
torque is introduced by the center of mass term. The potential
function is:

E droplet = k
∑ m (r – r i i COM )
n Eq. 31

Forcefield-Based Simulations/October 1997 105


3. Preparing the Energy Expression and the Model

FDiscover (standalone only) allows similar restraints within


spherical shells.
Consensus dynamics Consensus dynamics is used to find the consensus configuration
of a set of analogs. In essence, all models in the set are treated as
both moving molecules and templates.
Standalone FDiscover uses the harmonic template-forcing
restraint (Eq. 28).
The database capability of CDiscover is used in settng up consen-
sus dynamics calculations, using the restraint in Eq. 29.

General internal-coordinate restraints


In CHARMm, you can apply general internal coordinate restraints
by applying restraints to all bonds, angles, and/or dihedral angles
that have entries in an internal coordinates table. This facility is
global, that is, not applicable to specific internal coordinates.

Distance and NOE restraints


Uses Distance restraints are used to bias the distance between two
atoms, bonded or not, toward a given value. Some uses are to
cyclize linear models by bringing the ends closer together, dock
different models, and fit distance data derived from NOE and
other experiments.
Several functional forms Several commonly used functional forms are supported.
for distance restraints: har-
One is a simple harmonic function, which in FDiscover has the
monic…
form:

E = K ( R ij – R target ) 2 Eq. 32

where K is a force constant, Rij is the current distance between the


atoms, and Rtarget is the target distance. A large force constant
tends to force the distance to be close to the target distance; a
smaller force constant results in a correspondingly smaller bias.
In CDiscover, the quadratic form is the same, except that scaling is
enabled (Eq. 29).

106 Forcefield-Based Simulations/October 1997


Applying constraints and restraints

In Cerius2, the form is the same, except that K is multiplied by 0.5.


Rtarget can be defined explicitly or automatically extracted from
the model as the current distance between atoms.
…and flat-bottomed… The second form is also harmonic, but it is separated into several
piecewise continuous regions, resulting in a flat-bottomed poten-
tial (Figure 6). For FDiscover, the form is:

 E + ( R – R )F R ij < R 1
 1 1 ij

 K 2 ( R ij – R 2 ) 2 R 1 < R ij ≤ R 2

E = 0 R 2 < R ij ≤ R 3 Eq. 33

 K ( R – R )2 R 3 < R ij ≤ R 4
 3 ij 3
 E + ( R – R )F R 4 < R ij
 4 ij 4

For CDiscover, the flat-bottomed form is:

 k ( V – V0 )
2 V ≤ V0

E = scale factor  0.0 V0 < V < V1 Eq. 34

 k ( V – V1 )
2 V ≥ V1

where V is any appropriate internal (bond length, angle, etc.—the


same functional form is used for several types of restraints).
The restraining potential used in CHARMm is:

k min
 ---------- ( R – R min ) 2 R < R min
 2

0 R min < R < R max

E =  k min Eq. 35
 ---------- ( R – R max ) 2 R max < R < R lim
 2
 R lim + R max
 f  R – ----------------------------  R <R
 max  2  lim

Where Rlim is the value of R where the force equals fmax.

Forcefield-Based Simulations/October 1997 107


3. Preparing the Energy Expression and the Model

0
R1 R2 R3 R4

E(r)

0
R1 R2 R3 R4

0
R1 R2 = R3 R4

Figure 6. Distance restraint function


E as a function of R, the distance between two atoms or dihedral angles,
defined as in Eq. 33.

…and cosine CDiscover also allows a cosine form of restraint:

k
E = scale factor --- ( 1 – cos ( n ( V – V 0 ) ) ) Eq. 36
2

108 Forcefield-Based Simulations/October 1997


Applying constraints and restraints

where V is any appropriate internal (bond length, angle, etc.) and


n is the periodicity.
Advantages of the flat- It is not necessary for the flat-bottomed potential (Figure 6) to be
bottomed functional form symmetric. By appropriate definition of the points R1, R2, etc., any
of the regions may be eliminated. For Eq. 33, the important regions
are those from R1 to R2 and from R3 to R4, where a harmonic poten-
tial is applied, and the flat bottom from R2 to R3.
This form of the restraint allows a range of acceptable distances
and is particularly useful for incorporating experimental distance
information, such as those from NOE experiments, into a calcula-
tion. The flat bottom allows for experimental error in the deter-
mined distance. The two outer regions (Figure 6) have a constant
gradient, which is useful for avoiding unreasonably large forces if
the initial structure is far from the target value.

Distance and angle constraints in dynamics simulations


The RATTLE and SHAKE algorithms effectively remove very-
high-frequency vibrations from consideration during dynamics
simulations. Use of these algorithms can allow for a larger time
step during simulation.
In CDiscover, the BTCL rattle command is used before the dynam-
ics command to set up constraints in bonds, angles, or water mol-
ecules in a molecular dynamics simulation. It can be used to
constrain bonds or any atom pairs to user-defined distances. It can
be used to constrain angles spanned by two constrained bonds. In
addition, it can be used to fix the geometry of water molecules so
that the fixed-geometry water models SPC and TIP3P can be used
in a simulation. This functionality is available in a limited way via
the Calculate/Dynamics parameter block in the Discover_3 mod-
ule of Insight (click More to display the Rattle toggle).
In CHARMm, SHAKE is used to constrain bond lengths and
angles spanned by two constrained bonds during dynamics runs.
(However, its use is recommended only for constraining all bonds
in which one of the bonded atoms is a hydrogen.) The SHAKE
algorithm cannot be used with the Newton–Raphson or ABNR
minimizers (see Minimization).

Forcefield-Based Simulations/October 1997 109


3. Preparing the Energy Expression and the Model

Angle restraints
In Cerius2, an angle restraint can be applied to a group of any three
atoms. The restraint is implemented such that:

E = 0.5K a ( θ – θ 0 ) 2 Eq. 37

Where: Ka is the angle force constant; θ is the angle between the


selected atoms; and θ0 is the desired restrained angle of the
selected atoms. θ0 can be defined explicitly or can be automatically
extracted from the model as the current angle connecting selected
atoms.
In CDiscover the default form of angle restraints is cosine (Eq. 36).
Quadratic (Eq. 29) and flat-bottomed (Eq. 34) angle restraints can
also be used.

Torsion restraints
Uses Some uses of torsion restraints are to enforce chiral and prochiral
centers, prevent cis–trans conversions, and fit NOE J-coupling con-
stants from NMR experiments. Conversely, other uses are to force
torsion rotation in order to perform phi/psi mapping, perform
conformational searching, and induce conformational changes.
Functional forms Several forms of torsion restraints are used in the literature and
implemented in MSI’s simulation engines.
Harmonic restraints, or periodic restraints (Eq. 42 with n = 1), are
appropriate for forcing a torsion angle to a particular value. The
periodic form with a periodicity greater than one is useful for
restraining a torsion to one of several related angles. For instance,
a threefold potential could keep a torsion either trans or at one of
the two gauche conformations, depending on the starting confor-
mation and the strength of the potential applied.
Implementation In Cerius2•OFF, a torsion (dihedral) restraint can be defined
among any group of four atoms. The restraint is implemented such
that:

E = K t ( φ – φ0 )2 Eq. 38

110 Forcefield-Based Simulations/October 1997


Applying constraints and restraints

Where Kt is the torsion force constant; φ is the angle between the i–


j–k and j–k–l planes; and φ0 is the desired restrained angle of the
selected atoms, which can be defined explicitly or automatically
extracted from the model as the current angle connecting selected
atoms.
In CDiscover, you can specifically restrain dihedrals to be cis:

k
E = scale factor --- ( 1 – cos φ ) Eq. 39
2

or trans:

k
E = scale factor --- ( 1 + cos φ ) Eq. 40
2

or either cis or trans:

k
E = scale factor --- ( 1 – cos ( 2φ ) ) Eq. 41
2

You can also use the flat-bottomed function (Eq. 34) to apply J3
dihedral restraints to fit the results of NOE experiments. A plain
cosine form (Eq. 36) and a quadratic form (Eq. 29) are also avail-
able. The torson involving any four atoms can be restrained.
In FDiscover, the functional forms include a simple harmonic form
analogous to Eq. 32 and a piecewise continuous form like Eq. 33
with R interpreted as the angle, rather than the distance. Another
form is the periodic function of Eq. 42:

E = V [ 1 + cos ( nφ – φ 0 ) ] Eq. 42

where V gives the strength of the restraint, n is an integer period-


icity, and φ0 is the phase angle.
CHARMm uses a harmonic potential to restrict the motion of a
dihedral angle to a value close to a reference position or to examine
a series of different conformations when making potential energy
maps.

Forcefield-Based Simulations/October 1997 111


3. Preparing the Energy Expression and the Model

Inversion, out-of-plane, and chiral restraints


Uses Typical uses include prevention of changes in chirality or
prochirality. (A molecule is chiral if no stable conformation of it can
be superimposed on its mirror image—most chiral organic mole-
cules can be described in terms of chiral centers, i.e., an atom that
has four distinct substituents. Two chemically identical substitu-
ents on an otherwise chiral tetrahedral center are prochiral; in addi-
tion, sp2 hybridised planar systems with three different
substituents are considered prochiral.)
Implementation In Cerius2•OFF, an inversion (improper torsion or out-of-plane
angle) restraint can be defined among any four atoms i, j, k, l,
where i defines the inversion center. The restraint is implemented
such that:

E = Ki ( χ – χ0 ) 2 Eq. 43

Where Ki is the force constant for the out-of-plane; χ is the angle


between the i–j–l and i–k–l planes; and χ0 is the desired restrained
out-of-plane angle of the selected atoms, which can be defined
explicitly or automatically extracted from the model as the current
angle connecting selected atoms. There must be a real atomic cen-
ter for the inversion.
The CDiscover program can impose a flat-bottomed chiral
restraint (Eq. 34) to invert the chirality or force it to be R or S.
CDiscover can also impose a cosine (Eq. 36), quadratic (Eq. 29), or
flat-bottomed (Eq. 34) out-of-plane restraint.
The FDiscover DSL language can be used to impose chirality and
prochirality restraints having the same functional form as Eq. 43,
where χ0 is the out-of-plane angle corresponding to R or S.

Plane and other geometrical constraints and restraints


The BTCL language of CDiscover allows sophisticated geometric
manipulation of molecular and other objects, including constraints
and restraints, by means of the geometry, molGeom, restraint and
other commands. A subset of this functionality is accessible in the

112 Forcefield-Based Simulations/October 1997


Modeling periodic systems

Calculate/Geometric parameter block in the Insight•Discover_3


module.

Modeling periodic systems


Why read this section Periodic boundary conditions refers to the simulation of models con-
sisting of a periodic lattice of identical subunits. By applying peri-
odic boundaries to simulations, the influence, for example, of bulk
solvent or crystalline environments can be included, thereby
improving the rigor and realism of a model.
Availability

Table 10. Periodic boundary methods in MSI’s simulation engines

periodicity engine details


a
minimum image OFF , FDiscover, CHARMm here
explicit image FDiscover, CDiscover, CHARMm here
crystal simulations OFF, CDiscover, CHARMm here
bonds across boundaries OFF, CDiscover, CHARMm here
a
The Open Force Field module in Cerius2.

Most methods for controlling the treatment of periodic systems are


available via the graphical UIFs:
♦ CDiscover—The program detects whether a system is periodic
(in Cerius2: fully automatic, depends only on which model is
current; in Insight: semi-automatic, you need to execute the
Setup/System parameter block) and displays the appropriate
controls or parameters in the interface of the Cerius2•Discover
or Insight•Discover_3 module.
♦ FDiscover—You can choose the minimum-image or explicit-
image convention in the Parameters/Variables parameter block
of the Insight•Discover module. You have to specify whether a
system is periodic by toggling PBC (periodic boundary condi-
tions) in the Run/Run parameter block.

Forcefield-Based Simulations/October 1997 113


3. Preparing the Energy Expression and the Model

♦ QUANTA—Use the CHARMm/Periodic Boundaries menu


item to turn periodic boundary conditions on and off and to
specify where to obtain this information.
Discover and CHARMm offer additional functionality when run
in standalone mode.
Models are specified in Some simulation engines accept only Cartesian coordinates, not
Cartesian space crystal coordinates (others are able to convert between the two sys-
tems). This is important when using asymmetric space groups,
since the symmetry operators assume that the input coordinates
correspond to the standard asymmetric unit as defined in the
International Tables for Crystallography (Reidl 1983).
For Discover, it is assumed that the x Cartesian axis corresponds to
the a crystal axis and that the b axis lies in the x,y plane (see
Figure 7).

b
x a

Figure 7. Relationship between Cartesian coordinate system (xyz) and


periodic system (abc) in Discover and CHARMm

For Cerius2•OFF, by default the c lattice vector is parallel to the z


Cartesian axis and the b lattice vector lies in the y,z plane
(Figure 8).
CHARMm can handle models that are defined in either crystal or
Cartesian space. In converting from crystal to Cartesian axes, the

114 Forcefield-Based Simulations/October 1997


Modeling periodic systems

b
z c

Figure 8. Relationship between Cartesian coordinate system (xyz) and


periodic system (abc) in Cerius2•OFF

a, b, c crystal axes are aligned with the x, y, z Cartesian axes


(Figure 7).

Minimum-image model

Tip
For periodic systems in which nonbond interactions dominate,
the Ewald sum method (Ewald sums for periodic systems) is
preferred over the the minimum-image convention.

Simulation in bulk solvent The left side of Figure 9 shows a solute molecule surrounded by
enough solvent to occupy the volume (and shape) of a cube. A sim-
ulation carried out on this isolated cubic system is a poor approx-
imation of what would happen in a true bulk solvent environment.
For example, the solute can diffuse toward a surface or solvent
molecules can evaporate. To remedy this, on the right of Figure 9
the cube is replicated in three dimensions to form a 3 × 3 × 3 lattice
of identical cubes. This is a much better representation of bulk sol-
vent for the interior cube, because molecules near the surfaces now
interact with solvent in adjacent cubes. The imaged atoms are used
to calculate energies and forces on the real atoms in the interior
cube. The energies and forces on the imaged atoms themselves are

Forcefield-Based Simulations/October 1997 115


3. Preparing the Energy Expression and the Model

not calculated because their motions are computed as symmetry


operations on the real atoms, for example, by translations along
the cubic axes.

solute

Figure 9. Solute surrounded by solvent


A solute surrounded by an isolated cube of solvent is replicated periodically in
three dimensions in order to better represent a bulk or crystalline environment.

Implications of minimum- Consider the implications of this model for a specific case. In
image model for calculat- Figure 10, molecule A1 is located near an edge of the square. (For
ing nonbond interactions simplicity, this discussion focuses on a two-dimensional lattice.) In
addition, eight images of A1 (A2–A9) are present in the adjacent
symmetrically related squares. Consider the interactions of mole-
cules A with molecules B. The closest image of B to A1 is actually
not B1, but rather B5. If molecules in the interior cell are allowed to
interact only with the molecule or molecular image closest to it,
this is called a minimum-image model. Each molecule interacts only
with those molecules and images within a distance of half the cell
size. The advantage of this approach is its simplicity. It is straight-
forward to compute energy between a given pair of molecules
without explicitly keeping track of the images in neighboring cells.
All periodic boundary algorithms imply a cutoff criterion, but the
minimum-image convention implies a maximum distance for this
cutoff of no more than half the cell dimensions.
For a description of the minimum-image convention, see also
Allen and Tildesley (1987).

116 Forcefield-Based Simulations/October 1997


Modeling periodic systems

A2 A3 A4
B2 B3 B4

A5 A1 A6
B5 B1 B6

A7 A8 A9
B7 B8 B9

Figure 10. Minimum-image model


Minimum-image model showing that each real molecule interacts with at most
only one image of each real molecule.

Explicit-image model
A more general Simulation engines (Discover and CHARMm) can also use a more
approach—ghost mole- general approach by generating explicit images of the interior mol-
cules ecules, also called ghost molecules, which interact with the interior
molecules. These ghost molecules are replicated to as great a dis-
tance as necessary (but no farther than necessary) to satisfy the
desired potential energy cutoff criteria.
The left side of Figure 11 shows molecule A1 interacting with sev-
eral images of B (B1, B2, B3, B5) within the specified cutoff radius
(shown as a shaded circle centered on A1). A1 interacts with sev-
eral of its own images as well (A3, A5, A6, A8).
Cutoff distances and non- The right side of Figure 11 shows which molecules in the adjacent
bond interactions unit cells become explicit ghost molecules for a given cutoff dis-
tance. Not every molecule in an adjacent cell becomes a ghost.
However, if a cutoff distance that is longer than the cell length is
used, ghosts from unit cells beyond the nearest neighbor cells may
be included. As molecules (effectively, see below) move in and out
of the boundaries, the molecules that are ghosts can change. There-
fore, the ghost list is regenerated periodically.

Forcefield-Based Simulations/October 1997 117


3. Preparing the Energy Expression and the Model

A2 A3 A4 A2 A3 A4
B2 B3 B4 B2 B3 B4
ghost molecules

A5 A1 A6 A5 A1 A6
B5 B1 B6 B5 B1 B6
real
cutoff molecules

cutoff
A7 A8 A9 A7 A8 A9
B7 B8 B9 B7 B8 B9

Figure 11. Explicit-image model


Explicit-image model showing how a cutoff distance defines which molecules
in adjacent unit cells are selected as ghost images. (Different cutoff distances
are used in the left and right figures.) Left: explicit-image model—a larger cutoff
including interactions with more images is possible than with the minimum-
image convention; right: the shaded region identifies which molecules are
selected as ghost images within the cutoff distance of any molecules in the unit
cell.

Nonbond interactions do not have to be calculated between ghost


atoms. This helps to significantly reduce computation time.
When group-based cutoffs (Charge groups and group-based cutoffs)
are used, the nonbond potential is cut off on the basis of charge
groups (i.e., only if two groups are within the cutoff is the interac-
tion calculated), and only those groups in molecular ghosts that
are within the cutoff distance of a real group are included in the
ghost atom list.
How images and “real” Ghost molecules follow their symmetrically related counterparts.
molecules move However, when it comes time to move the molecules (in a dynam-
ics step or minimization iteration), only the real molecules (A1 and
B1) are actually moved according to the accumulated forces each
molecule has felt. The ghost molecule positions are simply regen-
erated by applying the defined symmetry relations to the new
positions of the molecules.
Perfect symmetry is maintained between the primary structure
and all its image objects. For many applications, this condition is

118 Forcefield-Based Simulations/October 1997


Modeling periodic systems

satisfactory. However, it is not possible to study, for example,


cooperative changes between image objects.
To maintain all molecules in the central cell, image centering is used.
Molecules that happen to migrate to an edge of the primary struc-
ture and would appear in one of its image objects instead reappear
in the primary structure from the opposite direction. Thus a con-
stant number of atoms is maintained and no molecules are lost, no
matter how far they may diffuse during the calculation.

Crystal simulations
Energies of crystals can be calculated and the lattice parameters a,
b, c, α, β, and γ can be optimized with Cerius2, CDiscover, and
CHARMm:
♦ In the Cerius2•OFF module, you can choose to optimize cell
dimensions and angles in 2D or 3D periodic systems or to con-
strain some of these coordinates. From the MINIMIZER card
(accessed from the OFF METHODS deck of cards), you can
access cell constraints options with the Constraints/Cell menu
item.
Crystal simulations are also available in several Cerius2•OFF
Instruments modules. For example, you can use the Crystal
Packer module to optimize crystals or calculate their energy
and can include minimization of periodic structures in a
Mechanical Properties run.
♦ In the Cerius2•Discover module, the Optimize Cell checkbox
in the Discover Minimize control panel is automatically
checked if the current model is periodic.
♦ In the Insight•Discover_3 module, crystal optimization is
requested by toggling Optimize_Cell in the Calculate/Mini-
mize parameter block. Crystal optimization is also available
within the Structure_Refine, Amorphous_Cell, and other
Insight II modules.
♦ In QUANTA, use the CHARMm/Periodic Boundaries menu
item to turn periodic boundary conditions on and obtain crystal
energies.

Forcefield-Based Simulations/October 1997 119


3. Preparing the Energy Expression and the Model

Because crystal patching is not available in CHARMm, bonds


between crystal images are not handled well. Similarly, hydro-
gen bond interactions described by an explicit hydrogen bond
function cannot be used. The only forces that can be calculated
between primary and image atoms in crystals are nonbond
forces.

Bonds across boundaries


Allowing bonds (with additional energy terms including angles,
dihedrals, and improper dihedrals) between the primary atoms
and image atoms enables you to study polymers such as DNA or
industrial polymers.
Cerius2•OFF, CDiscover, and CHARMm can handle bonds across
cell boundaries. (However, CHARMm is best used only for linear
polymers, since it does not handle 3D lattices or networks well.)

Handling nonbond interactions


Electrostatic (Coulombic) and van der Waals interactions together
are referred to as nonbond interactions.
Why read this section Nonbond terms can involve extensive calculation. To avoid a
heavy calculation burden, some approximation scheme is often
employed. Choosing the best method for your particular model
can save computational expense without sacrificing accuracy.
In addition, you have some direct control over the functional terms
for nonbond interactions:
♦ You might be able to improve your simulation by changing the
default combination rules for van der Waals interactions
between non-identical atom types (OFF, Discover, Combination
rules for van der Waals terms).
♦ You can change the dielectric constant to account for nonaque-
ous solvents and/or solvent screening or make the dielectric
“constant” a function of distance (OFF, Discover, CHARMm,
The dielectric constant and the Coulombic term).

120 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

Availability

Table 11. Nonbond methods in MSI’s simulation engines

method type of system engine details


a
atom-based (single) periodic, nonperiodic OFF , FDiscover, CDiscover, CHARMm here
cutoffs
group-based cutoffs periodic, nonperiodic FDiscover, FDiscover, CHARMm here
double cuttoffs periodic, nonperiodic FDiscover here
tail corrections disordered periodic CDiscover here
cell-based cutoffs periodic CDiscover here
cell multipole method nonperiodic, peri- CDiscover here
odicb
Ewald sums periodic OFF, CDiscover here
a
The Open Force Field module in Cerius2.
bStandalone only for periodic systems, cannot be used for constant-pressure or constant-stress dynam-
ics.

Most methods for specifying how to treat nonbond interactions


are available via the graphical UIFs:
♦ Cerius2•OFF—The Van Der Waals Preferences and Coulomb
preferences control panels (accessed from the Energy Terms
card menu item), and similar control panels in several OFF
Instruments modules.
♦ Cerius2•MMFF—The MMFF Nonbonded Preferences control
panel.
♦ CDiscover—Controls in the Discover Non-Bond control panel
of the Cerius2•Discover module and commands in the Specify/
Nonbonds parameter block of the Insight•Discover_3 module.
♦ FDiscover—Commands in the Parameters pulldown of the
Insight•Discover module.
♦ QUANTA—Controls accessed through the CHARMm/
Update Parameters menu item.
Discover and CHARMm also offer additional functionality when
run in standalone mode.

Forcefield-Based Simulations/October 1997 121


3. Preparing the Energy Expression and the Model

You may use different Typically, both van der Waals and Coulombic interactions are cal-
methods for van der culated by the same method and (if by the nonbond cutoff
Waals and electrostatic method) with the same nonbond list. However, different methods
interactions and parameters may be used for van der Waals and Coulombic
terms in CDiscover and (except for some operations) in
Cerius2•OFF and CHARMm. This allows you, for instance, to use
a large cutoff for electrostatic interactions and a smaller cutoff for
van der Waals interactions.
The van der Waals interaction potential is relatively short range
and dies out as 1 ⁄ r6. By 8–10 Å, the energy and forces are quite
small. Thus, using cutoffs to bring the van der Waals potential to
zero at about 10 Å can be a reasonable approximation. The Cou-
lombic interactions, on the other hand, die off as 1 ⁄ r, so even at
considerable distances the energy of interaction is not negligible.
But this depends on the model: except for a few formally charged
groups, most molecules are composed of neutral fragments with
dipoles and quadrupoles. Thus, in most models the major compo-
nent of the electrostatic interaction between molecules or parts of
molecules is a dipole–dipole interaction, which falls off as 1 ⁄ r3.
♦ To specify different methods for treating van der Waals and
Coulombic interactions in the Cerius2•Discover module, select
the Forcefield/Nonbond menu item in the DISCOVER card
and check the Treat VDW and Coulomb Separately check box
in the Discover Non-Bond control panel.
♦ Cerius2•OFF does not allow you to select different cutoffs for
Coulombic and van der Waals interactions if you are using a
non-Ewald method for both. However, you may independently
select an Ewald or non-Ewald method for either. If you do
select Ewald for both, you can independently set the cutoff and
convergence parameters for each.

Note
For models having 2D periodicity (e.g., built using the
Cerius2•Surface Builder) the Ewald method is available for the
Coulombic terms but not for the van der Waals terms.

♦ To specify different methods for treating van der Waals and


Coulombic interactions in the Insight•Discover_3 module, go
the Specify/Nonbonds parameter block, click More and then
set Define Cutoffs to Separate.

122 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

♦ In QUANTA and in Cerius2•MMFF, you can use different


switching functions for the van der Waals and electrostatic
interactions.
Automatic exclusions Van der Waals and Coulombic interactions are ordinarily calcu-
lated between all atom pairs that are not specifically excluded.
Most forcefields exclude nonbond terms for atoms connected by
bonds (1–2 interactions) and valence angles (1–3). Some forcefields
also exclude nonbond terms between end atoms in torsion (1–4)
interactions. These interactions are illustrated Figure 12.

1 4
1 3
1 2
2 2 3
Figure 12, Types of interactions usually excluded from nonbond
calculations

1–4 interactions and If 1–4 nonbond interactions in torsions are included in the non-
AMBER bond list, they may be scaled. For example, with the AMBER force-
field (as implemented in both Cerius2•OFF and Discover) these
nonbond interactions must be scaled by 0.50.
Scaling by 0.5 occurs by default with Cerius2 when AMBER is
loaded.
In the Insight•Discover module, you need to toggle the p1_4
parameter in the Parameters/Scale_Terms parameter block on and
enter 0.5 in the p14 parameter box,
The standalone version of the FDiscover program handles this
scaling with the following DSL command:

> scale 1-4 by 0.5

Forcefield-Based Simulations/October 1997 123


3. Preparing the Energy Expression and the Model

Important
This term is not set by default in Discover, even for the AMBER
forcefield, so you must remember to set the p1_4 parameter
either the first time that you run the Discover program from
Insight when using AMBER or in your command input file for
each standalone job that uses AMBER.

The equivalent BTCL command for the CDiscover program is


forcefield scale with the vdw_1_4 keyword.

Combination rules for van der Waals terms


van der Waals radius Any van der Waals interaction parameters that are actually
combination rules defined for heterogenous atom pairs are called off-diagonal parame-
ters. Off-diagonal parameters that are not available for such atom
pairs are calculated by averaging those for each of the two atom
types, using a geometric, arithmetic, or (in CDiscover and
Cerius•OFF) 6th-power combination rule:

geometric: A ij = A ii A jj B ij = B ii B jj
Eq. 44

r ij = r ii r jj ε ij = ε ii ε jj

* *
arithmetic: * r ii + r jj Eq. 45
ε ij = ε ii ε jj r ij = -----------------
2

sixth-power:
*3 *3 *6 * 6 1/6
ε ii ε jj 2r ii r jj *  r ii + r jj  Eq. 46
ε ij = ---------------------------------------- r ij =  -----------------------
*6
r ii + r jj
*6
 2 

Availability In Discover and Cerius2•OFF, a choice of combination rule is


available and is specified in the forcefield file (see the File Formats
documentation).
Quality of results The arithmetic mean gives marginally better equilibrium distances
for van der Waals interactions than the geometric combination rule

124 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

(Halgren 1992). The 6th-power rule (not available with all force-
fields) yields even better results (Waldman and Hagler 1993).
van der Waals combina- With the Ewald method (Karasawa & Goddard 1989) (Ewald sums
tion rules and Ewald sums for periodic systems), the geometric mean leads to faster conver-
gence than the arithmetic mean.
In addition, because the Ewald sum calculation proceeds much
faster when only diagonal parameters are used, the Cerius2•OFF
Van Der Waals Preferences control panel includes an option to
ignore off-diagonals even when they are present (they are not
present in any of the Discover forcefields).

The dielectric constant and the Coulombic term


Role of the dielectric con- The electrostatic potential is computed from the partial atomic
stant in modeling charges associated with the model (Assigning charges). Approxi-
mate solvent-screening effects can be included by specifying a
nondefault value for the dielectric constant ε if it is explicitly
included in the forcefield. (The “dielectric constant” used in mod-
eling is not the dielectric constant that most experimental chemists
would think of—it is instead an empirical, dimensionless scaling
factor.)
The dielectric constant reflects the polarizability of the solvent
molecules. A polarizable solvent such as water has a greater
dielectric constant than less polar liquids. Electrostatic interactions
in polarizable solvents with high dielectric constants are greatly
attenuated. In closely packed molecules, however, there are fewer
solvent molecules to screen the charge interactions.
A relatively large dielectric constant can be used for simulating the
aqueous environment of small systems. However, many calcula-
tions on models use a smaller dielectric constant. For example, a
dielectric constant between 2.0 and 10.0 has been used for simula-
tions in the interior of a protein. A typical value for water would
be around 4.
Additional information For a helpful review, please see Harvey (1989). A tutorial on dielec-
tric constants in forcefields can be found at MSI’s website:
http://www.msi.com/support/insight/insight/dielectric.html

Forcefield-Based Simulations/October 1997 125


3. Preparing the Energy Expression and the Model

A distance-dependent The dielectric constant can be kept constant, or the Coulombic


dielectric “constant” term can be made a shielded function, where the dielectric “con-
stant” is a function of distance (r ·ε). This is useful for electrostatic
interactions in closely packed molecules, where the number of sol-
vent molecules between two interacting charges is usually fewer
than in bulk solvent. A distance-dependent dielectric constant is
also useful for models in which explicit solvent molecules are not
included.
The distance-dependent dielectric function (also called a shielded
dielectric) is generally used with the Dreiding and AMBER force-
fields and may be used with others.
A shielded Coulombic term is faster to calculate than a non-
shielded term because no square root has to be evaluated.

Important
A distance-dependent dielectric constant cannot be used on a
periodic model with the Ewald sum method (Ewald sums for
periodic systems).
Availability The dielectric constant can be changed and/or made distance
dependent in any of MSI’s simulation engines.
In Cerius2•FFE, the Coulombic control panel allows you to choose
the form of the Coulombic term—the term can be distance-depen-
dent, not distance-dependent, or corrected by an erfc term (see
Glass forcefield).
Special considerations for With the AMBER forcefield, in most applications a distance-
the AMBER forcefield dependent dielectric (ε = f (r)) should be used.
In Cerius2•OFF, the Epsilon value is 1.0 unless you change it in the
Coulomb Preferences control panel (which is accessed by selecting
the Energy Terms/Coulomb card menu item).
In the Insight•Discover_3 module, in the Specify/Nonbonds
parameter block you would click More, then set Dielectric to
Dist_Dependent and enter 4.0 for the Dielectric Value.
The equivalent BTCL command for the CDiscover program is
forcefield with the distance_dep keyword set to true and dielect
set to 4.0.

126 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

In the Insight•Discover module, in the Parameters/Set parameter


block you would enter 4.0 in the Dielectric parameter box and
make sure the Dist_Dependent parameter is toggled on.
The standalone version of the FDiscover program handles this
with the following DSL command:

> set dielectric = 4.0*r

Nonbond cutoffs
Why read this section An energy expression such as Eq. 8, which is representative of cur-
rent forcefields, is computationally tractable only for systems with
relatively small numbers of atoms. The number of internal coordi-
nates grows linearly with the size of a model, so the computational
work involved in the first nine terms in Eq. 8 also grows linearly.
However, inspection of the final summation, which represents the
nonbond interactions, reveals a quadratic dependence on the
number of atoms in the system: If the system of interest has 1000
atoms, the nonbond summation has about 500,000 terms. If it has
10,000 atoms, the summation has 50,000,000!
Therefore, it is common to neglect or approximate the nonbond
interactions for widely separated pairs of atoms.
Choosing how to treat long-range nonbond interactions is an
important factor in determining the accuracy and the calculation
time of an energy evaluation.
Several cutoff methods are discussed below, and a review was
published by Brooks et al. (1985b). More recently, two other
methods—cell multipoles (Cell multipole method) and Ewald sums
(Ewald sums for periodic systems)—have also become available. You
should read all these sections to decide which method is best for
your model and computational problem.

Note
The same nonbond method(s)a and specifications should
generally be used for all energy calculations within a given
project.

Forcefield-Based Simulations/October 1997 127


3. Preparing the Energy Expression and the Model

aHowever, the method and/or specifications used for van der Waals
interactions may differ from those used for Coulombic interactions
(see You may use different methods for van der Waals and electro-
static interactions).

Effect of nonbond cutoff To appreciate the impact of cutoffs on computational efficiency,


distance on calculation of consider a receptor–ligand–solvent system with 5000 total atoms.
nonbond interactions An example would be a small protein (100–150 residues) sur-
rounded by 1–2 layers of water.
Figure 13 shows how the number of nonbond interactions
increases with the cutoff distance. This calculation would run at
least 10 times faster with an 8.0 Å cutoff than with no cutoff
(assuming that the nonbond term is rate limiting, which it usually
is). The trade-off is, of course, that interactions beyond the cutoff
distance are not accounted for.
The significance of nonbond interactions beyond the cutoff dis-
tance depends on the system being simulated. In modeling an iso-
lated molecule or cluster, the use of cutoffs for the van der Waals
interactions is quite reasonable. The potential is relatively short
range and dies out as 1/r6. Consequently, by 8–10 Å, both the
energy and forces are quite small.
In contrast, the situation is slightly different in modeling disor-
dered or ordered crystalline systems. For a typical disordered sys-
tem, which might consist of a cube of organic material with ~25 Å
edges, contributions of the van der Waals interactions at distances
greater than 8–10 Å to the energy and pressure can amount to ~50–
200 kcal mol-1 and ~500–1000 bar (0.05–0.10 GPa), respectively,
while contributions of electrostatic interactions are much smaller.
Also, contributions of remote nonbond interactions of all types to
the resultant force on atoms is small. Fortunately, it is possible in
such systems to apply tail corrections, which permit the use of 8–
10 Å cutoffs while simultaneously yielding accurate values of
energy and pressure.
Finally, in periodic crystalline systems, both van der Waals interac-
tions and electrostatic interactions can be significant up to 15 Å or
more. For example, in a calculation of the energy as a function of
cutoff distance in the [Ala–Pro–D-Phe]2 crystal, Kitson and Hagler
showed that the nonbond energy accounted for changes from 63%
to 97% of the asymptotic value as the cutoff distance was increased
from 8 to 15 Å (Kitson and Hagler 1988).

128 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

15

nonbond interactions (106)


10

0
6 8 10 12 14 none
cutoff distance (Å)
Figure 13. Number of nonbond interactions as a function of cutoff
distance
The number of nonbond pairwise interactions (in millions) expected for a
5000-atom system as a function of cutoff distance. The time required to evalu-
ate the total energy of this system is approximately proportional to the number
of nonbond interactions.

Figure 14 shows how the van der Waals component of the non-
bond energy varies as a function of cutoff distance for an [Ala–
Pro–D-Phe]2 crystal. The van der Waals energy changes by 40% as
the cutoff distance is increased from 8 to 15 Å. The exact depen-
dence of the energy on the cutoff distance depends on the system
itself and should be calibrated for each new system.

Atom-based cutoffs and nonbond cutoff terms


Direct cutoff not recom- MSI applications make several methods available for calculation
mended of long-range nonbond interactions. Cerius2•OFF offers (among
others) the direct method, which is straightforward and can be
applied to nonperiodic and periodic models. Nonbond interac-
tions are simply calculated to a cutoff distance and interactions
beyond this distance are ignored.

Forcefield-Based Simulations/October 1997 129


3. Preparing the Energy Expression and the Model

–30

van der Waals energy


(kcal mol-1)
–40

–50

–60
0 10 20 30 40

cutoff distance (Å)


Figure 14. Van der Waals energy as a function of cutoff distance
The van der Waals energy for the hexapeptide crystal, [Ala–Pro–D-Phe]2 as a
function of cutoff distance. Note that the van der Waals energy does not con-
verge until approximately 20 Å. Simulation done with FDiscover.

However, the direct method can lead to discontinuities in the


energy and its derivatives. As an atom pair distance moves in and
out of the cutoff range between calculation steps, the energy
jumps, since the nonbond energy for that atom pair is included in
one step and excluded from the next. (For small models you may,
of course, calculate all nonbond interactions by setting a large
enough cutoff distance and using the direct method.)
Minimizing discontinuities To avoid the discontinuities caused by direct cutoffs, most simula-
in the potential energy tion engines use some kind of switching function (Figure 15) to
surface smoothly turn off nonbond interactions over a range of distances.
(The variable names and definitions differ among MSI simulation
engines, as illustrated in the figure).
Implementation in Cerius2•OFF allows you to use a cubic spline switching method,
Cerius2•OFF by which the energy is multiplied by a spline function. The inter-
action cutoff is defined by two parameters: the spline-on and the
spline-off distances (Figure 15). Within the spline-on/spline-off
range, the nonbond interaction energy is attenuated by the spline
function. Beyond the spline-off distance, nonbond interactions are

130 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

nonbond potential, E(r)


switching function, S(r)
1.0
energy (kcal mol-1)

distance (r)
0.0

E(r) • S(r)

cutoff
CDiscover
spline width
buffer width
CUTDIS
FDiscover
SWTDIS
CUTOFF
spline-on distance
Cerius2•OFF
spline-off distance
buffer width
ron
CHARMm
roff
rcutoff
Figure 15. Application of a switching function
Application of a switching function; energy = E(r) · S(r). Variable names in MSI
simulation engines that relate to cutoffs are also illustrated. Thick dark curve =
the unmodified van der Waals potential; dashed curve = the switching function
S(r); grey curve = the resulting, switched potential.

Forcefield-Based Simulations/October 1997 131


3. Preparing the Energy Expression and the Model

ignored. The current defaults in the Open Force Field module set a
narrow on/off range that results in fast calculations. Using a
broader spline range gives more accurate results, but slower calcu-
lation.

 1.0 r 2 ij < r 2 on
 2
 ( r off – r ij ) 2 ( r 2 off + 2r 2 ij – 3r 2 on )
OFF switching function =  ----------------------------------------------------------------------------------
- r 2 on < r 2 ij < r 2 off Eq. 47
 ( r 2 off – r 2 on ) 3
 r 2 ij > r 2 off
 0.0

Implementation in Dis- In the Discover program the switching function is defined by two
cover variables: the point where the function reaches zero and the range
over which the function decreases from one to zero (Figure 15).
The Discover program uses a fifth-order polynomial for the
switching function. It is formulated so that the first and second
derivatives are zero at both the inner and outer ends of the switch-
ing region. Thus, the interaction energy and its first and second
derivatives are continuous, although higher derivatives are not.
Implementation in CHARMm and Cerius2•MMFF offer two types of switching func-
CHARMm tions, which its documentation refers to as a switching function (not
the same as the Discover switching function) and a shifted potential.
Two variables corresponding to the ends of the switching regions
(ron and roff, Figure 15) are required to define the switching func-
tion. Depending on the value of rij, the following values are used
to multiply individual electrostatic or van der Waals energy terms:

 1.0 r ij < r on

 ( r off – r ij ) 2 ( r off + 2r ij – 3r on ) r < r ij < r off
CHARMm switching function =  -----------------------------------------------------------------------
- on Eq. 48
 ( r off – r on ) 3
 r ij > r off
 0.0

However, energy can be significant at the cutoff distance, which


can result in artificially large forces at long ranges. This is espe-
cially true for relatively short (that is, less than 12 Å) cutoff dis-
tances and small ranges (that is, when roff – ron < 3 Å).
The shifted potential modifies the radial function so that energy
and forces go to zero at some cutoff distance (Eq. 49). The individ-

132 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

ual electrostatic and van der Waals terms in the energy function
are simply multiplied by this term:

r ij 2 2
 1 –  -------------
-  Eq. 49
  r cutoff 

One disadvantage to the CHARMm functions is a discontinuity in


the second derivatives at the cutoff distance.

Neighbor lists and buffer widths


To maximize the efficiency of nonbond calculations, Cerius2•OFF,
Discover, and CHARMm create a neighbor list that contains all pair
interactions to be considered during calculation of the nonbond
interactions. Atom pairs are not included in the list if they are too
far apart or if they are excluded (Automatic exclusions).
Advantages of a neigh- Neighbor list generation was chosen over other approaches using
bor list cutoffs for computational efficiency:
♦ A pairwise search through all atoms at every step of a calcula-
tion is computationally expensive.
♦ During minimization or dynamics, the distances between
atoms do not change radically between one step and the next.
The buffer region Although a neighbor list requires time to set up, the net result is
time saving for models containing more than about 50 atoms,
because the list is not recalculated each time the energy expression
is evaluated. Since the list is not updated at every step, it includes
atoms in a buffer region (the distance between the two right-most
lines in Figure 15) that might move close enough together to con-
tribute to the energy calculation before the next update of the
neighbor list.
Updating the neighbor list To ensure that no atoms outside the buffer region can move close
enough to interact during an energy minimization or molecular
dynamics simulation, the nonbond list is automatically updated
whenever any atom moves more than one-half the buffer width.
Thus, the width of the buffer region, coupled with the velocity
with which atoms move, determines the maximum amount of
time before the neighbor list is updated.

Forcefield-Based Simulations/October 1997 133


3. Preparing the Energy Expression and the Model

Charge groups and group-based cutoffs


Dipoles must not be split To understand the implications of the generalization that the
by cutoff distances strongest electrostatic interactions in many molecules are due to
dipoles rather than fully charged groups (see You may use different
methods for van der Waals and electrostatic interactions), note that the
interaction energy for two monopoles, each of one e.u. of charge,
is about 33 kcal mol-1 at 10 Å, while that for two dipoles formed
from unit monopoles is no more than about 0.3 kcal mol-1. It is
clear that ignoring monopole–monopole interactions would give
grossly misleading results, whereas ignoring dipole–dipole inter-
actions would be only a modest approximation.
If nonbond cutoffs were applied to such a model on an atom-by-
atom basis, this could generate spurious monopoles by artificially
splitting dipoles (by having one of a dipole’s atoms inside the cut-
off and one outside). Instead of ignoring a relatively small dipole–
dipole interaction, this would artificially introduce a large mono-
pole–monopole interaction. To avoid these artifacts, the Discover
and CHARMm simulation engines can apply cutoffs over charge
groups. (In CHARMm in PSF mode, every residue is a charge
group.)
Functional groups and A charge group is a small group of atoms close to one another which
charge groups has a net charge of zero or almost zero. Often, charge groups are
identical to common chemical functional groups. Thus, a carbonyl
group, methyl group, or carboxylic group would be an approxi-
mately neutral charge group.
Implementation in The Discover program designates one atom from each charge
Discover—switching group as the switching atom and generates the neighbor list by con-
atoms sidering the distance between the switching atoms of two charge
groups. If the distance is less than the cutoff distance, then the pair-
wise interactions between all atoms in the two groups are
included. If the distance is greater than the cutoff, they are all
excluded. Similarly, when calculating the actual interaction
energy, the Discover program switches off the interactions
between all atom–atom pairs in the two charge groups based only
on the distance between the two switching atoms. This procedure
prevents artifactual splitting of dipoles.
Implementation in If group-based cutoffs are used in CHARMm, the neighbor list is
CHARMm stored in terms of group pairs.

134 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

Charge group size and The size of a charge group, as defined by the greatest distance from
the cutoff distance the switching atom to another atom in the same group, must be
significantly smaller than the cutoff distances. Otherwise, an inter-
action between two atoms close to each other might be ignored
because the switching atoms of the two groups are farther apart
than the cutoffs. Typical groups are no more than 1–3 Å large, so
cutoffs larger than 7–8 Å are reasonable. However, some models
contain considerably larger groups.
The Discover program checks the size of the groups against the
cutoff distances, then outputs an error message and terminates if
the cutoffs are too short relative to the group size. If this happens,
you must either increase the cutoffs or define smaller groups.
Charge group neutrality… The Discover program also warns you about significantly non-
neutral groups. Some can be expected if the model contains for-
mally charged functional groups, such as protonated amines and
carboxylates. However, other non-neutral groups usually indicate
an error in group definitions.
… and defining or check- In Cerius2•Discover, you can specify the tolerance with which
ing charge groups neutrality is defined when you ask Discover to perform charge
grouping.
In Insight, charge groups and switching atoms are defined, edited,
and checked with the Forcefield/Groups parameter block, which
is found in the Builder, Biopolymer, and other modules. Potentials
and charges for the atoms must be fixed or accepted before defining
charge groups.
In CHARMm, you can edit charge groups in the RTF files with any
text editor. In PSF mode, every residue is a charge group.

Double cutoffs
The FDiscover program also incorporates an improvement over a
single cutoff distance called double cutoffs, or, as it is sometimes
called in dynamics, multiple timesteps. The nonbond interaction
potential at a distance is a smooth function that does not vary rap-
idly.
With double cutoffs, two cutoff distances—an inner and outer
one—are assigned. The two distances define an inner spherical
region and an outer shell around a given atom.

Forcefield-Based Simulations/October 1997 135


3. Preparing the Energy Expression and the Model

Whenever the neighbor list is updated, interactions are calculated


in exactly the same way as in the single cutoff scheme, but consid-
ering the outer cutoff as the cutoff. Then the resulting interaction
energy is partitioned into contributions from atoms within the
inner cutoff and atoms within a spherical shell from the inner to
the outer cutoff. The contribution from the shell is treated as a con-
stant in subsequent molecular dynamics steps, until the next
neighbor list update occurs. The basic idea behind double cutoffs
is to calculate the shell contribution only when the neighbor list
needs to be updated, while calculating inner cutoff contributions
at every step.
These double cutoffs make the calculation less expensive by allow-
ing smaller values for the inner cutoff than could normally be used
with a single cutoff. Accuracy is regained at minimal cost by using
a large distance for the outer cutoff.
Discontinuities in the It is important to realize that the effective potential energy surface
potential energy surface is not quite continuous when double cutoffs are used. The magni-
tude of the discontinuities depends on the cutoff distances and the
system that is being studied.
These discontinuities are only a minor problem for dynamics,
where they are manifested as small fluctuations in the total energy.
Their effect during minimization depends on the minimizer that is
used, because some minimization algorithms, such as conjugate
gradient, are quite sensitive to discontinuities in the surface. Other
algorithms, such as steepest descents, are relatively robust.

Tail corrections
Long-range van der For disordered periodic systems, contributions to the potential
Waals interactions in disor- energy and pressure from van der Waals interactions outside the
dered periodic systems cutoff can be written as:

ν ν ∞

∑ N ∑ ρ 4π∫ r g
1
∆U tail = --- α β
2
αβ ( r )U αβ ( r ) dr Eq. 50
2
α=1 β=1 rc

136 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

ν ν ∞
r dU αβ ( r )
∑ ∑ ∫
1
∆P tail = --- ρα ρ β 4π r 2 g αβ ( r ) ----------------------- dr Eq. 51
6 dr
α=1 β=1 rc

where Ni and ρi denote the number and number density of atoms


of type i, Uαβ (r) denotes the van der Waals nonbond potential
describing interactions between atoms of type α and β, and gαβ (r)
denotes the pair correlation function describing the probability of
finding α and β at separation r relative to the probability of finding
the pair at an infinite distance (McQuarrie, 1976, Chapter 13).
Except in rare cases, the function gαβ (r) is short range, reaching its
limiting value of unity at distances of ~10 Å. Moreover, gαβ (r) –
1.0 is small even at shorter distances. In consequence, accurate esti-
mates of the tail corrections for all normal nonbond cutoff values
may be safely made by setting all gαβ (r) = 1.0 in Eq. 50 and Eq. 51.
Computational costs Note also that applying Eqs. 50 and 51 at each step in a simulation
contributes negligibly to the overall simulation cost, since for con-
stant-volume simulations the full correction may be precomputed,
and in simulations where the volume fluctuates it is necessary
only to recompute the volume at each step.

Cell-based cutoffs
CDiscover allows cell-based cutoffs for periodic systems. This is
another image-based method, in which the neighbor list is based
on a specified number of cell layers surrounding the central cell.

Cell multipole method


Mor rigorous, controllable, The cell multipole method (CMM, available only in CDiscover)
and efficient than cutoffs provides a treatment of the nonbond interactions for both nonpe-
riodic and periodic systems that is more rigorous and efficient
than cutoffs. This method (Greengard and Rokhlin 1987, Schmidt
and Lee 1991, Ding et al. 1992) is a hierarchical approach that
allows the accuracy of the nonbond calculation to be controlled.
Short-range interactions are treated in the usual way, but long-
range group–group interactions are treated in terms of multipoles.
Computational time scales as N (the number of atoms).

Forcefield-Based Simulations/October 1997 137


3. Preparing the Energy Expression and the Model

The cell multipole method applies to the general energy expres-


sion of the following form:

N
λi λj
E =
∑ λi Φi =
∑ ---------
R ijp
- Eq. 52

i=1 i>j

where Φi is the potential at atom i, Rij is the distance between atom


i and atom j, p is a number (p = 1 for Coulombic and 6 for London
dispersion interactions, for example), and the λ’s are general
charges. For Coulombic interactions, the λ’s are real charges.
Near- and far-field poten- The general potential Φi may be divided into a near-field potential
tials due to the surrounding atoms (those within a few angstroms) and
a far-field potential due to the rest of the atoms that interact with
the ith atom.
The number of interactions in the near field is limited, so it is rela-
tively easy to calculate the near-field potential exactly. The number
of interactions in the far field is of order N 2, making an exact cal-
culation of this potential intractable for large models. The cell mul-
tipole method calculates the far-field potential accurately and
efficiently in the following manner.
Derivation of cell multi- We begin by placing an arbitrarily shaped molecule in a rectangu-
pole method lar box. The box is then cubed into a number of basic cells of length
4–6 Å and containing 2–4 atoms on average. The basic cell level is
denoted level A in Figure 16. Starting from a corner of the box,
every eight basic cells may be considered to constitute a larger, par-
ent cell, termed level B. Every eight parent cells may constitute a
grandparent cell, termed level C. This procedure is repeated until
only a few large cells fill the box. For example, considering any
atom in cell A0 of the three-level cell system (Figure 16) the other
atoms in A0 and all atoms in An contribute to the near-field poten-
tial, and the atoms in Af, B, and C contribute to the far-field poten-
tial.
Key steps used in cell mul- The cell multipole method involves the following two key steps:
tipole method
1. Multipole expansion and calculation of general multipole
moments.

138 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

C C C C C

B B B B B B
C C
Af Af Af Af Af Af
B B B
Af An An An Af Af
Af An A0 An Af Af
B B B
Af An An An Af Af
C C
Af Af Af Af Af Af
B B B
Af Af Af Af Af Af

B B B B B B
C C
B B B B B B

C C C C C

Figure 16. Three-level hierarchical cell system


Definition of hierarchical cells and division of near field and far field for a basic
cell A0. Larger cells are formed as cells are farther from cell A0 (this constitutes
the hierarchy). Note that the near field is one layer thick.

The potential associated with each basic cell can be represented


as a general potential originating at the center of the cell. This
potential may be expanded into an infinite series of multipole
moments. For example, the potential associated with cell Af in
Figure 16 centered at rA , is expressed as:
f

Forcefield-Based Simulations/October 1997 139


3. Preparing the Energy Expression and the Model

∑D R ∑Q α α αβ R α R β

-–…
Z α αβ
Φ A ( r ) = ------p- – ------------------------
- + ---------------------------------- Eq. 53
f R Rp + 2 Rp + 4

where R = rA - r; r is any point outside cell Af; α, β = x, y, z; and


f
Z, D, and Q are monopoles, dipoles, and quadrupoles, respec-
tively.
The potentials associated with the higher-level cells can be
expanded in an analogous manner, with moments derived
from lower-level cell moments.
2. Generation of Taylor coefficients.
Using this expansion to represent the potentials associated with
Af-, B-, and C-level cells, the far-field potential of cell A0 may be
obtained by summing all the far-cell contributions. The result-
ing potential may now be expanded as a Taylor series about the
center of cell A0:

∑T ∑T +…
(0) (1) 1 (2 )
ΦA ( r ) = T ( rA ) + α ( r A 0 )∆r α + --- αβ ( r A 0 )∆r α ∆r β
0 o 2
α αβ
Eq. 54

where rA is the position vector of the center of cell A0 and ∆rα


0
= rα - rA α. The Taylor coefficients in Eq. 54 are due to all the far-
0
cell contributions.
A key point of the cell multipole method is that, once the set of
Taylor coefficients is calculated at rA , the far-field potential of any
0
atom in cell A0 is obtained easily through Eq. 54.
Since the Taylor coefficients must be generated for every basic cell,
another key point of the cell multipole method is efficient genera-
tion of these coefficients. A hierarchical procedure is used, in
which coefficients determined for higher-level cells are propa-
gated to the coefficients for lower-level cells. Thus, coefficients for
a child B cell are obtained by adding contributions directly trans-
lated from the C-level coefficients at the center of the parent C cell
to the coefficients at the center of B, generated by considering only
the B-cell contributions.

140 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

Improved computational The cell multipole method is an order-N method (Greengard and
performance and accu- Rokhlin 1987, Schmidt and Lee 1991, Ding et al. 1992). The time
racy savings with respect to an exact N 2 algorithm, as well as the
improved accuracy relative to using cutoffs, can be dramatic.
Table 12 shows results from several calculations on hemoglobin.
When the conventional method with 9.5-Å cutoffs is used, the
computational and setup times are greatly reduced, but at the cost
of a disturbingly large error (over 1% of the correct energies). The
last 6 lines of the table show results for second-, third-, and fourth-
order multipole expansions at two levels of computational accu-
racy. The short-range treatment becomes progressively better
towards the bottom of the table. However, the overall CPU time
increases. It is practical to achieve essentially exact results (within
a fraction of a kcal mol-1) in reasonable times.
For systems larger than hemoglobin, the improvement in perfor-
mance can be even more dramatic. For a system ten times larger,
the cell multipole method would take 3–10 minutes for the energy
evaluation, depending on the accuracy desired. The exact N 2 cal-
culation, in contrast, would take about three days!

Table 12. Computational efficiency of cell multipole method


The results shown are for a calculation with CDiscover 3.1 on hemoglobin (32250
atoms) on an SGI Crimson (50 MHz MIPS R4000),

time (s) error (kcal mol-1)a

energy eval-
level of calculation setupb uationc van der Waals Coulomb
exact pairwise calculation 468 2809 0.00 0.00
9.5-Å cutoff 12.6 30.2 1485 1359
coarsed, 2nd-order multi- 23.9 15.1 275 -26.0
pole
coarse, 3rd-order multi- 58.5 15.1 243 -25.0
pole
coarse, 4th-order multi- 199 16.1 243 -3.50
pole
finee, 2nd-order multipole 96.7 57.2 8.95 -10.6
fine, 3rd-order multipole 219 56.6 6.95 -2.50
fine, 4th-order multipole 718 57.7 0.24 -0.18
a
Relative to the exact pairwise calculation of the energy.

Forcefield-Based Simulations/October 1997 141


3. Preparing the Energy Expression and the Model

bTime required to set up atom lists and multipole expansions (overhead needed at
the beginning of a calculation and periodically during dynamics or minimization).
c
Time required for recurring evaluation of energy and gradients during a calculation.
dThat is, a reasonably accurate, fast calculation with low overhead.
e
That is, calculation with highest accuracy and greatest overhead.

Nonbond interaction Due to the nature of the cell multipole method, specific nonbond
energies interaction energies cannot be calculated unless you use the ESFF
forcefield. When this method is used with other forcefields, the
per-atom energy is calculated by using the cell multipole method,
and the nonbond interaction energy is calculated using the group-
based method. You can specify cutoffs for the group-based method
of nonbond analysis. A large cutoff in the group-based method
may give reasonably accurate energies compared with the cell
multipole method.

Ewald sums for periodic systems

Nonbond energies of periodic systems


Mainly for crystals The Ewald technique (Tosi 1964, Ewald 1921) is a method for com-
putation of nonbond energies of periodic systems. Crystalline sol-
ids are the most appropriate candidates for Ewald summation,
partly because the error associated with using cutoffs (Nonbond
cutoffs) is much greater in an infinite system. The technique can
also be applied to amorphous solids and solutions.
Ewald method compared Figure 17 shows the electrostatic energy for quartz as computed by
with cutoff-based various techniques. One would feel that all the techniques should
methods—Coulombic converge to the same value at high cutoff distances. However, the
energy… direct atom-based cutoffs approach yields results that fluctuate
wildly as the cutoff increases, even for rather large cutoffs. The
problem is that the sum is only conditionally convergent. As the
cutoff increases, charges of opposite sign are taken into account
and the partial sum is modified significantly. Worse, reordering
the terms of a conditionally convergent series can yield arbitrary
results. The problem then is to find physically and chemically
meaningful orderings of the series.
The cell-based (Cell-based cutoffs) and group-based (Charge groups
and group-based cutoffs) cutoff techniques are natural candidates.
However, they yield somewhat different values (Figure 17), due to

142 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

electrostatic energy (kcal mol-1)

-100

-200

-300
35 40 45 50 55 60
cutoff (Å)

Figure 17. Electrostatic energy vs. cutoff distance for quartz


The electrostatic energy of quartz was calculated with CDiscover 3.1 by several
methods. Medium line with points: using atom-based cutoffs; thin dark line: using
cell-based cutoffs; thick line: using group-based cutoffs; same thick line: by the
Ewald method with dipole correction; and medium dashed line: by the Ewald
method without dipole correction.

the different cutoff conventions employed. The group-based tech-


nique computes the result for a sphere, but the cell-based tech-
nique computes the result for a parallelepiped that preserves the
shape of the unit cell.
A standard Ewald calculation that does not take the dipole
moment of the unit cell into account yields yet another value. An
Ewald calculation that includes the effect of the dipole moment
agrees with the group-based calculation (Figure 17).
…and van der Waals For van der Waals energy, the energy sum is absolutely conver-
energy gent, and no chaotic behavior arises from the direct approach.
Even so, as Figure 18 indicates, the convergence of the dispersive
energy is slower than might be expected. Even with a cutoff dis-
tance of 30 Å, the error is a significant fraction of 0.1 kcal mol-1.

Forcefield-Based Simulations/October 1997 143


3. Preparing the Energy Expression and the Model

(The Ewald calculation is less costly for comparable accuracy.) The


repulsive energy, on the other hand, converges at a cutoff distance
of only 15 Å and needs no special treatment. (Atom-based calcula-
tions for much larger systems, however, show that sometimes
even the repulsive energy can exhibit a surprisingly high error at
a cutoff of 12 Å.)
dispersive energy (kcal mol-1)

repulsive energy (kcal mol-1)


17.340

-32.6

17.338

-33.0

17.336

-33.4
5 10 15 20 25 30 35
cutoff (Å)
Figure 18. van der Waals energy vs. cutoff distance for NaCl
The graph shows the (solid lines) dispersive and (dashed line) repulsive portions
of the van der Waals energy as a function of the cutoff distance, as calculated
by the (thin lines) atom-based and (thick line) Ewald methods. The Ewald calcu-
lation was performed with CDiscover 3.1 to an accuracy of 1 e-6, which requires
a cutoff distance of 9.5 Å.

Theory of Ewald technique


For full details on the Ewald summation method and parameter
optimization procedure used in MSI’s simulation engines, please
refer to Karasawa and Goddard (1989).
The Ewald approach to improving convergence is to multiply a
general lattice sum:

144 Forcefield-Based Simulations/October 1997


Handling nonbond interactions


1 A ij
S m = --- ---------------------------------- Eq. 55
2 ri – rj – RL m
L, i , j

by a convergence function φ(r), which decreases rapidly with r. Of


course, to preserve equality, one must then add a term equal to the
product of 1 - φ(r) with the lattice sum:

A ij φ m ( r i – r j – R L ) A ij ( 1 – φ m ( r i – r j – R L ) )
∑ ------------------------------------------------- 2∑
1 1
S m = --- + --- ---------------------------------------------------------------
2 r –r –R m r –r –R m
i j L i j L
L, i , j L, i , j
Eq. 56

Here, the first term converges quickly, because φm(r) decreases rap-
idly. Ewald’s insight was that the second term can be Fourier trans-
formed to provide a rapidly converging sum over the reciprocal
lattice. The sum over L in Eq. 56 runs over all lattice vectors, but
the i = j terms must be omitted when L = 0.
The convergence func- The convergence functions are, for the electrostatic energy:
tions

∫ exp( –s )ds
2
φ 1 = erfc ( ηr ) = 1 – erf ( ηr ) = ------- 2 Eq. 57
π
ηr

and for the dispersive energy:

φ 6 ( r ) =  1 + ( ηr ) 2 + --- ( ηr ) 4 exp ( – ( ηr ) 2 )


1
Eq. 58
2

Optimizing computational The electrostatic convergence function φ1 was also used by Catlow
effort and Norgett (1976) and Karasawa and Goddard (1989). The disper-
sive convergence function φ6 was recommended and used by
Karasawa and Goddard. The convergence parameter η plays a
similar role in both cases. As η increases, the real-space sum con-
verges more rapidly and the reciprocal space sum converges more
slowly. (That is, a large η implies a heavy computational load for
reciprocal space, and a small η implies a heavy computational load
for real space.) Cutoffs must be adjusted accordingly, and process-
ing time is affected by the cutoffs. A value of η that balances pro-

Forcefield-Based Simulations/October 1997 145


3. Preparing the Energy Expression and the Model

cessing in the real and reciprocal spaces proves to be optimal. The


same value of η can be used for both the dispersive and electro-
static energy, and thus they can be combined for greater efficiency.
Implementation in Dis- The Discover program automatically chooses η so as to balance the
cover computational loads for real and reciprocal space.
Implementation in Cerius2•OFF instead uses the inverse of η as input, that is, the
Cerius2• OFF ratio of the time required for a real-space calculation to the time for
a reciprocal-space calculation. The value chosen for the time ratio
does not affect the accuracy of the calculation, only the time taken
to perform it. Real-space calculations typically take longer than
reciprocal ones, so the value of the time ratio is usually greater
than 1.
Electrostatic energy The Ewald expression for the electrostatic energy is (dropping a
factor of 1/4πε0):

 2 2

∑ ∑ ∑ ∑
η erfc ( a ) 2π  exp ( – b 2 )
--- q i q j ------------------ + ------ q i cos ( h • ri ) + q i sin ( h • ri )  ----------------------
-
EQ = 2 a Ω   h2
 
L, i , j h≠0 i i
 2
∑ ∑
η Π
– ------- q i2 – -------  η q i
π 2Ω  
 
i i
Eq. 59

where a = η | ri - rj - RL |; ri = Hsi; h = 2π(HT)-1n (reciprocal lattice


vectors); Ω = det(H) = cell volume; and b = h ⁄ 2η.
In Eq. 59, the first term corresponds to the real-space sum, the con-
tribution (L = 0, i = j) being omitted. The second term in Eq. 59 cor-
responds to the reciprocal-space sum. For electrostatic
interactions, the double sum over i and j is reduced to a single sum.
This provides a substantial performance improvement (N1.5
instead of N2.5, where N is the number of atoms per unit cell). A
similar reduction of the reciprocal-space sum occurs for the disper-
sive energy only if the geometric combination rule ( B ij = B ii B jj )
holds. The third term in Eq. 59 arises from the self-energy of the
charge distributions produced by the introduction of the conver-
gence function. The final term of Eq. 59 is zero if the unit cell is
charge neutral, which is normally the case. Indeed, the electro-

146 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

static energy in an infinite system of non-neutral cells is formally


infinite. However, some applications can involve the use of non-
neutral cells. For example, Cation Locator adds cations to the sys-
tem one by one, and neutrality is not achieved until the final cation
is added. The effect of the final term in the equation is to give con-
vergence to a finite value of the energy, which corresponds to a
system where the excess charge is neutralized by a compensating
uniform background charge density.
The Ewald sum as it appears in Eq. 59, with no h = 0 term, strictly
represents an infinite crystal. A real macroscopic but finite crystal
also includes surface contributions, which can be substantial
(Deem et al. 1990) and which depend on the dipole moment of the
unit cell and the shape of the crystal. However, in a real environ-
ment, physical effects such as surface reconstruction and dielectric
effects in the surrounding medium serve, in general, to diminish
the surface charge. The Cerius2•OFF and CDiscover programs
therefore omit such terms, corresponding to the so-called “tin-foil”
boundary conditions.

Accuracy of Ewald calculations


You choose the accuracy The Ewald method allows you to select, before running the calcu-
before beginning calcula- lation, a level of accuracy for the calculation. (Estimation of the
tion cutoff and convergence constants is difficult, so a facility to auto-
matically calculate these parameters to a certain accuracy
(Karasawa & Goddard 1989) is provided instead.)
Depending on the system, an Ewald calculation with accuracy =
1 e-4 can be comparable in performance to an atom-based calcula-
tion with a large cutoff (19 Å) over the range that has been tested.
In addition, the Ewald results are significantly more accurate.
Computation time vs. Ewald processing time grows as N1.5, where N is the number of
accuracy and model size atoms in the unit cell. Increases in accuracy do not require unrea-
sonable increases in the Ewald lattice cutoff. For acetic acid, for
example, increasing the accuracy by 2 orders of magnitude (from
1 e-2 to 1 e-4), with constant repulsive cutoff, increased processing
time only about 1.5 fold (from 22.08 to 35.34 seconds) and
increased the Ewald lattice cutoff less than 20% (from 11.7 to
13.4 Å).
Troubleshooting Although the default Ewald accuracy is acceptable for most single-
point energy calculations, tighter accuracy may be required for

Forcefield-Based Simulations/October 1997 147


3. Preparing the Energy Expression and the Model

some minimization and dynamics runs, to assure acceptable gra-


dient accuracy. A value as low as 0.00025 may be preferable if your
minimization run fails to converge or your dynamics run misbe-
haves.
Nonbond interaction Due to the nature of the Ewald sum method, specific nonbond
energies interaction energies cannot be calculated. When this method is
used, the per-atom energy is calculated by using the Ewald sum
method, and the nonbond interaction energy is calculated using
the group-based method. You can specify cutoffs for the group-
based method of nonbond analysis. A large cutoff in the group-
based method may give reasonably accurate energies compared
with the Ewald sum method.

Ewald sum for models with 2D periodicity


Only for Coulombic terms If the model has 2D periodicity, an Ewald sum may be applied to
the Coulombic terms, using the method of F. Harris (communica-
tion). A non-Ewald method must be used for the van der Waals
terms in this case, and a large cutoff may be required to obtain
good accuracy.
Slow but accurate The method is similar in principle to the 3D Ewald sum, but more
complex in that one direction must be treated nonperiodically. To
avoid loss of accuracy, no cutoff is applied in the nonperiodic
direction. However, the method can still be optimized, since fewer
terms are required in the periodic sums for those atom pairs hav-
ing a large separation in the nonperiodic direction. Although the
method is slower than a non-Ewald method (or a 3D Ewald sum),
it provides very good accuracy.
Activated automatically The method is available in Cerius2•OFF and is activated automat-
ically if you select the Ewald method for Coulomb terms when the
current model has 2D periodicity.

148 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

Forcefield-Based Simulations/October 1997 149


3. Preparing the Energy Expression and the Model

150 Forcefield-Based Simulations/October 1997


Handling nonbond interactions

Forcefield-Based Simulations/October 1997 151


3. Preparing the Energy Expression and the Model

152 Forcefield-Based Simulations/October 1997


4 Minimization

This chapter explains This chapter concentrates on the static information that can be
extracted from the potential energy surface, as well as on the algo-
rithms used for this purpose. The main areas that are covered—
minimization and harmonic vibration calculations—are usually
lumped together as molecular mechanics, to differentiate them from
molecular dynamics calculations (Molecular Dynamics), in which the
time evolution of the system is considered.
This chapter explains how minimization is implemented and pre-
sents general procedures for performing minimization calcula-
tions. To make the most effective use of minimization, you should
read this entire chapter, which includes:
General minimization process
Minimization algorithms
General methodology for minimization
Energy and gradient calculation
Vibrational calculation

Related information Forcefields and Preparing the Energy Expression and the Model focus
on the representation of the potential energy surface and the useful
ways that it can be biased through the addition of restraints and
constraints, as well as other information on preparing the model
for calculations.
Specific information For specific information on setting up and running minimizations
with the various MSI simulation engines, please see the relevant
documentation (see Available documentation).

Forcefield-Based Simulations/October 1997 153


4. Minimization

Table 13. Finding information in Minimization section

If you want to know about: Read:


Simple 2D illustrations of minimization algorithms. Specific minimization example.
Minimization algorithms used by MSI simultion engines. Table 14.
The steepest-descents method of minimization. Steepest descents
The conjugate-gradient method. Conjugate gradient.
The full, iterative Newton–Raphson method. Iterative Newton-Raphson method.
Quasi (or pseudo) Newton–Raphson methods. Quasi-Newton–Raphson.
Flowcharts for Newton–Raphson methods. Figure 24; Figure 25.
Truncated Newton–Raphson methods. Truncated Newton–Raphson.
Before you begin a minimization. General methodology for minimization.
Using MSI’s simulation engines for minimization. Minimizations with MSI simulation engines.
Meaning of the minimized structure and its calculated Significance of minimum-energy struc-
energy. ture.
Using the various minimization algorithms. When to use different algorithms.
Precautions with large models. When to use different algorithms.
Precautions with models having bad conformations. When to use different algorithms.
Convergence problems. Starting structures and choice of force-
field; Failure to converge.
Disk-space storage requirements. Conjugate gradient vs. Newton–Raphson
and disk space.
Using different constraints/restraints at different stages When to use constraints/restraints.
of minimization.
Ending a run. Convergence criteria.
Using MSI’s minimization engines for vibrational calcu- General methodology for vibrational cal-
lations. culations.

Uses of minimization An important method for exploring the potential energy surface is
to find configurations that are stable points on the surface. This
means finding a point in the configuration space where the net
force on each atom vanishes. By adjusting the atomic coordinates
and unit cell parameters (for periodic models, if requested) so as to
reduce the model potential energy, stable conformations can be
identified.
Perhaps more important, the addition of external forces to the
model in the form of restraints (Preparing the Energy Expression and

154 Forcefield-Based Simulations/October 1997


General minimization process

the Model) allows for the development of a wide range of modeling


strategies using minimization strategies as the foundation for
answering specific questions. For example, the question “How
much energy is required for one molecule to adopt the shape of
another?” can be answered by forcing specific atoms to overlap
atoms of a template structure during an energy minimization.

General minimization process


Energy evaluatiom Minimization of a model is done in two steps. First, the energy
expression (an equation describing the energy of the system as a
function of its coordinates) must be defined and evaluated for a
given conformation. Energy expressions may be defined that
include external restraining terms to bias the minimization, in
addition to the energy terms.
Conformation adjustment Next, the conformation is adjusted to lower the value of the energy
expression. A minimum may be found after one adjustment or
may require many thousands of iterations, depending on the
nature of the algorithm, the form of the energy expression, and the
size of the model.
The efficiency of the minimization is therefore judged by both the
time needed to evaluate the energy expression and the number of
structural adjustments (iterations) needed to converge to the min-
imum.

Specific minimization example


To introduce the various minimization algorithms, the application
of each algorithm to the minimization of a pure quadratic function
in two dimensions is discussed. Although the energy surface is
most certainly anharmonic in regions away from the minimum, it
may be considered to be locally harmonic at the minimum.
A simple illustration in two Rather than a complex energy expression, the target function used
dimensions in this illustration is an elliptical surface in two dimensions, as
described by Eq. 60:

E(x ,y) = x 2 + 5y 2 Eq. 60

Forcefield-Based Simulations/October 1997 155


4. Minimization

Begin with the function This simple function illustrates the properties of the minimization
and an initial guess of its algorithms and captures the mathematical essence of the formula-
minimum tions. Every minimization begins with some energy expression
analogous to Eq. 60. In addition to an energy expression defining
the energy surface, a starting set of coordinates—an initial
guess—for (x,y) must be provided.
Figure 19 is a contour plot of the energy E in the (x,y) plane. Each
ellipse is spaced two energy units apart and represents a locus of
points having the same energy. (This is analogous to a contoured
topographical map.)

2.0

0.0 2 4 6 8 10 12 14 16 20
y

-2.0
-5.0 0.0 5.0
x
Figure 19. Energy contour surface of a simple function
An energy contour surface for the function x2 + 5y2. Each contour represents an
increase of two arbitrary energy units.

Of course, the minimum of this simple function is trivial and can


be deduced by inspection to be (0,0).
The minimizer must find Given an energy expression that defines the energy surface (such
the direction to the mini- as in Figure 19) and an initial starting point, a minimizer must
mum and its distance determine both the direction towards a minimum and the distance
from the initial guess to the minimum in that direction. A good initial direction is simply
the slope or derivatives of the function at the current point. The
derivatives of Eq. 60 are a two-dimensional vector:

156 Forcefield-Based Simulations/October 1997


General minimization process

∇E = ( 2x, 10y ) Eq. 61

Here, the derivatives are proportional to the coordinates, so that,


the further you are from the minimum, the larger the derivatives.
Improve efficiency by The derivatives, however, merely point downhill and not neces-
finding how the deriva- sarily towards the minimum (see Figure 20). Thus, as you move in
tives change the direction of the initial derivatives, the new derivatives change
and point in yet a new direction. In order to improve the efficiency,
the more sophisticated algorithms such as conjugate gradients and
Newton–Raphson use information on how the derivatives change,
to determine the direction.

Line search
Before detailing the different algorithms, the concept of a line
search is introduced. Line searches are an implicit component of
most minimizers.
Most minimizers use line Minimizers usually have two major components. The more
searches generic part is the so-called line search, which actually changes the
coordinates to a new lower-energy structure. As an illustration,
consider Figure 20 in which the gradient direction from an arbi-
trary starting point has been superimposed on our elliptic func-
tion. The starting point (x0, y0) is defined as point a.
What is a line search? In simple terms, a line search amounts to a one-dimensional mini-
mization along a direction vector determined at each iteration. For
the path shown in Figure 20, it would be along derivative vector
(2x,10y), and the one-dimensional surface can be expressed para-
metrically in terms of coordinate α (see also Figure 21):

 
 
( x′, y′ ) =  x + α ∂E , y0 + α
∂E  Eq. 62
 0 ∂x x 0, y 0
∂y x 0, y 0

 

where (x′, y′) are coordinates along the line away from the current
point (x0, y0) in the direction of the derivative at (x0, y0).
If the energy of these new points is calculated and plotted as a
function of α, the curve in Figure 21 is obtained.

Forcefield-Based Simulations/October 1997 157


4. Minimization

2.0
b

a (x0,y0)

0.0
y

d
-2.0
-5.0 0.0 5.0
x
Figure 20. Energy surface for Eq. 60
The derivative vector from the initial point a (x0, y0) defines the line search direc-
tion. Note that the derivative vector does not point directly toward the mini-
mum. Compare this representation with that in Figure 21, where the line (b–a–c–
d) is searched in one dimension for the minimum. Note that the minimum (point
c) occurs precisely at the point where the derivative vector is tangent to the
energy contours, which implies that the subsequent derivative vectors are
orthogonal to the previous derivatives.

The minimum along α (that is, c) coincides with the point at which
the line is tangential to the energy contour. Because the maximum
derivative’s direction is perpendicular to the search line at this
point, each line search is orthogonal to the previous one. This is an
important property of line searches, which is also included in the
discussion of the conjugate gradient algorithm under Conjugate
gradient.
Efficiency and cost of Line searches do not depend on the algorithm that generated the
extensive line searches direction vector. The general strategy is to simply bracket the one-
dimensional minimum between two points higher in energy, for
example points b and d in Figure 21. Then, by a set of successive
iterations, the actual minimum is approached (e.g., starting at
point a, the first step might lead to b, then the direction might
reverse to lead to d and finally to c). Extensive line searches are
attractive because they extract all the energy from one direction
before moving on to the next. Also, the fact that the new deriva-

158 Forcefield-Based Simulations/October 1997


Minimization algorithms

b
8

E(x′,y′)
a(x0,y0) d
7
c

6
-2 0 2 4 6
α
Figure 21. Cross section of the energy surface as defined by the
intersection of the line search path in Figure 20 with the energy surface
The independent variable α is a one-dimensional parameter that is adjusted so
as to minimize the value of the function E (x’, y’), where x’ and y’ are parame-
terized in terms of α in Eq. 62. Point a corresponds to the initial point (when α is
0), and point c is the local one-dimensional minimum. Points b and d, along with
a, bound the minimum and form the basis for an iterative search for the mini-
mum.

tives are always perpendicular to the previous directions produces


an efficient path to the minimum for surfaces that are approxi-
mately quadratic. In practice, however, line searches are costly in
terms of the number of function evaluations that must be per-
formed. The energy must be evaluated at 3–10 points to precisely
locate the one-dimensional minimum, and thus extensive line
searches are inefficient.

Minimization algorithms
Only minimization algorithms used in MSI’s simulation engines
(Table 14) are considered here:
Steepest descents
Conjugate gradient
Newton–Raphson methods

Forcefield-Based Simulations/October 1997 159


4. Minimization

Table 14. Minimization algorithms used by MSI simulation engines

simulation engine
algorithm variant
CHARMm Discover OFF MMFF93
steepest descents √ √ √ √
conjugate gradient Polak–Ribiere √ √
Fletcher–Reeves √ √ √ √
Powell √ √
Newton–Raphson full, iterative √ √ √ √
BFGS (quasi) √ √
DFP (quasi) √ √
truncated √ √ √
ABNR √ √ √

“Iteration” defined To be consistent in discussions of efficiency, a minimization iteration


must be explicitly defined. That is, an iteration is complete when
the direction vector is updated. For minimizers using a line search,
each completed line search is therefore an iteration. Iterations
should not be confused with function evaluations.
Choosing the minimiza- Access to controls used to specify what minimizer(s) to use is pre-
tion algorithm(s) sented under General methodology for minimization.

Steepest descents
In the steepest-descents method, the line search direction is
defined along the direction of the local downhill gradient –∇
E(xi, yi). Figure 22 shows the minimization path followed by a
steepest-descents approach for the simple quadratic function. As
expected, each line search produces a new direction that is perpen-
dicular to the previous gradient; however, the directions oscillate
along the way to the minimum. This inefficient behavior is charac-
teristic of steepest descents, especially on energy surfaces having
narrow valleys.
Increased efficiency with What would happen if the line search were eliminated and the
truncated line searches position would simply be updated any time that the trial point
along the gradient had a lower energy? The advantage would be
that the number of function evaluations performed per iteration
would be dramatically decreased. Furthermore, by constantly

160 Forcefield-Based Simulations/October 1997


Minimization algorithms

2.0

0.0
y

-2.0
-5.0 0.0 5.0
x
Figure 22. Minimization path following a steepest-descents path
When complete line searches starting from point a are used, the minimum is
reached in about 12 iterations. Here, where a rigorous line search is carried out,
approximately 8 function evaluations are needed for each line search using a
quadratic interpolation scheme. Note how steepest descents consistently over-
shoots the best path to the minimum, resulting in an inefficient, oscillating trajec-
tory.

changing the direction to match the current gradient, oscillations


along the minimization path might be damped.
The result of such a minimization is shown in Figure 23. The min-
imization begins from the same point as in Figure 22, but each line
search uses, at most, two function evaluations (if the trial point has
a higher energy, the step size is adjusted downward and a new
trial point is generated) (Levitt and Lifson, 1969). Note that the
steps are more erratic here, but the minimum is reached in roughly
the same number of iterations. The critical aspect, however, is that
by avoiding comprehensive line searches, the total number of
function evaluations is only 10–20% of that used by the rigorous
line search method.
A slow but robust method The exclusive reliance of steepest descents on gradients is both its
weakness and its strength. Convergence is slow near the minimum
because the gradient approaches zero, but the method is extremely
robust, even for systems that are far from harmonic. It is the

Forcefield-Based Simulations/October 1997 161


4. Minimization

2.0

0.0
y

-2.0
-5.0 0.0 5.0
x
Figure 23. Minimization path following a steepest-descents path without
line searches
The searching starts from point a and converges on the minimum in about 12
iterations. Although the number of iterations is slightly larger than in Figure 22,
the total minimization is five times faster since, on average, each iteration uses
only 1.3 function evaluations. Note that, in most applications in molecular
mechanics, the function evaluation is the most time-consuming portion of the
calculation.

method most likely to generate a lower-energy structure regard-


less of what the function is or where the process begins. Therefore,
the steepest-descents method is often used when the gradients are
large and the configurations are far from the minimum. This is
commonly the case for initial relaxation of poorly refined crystal-
lographic data or for graphically built models. In fact, as explained
in the following sections, more advanced algorithms are often
designed to begin with steepest descents as the first step.

162 Forcefield-Based Simulations/October 1997


Minimization algorithms

Tip
The conjugate gradient method (below) and the iterative and
quasi Newton–Raphson methods assume that the conformation
is close enough to a local minimum that the potential energy
surface is very nearly quadratic. Hence, steepest descents
should generally be used for the first 10–100 steps of
minimization (depending on the size of the model and how
distorted its starting conformation is). Please see also When to
use different algorithms.

Conjugate gradient
The reason that the steepest-descents method converges slowly
near the minimum is that each segment of the path tends to reverse
progress made in an earlier iteration. For example, in Figure 22,
each line search deviates somewhat from the ideal direction to the
minimum. Successive line searches correct for this deviation, but
they cannot efficiently correct because each direction must be
orthogonal to the previous direction. Thus, the path oscillates and
continually overcorrects for poor choices of directions in earlier
steps.
Increasing the efficiency It would be preferable to prevent the next direction vector from
of line searches by con- undoing earlier progress. This means using an algorithm that pro-
trolling the choice of new duces a complete basis set of mutually conjugate directions such
direction that each successive step continually refines the direction toward
the minimum. If these conjugate directions truly span the space of
the energy surface, then minimization along each direction in turn
must by definition end in arriving at a minimum. The conjugate
gradient algorithm constructs and follows such a set of directions.
In conjugate gradients, hi+1, the new direction vector leading from
point i+1, is computed by adding the gradient at point i+1, gi+1, to
the previous direction hi scaled by a constant γi:

hi + 1 = gi + 1 + γ i hi Eq. 63

Polak–Ribiere method where γi is a scalar that can be defined in two ways. In the Polak–
Ribiere method, γi is defined as:

( g i + 1 – g i )g i + 1
γ i = --------------------------------------- Eq. 64
g i ⋅ gi

Forcefield-Based Simulations/October 1997 163


4. Minimization

Fletcher–Reeves method And in the Fletcher–Reeves method (1964), γi is defined as:

gi + 1 ⋅ gi + 1
γ i = ---------------------------- Eq. 65
g i ⋅ gi

(Fletcher 1980). Although the two conjugate gradient methods


have similar characteristics, one or the other might behave better
in certain cases.
This direction is then used in place of the gradient in Eq. 62, and a
new line search is conducted. This construction has the remarkable
property that the next gradient, gi+1, is orthogonal to all previous
gradients, g0, g1, g2, …, gi, and that the next direction, hi+1, is con-
jugate to all previous directions, h0, h1, h2, …, hi. Thus, the term
conjugate gradient is somewhat of a misnomer. The algorithm pro-
duces a set of mutually orthogonal gradients and a set of mutually
conjugate directions. This method converges in approximately N
steps, where N is the numbr of degrees of freedom.
Powell method The Powell method (used in CHARMm and Cerius2•MMFF; see
Powell 1977 and Gunsteren & Karplus 1980) is essentially a strat-
egy for handling convergence problems.
Conjugate gradient best Conjugate gradients is the method of choice for large models
for large models because, in contrast to Newton–Raphson methods, where storage
of a second-derivative matrix (N (N + 1) ⁄ 2) is required, only the
previous 3N gradients and directions have to be stored. However,
to ensure that the directions are mutually conjugate, more com-
plete line search minimizations must be performed along each
direction. Since these line searches consume several function eval-
uations per search, the time per iteration may be longer for conju-
gate gradients than for steepest descents. This is more than
compensated for by the more efficient convergence to the mini-
mum achieved by conjugate gradients.

Tip
The conjugate gradient method can be unstable if the
conformation is so far away from a local minimum that the
potential energy surface is not nearly quadratic. Steepest
descents (Steepest descents) should generally be used for the first
10–100 steps of minimization. Please see also When to use
different algorithms.

164 Forcefield-Based Simulations/October 1997


Minimization algorithms

Newton–Raphson methods

Iterative Newton-Raphson method


Using the second deriva- As a rule, N 2 independent data points are required to numerically
tives to accelerate con- solve a harmonic function with N variables. Since a gradient is a
vergence vector N long, the best you can hope for in a gradient-based mini-
mizer is to converge in N steps. However, if you can exploit sec-
ond-derivative information, a minimization could ideally
converge in one step, because each second derivative is an N × N
matrix. This is the principle behind the variable metric minimiza-
tion algorithms, of which Newton–Raphson is perhaps the most
commonly used.
Another way of looking at Newton–Raphson is that, in addition to
using the gradient to identify a search direction, the curvature of
the function (the second derivative) is also used to predict where
the function passes through a minimum along that direction. Since
the complete second-derivative matrix defines the curvature in
each gradient direction, the inverse of the second-derivative
matrix can be multiplied by the gradient to obtain a vector that
translates directly to the nearest minimum. This is expressed
mathematically as:

r min = r 0 – A – 1 ( r 0 ) ⋅ ∇E ( r 0 ) Eq. 66

where rmin is the predicted minimum, r0 is an arbitrary starting


point, A(r0) is the matrix of second partial derivatives of the energy
with respect to the coordinates at r0 (also known as the Hessian
matrix), and ∇E(r0) is the gradient of the potential energy at r0.
Iteration required The energy surface is generally not harmonic, so that the mini-
because energy surface is mum-energy structure cannot be determined with one Newton–
not harmonic Raphson step. Instead, the algorithm must be applied iteratively:

r i = r i – 1 – A – 1 ( r i – 1 ) ⋅ ∇E ( r i – 1 ) Eq. 67

Thus, the ith point is determined by taking a Newton–Raphson


step from the previous point (i - 1). Similar to conjugate gradients,
the efficiency of Newton–Raphson minimization increases as con-
vergence is approached (Ermer 1976).

Forcefield-Based Simulations/October 1997 165


4. Minimization

Drawbacks of pure New- As elegant as this algorithm appears, its application to molecular
ton–Raphson method modeling has several drawbacks. First, the terms in the Hessian
matrix are difficult to derive and are computationally costly for
molecular forcefields. Furthermore, when a structure is far from
the minimum (where the energy surface is anharmonic), the mini-
mization can become unstable.
For example, when the forces are large but the curvature is small,
such as on the steep repulsive wall of a van der Waals potential, the
algorithm computes a large step (a large gradient divided by the
small curvature) that may overshoot the minimum and lead to a
point even further from the minimum than the starting point.
Thus, the method can diverge rapidly if the initial forces are too
high (or the surface too flat).
Finally, calculating, inverting, and storing an N × N matrix for a
large system can become unwieldy. Even taking into account that
the Hessian is symmetric and that each of the tensor components
is also symmetric, the storage requirements scale as approximately
3N 2 for N atoms. Thus, for a 200-atom system, 180,000 words are
required. The Hessian alone for a 1,000-atom system already
approaches the limits of a Cray-XMP supercomputer, and a
10,000-atom system is currently intractable.
Pure Newton–Raphson is reserved primarily for calculations
where rapid convergence to an extremely precise minimum is
required, for example, from initial derivatives of 0.1 kcal mol-1 Å-1
to 10-8 kcal mol-1 Å-1. Such extreme convergence is necessary
when performing vibrational normal mode analysis, where even
small residual derivatives can lead to errors in the calculated
vibrational frequencies.

Tip
The iterative Newton–Raphson method can be unstable if the
conformation is so far away from a local minimum that the
potential energy surface is not nearly quadratic. Steepest
descents (Steepest descents) should generally be used for the first
10–100 steps of minimization. Please see also When to use
different algorithms.

Variants of iterative Newton-Raphson method


In addition to the iterative Newton–Raphson method, variants of
the Newton method are available (Table 14): the quasi-Newton

166 Forcefield-Based Simulations/October 1997


Minimization algorithms

(which includes the BFGS and DFP methods) and the truncated
Newton methods. These variants, as well as others, are character-
ized by the use of the general algorithm shown in Figure 24.

Figure 24. General algorithm for variants of Newton–Raphson method

1. Supply an initial guess r0.


2. Test for convergence.
3. Compute an approximation Hessian A that is positive-definite.
4. Solve for the search direction pk so that:

Ak pk + gk < φk gk

where φk is some prescribed quantity that controls the accuracy of the


computed pk.
5. Compute an appropriate step length λk so that the energy decreases by a
sufficient amount.
6. Increment the coordinates:

xk + 1 = xk + λ k pk

7. Go to Step 2.

The differences among the various Newton methods revolve


around:
♦ How A is calculated and how its positive-definite character is
preserved during the minimization.
♦ How the search direction is determined and to what accuracy
(i.e., φk).
♦ How the line search is carried out and how exact it is.

Quasi-Newton–Raphson The quasi-Newton–Raphson method


follows the basic idea of the conjugate-gradients method by using
the gradients of previous iterations to direct the minimization
along a more efficient pathway. However, the use of the gradients

Forcefield-Based Simulations/October 1997 167


4. Minimization

is within the Newton framework. In particular, a matrix B approx-


imating the inverse of the Hessian (A-1) is constructed from the
gradients using a variety of updating schemes. This matrix has the
property that, in the limit of convergence, it is equivalent to A-1, so
that, in this limit, the method is equivalent to the Newton–Raph-
son method. Another property of B is that it is always positive-def-
inite and symmetric by construction, so that successive steps in the
minimization always decrease the energy.
Of the several different updating schemes for determining B, the
two most common ones are the Broyden, Fletcher, Goldfarb, and
Shanno (BFGS, also known as VA09A) and the Davidon, Fletcher,
and Powell (DFP) algorithms. The BFGS method uses a Fletcher–
Powell algorithm with approximate second derivatives.
DFP updating scheme Defining δ and γ as the changes in the coordinates and gradients
for successive iterations, the approximate Hessians (B-1) are given
by the following in the DFP method:

T
δδ T B k γγ B k
Bk + 1 = B k + --------- – --------------------- Eq. 68
δT γ γT Bkγ

BFGS updating scheme and in the BFGS method:

T T T
 γ B k γ  δδ T  δγ B k + B k γδ 
Bk + 1 - –  -------------------------------------
= B k +  1 + ----------------- -------- Eq. 69
 δT γ  δT γ  δT γ 

In practice, the BFGS method is preferred over the DFP method,


because BFGS has been shown to converge globally with inexact
line searches, while DFP has not.
Advantages and disad- The quasi-Newton–Raphson method has an advantage over the
vantages conjugate-gradient method in that it has been shown to be qua-
dratically convergent for inexact line searches. Like the conjugate-
gradient method, the method also avoids calculating the Hessian.
However, it still requires storage proportional to N 2 (N = number
of degrees of freedom), and the updated Hessian approximation
may become singular or indefinite even when the updating
scheme guarantees hereditary positive-definiteness. Finally, the
behavior may become inefficient in regions where the second
derivatives change rapidly. Thus, this minimizer is used in prac-

168 Forcefield-Based Simulations/October 1997


Minimization algorithms

tice as a bridge between the iterative Newton–Raphson and the


conjugate-gradient methods.

Tip
The quasi Newton–Raphson method can be unstable if the
conformation is so far away from a local minimum that the
potential energy surface is not nearly quadratic. Steepest
descents (Steepest descents) should generally be used for the first
10–100 steps of minimization. Please see also When to use
different algorithms.

Truncated Newton–Raphson The truncated Newton–Raphson


method (Figure 25) differs from the quasi-Newton–Raphson
method in two respects:
♦ First, the solution of the line search direction is done iteratively
using the conjugate-gradient method.
♦ Second, the elements in the Hessian are not constructed from
previous gradients.
Increased stability and By using the second derivatives to generate the Hessian, the mini-
speed mization is more stable far away from the minimum or in regions
where the derivatives change rapidly. Since solving Eq. 67 by
inversion is not tractable for large models, the search direction is
solved iteratively using the conjugate-gradient method. Further-
more, to increase the speed in solving the search direction, the tol-
erance for conjugate-gradient convergence is dependent on the
proximity to the minimum. The tolerance is relatively large when
the minimization is far from the solution and decreases during
convergence. This is appropriate, because the dependency of con-
vergence on the line search direction becomes greater at the end of
the minimization. At the beginning, it is more efficient to take
more less-well-defined Newton steps than to take fewer well-
defined steps.
To further increase the convergence rate of the conjugate-gradient
minimization, the Newton equation that is solved for each line
search direction is preconditioned with a matrix M. The matrix M
may range in complexity from the identity matrix (M = I) to M =
H. In the first limit, the Newton equation remains unchanged.
However, in the second, there is no saving in computational effort,
because M-1 H = I must be solved.

Forcefield-Based Simulations/October 1997 169


4. Minimization

Figure 25. Flowchart of truncated Newton–Raphson method

1. Initialize variables for the outer Newton loop.


2. Calculate gradient gk, Hessian Hk, and preconditioner Mk.
Test for Newton convergence (gkmax ≤ ε) and exit if true.
3. Determine line search direction pk by using the conjugate-gradient method to solve:
M k–1 H k p k = – M k–1 g k
a. Initialize conjugate gradient variables for inner conjugate-gradients loop.
b. Calculate conjugate-gradient gradient (r0 = -gk):
r j + 1 = r j – γ j Hd j
r jT z j
γ j = ---------------
d jT Hd j
and construct the Newton line search direction using the conjugate-gradient direction:

p j + 1 = pj + γj d j
c. Test for convergence of the conjugate-gradient inner loop:
if:
rj + 1 ≤ φ gk
then:
pk = pj + 1
and go to Step 4.
d. Begin next iteration of conjugate gradients.
Solve for Mzk = rk for zk and construct new conjugate direction:
d j + 1 = zj + 1 + βj d j
zj + 1
βj = r jT+ 1 ----------
-
r jT z j

4. Next iteration in the outer Newton loop:


Determine length of step along line search direction and increment coordinates:
xk + 1 = xk + λk pk
5. Go to Step 2.

170 Forcefield-Based Simulations/October 1997


General methodology for minimization

Please see also When to use different algorithms.


ABNR similar to truncated The adopted-basis Newton–Raphson method (ABNR) is similar to
Newton–Raphson the truncated Newton–Raphson method. The ABNR method per-
forms energy minimization using a Newton–Raphson algorithm
applied to a subspace of the coordinate vector spanned by the dis-
placement coordinates of the last positions. The second derivative
matrix is constructed numerically from the change in the gradient
vectors, and is inverted by an eigenvector analysis that allows the
routine to recognize and avoid saddle points in the energy surface.
At each step, the residual gradient vector is calculated and used to
add a steepest-descents step, incorporating new direction into the
basis set.
Because ABNR avoids the large storage requirements of the full
Newton–Raphson second derivative method, larger systems can
be minimized more efficiently. ABNR is the method of choice if
storage is a problem.

General methodology for minimization


Many issues are involved in designing an appropriate simulation
strategy for a given model, some of which have to do only with the
minimization algorithms themselves:
Minimizations with MSI simulation engines
When to use different algorithms
Convergence criteria
Significance of minimum-energy structure

Prerequisites One of the most important steps in any simulation is properly pre-
paring the model to be simulated. Calculations on the fastest com-
puter running the most efficient minimization algorithm may be
worthless if the hydrogen is put on the wrong nitrogen or an
important water molecule is omitted.
Unfortunately, it is impossible to provide a single recipe for a suc-
cessful model—too much depends on the objectives and expecta-
tions of each calculation. Are energies to be compared

Forcefield-Based Simulations/October 1997 171


4. Minimization

quantitatively? What is the hypothesis being tested? The effects of


tethering, fixing, energy cutoffs, etc. on the results can be
answered only by controlled preliminary experiments.
Considerations For simulation strategies that involve minimization, several con-
siderations must be addressed, including:
♦ When to use constraints and restraints (When to use constraints/
restraints).
♦ Which minimization algorithm(s) to use (When to use different
algorithms).
♦ What criteria to use for judging convergence of the minimiza-
tion (Convergence criteria).
♦ The significance of the minimum-energy structure and its cal-
culated energy (Significance of minimum-energy structure).

Minimizations with MSI simulation engines


Prerequisites To set up a minimization run, first:

1. Choose the desired forcefield if you don’t want to use the


default forcefield (Forcefields).

2. Set up the forcefield and prepare your model (Preparing the


Energy Expression and the Model).

Accessing minimization Next:


controls
3. Specify items such as the minimization algorithm(s)
(Minimization algorithms and When to use different algorithms)
and run-termination criteria (Convergence criteria) (unless you
want your calculation to run under the default conditions).
To find the relevant controls in the different molecular model-
ing programs:
♦ For Cerius2•OFF, go to the OFF METHODS deck of cards and
choose the MINIMIZER card. Select the Run menu item to
access the Energy Minimization control panel. You can access
additional tools, such as for setting the minimization method,
by clicking the Preferences… pushbuttons in this control panel.

172 Forcefield-Based Simulations/October 1997


General methodology for minimization

♦ In the Cerius2•MMFF module, select the Run menu item to


access the MMFF Energy Minimization control panel. You can
access additional tools, such as for setting the minimization
method, by clicking the Controls… pushbutton in this control
panel or selecting Controls on the MMFF card.
♦ For Cerius2•Discover, go to the DISCOVER deck of cards.
Select the Run menu item on the DISCOVER card to access the
Run Discover control panel. Set the Task popup to Minimiza-
tion. You can access additional tools, such as for setting the
minimization method, by clicking the More… pushbutton to
the right of this popup to open the Discover Minimize control
panel.
♦ In the Insight•Discover_3 module, select the Calculate/Mini-
mize command. Toggle More on to access additional controls.
Set the controls as desired and select Execute.
Alternatively, for a simple minimization run, select the Strat-
egy/Simple_Minimize command. Set the controls as desired.
♦ In the Insight•Discover module, select the Parameters/Mini-
mize command. Set the controls as desired and select Execute.
♦ In QUANTA, select the CHARMm/Minimization Options
menu item. Select the desired minimization method and other
options in the CHARMm Minimization Setup dialog box, then
click OK.
Discover and CHARMm offer additional functionality when
run in standalone mode. (How to run Discover and CHARMm
in standalone mode is documented separately—see Additional
information.)

Specifying output 4. Specify the desired output:


♦ In Cerius2•OFF, select the Output menu item on the MINI-
MIZER card.
♦ In the Cerius2•MMFF module, click the Output… button in the
MMFF Energy Minimization control panel or select Output on
the MMFF card.
♦ In Cerius2•Discover, set the Task popup in the Run Discover
control panel to Minimization. Then click the Output… push-
button to access the Discover Minimize Output control panel.

Forcefield-Based Simulations/October 1997 173


4. Minimization

♦ In the Insight•Discover_3 module, use the Analyze/Output


command (it may not be accessible until after you have exe-
cuted the Calculate/Minimize command). Set the controls as
desired and select Execute.
♦ In the Insight•Discover module, use the Run/Report and/or
Run/Files commands. Set the controls as desired and select
Execute.
♦ In QUANTA, limited control over output is available via the
CHARMm/Initialization Options menu item.
Discover and CHARMm offer additional functionality when
run in standalone mode.

Starting a minimization run Finally:


5. Start the minimization run:
♦ In Cerius2•OFF, click the Minimize the Energy action button in
the Energy Minimization control panel.
♦ In the Cerius2•MMFF module, click the MINIMIZE button in
the MMFF Energy Minimization control panel.
♦ In Cerius2•Discover, set the Task popup in the Run Discover
control panel to Minimization. Then click the RUN pushbut-
ton.
♦ In the Insight•Discover_3 module, execute the D_Run/Run
command.
Alternatively, for a simple minimization run, select Execute in
the Strategy/Simple_Minimize command.
♦ In the Insight•Discover module, select the Run/Run command.
Set the controls as desired, being sure that Run_Minimization
is on and Run_Dynamics is off, and select Execute.
♦ In QUANTA, go to the Modeling window and click CHARMm
Minimization.
Specific information For specific information on setting up and running minimizations
with the various MSI simulation engines and on examining the
results, please see the relevant documentation (see Available docu-
mentation).

174 Forcefield-Based Simulations/October 1997


General methodology for minimization

When to use different algorithms


The default minimizers in Discover and OFF use a cascade of
appropriate minimization algorithms in sequence. However, you
may want to exercise more control over your simulation.
Model size and distance The choice of which algorithm to use depends on two factors—the
from the minimum size of the model and its current state of optimization. The conju-
gate gradient and steepest descents methods can be used with
models of any size. Most Newton–Raphson methods cannot be
used with very large models, because they need sufficient disk
space to store a second-derivative matrix. (The ABNR method
does not store a large second-derivative matrix.)
Until the derivatives are well below 100 kcal mol-1Å-1, it is likely
that the point is sufficiently distant from a minimum that the
energy surface is far from quadratic. Algorithms that assume the
energy surface to be quadratic (Newton–Raphson, quasi-Newton–
Raphson, conjugate gradient) can be unstable when the model is
far from the quadratic limit. The Newton–Raphson method is par-
ticularly sensitive because it must invert the Hessian matrix.
Therefore, as a general rule, steepest descents is often the best min-
imizer to use for the first 10–100 steps, after which the conjugate
gradient and/or a Newton–Raphson minimizer can be used to
complete the minimization to convergence. The truncated New-
ton–Raphson minimizers are often the best Newton–Raphson
methods for most applications.
Starting structures and For highly distorted structures, the presence of cross terms and
choice of forcefield Morse bond potentials in the forcefield can cause convergence
problems. These functional forms can produce either small restor-
ing forces or, more seriously, minima at nonphysical points on the
potential energy surface.
Thus, in addition to using steepest descents for such distorted
structures, you also ought to use a forcefield with a simple, qua-
dratic functional form. How to choose and set up forcefields is cov-
ered in Preparing the Energy Expression and the Model.
Conjugate gradient vs. Several practical aspects of the conjugate gradient method are
Newton–Raphson and worth mentioning. First, the conjugate gradient algorithm requires
disk space convergence along each line search before continuing in the next
direction. The gradient at step i+1 must be perpendicular to hi or

Forcefield-Based Simulations/October 1997 175


4. Minimization

the derivation guaranteeing a conjugate set of directions breaks


down. Second, to start conjugate gradients, an initial direction h0
must be chosen that is equal to the initial gradient. Finally, addi-
tional storage is required for an extra vector of N elements to hold
the N components of the old gradient. For energy minimization in
Cartesian space this would be the 3N derivatives of the energy
with respect to the x, y, and z coordinates of each atom. This makes
conjugate gradient the method of choice for systems that are too
large for storing and manipulating a second-derivative matrix, as
is required by the Newton–Raphson minimizers.
The general memory requirements of all the minimizers are listed
in Table 15.

Table 15. General storage requirements of minimization algorithms

algorithm variant memory needed for scales asa


steepest descents first derivatives 3N
conjugate gradient Polak–Ribiere first derivatives, gradient from previous iteration 3N
Fletcher–Reeves first derivatives, gradient from previous iteration 3N
Powell first derivatives, gradient from previous iteration 3N
Newton–Raphson full, iterative Hessian, eigenvectors (3N) 2
BFGS first derivatives, Hessian update, scratch vectors (3N) 2
DFP first derivatives, Hessian update, scratch vectors (3N) 2
truncated Hessian (3N) 2
ABNR first derivatives 3N
a
N = number of atoms (number of degrees of freedom).

Failure to converge Also, note that the derivation invokes a quadratic approximation.
For nonharmonic systems, the conjugate gradient method can
exhaustively minimize along the conjugate directions without
converging. This condition indicates that the minimizer may have
gotten stuck at a saddle point. If this occurs, you can restart the
algorithm. Several minimizations may be required. For a detailed
discussion of this algorithm, see the excellent text by Press et al.
(1986) or the somewhat more formal treatment by Fletcher (1980).

176 Forcefield-Based Simulations/October 1997


General methodology for minimization

Convergence criteria
Mathematical definition In the literature a wide variety of criteria have been used to judge
minimization convergence in molecular modeling. Mathemati-
cally, a minimum is defined as the point at which the derivatives
of the function are zero and the second-derivative matrix is posi-
tive definite. Nongradient minimizers can use only the increment
in the energy and/or coordinates as criteria. In gradient minimiz-
ers, derivatives are available analytically and should be used
directly to assess convergence.
Application to chemical In a molecular minimization, the atomic derivatives may be sum-
models marized as an average, a root-mean-square (rms) value, or the
largest value. The average, of course, must be an average of the
absolute values of the derivatives, because the distribution of
derivatives is symmetric about zero. A rms derivative is a better
measure than the average, because it weights larger derivatives
more, and it is therefore less likely that a few large derivatives
would escape detection, which can occur with simple averages.
Regardless of whether you choose to report convergence in terms
of the average or rms values of the derivatives, you should always
check that the maximum derivative is not unreasonable. There can
be no ambiguity about the quality of the minimum if all deriva-
tives are less than a given value.
How close to absolute The more difficult question is, What value of the average or rms
convergence is good derivative constitutes convergence? The specific value depends on
enough? the objective of the minimization. If you simply want to relax over-
lapping atoms before beginning a dynamics run, minimizing to a
maximum derivative of 1.0 kcal mol-1 Å-1 is usually sufficient.
However, to perform a normal mode analysis, the maximum
derivative must be less than 10-5, or the frequencies may be shifted
by several wavenumbers.
Local or global minimum? There is no guarantee that the minimum you find is necessarily a
global minimum.
Small models can be minimized to a global minimum. However,
multiple minimizations from different starting conformations
should be run to confirm that a global minimum has indeed been
found.

Forcefield-Based Simulations/October 1997 177


4. Minimization

Larger models can often be minimized to several different confor-


mations that a molecule might assume at 0 K. A global minimum
may never be found for large models, because of the complexity of
the potential energy surface. However, these many minima can be
useful in understanding the molecule’s conformational “space”.
Please see also Failure to converge.
Other termination criteria You generally also set a maximum number of iterations, so that
runs that do not converge will nevertheless end within a reasonabe
amount of time. That is, the run ends when either the convergence
criteria or the maximum number of iterations is reached, which-
ever occurs first.
The BTCL language can be used to access the Discover Minimize
database during a minimization run, so as to implement sophisti-
cated customized stopping strategies.

Significance of minimum-energy structure


Calculated energy is rela- In dealing with macromolecular optimization calculations, it is
tive important to keep in mind the theoretical significance of the mini-
mum-energy structure and its calculated energy. For all forcefields
used in calculations of this type, the energy zero is arbitrary, and
therefore, the total potential energy of different models cannot be
compared directly. However, it is meaningful to make compari-
sons of energies calculated for different configurations of chemi-
cally identical models. In principle, the calculated energy of a fully
minimized structure is the classical enthalpy at absolute zero,
ignoring quantum effects (in particular the zero-point vibrational
motion). For a model that is sufficiently small that its normal
modes can be calculated, quantum corrections for zero-point
energy and the free energy at higher temperatures can be taken
into account (Hagler et al. 1979c).
Precautions in ligand- The minimized energies calculated for enzyme–substrate com-
binding calculations plexes can be used to estimate relative binding enthalpies, but
there are two caveats:
♦ First, for a meaningful comparison of the relative binding of
two different substrates, a complete thermodynamic cycle must
be considered (Kirkwood 1935, Quirke and Jacucci 1982, Tembe
and McCammon 1984, Mezei and Beveridge 1986).

178 Forcefield-Based Simulations/October 1997


Energy and gradient calculation

In practical terms, this means that an enthalpy calculation must


be made for the various substrates in water. Where the relative
binding of two different enzymes to the same substrate is calcu-
lated, the energy of each enzyme with solvent in the binding
site must be calculated.
Entropy is usually ♦ A second consideration for using minimization results to esti-
neglected mate relative binding strengths is that the entropy is neglected
in such calculations. Direct calculation of entropy differences is
a computationally intensive process, and only recently has it
been taken into account correctly by calculations of relative free
energies (Hagler et al. 1979c, Hwang and Warshel 1987,
Warshel et al. 1986, Singh et al. 1987, Straatsma et al. 1986).
The extent of the errors introduced by neglecting entropic con-
tributions in the simpler minimization calculations is difficult
to estimate, although, as with the zero-point energy, the
entropy can be estimated for a model small enough that its nor-
mal mode frequencies can be calculated (Hagler et al. 1979c).
What is the purpose of The relative importance of these fundamental considerations
your calculation? depends on the objective of the calculation. When studying the rel-
ative binding in an enzyme active site of two substrates, one of
which is flexible and the other rigid, entropic effects may be crucial
for obtaining even qualitative agreement with experimental bind-
ing constants. On the other hand, if a putative compound overlaps
sterically with many active-site atoms and causes hundreds of
kilocalories of strain energy even in a minimized structure, the
compound can be rejected confidently.
The bottom line is that physical-chemistry common sense cannot
be abandoned when you are setting up a calculation and interpret-
ing the results.

Energy and gradient calculation


An energy calculation is essentially just a zero-iteration minimiza-
tion. It is used to calculate the energy of the current model struc-
ture without changing any atom positions.
It is often useful to obtain this information before performing other
operations such as running an energy minimization, performing

Forcefield-Based Simulations/October 1997 179


4. Minimization

molecular dynamics, or doing a conformational search. Then


results obtained using different methods can be compared with
the initial values.
A gradient calculation is used to calculate the forces on the current
model structure’s atoms without changing any atom positions.
Specifying single-point To specify single-point energy and gradient calculations rather
energy and gradient cal- than a minimization (Minimizations with MSI simulation engines):
culations
♦ In Cerius2•OFF, click the Calculate the Current Energy action
button in the Energy Minimization control panel (the gradient
is also calculated).
♦ In the Cerius2•MMFF module, click the Calculate Current
Energy button in the MMFF Energy Minimization control
panel.
♦ In Cerius2•Discover, set the Task popup in the Run Discover
control panel to Single Point Energy or to Gradient.
♦ In the Insight•Discover_3 module, select the Calculate/Mini-
mize command. Set Iterations to 0 and select Execute.
♦ In the Insight•Discover module, select the Parameters/Mini-
mize command. Set Max Steps to 0 and select Execute.
♦ In QUANTA, go to the Modeling window and click CHARMm
Energy.
Discover and CHARMm offer additional functionality when
run in standalone mode.

Vibrational calculation
This section includes:
Application of minimization to vibrational theory
Vibrational frequencies
General methodology for vibrational calculations

180 Forcefield-Based Simulations/October 1997


Vibrational calculation

Harmonic vibrational fre- The vibrational frequencies and modes of a molecule are strictly
quencies obtained from dynamic properties. However, it is possible to calculate the har-
an equilibrium geometry monic vibrational frequencies of a model from just information at
its equilibrium geometry by expanding the potential energy sur-
face as a Taylor series, truncating after the second term, and con-
sidering infinitesimal displacements. This harmonic
approximation usually gives a good description of the true fre-
quencies and normal modes and can be valuable for tasks ranging
from evaluating the quality of a forcefield to understanding vibra-
tional shifts induced by conformational changes or other interac-
tions. The harmonic vibrational frequencies also can be used for
zero-point vibrational corrections and for deriving vibrational free
energy contributions. These effects can be important in comparing
conformational energies and rotational barrier heights.
Uses of vibrational calcu- Beyond these considerations, the vibrational frequencies can be
lations used for two classes of problems:
♦ The first is for determining the shape of the potential energy
surface; that is, the characterization of stable points as minima,
transition states, or other points. For this purpose, the question
of forcefield accuracy is less important. The qualitative, rather
than quantitative, shape of the surface is all that is important.
♦ The second use is for comparison with experimental results.
The vibrational frequencies and thermodynamic corrections
depend strongly on the forcefield as well as on the fundamental
harmonic approximation invoked. By nature, low-frequency
modes are less harmonic. The torsion and nonbond interac-
tions, which dominate low-frequency modes, are fundamen-
tally anharmonic; hence the interpretation of the calculated
low-frequency modes should take this into account. Unfortu-
nately, these low-frequency modes make the largest contribu-
tions to the vibrational entropy.

Application of minimization to vibrational theory


Kinetic energy of nuclei Following Wilson et al. (1980), the kinetic energy of the nuclei is:

Forcefield-Based Simulations/October 1997 181


4. Minimization


2 2 2
1
m i  ∆x +  ∆y +  ∆z
d d d
T = --- Eq. 70
2 dt  dt  dt 
i=1

where the coordinates represent displacements from an equilib-


rium structure. If the 3N Cartesian coordinates are replaced with
3N mass-weighted coordinates as follows:

qi = m α ∆x α Eq. 71

Simplified kinetic energy where mα is the mass of the atom associated with the α coordinate,
of nuclei and ∆xα runs over the y and z coordinates, as well as x, then the
kinetic energy has the following simple form:

3N


2
1  dq i
T = --- Eq. 72
2  dt 
 
i=1

The second term in a Tay- When the potential energy of the system is expanded as a Taylor
lor series is needed to series in the same coordinates, it yields:
obtain vibrational infor-
mation 3N 3N

∑  ------δδqV- q + --2- ∑  --------------


  1 δ V qq …  2 
Eq. 73
δq δq 
V = V0 + i - i j
i 0 i j 0
i=1 i ,j = 1

V0 is simply a constant—the energy scale can be chosen so that


V0 = 0. The definition of an equilibrium structure is that the force
on each atom is zero. The second term in Eq. 73 also is zero, leaving
the following second-order approximation of the potential energy:

3N


1  δ2V 
V = ---  --------------
- q i q j Eq. 74
2  δq i δq j 0
i ,j = 1

182 Forcefield-Based Simulations/October 1997


Vibrational calculation

Combining energy and Using this approximation of the potential energy in Newton’s
motion and solving for the equations of motion (Eq. 4) yields the following simultaneous sec-
mass-weighted coordi- ond-order differential equations:
nates
3N
2


d qi δ2 V 
 --------------
+  δq δq- q j = 0 i = 1 ,2 , … ,3N Eq. 75
d t2  i j 0
d=1

The solution to these equations can be of the form:

q i = A i cos ( λ 1 / 2 t + δ ) Eq. 76

Numerically solvable form where the Ai are related to the relative amplitudes of the vibra-
of the function tional motion, λ1⁄2 is proportional to the vibrational frequency, and
δ is a phase. Substituting Eq. 76 in Eq. 75 yields a set of algebraic
equations:

3N

∑ δ2V 
 --------------
 δq δq- – δ ij λ A i = 0
 i j 0
j = 1 ,2 , … ,3N Eq. 77

i=1

where δij is a Kronecker delta, which equals one if i = j and zero


otherwise. This is an eigenvalue problem that is readily solved
numerically by standard techniques. The second derivatives of the
potential energy, often called the force constants, can be analytically
evaluated for most energy expressions used in molecular mechan-
ics, in terms of the Cartesian coordinates of the atoms. A simple
transformation to the mass-weighted coordinates then gives the
values needed in Eq. 77.

Vibrational frequencies
Eigenvalues converted to The simulation engine determines vibrational frequencies by cal-
vibrational frequencies culating the second derivative matrix, mass weighting it, and then
diagonalizing it to obtain the eigenvalues. These eigenvalues are
then converted to vibrational frequencies in wavenumbers as fol-
lows:

Forcefield-Based Simulations/October 1997 183


4. Minimization

νi = F λi Eq. 78

Evaluating the quality of where the conversion factor F converts the units from kcal mol-1 to
the calculation wavenumbers. Of the 3N coordinates used to calculate the energy
and vibrational frequencies, six correspond to net translations and
rotations of the model (five for linear systems). These six modes
have no restoring force and therefore have vibrational frequencies
of zero for a minimized structure. Due to numerical inaccuracies,
the simulation engine reports these frequencies with small values,
which are typically less than 0.1 cm-1. If the structure is not per-
fectly minimized, the first-order terms in the Taylor expansion of
the potential surface in Eq. 73 do not vanish. In turn, they intro-
duce terms that couple the net rotations of the model with the
internal motions and both perturb the internal vibrational fre-
quencies and give apparent frequencies for the three rotations.
Thus, the magnitude of the six “zero” frequencies is a good indica-
tion of the quality of the calculation.

Transition states
The model must be opti- For calculating vibrational frequencies, the model must be mini-
mized before doing a mized and the gradients must be zero. This does not mean that the
vibration calculation configuration of the model must be at a minimum, but rather that
it must be at a stable point on the surface. If the structure is not at
a minimum, but rather is at a saddle point or transition state in one
or more directions, this is reflected in the eigenvalues and the
reported vibrational frequencies. For a saddle point, at least one
eigenvalue is negative, which means that the curvature of the sur-
face along at least one normal mode is negative. By convention, the
imaginary frequencies for such modes are reported as negative.
Therefore, the reported vibrational frequencies describe the char-
acter of the stable point on the surface. If all frequencies are real, it
is a minimum; if one frequency is imaginary, the structure is in a
simple transition state; if two or more frequencies are imaginary, a
double or more complicated transition state is indicated. The nor-
mal modes corresponding to the frequencies can be analyzed to
understand the reaction path going through such a transition state.

184 Forcefield-Based Simulations/October 1997


Vibrational calculation

Thermodynamics
Quantum mechanical The quantum mechanical solution for the vibrational energy of a
solution… set of uncoupled harmonic oscillators, which corresponds to the
classical treatment outlined above, is:

 
E vib =
∑ n + 1
--
-
 i 2 i

 hν

Eq. 79

…used to correct the where the summation is over all the vibrational frequencies, ni is
vibrational energy deter- the vibrational quantum number for each vibration, h is Planck’s
mined with a forcefield constant, and νi is the vibrational frequency. This leads to the fol-
lowing correction to the classical forcefield energy:

∑ --2- + ------------------------
1 1
E vib = - hν i Eq. 80
e hν i /kT –1
i

Free energy correction where k is the Boltzmann constant and T is the temperature. The
first term, hνi ⁄ 2, is the zero-point correction; the second term cor-
rects for the average thermal population of vibrational levels at the
temperature T. This leads to a vibrational free energy correction of:

∑ -------2 - + kT ln 1 – e
hν i  – hν i /kT
A vib =  Eq. 81

Vibrational entropy and a vibrational entropy of:

S vib = ( E vib – A vib ) ⁄ T Eq. 82

General methodology for vibrational calculations


Specifying a vibration cal- To specify a vibration calculation for a model that is already well
culation minimized (Minimizations with MSI simulation engines):
♦ In the Cerius2, use the IR/RAMAN module.

Forcefield-Based Simulations/October 1997 185


4. Minimization

♦ In the Insight•Discover_3 module, select the Calculate/Vibra-


tional command. Set the controls desired and select Execute.
Vibrational calculations in FDiscover and CHARMm are avail-
able only in standalone mode.
Model must be small and Computation of the harmonic vibrational frequencies requires the
very well minimized storage and diagonalization of the second-derivative matrix,
which has dimensions of 3N × 3N, where N is the number of atoms.
The work involved in the diagonalization scales as N 3 and quickly
becomes prohibitively expensive for more than a few hundred
atoms. The frequencies are valid only for well minimized models
having maximum derivatives no greater than approximately 0.001
kcal mol-1 Å-1.
Use a forcefield that was The forcefield is a major consideration. Most forcefield develop-
designed for vibrational ment has emphasized structures and energetics rather than vibra-
calculations tional frequencies. As a result, the frequencies calculated with
forcefields (see Forcefields) such as AMBER, MM2, CHARMm, and,
to a lesser extent, CVFF, may often be in error by several hundred
wavenumbers. The inclusion of cross terms such as bond–bond
and bond–angle terms is crucial for the accurate reproduction of
experimental frequencies. The CVFF forcefield includes such cross
terms and was, in part, parameterized to reproduce experimental
frequencies, which explains its moderately good performance for
vibrational calculations. The second-generation forcefields MM3
and CFF were explicitly designed to evenly weight vibrational fre-
quencies as well as structural and energetic properties. Therefore,
they provide the most reasonable and consistent results, usually
within 50–100 cm-1. This error of up to approximately 100 cm-1
appears to be the current limit of general-purpose, transferable
forcefields.

186 Forcefield-Based Simulations/October 1997


Vibrational calculation

Forcefield-Based Simulations/October 1997 187


4. Minimization

188 Forcefield-Based Simulations/October 1997


5 Molecular Dynamics

While minimization computes the forces on the atoms and


changes their positions to minimize the interaction energies,
dynamics computes forces and moves atoms in response to the
forces.
Molecular dynamics solves the classical equations of motion for a
system of N atoms interacting according to a potential energy
forcefield as described in Forcefields. Dynamics simulations are
useful in studies of the time evolution of a variety of systems at
nonzero temperatures, for example, biological molecules, poly-
mers, or catalytic materials, in a variety of states, for example, crys-
tals, aqueous solutions, or in the gas phase.
This chapter explains To perform the most reasonable and realistic dynamics simula-
tions, you should read this entire chapter, which includes informa-
tion on:
Integration algorithms
The choice of timestep
Integration errors
Statistical ensembles
Temperature
Pressure and stress
Types of dynamics simulations
Constraints during dynamics simulations
Dynamics trajectories
General methodology for dynamics calculations

Related information Forcefields and Preparing the Energy Expression and the Model focus
on the representation of the potential energy surface and the useful
ways that it can be biased through the addition of restraints and

Forcefield-Based Simulations/October 1997 189


5. Molecular Dynamics

constraints, as well as other information on preparing the model


system for calculations.
Use of simulation engines for calculating normal modes is found
under Vibrational calculation.
Specific information For specific information on setting up and running dynamics cal-
culations with the various MSI simulation engines, please see the
relevant documentation (see Available documentation).

Table 16. Finding information in Chapter 5

If you want to know about: Read:


Repeating a dynamics simulation. Defined initial coordinates and random initial veloc-
ities.
Integrators used by MSI simulation engines. Table 17.
Thermodynamic ensembles handled by MSI Table 19.
simulation engines.
Obtaining thermodynamic properties of a Equilibrium thermodynamic properties.
model.
Temperature-control methods used by MSI Table 20.
simulation engines.
Producing true canonical ensembles. Nosé and Nosé–Hoover dynamics.
Pressure- and stressecontrol methods used by Table 21.
MSI simulation engines.
Types of dynamics simulations readily set up Table 22.
with MSI simulation engines.
Dynamics runs with periodic minimization. Quenched dynamics.
Controlled temperature change during Simulated annealing.
dynamics.
Finding a common configuration of related Consensus dynamics.
models.
Impulse dynamics. Impulse dynamics.
Simulated dynamics in a viscous fluid. Langevin dynamics.
Dynamics of localized parts of a model. Stochastic boundary dynamics.
SHAKE and RATTLE algorithms. Constraints during dynamics simulations.
Equilibration and data-collection stages of a Stages and duration of dynamics simulations.
dynamics simulation.
How long is long enough for a dynamics sim- Has equilibrium been achieved?; How long should
ulation. the simulation be?.

190 Forcefield-Based Simulations/October 1997


Table 16. Finding information in Chapter 5

If you want to know about: Read:


Setting up a dynamics simulation. Dynamics with MSI simulation engines.
Continuing or restarting a dynamics run. Restarting a dynamics simulation.
Using a previous dynamics run to start a new Restarting a dynamics simulation.
simulation.

Some uses of dynamics The major applications of molecular dynamics are:


calculations
♦ Performing conformational searches.
During dynamics simulations, a system undergoes conforma-
tional and momentum changes so that different parts of the
phase space accessible to the model can be explored. The con-
formational search capability of dynamics is one of its most
important uses.
♦ Generating statistical ensembles.
By providing several mechanisms for controlling the tempera-
ture and pressure of simulated systems, molecular dynamics
allows you to generate statistical ensembles from which vari-
ous energetic, thermodynamic, structural, and dynamic prop-
erties can be calculated. For such studies, it is important that the
calculation visit various conformational states with the correct
statistical frequency.
♦ Studying the motions of molecules.
Although modern crystallography has provided a window into
the static structure of molecules both small and large, the
thought of intermolecular collisions and conformational varia-
tion is always present. After all, binding of substrates by pro-
teins, folding of proteins and peptides into unique shapes, the
dynamic behavior of polymers, and chemical reactions them-
selves would be inconceivable without the concept of molecu-
lar motion.
Studies of model motions can be used to derive properties such
as diffusion coefficients.
Non-dynamics Other approaches to simulating molecular motion and generating
approaches are also used conformational searches exist.

Forcefield-Based Simulations/October 1997 191


5. Molecular Dynamics

For example, a dynamics trajectory can be constructed from a set


of normal modes to represent the vibrations of a model. While this
is a fast method, it is restricted to harmonic motion about a single
energy minimum.
An approach to doing conformational searches is the Monte Carlo
method. While this method can sample conformational space so as
to produce meaningful statistical ensembles, it does not provide
dynamic information about the model, since particles of the model
system are simply moved randomly according to some statistical
rules.

Integration algorithms
This section includes:
Introduction
Criteria of good integrators in molecular dynamics
Integrators in MSI simulation engines

Introduction
Newton’s equation of At its simplest, molecular dynamics solves Newton’s familiar
motion applied to atoms equation of motion:

Fi ( t ) = mi ai ( t ) Eq. 83

where Fi is the force, mi is the mass, and ai is the acceleration of


atom i.
The force on atom i can be computed directly from the derivative
of the potential energy V with respect to the coordinates ri:

2
∂V ∂ ri
– = mi 2 Eq. 84
∂ri ∂ti

What is a trajectory? Notice that classical equations of motion are deterministic. That is,
once the initial coordinates and velocities are known, the coordi-

192 Forcefield-Based Simulations/October 1997


Integration algorithms

nates and velocities at a later time can be determined. The coordi-


nates and velocities for a complete dynamics run are called the
trajectory. (However, trajectories are sensitive to initial conditions,
so the same simulation run with a different simulation engine or
on a different computer does not produce an identical trajectory.)
The finite-difference A standard method of solving an ordinary differential equation
method such as Eq. 84 numerically is the finite-difference method. The
general idea is as follows. Given the initial coordinates and veloc-
ities and other dynamic information at time t, the positions and
velocities at time t + ∆t are calculated. The timestep ∆t depends on
the integration method as well as the system itself.
Defined initial coordinates Although the initial coordinates are determined in the input file or
and random initial veloci- from a previous operation such as minimization, the initial veloc-
ties ities are randomly generated at the beginning of a dynamics run,
according to the desired temperature. Therefore, dynamics runs
cannot be repeated exactly, except for forcefield engines (CHARM
standalone, Discover) that allow you to set the random number
seed to the value that was used in a previous run.
More details on the initial velocities are provided under Tempera-
ture.

Criteria of good integrators in molecular dynamics


Molecular dynamics is usually applied to a large model. Energy
evaluation is time consuming and the memory requirement is
large. To generate the correct statistical ensembles, energy conser-
vation is also important.
Thus, the basic criteria for a good integrator for molecular simula-
tions are as follows:
♦ It should be fast, ideally requiring only one energy evaluation
per timestep.
♦ It should require little computer memory.
♦ It should permit the use of a relatively long timestep.
♦ It must show good conservation of energy.

Forcefield-Based Simulations/October 1997 193


5. Molecular Dynamics

Integrators in MSI simulation engines


Integrators provided in MSI simulation engines were chosen
according to the above criteria. Only dynamics algorithms used in
MSI’s simulation engines (Table 17) are considered here:
Verlet leapfrog integrator
Verlet velocity integrator
ABM4 integrator
Runge–Kutta-4 integrator

Table 17. Dynamics integrators used by MSI simulation engines

simulation engine
integrator
CHARMm Discover OFF
Verlet leapfrog √ √a

Verlet velocity √ √b
ABM4 √2
Runge–Kutta–4 √2
a
FDiscover only, not in CDiscover.
bCDiscover only, not in FDiscover.

Choosing the dynamics To specify the dynamics integrator:


algorithm(s)
♦ The Cerius2•Dynamics Simulation module always uses the
Verlet leapfrog integrator.
♦ In the Cerius2•Discover and Insight•Discover_3 modules, the
default integrator is the Verlet velocity method. If you really
want to change it, you can write out the command input file,
edit the BTCL dynamics command statement with a text editor,
and then use that file for your run.
Alternatively (in Insight•Discover_3), select the Language_
Control/Command_Comment command. Set the Comment
Type to Command. Enter integration_method = ABM4 or
integration_method = Runge_Kutta in the Command/Com-

194 Forcefield-Based Simulations/October 1997


Integration algorithms

ment entry box and select Execute. Be sure that you insert this
stage at the correct point in your command input file.
♦ The Insight•Discover module always uses the Verlet leapfrog
integrator.
♦ CHARMm allows you to choose between the Verlet leapfrog
and velocity methods only when run in standalone mode.

Verlet leapfrog integrator


Advantages of Verlet Variants of the Verlet (1967) algorithm of integrating the equations
methods of motion (Eq. 84) are perhaps the most widely used method in
molecular dynamics. The advantages of Verlet integrators is that
these methods require only one energy evaluation per step,
require only modest memory, and also allow a relatively large
timestep to be used.
The leapfrog algorithm The Verlet leapfrog algorithm is as follows:
Given r(t), v(t –∆t/2), and a(t), which are (respectively) the
position, velocity, and acceleration at times t, t –∆t/2, and t,
compute:

v  t + --- ∆t = v  t – --- ∆t + ∆ta ( t )


1 1
 2   2 

r ( t + ∆t ) = r ( t ) + ∆tv  t + --- ∆t


1
 2 
f ( t + ∆t )
a ( t + ∆t ) = --------------------
m

where f (t + ∆t) is evaluated from -dV/dr at r (t + ∆t).


Disadvantage of Verlet The Verlet leapfrog method has one major disadvantage: the posi-
leapfrog method tions and velocities calculated are half a timestep out of synchrony.

Verlet velocity integrator


The Verlet velocity algorithm overcomes the out-of-synchrony
shortcoming of the Verlet leapfrog method. The Verlet velocity
algorithm is as follows:
The velocity algorithm Given r(t), v(t), and a(t), which are (respectively) the position,
velocity, and acceleration at time t, compute:

Forcefield-Based Simulations/October 1997 195


5. Molecular Dynamics

∆t 2 a ( t )
r ( t + ∆t ) = r ( t ) + ∆tv ( t ) + ------------------
2
f ( t + ∆t )
a ( t + ∆t ) = --------------------
m
1
v ( t + ∆t ) = v ( t ) + --- ∆t [ a ( t ) + a ( t + ∆t ) ]
2

ABM4 integrator
ABM4, which stands for Adams–Bashforth–Moulton fourth order,
is a predictor and corrector method. It is a fourth-order method,
meaning that the truncation error is to the fifth order of the
timestep used.
This method requires two energy evaluations per step and has to
make use of the results of the previous three steps. It is thus not self
starting—the first three steps are generated by the Runge–Kutta
method. More memory has to be used, because previous informa-
tion has to be stored.
ABM4’s algorithm The algorithm is as follows:
Let:

y = r ( t ), v ( t ) and y′ = v ( t ), a ( t )

where subscripts (not shown above) 0, 1, -1, -2, and -3 indicate


y at times t, t + ∆t, t - ∆t, t - 2∆t, and t - 3∆t.
Predictor step The predictor is:

predicted ∆t
y1 = y 0 + ------ ( 55y 0 ′ – 59y – 1 ′ + 37y – 2 ′ – 9y – 3 ′ ) + O ( ∆t 5 )
24

Now evaluate y1′, using y1predicted, which involves one energy


evaluation.
Corrector step The corrector is:

corrected ∆t
y1 = y 0 + ------ ( 9y 1 ′ + 19y 0 ′ – 5y – 1 ′ + y – 2 ′ ) + O ( ∆t 5 )
24

Now evaluate y1′, using y1corrected, which involves another


energy evaluation.

196 Forcefield-Based Simulations/October 1997


The choice of timestep

Runge–Kutta-4 integrator
Robust, but disadvan- Runge–Kutta-4 stands for the fourth-order Runge–Kutta method,
tages which is one of the oldest numerical methods for solving ordinary
differential equations. The method is self starting but requires four
energy evaluations per step.
From testing done at MSI, the timestep has to be very small. This
is thus not a very suitable integrator for molecular simulation.
However, the method is very robust, meaning that it can deal with
almost all kinds of equations, including stiff ones. This integrator
is used to generate the trajectory for the first three steps for ABM4.
Since we do not recommend using this integrator, the algorithm is
not presented here. Details can be found in Press et al. 1986.

The choice of timestep


A key parameter in the integration algorithms is the integration
timestep ∆t. To make the best use of the computer time, a large
timestep should be used. However, too large a timestep causes
instability and inaccuracy in the integration process.
Relation of timestep to The timestep used depends on the model as well as the integrators.
molecular vibration The main limitation imposed by the model is the highest-fre-
quency motion that must be considered. A vibrational period must
be split into at least 8–10 segments for models to satisfy the Verlet
assumption that the velocities and accelerations are constant over
the timestep used.
In most organic models, the highest vibrational frequency is that
of C–H bond stretching, whose period is on the order of 10-14 s (10
fs). The integration timestep should therefore be about 0.5–1 fs. If
you use the SHAKE or RATTLE constraint algorithm (Constraints
during dynamics simulations), a longer timestep is possible.
If you are studying simple model liquids or solids and are not
interested in internal modes, much longer timesteps may be used,
e.g., up to 20 fs. A timestep of about 5 fs should be adequate for
ionic material models.

Forcefield-Based Simulations/October 1997 197


5. Molecular Dynamics

Appropriate for the inte- The timestep must also be appropriate to the integrator. For the
grator ABM4 method, the timestep should be about half that needed for
the Verlet algorithm. The Runge–Kutta-4 method seems to require
a much smaller timestep than the other methods.
Setting the timestep To specify the length of the timestep:
♦ In Cerius2•OFF, go to the DYNAMICS SIMULATION card in
the OFF METHODS deck of cards. Click the Run menu item to
access the Dynamics Simulation control panel. Enter the
desired time step (in ps) in the Dynamics Time Step entry box.
♦ In Cerius2•Discover, click the Run menu item in the DIS-
COVER card to access the Run Discover control panel. Set the
Task popup to Dynamics and click the More… pushbutton to
the right of the Task popup to open the Discover Dynamics
control panel. Enter values for two of the Total time, Steps, and
Time step entry boxes (the third value is computed from the
other two) in the Equilibration and Production sections of the
latter control panel.
♦ In the Insight•Discover_3 module, select the Calculate/
Dynamics command. Toggle More on to access additional con-
trols, and change the value in the Time Step fs entry box.
♦ In the Insight•Discover module, select the Parameters/Dynam-
ics command. Change the value (in fs) in the Time Step entry
box.
♦ In QUANTA, Choose the CHARMm/Dynamics Options menu
item. You can change the Time Step in any of the setup dialogs
that are accessed by clicking a radio button and clicking OK.

Integration errors
If the chosen timestep is too small, no harm is done, except for the
waste of computer time. However, if the timestep is too large for
the calculation conditions, the simulation can “blow up”.
Two examples illustrate how the timestep, temperature, and inte-
gration algorithm affect the results. The first example is a simula-
tion of the collision of two hydrogen atoms travelling towards
each other. The second is an examination of energy conservation in

198 Forcefield-Based Simulations/October 1997


Integration errors

a simple harmonic oscillator when different integrators and


timesteps are used.

Example 1—Two colliding hydrogen atoms


The stability of the numerical integration with respect to the time
step can be tested directly by integrating (over distance) the forces
used by dynamics and comparing the integral with the analytical
energy. The error in this integral as a function of time step is an
indication of the intrinsic limitations of molecular dynamics.
Timestep slightly too large To illustrate this, consider the collision of two atoms. Figure 26
plots the true potential energy of the van der Waals potential
between two hydrogens along with the energy integrated numer-
ically from the forces. The time step used is 1 fs at a temperature of
300 K. The temperature is set by assigning an initial velocity of
1500 m s-1 (0.015Å fs-1) to one of the hydrogens along the vector
connecting them. This velocity is the most probable velocity of a
hydrogen atom at 300 K.
As the two atoms approach each other, the integration agrees well
with the analytic curve. However, after the atoms collide, the inte-
grated energy is significantly higher than the true energy. This
behavior is due to the atom moving too quickly through a rapidly
changing energy function. When the atoms are far apart, the
energy change is smallest and therefore, the forces are smallest and
most linear. As the atoms approach, they speed up and take larger
and larger steps, until they reach their highest velocity at the
energy minimum. Unfortunately, this is precisely where the forces
start changing the fastest. Thus, when the atoms should be taking
large steps (far apart), they are taking the smallest steps, and when
they should take small steps (near the minimum), they take the
largest steps. The consequence is that the particles “step through”
the energy barrier momentarily. Of course, once a new force is cal-
culated at the extrapolated coordinate, the trajectory is rapidly cor-
rected. However, it is too late for the energy integral—some
energy has been gained.
In this example, the total energy rose by about 0.02 kcal mol-1.
Whether this is a reasonable error depends on how closely the
exact motions need to be reproduced.

Forcefield-Based Simulations/October 1997 199


5. Molecular Dynamics

0.02

energy (kcal mol-1)


0.01

0.00

-0.01
300K
1 fs timestep
-0.02
2 3 4 5
H–H distance (Å)
Figure 26. Numerical integration of energy from molecular dynamics
hydrogen-collision trajectory, 1 fs timestep
The integrated energy calculated numerically from a dynamics trajectory of
two colliding hydrogen atoms (circles) is compared with the analytical energy
curve (thick line). Simulation done with FDiscover.

Shorter timesteps mean Figure 27 shows the same curve as Figure 26, but with timesteps of
more computational cost 0.33 and 0.10 fs. Both give better results than 1.0 fs, but it is not
clear whether the extra time required to calculate the smaller
timesteps is worthwhile.
A very large timestep The consequences of too large a time step are much more dramatic.
leads to artifactual Doubling the time step from 1 to 2 fs results in an unusual artifact
behavior for the hydrogen system (Figure 28). In this case, the integration
error affects not only the potential energy, but the kinetic energy as
well. Momentum is removed, so that the hydrogens no longer
have the velocity needed to escape after the collision. The atoms
are trapped forever (barring an inverse error that could impart
momentum). The atoms now vibrate back and forth (only 2 cycles
are plotted in Figure 28), and each cycle incurs an additional inte-
gration error.
An even higher timestep Increasing the time step another factor of 2 to 4 fs (Figure 29)
can cause the system to finally causes the system to “blow up”. The timestep is so long that
explode the atoms deeply interpenetrate each other’s steeply repulsive
wall between steps. The resulting force is now so large that the

200 Forcefield-Based Simulations/October 1997


Integration errors

0.02
300K
0.33 fs timestep
0.01

energy (kcal mol-1) 0.00

-0.01

300K
0.1 fs timestep
0.01

0.00

-0.01

-0.02
2 3 4 5
H–H distance (Å)
Figure 27. Numerical integration of energy from molecular dynamics
hydrogen-collision trajectory, 0.33 and 0.1 fs timesteps
Energy integration errors decrease with smaller time steps. Compared to 0.016
kcal error with 1 fs time steps, the 0.33 fs time step has a 0.006 error, and the 0.1
fs time step a 0.001 error. The cost for increased accuracy is the computational
burden to compute more steps. For most simulations, a 1-fs time step is a good
compromise between numerical accuracy and computational efficiency. Sim-
ulation done with FDiscover.

atoms fly off at a speed of about 1 Å fs-1 (or 105 m s-1). An equiva-
lent temperature would be hundreds of thousands of degrees.
Effect of temperature To complete the analysis of integration errors, it is instructive to
compare the effects of increasing kinetic energy on the stability of

Forcefield-Based Simulations/October 1997 201


5. Molecular Dynamics

0.10 300K
2 fs timestep
0.08

energy (kcal mol-1)


0.06

0.04

0.02

0.00

-0.02
2 3 4 5
H–H distance (Å)
Figure 28. Numerical integration of energy from molecular dynamics
hydrogen-collision trajectory, 2 fs timestep
A 2-fs time step causes the rebounding force to be underestimated, robbing the
colliding Hs of sufficient escape velocity and resulting in the two Hs being
“bound” by their van der Waals forces. Simulation done with FDiscover.

0.20
energy (kcal mol-1)

0.15
300K
0.10 4 fs timestep

0.05

0.00

-0.05
3 2 4 5
H–H distance (Å)
Figure 29. Numerical integration of energy from molecular dynamics
hydrogen-collision trajectory, 4 fs timestep
With a time step of 4 fs, the Hs travel too far in a single step, interpenetrating
each other’s van der Waals radii before the forces are recalculated. By this time,
the forces are so large that the Hs are flung apart at a temperature equivalent
to hundreds of thousands of degrees. Simulation done with FDiscover.

202 Forcefield-Based Simulations/October 1997


Integration errors

the numerical integration. Figure 30 shows that there is essentially


no difference in the error between 300 and 1200 K. Going as high
as 30,000 K merely almost doubles the error, indicating that
dynamics simulations are not as sensitive to the temperature as to
the timestep.

0.04

0.03
energy (kcal mol-1)

0.02

0.01

0.00

-0.01

-0.02
2 3 4 5
H–H distance (Å)
Figure 30. Integration errors for hydrogen-collision trajectories at several
temperatures
Integration errors observed for an H–H collision for an initial velocity equal to the
mean velocity appropriate for 300 K (squares), 1200 K (circles) and 30,000 K (tri-
angles). Simulation done with FDiscover.

Example 2 — Energy conservation of a harmonic


oscillator
Verlet velocity vs. ABM4 To test the conservation of energy in a simulation, 10 ps of molec-
ular dynamics was performed on a harmonic oscillator having an
equilibrium length of 0.75 Å and period of 7.5 fs. As can be seen in
Table 18, the Verlet velocity method can use a larger timestep than
the ABM4 method. Although the Verlet algorithm starts to show
instability at 1 fs, ABM4 starts to fail at 0.25 fs.

Forcefield-Based Simulations/October 1997 203


5. Molecular Dynamics

For the Verlet velocity integrator, the standard deviation in the


total energy is proportional to ∆t2, as predicted by the theory. This
is a simple verification that the integrator has been implemented
correctly.

Table 18. Energy conservation for different timesteps with the Verlet
velocity and ABM4 integrators
Run conditions: 10,000 fs, constant-energy (NVE), harmonic oscillator, initial
energy 0.296 kcal mol-1, CDiscover 94.0.

timestep final energy average energy standard deviation


integrator fs kcal mol-1 kcal mol-1 kcal mol-1
Verlet velocity 1.0 0.327 0.325 0.021
Verlet velocity 0.5 0.296 0.302 0.005
Verlet velocity 0.25 0.298 0.298 0.001
ABM4 0.25 0.693 0.467 0.114
ABM4 0.10 0.299 0.297 0.001

Statistical ensembles
You can control the tem- Integrating Newton’s equations of motion allows you to explore
perature and pressure the constant-energy surface of a system. However, most natural
phenomena occur under conditions where a system is exposed to
external pressure and/or exchanges heat with the environment.
Under these conditions, the total energy of the system is no longer
conserved, and extended forms of molecular dynamics are
required.
Purpose of the calculation Several methods are available for controlling temperature and
pressure. Depending on which state variables (the energy E,
enthalpy H (i.e., E + PV), number of particles N, pressure P, stress
S, temperature T, and volume V) are kept fixed, different statistical
ensembles can be generated. A variety of structural, energetic, and
dynamic properties can then be calculated from the averages or
the fluctuations of these quantities over the ensemble generated.
Available thermodynamic Both isothermal (exchange heat with a temperature bath to main-
ensembles tain a constant thermodynamic [not kinetic] temperature) and adi-
abatic (do not exchange heat) ensembles are available:

204 Forcefield-Based Simulations/October 1997


Statistical ensembles

Table 19. Thermodynamic ensembles handled by MSI simulation engines

simulation engine
ensemblea
CHARMm Discover OFF
Constant temperature, constant volume (NVT) √ √ √
Constant temperature, constant pressure (NPT)b √c √ √
Constant temperature, constant stress (NST)2 √d √
Constant energy, constant volume (NVE) √ √ √
Constant pressure, constant enthalpy (NPH)2 √4 √
Constant stress, constant enthalpy (NSH)2 √4 √
a
In all
ensembles, the number of particles is conserved.
bOnly for periodic systems, because volume is undefined in nonperiodic systems. For
all space-group symmetries unless otherwise noted.
cOnly for cubic, orthorhombic, and triclinic unit cells.
dCDiscover only, not in FDiscover.

Choosing the thermody- To access the controls used for specifying the thermodynamic
namic ensemble ensemble:
♦ In Cerius2•OFF, go to the DYNAMICS SIMULATION card on
the OFF METHODS deck of cards. Click the Run card menu
item to access the Dynamics Simulation control panel. Select
the radio button next to the desired ensemble. You can access
additional controls relevant to each ensemble by clicking the
Preferences… button to the right of the ensemble.
♦ In Cerius2•Discover, click the Run menu item on the DIS-
COVER card to access the Run Discover control panel. Set the
Task popup to Dynamics and click the More… pushbutton to
the right of the Task popup. The controls available depend on
what Ensemble and Thermostat you choose and on whether
the current model is periodic or not.
♦ In the Insight•Discover_3 module, select the Calculate/
Dynamics command. Only the ensembles appropriate to your
model system (periodic or nonperiodic) are displayed in the
parameter block. Toggle More on to access additional controls
relevant to the chosen ensemble.

Forcefield-Based Simulations/October 1997 205


5. Molecular Dynamics

♦ In the Insight•Discover module, select the Parameters/Dynam-


ics command. Choose between NVT and NPT ensembles by
toggling Constant_Pressure off or on, respectively. The NVE
ensemble is accessible only through DSL commands.
♦ In QUANTA, set up and apply periodic boundary conditions
by choosing the CHARMm/Periodic Boundaries menu item.

NVE ensemble
Some energy drift The constant-energy, constant-volume ensemble (NVE), also
known as the microcanonical ensemble, is obtained by solving the
standard Newton equation without any temperature and pressure
control. Energy is conserved when this (adiabatic) ensemble is
generated. However, because of rounding and truncation errors
during the integration process, there is always a slight fluctuation
or drift in energy.
When the Verlet leapfrog integrator is used, only r (t) and v (t – 1 ⁄ 2
∆t) are known at each timestep. Thus, the potential and kinetic
energies at each timestep are also half a step out of synchrony.
Although the difference between the kinetic energies half a
timestep apart is small, this can also contribute to the fluctuation
in the total energy.
Although the temperature is not controlled during true NVE
dynamics, you might want to use NVE conditions during the
equilibration phase (Stages and duration of dynamics simulations) of
your simulation. For this purpose, Cerius2•Discover and
Cerius2•Dynamics Simulation allow you to hold the temperature
within specified tolerances by periodic scaling of the velocities.
When to use it True constant-energy conditions (i.e., without temperature contol)
are not recommended for equilibration because, without the
energy flow facilitated by temperature control, the desired tem-
perature cannot be achieved.
However, during the data collection phase, if you are interested in
exploring the constant-energy surface of the conformational space,
or for other reasons do not want the perturbation introduced by
temperature- and pressure-bath coupling, this is a useful ensem-
ble.

206 Forcefield-Based Simulations/October 1997


Statistical ensembles

Results The results can be used (Equilibrium thermodynamic properties) to


calculate the thermodynamic response function (Ray 1988).

NVT ensemble
The constant-temperature, constant-volume ensemble (NVT), also
referred to as the canonical ensemble, is obtained by controlling the
thermodynamic temperature. Direct temperature scaling should
be used only during the initialization stage (Stages and duration of
dynamics simulations), since it does not produce a true canonical
ensemble (it is not truly isothermal). Any of the other temperature-
control methods available (How temperature is controlled) is used
during the data collection phase.
When to use it This is the appropriate choice when conformational searches of
models are carried out in vacuum without periodic boundary con-
ditions. (Without periodic boundary conditions, volume, pressure,
and density are not defined and constant-pressure dynamics can-
not be carried out.)
Even when periodic boundary conditions are used, if pressure is
not a significant factor, the constant-temperature, constant-vol-
ume ensemble provides the advantage of less perturbation of the
trajectory, due to the absence of coupling to a pressure bath.

NPT and NST ensembles


Periodic systems The constant-temperature, constant-pressure ensemble (NPT)
allows control over both the temperature and pressure. The unit
cell vectors are allowed to change, and the pressure is adjusted by
adjusting the volume (i.e., the size and also, in some programs, the
shape of the unit cell). This method applies only to periodic sys-
tems.
The constant-temperature, constant-stress ensemble (NST) is an
extension of the constant-pressure ensemble. In addition to the
hydrostatic pressure which is applied isotropically, the constant-
stress ensemble allows you to control the xx, yy, zz, xy, yz, and zx
components of the stress tensor.
Control of run conditions Pressure can be controlled by the Berendsen, Andersen, or Par-
rinello–Rahman method (How pressure and stress are controlled).

Forcefield-Based Simulations/October 1997 207


5. Molecular Dynamics

However, only the size, and not the shape, of the unit cell can be
changed with the Berendsen and Anderson methods (Berendsen
method of pressure control and Andersen method of pressure control).
Stress can be controlled by the Parrinello–Rahman method (Par-
rinello–Rahman method of pressure and stress control), since it allows
both the cell volume and its shape to change.
Temperature can be controlled by any method available (How tem-
perature is controlled) (except, of course, the temperature scaling
method, since it is not truly isothermal).
When to use it NPT is the ensemble of choice when the correct pressure, volume,
and densities are important in the simulation. This ensemble can
also be used during equilibration to achieve the desired tempera-
ture and pressure before changing to the constant-volume or con-
stant-energy ensemble when data collection starts.
Results The NST ensemble is particularly useful if you want to run a sim-
ulation at incremented tensile loads to study the stress–strain rela-
tionship in polymeric or metallic materials.
If the forcefield being used yields a high pressure at the experi-
mental volume, it may be more realistic to simulate at the experi-
mental pressure rather than the experimental volume. High
simulated pressure is a sign that the system is unduly compressed,
which restricts atomic motions, artefactually slowing down the
dynamic relaxations.

NPH and NSH ensembles


Periodic systems The constant-pressure, constant-enthalpy ensemble (NPH, Ander-
sen 1980, see also Andersen method of pressure control) is the ana-
logue of constant-volume, constant-energy ensemble, where the
size of the unit cell is allowed to vary. In the constant-pressure (or
-stress), constant-enthalpy ensemble (NPH or NSH, Parrinello and
Rahman 1981, see also Parrinello–Rahman method of pressure and
stress control), both the size and shape of the unit cell are allowed
to vary (meaning that external stress can be applied). These meth-
ods apply only to 3D periodic systems.
Enthalpy H, which is the sum of E and PV, is constant when the
pressure is kept fixed without any temperature control. Although
the temperature is not controlled during true (adiabatic) NPH or

208 Forcefield-Based Simulations/October 1997


Statistical ensembles

NSH dynamics, you might want to use these conditions during the
equilibration phase (Stages and duration of dynamics simulations) of
your simulation. For this purpose, Cerius2•Discover and
Cerius2•Dynamics Simulation allow you to hold the temperature
within specified tolerances by periodic scaling of the velocities.
Results The natural response functions (specific heat at constant pressure,
thermal expansion, adiabatic compressibility, and adiabatic com-
pliance tensor) are obtained (Equilibrium thermodynamic properties)
from the proper statistical fluctuation expressions of kinetic
energy, volume, and strain (Ray 1988).

Equilibrium thermodynamic properties


Precautions Since the ensembles are artificial constructs, they produce aver-
ages that are consistent with one another when they represent the
same state of the model. Nevertheless, the fluctuations vary in dif-
ferent ensembles. Some of the fluctuations are related to thermo-
dynamic derivatives, such as the specific heat or the isothermal
compressibility.

Caution
In practice, obtaining accurate fluctuations to calculate physical
quantities is difficult, and this approach should be used with
caution.

The transformation and relation between different ensembles has


been discussed in greater detail by Allen and Tildesley (1987).
Obtaining equilibrium One of the objectives of molecular dynamics is to obtain the equi-
thermodynamic proper- librium thermodynamic properties of a model. If a microscopic
ties dynamic variable A takes on values A(t) along a trajectory, then the
following time average:



A = OLP T → ∞ --- A ( t ) dt Eq. 85
T


yields the thermodynamic value for the selected variable. This


dynamic variable can be any function of the coordinates and
momenta of the particles of the model.

Forcefield-Based Simulations/October 1997 209


5. Molecular Dynamics

Time averaging for first- Through time averaging, you can calculate the first-order proper-
order properties ties of a system (such as the internal energy, kinetic energy, pres-
sure, and virial). Similarly, using microscopic expressions in the
form of fluctuations of these first-order properties, you can also
calculate thermodynamic properties of a system. These include the
specific heat, thermal expansion, and bulk modulus.
In the thermodynamic limit, the first-order properties obtained in
one ensemble are equivalent to those obtained in other ensembles
(differences are on the order of 1/N).
Ensemble-dependent However, second-order properties such as specific heats, com-
second-order properties pressibilities, and elastic constants differ between ensembles. For
example, the specific heat at constant pressure differs from the spe-
cific heat at constant volume.
Therefore, it is important to use the appropriate ensemble when
performing simulations to obtain these properties.

Temperature
This section includes:
How temperature is calculated
How temperature is controlled

Relation of temperature Temperature is a state variable that specifies the thermodynamic


and velocity state of the system and is also an important concept in dynamics
simulations. This macroscopic quantity is related to the micro-
scopic description of simulations through the kinetic energy,
which is calculated from the atomic velocities.
The temperature and the distribution of atomic velocities in a sys-
tem are related through the Maxwell–Boltzmann equation:

mv 2
m 3 / 2 – ----------
f ( v )dv =  ----------- e 2kT 4πv 2 dv Eq. 86
 2πkt

210 Forcefield-Based Simulations/October 1997


Temperature

This well known formula expresses the probability f (v) that a mol-
ecule of mass m has a velocity of v when it is at temperature T.
Figure 31 shows this distribution at various temperatures.

3
probability (¥ 1000) 100K

2
300K
600K
1
1000K

0
1000 0 2000
speed (m s ) -1

Figure 31. Maxwell–Boltzmann distribution of velocity of water at various


temperatures
Distribution of model velocities at equilibrium as predicted by the Maxwell–Bolt-
zmann equation. The simulation program assigns random initial velocities to a
system of atoms such that the overall distribution of velocities matches a Max-
well–Boltzmann distribution for the desired temperature.

The x, y, z components of the velocities, on the other hand, have


Gaussian distributions:

– mvx2
m 1 / 2 -------------
g ( vx )dvx =  ------------- e 2kT dvx Eq. 87
 2πkT

How initial velocities are The initial velocities are generated from the Gaussian distribution
generated of vx, vy, and vz. The Gaussian distribution is generated from a ran-
dom number generator and a random number seed.

Forcefield-Based Simulations/October 1997 211


5. Molecular Dynamics

How temperature is calculated


Temperature is a thermodynamic quantity, which is meaningful
only at equilibrium. It is related to the average kinetic energy of the
system through the equipartition principle. This principle states
that every degree of freedom (either in momenta or in coordi-
nates), which appears as a squared term in the Hamiltonian, has an
average energy of kT/2 associated with it. This is true for momenta
pi which appear as pi2/2m in the Hamiltonian.
Relation of kinetic energy, Hence we have:
degrees of freedom, and
temperature N


p i2 Nf k B T
------- = 〈K 〉 = --------------- Eq. 88
2m 2
i

The left side of Eq. 88 is also called the average kinetic energy of
the system, Nf is the number of degrees of freedom, and T is the
thermodynamic temperature. In an unrestricted system with N
atoms, Nf is 3N because each atom has three velocity components
(i.e., vx, vy, and vz).
Instantaneous kinetic It is convenient to define an instantaneous kinetic temperature
temperature function:

2K
T instan = ----------- Eq. 89
Nf k B

The thermodynamic tem- The average of the instantaneous temperature Tinstan is the ther-
perature modynamic temperature T.
Temperature is calculated from the total kinetic energy and the
total number of degrees of freedom. For a nonperiodic system:
Temperature in nonperi- N
odic systems… ( 3N – 6 )k B T

m i v i2
------------------------------- = ------------ Eq. 90
2 2
i=1

Six degrees of freedom are subtracted because both the translation


and rotation of the center of mass are ignored.

212 Forcefield-Based Simulations/October 1997


Temperature

…and in periodic systems And for a periodic system:

N
( 3N – 3 )k B T

m i v i2
------------------------------- = ------------ Eq. 91
2 2
i=1

Only the three degrees of freedom corresponding to translational


motion can be ignored, since rotation of a central cell imposes a
torque on its neighboring cells.

How temperature is controlled


Although the initial velocities are generated so as to produce a
Maxwell–Boltzmann distribution at the desired temperature, the
distribution does not remain constant as the simulation continues.
This is especially true when the system does not start at a mini-
mum-energy configuration of the model. This occurs often, since
the model is commonly minimized only enough to eliminate any
hot spots.
During dynamics, kinetic energy is changed to potential energy as
the minimized structure changes to the thermal equilibrium struc-
ture, and the temperature also changes.
Need to control tempera- To maintain the correct temperature, the computed velocities have
ture to be adjusted appropriately. In addition to maintaining the
desired temperature, the temperature-control mechanism must
produce the correct statistical ensemble. This means that the prob-
ability of occurrence of a certain configuration obeys the laws of
statistical mechanics.
For example, in order for constant-temperature, constant-volume
dynamics to generate the canonical ensemble, P(E) (i.e., the proba-
bility that a configuration with energy E will occur) must be pro-
portional to exp(-E/kBT), also called the Boltzmann factor.
Methods of controlling Only temperature-control methods used in MSI’s simulation
temperature engines (Table 20) are considered here:
Direct velocity scaling
Berendsen method of temperature-bath coupling
Nosé and Nosé–Hoover dynamics

Forcefield-Based Simulations/October 1997 213


5. Molecular Dynamics

Andersen method

Table 20 Temperature-control methods used by MSI simulation


engines

simulation engine
method
CHARMm Discover OFF
Velocity scalinga
√ √ √
Berendsen temperature bathb √ √ √
Nosé √
Nosé–Hooverc √d √
Andersen √4
aTemperature scaling is not generally used to control temperature during a
simulation, but to quickly change the simulated temperature to the desired
value. It does not produce the correct statistical ensemble.
bReferred to as T_DAMPING in Cerius2•Dynamics Simulation.
c
Referred to as Nosé in the Cerius2•Discover and Insight•Discover_3 mod-
ules and Hoover in Cerius2•Dynamics Simulation.
dCDiscover only, not in FDiscover.

Choosing the tempera- To access the controls used for specifying the temperature-control
ture-control method(s) method:
♦ In Cerius2, go to the DYNAMICS SIMULATION card on the
OFF METHODS deck of cards. Click the Run card menu item
to access the Dynamics Simulation control panel. You can set
the target temperature for velocity scaling with the Required
Temperature entry box. The Preferences… buttons for each
statistical ensemble and type of dynamics simulation give
access to additional temperature-control methods and parame-
ters.
♦ In Cerius2•Discover, click the Run menu item on the DIS-
COVER card to access the Run Discover control panel. Set the
Task popup to Dynamics and click the More… pushbutton to
the right of the Task popup. Set the Ensemble popup in the Dis-
cover Dynamics control panel to NVT or NPT and select the
Thermostat. Other controls depend on which thermostat you
choose.

214 Forcefield-Based Simulations/October 1997


Temperature

♦ In the Insight•Discover_3 module, select the Calculate/


Dynamics command. Controls for setting the temperature con-
trol method are displayed in the parameter block when you
toggle More on and when an ensemble requiring temperature
control is chosen.
♦ In the Insight•Discover module, velocity scaling is automati-
cally used during the equilibration stage (Stages and duration of
dynamics simulations) and the Berendsen method during the
data-collection stage. You can change the relaxation time used
with the Berendsen method by selecting the Parameters/Vari-
ables command, toggling Timtmp on, and entering a value in
the TIM_TMP entry box.
♦ In QUANTA, select the CHARMm/Dynamics Option menu
item. Choose Setup Detailed Dynamics and click OK. You can
also specify how atomic velocities are assigned or scaled.

Direct velocity scaling


Direct velocity scaling is a drastic way to change the velocities of
the atoms so that the target temperature can be exactly matched
whenever the system temperature is higher or lower than the tar-
get by some user-defined amount.

Important
Direct velocity scaling can not be used to generate realistic
thermodynamic ensembles, since it suppresses the natural
fluctuations of a system.
Implementation In Discover, the velocities of all atoms are scaled uniformly as fol-
lows:

v new 2 T target
 ----------
- = ---------------- Eq. 92
 v old  T system

In the Cerius2•Dynamics Simulation module, the rescale factor is:

2
---------- ( T – T avg ) + 1 Eq. 93
T inst r

Forcefield-Based Simulations/October 1997 215


5. Molecular Dynamics

where Tinst = the instantaneous temperature, Tr = the required tem-


perature, Tavg = the average kinetic temperature, and the term
under the root is positive. Otherwise, rescaling is not done.
CHARMm allows you to assign velocities based on values in a
comparison coordinate file or to assign either a uniform or a Gaus-
sian distribution of velocities to the atoms. The Gaussian distribu-
tion is recommended.
Achieving equilibrium Direct temperature scaling adds (or subtracts) energy from the sys-
tem efficiently, but it is important to recognize that the fundamen-
tal limitation to achieving equilibrium is how rapidly energy can
be transferred to, from, and among the various internal degrees of
freedom of the model. The speed of this process depends on the
energy expression, the parameters, and the nature of the coupling
between the vibrational, rotational, and translational modes. It
also depends directly on the size of the system, larger systems tak-
ing longer to equilibrate.

Berendsen method of temperature-bath coupling


After equilibration, a more gentle exchange of thermal energy
between the system and a heat bath can be introduced through the
Berendsen et al. (1984) method (referred to as temperature damp-
ing in Cerius2•Dynamics Simulation), in which each velocity is
multiplied by a factor λ given by:

1/2
∆t T – T 0
λ = 1 – -----  --------------- Eq. 94
τ T 

where ∆t is the timestep size, τ is a characteristic relaxation time,


T0 is the target temperature, and T the instantaneous temperature.
To a good approximation, this treatment gives a constant-temper-
ature ensemble that can be controlled, both by adjusting the target
temperature T0 and by changing the relaxation time τ (generally
between 0.1 and 0.4 ps).
This is a simple approach that does not use Hamiltonians.

Nosé and Nosé–Hoover dynamics


Produces true canonical Nosé dynamics (Nosé 1984a, 1984b, 1991) is a method for perform-
ensembles… ing constant-temperature dynamics that produces true canonical

216 Forcefield-Based Simulations/October 1997


Temperature

ensembles both in coordinate space and in momentum space. The


Nosé–Hoover formalism (referred to as Nosé in Discover and as
Hoover in Cerius2•Dynamics Simulation) is based on a simplified
reformulation by Hoover (1985), which eliminates time scaling
and therefore yields trajectories (Dynamics trajectories) in real time
and with evenly spaced time points. The method is also called the
Nosé–Hoover thermostat.
…through use of a ficti- The main idea behind Nosé–Hoover dynamics is that an addi-
tious mass tional (fictitious) degree of freedom is added to the model, to rep-
resent the interaction of the model with the heat bath. This
fictitious degree of freedom is given a mass Q. The equations of
motion for the extended (i.e., model plus fictitious) system are
solved. If the potential chosen for that degree of freedom is correct,
the constant-energy dynamics (or the microcanonical dynamics,
NVE) of the extended system produces the canonical ensemble
(NVT) of the real model.
Hamiltonian and equa- The Hamiltonian H* of the extended system is:
tions of motion for this
extended system

* p i2 Q
H = --------- + φ ( q ) + ---- ζ 2 + gkTlnS Eq. 95
2m i 2
i

Equations of motion for the real-atom coordinates q and moments


p, as well as for the fictitious coordinates S and momentum ζ
(where φ is the interaction potential) are:

dq i pi
= ----- Eq. 96
dt mi

dp i dφ
= – – ζp i Eq. 97
dt d qi


p i2
------ – gkB T
dζ mi
= ------------------------------------ Eq. 98
dt Q

Forcefield-Based Simulations/October 1997 217


5. Molecular Dynamics

where Q = the user-defined q_ratio × a constant × g × T;


g = number of degrees of freedom;
T = temperature.
Magnitude of the fictitious The choice of the fictitious mass Q of that additional degree of free-
mass and computational dom is arbitrary but is critical to the success of a run. If Q is too
efficiency small, the frequency of the harmonic motion of the extended
degree of frequency is too high. This forces a smaller timestep to
be used in integration. However, if Q is too large, the thermaliza-
tion process is not efficient—as Q approaches infinity, there is no
energy exchange between the heat bath and the model.
The choice of Q should therefore be based on a balance between
the stability of the solution and the highest-frequency motions of
the model.
Suggested value for ficti- Q should be different for different models—Nosé (1991) suggests
tious mass… that Q should be proportional to gkBT, where g is the number of
degrees of freedom in the model, kB is the Boltzmann constant, and
T is the temperature.
To determine the proportionality constant, studies were done with
a model consisting of a box of liquid argon containing 343 atoms
at 87 K. Setting Q to 2.5 10-5 kcal mol-1 fs-2 for this model yielded
good results. This proportionality constant, together with the gkBT
term, was then used to generate Q for a box of water and an amor-
phous cell of polypropylene, which also yielded satisfactory
results.
Discover ordinarily chooses Q automatically. However, you may
choose to multiply this calculated Q by a user-defined Q ratio.
…or for the relaxation time In the Cerius2•Dynamics Simulation module, the factor that you
can control directly is τ, a relaxation time for the model (τ2 is
directly proportional to Q).
For simple fluid models, τ can be chosen as the second moment of
the velocity autocorrelation function. For covalently bonded mod-
els, however, it is better to choose τ based on the frequencies
involved in the model.
A rule of thumb, therefore, is to choose τ on the same order as the
smallest time scale (highest frequency) of your model.
Timestep to use Tests on polypropylene indicate that Nosé–Hoover dynamics
needs a timestep of 0.5 fs with the velocity Verlet method in order

218 Forcefield-Based Simulations/October 1997


Pressure and stress

to approach within 3% of the target temperature of 298 K. In con-


trast, 1.0 fs is sufficient for the direct velocity scaling method of
temperature control.
For the Nosé–Hoover method, the smaller the timestep, the closer
it approaches the target temperature. Apparently, the need for a
smaller timestep to achieve accuracy is unrelated to Q, as long as
Q is in the appropriate range (because the error in reaching the tar-
get temperature remains the same when Q is increased or
decreased by a factor of 2).
Accurate integrator For energy to be conserved, the Nosé–Hoover method also
requires an accurate integrator. Thus, the ABM4 integrator, if avail-
able, with 0.5 fs as the timestep should be used.

Andersen method
One version of the Andersen method of temperature control
involves randomizing the velocities of all atoms at a predefined
“collision frequency”. (The other version involves choosing one
atom at each timestep and changing its velocity according to the
Boltzmann distribution.)
The CDiscover program implements the first version. The pre-
defined frequency is proportional to N 2/3, where N is the number
of atoms in the system. Although this frequency is calculated by
the program, you can change it.

Pressure and stress


This section includes:
Units and sign conventions for pressure and stress
How pressure and stress are calculated
How pressure and stress are controlled

What are pressure and Pressure is another basic thermodynamic variable that defines the
stress? state of the system. It is a familiar concept, defined as the force per
unit area. Standard atmospheric pressure is 1.013 bar, where 1 bar
= 105 Pa. A single number for the pressure implies that pressure is

Forcefield-Based Simulations/October 1997 219


5. Molecular Dynamics

a scalar quantity, but in fact, pressure is a tensor of the more gen-


eral form (McQuarrie 1976):

 P P xy P xz 
 xx 
 
P =  P yx P yy P yz  Eq. 99
 
 P P zy P zz 
 zx 

Each element of the tensor is the force that acts on the surface of an
infinitesimal cubic volume that has edges parallel to the x, y, and z
axes. The first subscript refers to the direction of the normal to the
plane on which the force acts, and the second subscript refers to
the direction of the force.
In an isotropic situation, where forces are the same in all directions
and there is no viscous force, the pressure tensor is diagonal. With
the same diagonal elements, the tensor can be written as:

 1 0 0
 
P = p  0 1 0 Eq. 100
 
 0 0 1
 

where the scalar quantity p is the equivalent hydrostatic pressure.


Sometimes, (especially in materials science studies), the the stress
tensor, or stress, is used in preference to the pressure tensor (its
negative). The diagonal elements are known as the tensile stress,
and the nondiagonal elements are the shear stress.
The changes in unit-cell lattice parameters and volume resulting
from the stress can be obtained from an analysis of dynamics tra-
jectory output file data (Dynamics trajectories). Multiple dynamics
runs can be performed at varying stress or pressure values, and the
strains obtained can be used to plot a stress–strain curve.

Units and sign conventions for pressure and stress


Pressure and stress can be expressed in many different units. The
most common ones are bar and GPa. In SI units, 1 bar = 105 N m-2

220 Forcefield-Based Simulations/October 1997


Pressure and stress

and 1 GPa = 109 N m-2. Hence, 1 GPa = 104 bar. Pressure is usually
expressed in bars, but in materials science, stress is often expressed
in terms of GPa.
In the CDiscover program, constant-stress dynamics has GPa as
the default unit. Since the FDiscover program is geared towards
controlling pressure, the unit used is bar. In the Cerius2•Dynamics
Simulation module, GPa are the only units used for pressure and
stress.
Although pressure and stress are defined with the same physical
quantities, they have opposite sign conventions. Positive pressure
implies a compressive force pushing the system inward, but posi-
tive stress means a force acting outward to expand the system.

How pressure and stress are calculated


Pressure is calculated through the use of the virial theorem (Allen
and Tildesley 1987). Like temperature, pressure is a thermody-
namic quantity and is, strictly speaking, meaningful only at equi-
librium.
Relation of pressure, tem- Thermodynamic pressure, thermodynamic temperature, volume,
perature, volume and and internal virial can be related in the following way:
internal virial
2
PV = Nk b T + --- 〈W 〉 Eq. 101
3

And W is defined as:

∑r ⋅ f
1
W = --- i i Eq. 102
2
i=1

Volume (and pressure) is Note that pressure is defined only when the system is placed in a
known only for periodic container having a definite volume. In a computer simulation, the
models unit cell under periodic boundary conditions is viewed as the con-
tainer. Volume, pressure, and density can be calculated only when
the model is recognized as periodic.

Forcefield-Based Simulations/October 1997 221


5. Molecular Dynamics

Instantaneous pressure Analogous to the temperature, an instantaneous pressure function


P can be defined so that thermodynamic pressure is the average of
the instantaneous values:

Nk b T 2 W
P = -------------- + --- ------ Eq. 103
V 3V

where P is the instantaneous pressure and T is the instantaneous


kinetic temperature, which is related to the instantaneous kinetic
energy K of the system as:

2
T = ------------- K Eq. 104
3Nk b

The instantaneous pressure function can be written as:

2
P = ------- ( K + W ) Eq. 105
3V

Instantaneous pressure As mentioned above, pressure is a tensor and its components can
tensor also be expressed in tensorial form. Eq. 101 can be recast in the
form of:
N N

∑m v v + ∑r f
1 T T
P = --- i i i i i Eq. 106
V

In detail, the two terms on the right-hand side of the equation are:

∑m v v ∑m v v ∑m v
i ix ix i ix iy i ix v iz

N i i i

∑m v v
T
i i i = ∑m v v ∑m v v ∑m v
i iy ix i iy iy i iy v iz Eq. 107

i=1 i i i

∑m v v ∑m v v ∑m v v
i iz ix i iz iy i iz iz

222 Forcefield-Based Simulations/October 1997


Pressure and stress

∑r f ∑r f ∑r
ix ix ix iy ix f iz

N i i i

∑r f
T
i i = ∑r f ∑r f ∑r
iy ix iy iy iy f iz Eq. 108

i=1 i i i

∑r f ∑r f ∑r f
iz ix iz iy iz iz

Instantaneous hydrostatic where rix, vix, and f ix indicate the x components of the position,
pressure velocity, and force vectors of the ith atom, respectively.
From the definition of the instantaneous pressure tensor, the
instantaneous hydrostatic pressure is calculated as 1/3 the trace of
the pressure tensor (Pxx + Pyy + Pzz).
Forces on the image When periodic boundary conditions are used, atoms in the unit
atoms cell interact not only with the other atoms in the unit cell but also
with their translated images. Forces on the images in the virial W
must be included correctly. If the interaction is pairwise, using
Newton’s third law, W can be written as:

∑r f
1
W = --- ij ij Eq. 109
2
>
instead of:

∑r f
1
W = --- i i Eq. 110
2

Berendsen et al. (1984) use the rij ⋅ fij formalism by evaluating the
virial and the kinetic energy tensor based on the centers of mass,
which is valid because the internal contribution to the virial is can-
celed (on the average) by the internal kinetic energy. Because of the
way forces are evaluated, rescaling of coordinates is also done on
the basis of the centers of mass of the models.
How coordinates are The Discover program, however, calculates pressure on an atomic
scaled in response to pres- basis and performs atomic scaling. The major advantage of using
sure changes atomic scaling of the coordinates is that atom overlapping can be
avoided. Such overlap can occur if centers of mass are moved

Forcefield-Based Simulations/October 1997 223


5. Molecular Dynamics

instead of individual atoms. In addition, for large models having


internal flexibility, atomic scaling yields a smoother response to
pressure changes.
Explicit or implicit images In the Discover program, when explicit images (also called ghost
atoms) are used under periodic boundary conditions, the rij ⋅ fij for-
malism is used, with the explicit images included.
In the FDiscover program when minimum images (implicit
images) are used under periodic boundary conditions, the rij ⋅ fij
formalism is used so that forces between real and image atoms can
be properly accounted for.

How pressure and stress are controlled


As with temperature (How temperature is controlled), the pressure
(and stress) -control mechanism must produce the correct statisti-
cal ensemble. This means that the probability of occurrence of a
certain configuration obeys the laws of statistical mechanics.
Methods of controlling Only pressure-control methods used in MSI’s simulation engines
pressure (Table 21) are considered here:
Berendsen method of pressure control
Andersen method of pressure control
Parrinello–Rahman method of pressure and stress control

Table 21. Pressure- and stress-control methods used by MSI


simulation engines

simulation engine
method
CHARMm Discover OFF
Berendsen pressure “bath” √ √
Andersen √a √
Parrinello–Rahman √1 √
aCDiscover
only, not in FDiscover.

224 Forcefield-Based Simulations/October 1997


Pressure and stress

Important
With the Berendsen and Andersen methods the volume can
change, but there is no change in the shape of the cell; thus only
the pressure is controlled. With the Parrinello–Rahman method,
the cell’s shape can change, and therefore both pressure and
stress can be controlled.

Choosing the pressure- To access the controls used for specifying the pressure- and stress-
and/or stress-control control methods:
method(s)
♦ The Cerius2•Dynamics Simulation module automatically uses
the Parrinello–Rahman method to control stress and the Ander-
sen method to control pressure.
♦ In the Cerius2•Discover and Insight•Discover_3 modules, the
Parrinello–Rahman method is the default for pressure (and
stress) control. If you really want to change it, you can write out
the command input file, edit the BTCL dynamics command
statement with a text editor, and then use that file for your run.
Alternatively (for Insight•Discover_3), select the Language_
Control/Command_Comment command. Set the Comment
Type to Command. Enter pressure_control_method =
Andersen_cp or pressure_control_method = Berendsen_pc in
the Command/Comment entry box and select Execute. Be sure
that you insert this stage at the correct point in your command
input file.
♦ The Insight•Discover and QUANTA•CHARMm modules
always uses the Berendsen pressure-control method.

Berendsen method of pressure control


Pressure changes can be accomplished by changing the coordi-
nates of the particles and the size of the unit cell in periodic bound-
ary conditions.
How it works The Berendsen method (Berendsen et al. 1984) couples the system
to a pressure “bath” to maintain the pressure at a certain target.
The strength of coupling is determined both by the compressibility
of the system (using a user-defined variable γ) and by a relaxation
time constant (a user-defined variable τ). At each step, the x, y, and
z coordinates of each atom are scaled by the factor:

Forcefield-Based Simulations/October 1997 225


5. Molecular Dynamics

1/3
 ∆t 
µ =  1 + -------- γ [ P – P 0 ] Eq. 111
 τ 

where ∆t is the time step, P is the instantaneous pressure, and P0 is


the target pressure. The Cartesian components of the unit cell vec-
tors are scaled by the same factor µ.
Change in cell size but not Note that this method (as implemented) changes the cell uni-
shape formly, so that the size of the cell is changed, but not its shape.
Therefore, for simulations such as crystal phase transitions, where
both the cell size and shape are expected to change, this method is
not appropriate.
Disadvantages of the The pressure fluctuation has been observed to be large during test
method runs with the Discover program and in studies in the literature
(Brown and Clark 1984).
Negative pressures sometimes occur because the virial can be neg-
ative, even though this defies the usual sense that pressure is a
positive number. The calculated pressure depends on the cutoff
distances used in the simulation.
Precautions To compensate for the missing long-range part of the potential
contributions to the energy, pressure at r > rc can be estimated by
assuming the radial distribution g (r) ~ 1 (uniform distribution) in
that region. With this assumption and the known form of nonbond
potentials, the correction can be estimated analytically (Allen and
Tildesley 1987).
However, this correction is not built into the calculation. If the cut-
off distance is too short, the calculated pressure may be wrong.
Therefore in practice, you should test the effect of cutoff on pres-
sure by gradually increasing the cutoff and choosing the appropri-
ate cutoff accordingly.

Andersen method of pressure control


Change in cell size but not With the Andersen method (1980) of pressure control, the volume
shape of the cell can change, but its shape is preserved by allowing the
cell to change isotropically.
Uses The Anderson method is useful for liquid simulations since the
box could become quite elongated in the absence of restoring

226 Forcefield-Based Simulations/October 1997


Pressure and stress

forces if the shape of the cell were allowed to change. A constant


shape also makes the dynamics analysis easier.
However, this method is not very useful for studying materials
under nonisotropic stress or phase transitions, which involve
changes in both cell lengths and cell angles (for these conditions,
the Parrinello–Rahman method should be used).
How it works The basic idea of the method is to treat the volume V of the cell as
a dynamic variable in the system. The Lagrangian of the system is
modified so that it contains a term in the kinetic energy with a
user-defined mass M and a potential term which is the pV poten-
tial derived from an external pressure Pext acting on volume V of
the system.

Parrinello–Rahman method of pressure and stress control


Both size and shape of cell The Parrinello-Rahman method of pressure and stress control can
can change allow simulation of a model under externally applied stress. This
is useful for studying the stress–strain relationship of materials.
Both the shape and the volume of the cell can change, so that the
internal stress of the system can match the externally applied
stress.
Summary of derivation The method is presented in detail by Parrinello and Rahman (1981)
and is only summarized here.
The Lagrangians of the system are modified such that a term rep-
resenting the kinetic energy of the cell depends on a user-defined
mass-like parameter W. An elastic energy term pΩ is related to the
pressure and volume or the stress and strain of the system. The
equations of motion for the atoms and cell vectors can be derived
from this Lagrangian. The motion of the cell vectors, which deter-
mines the cell shape and size, is driven by the difference between
the target and internal stress.
With hydrostatic pressure only:

N N N

∑ m s· ′Gs· – ∑ ∑ φ( r ) + --2- WTrh· ′h· – pΩ


1 1
L = --- i i ij Eq. 112
2
i=1 i = 1 j>i

Forcefield-Based Simulations/October 1997 227


5. Molecular Dynamics

where h = {a ⋅ b ⋅ c} is the cell vector matrix, G = h′h, ri = hsi, φ is the


interaction potential, and Ω is the volume of the cell. The dots
above some symbols indicate time derivatives and the primes
indicate matrix transposition. Tr is the trace of a matrix.
With stress, the elastic term pΩ is replaced by:

p ( Ω – Ω 0 ) + Ω 0 Tr ( S – p )ε Eq. 113

where S is the applied stress, Ω0 is the initial volume, and ε is the


strain; ε is also a tensor, defined as (h0′-1 Gh0-1 - 1)/2.
Choice of the mass vari- Note the user-defined variable W, which determines the rate of
able change of the volume/shape matrix.
A large W means a heavy, slow cell. In the limiting case, infinite W
reverts to constant-volume dynamics. A small W means fast
motion of the cell vectors. Although that may mean that the target
stress can be reached faster, there may not be enough time for
equilibration.
In tests, too small a W also induced artificial periodic motions of
the cell. A value of 20 seems to give satisfactory results for a cell of
76mers of polyethylene.

Types of dynamics simulations


In addition to relatively simple dynamics simulations, it is also
possible to bias or control the dynamics run in several ways and/
or to combine dynamics and minimization in one simulation.
These types of dynamics runs can be used in conjunction with one
or more of the various thermodynamic statistical ensembles (as
appropriate, see Statistical ensembles).
Several such types of dynamics simulations are discussed in this
section:
Quenched dynamics
Simulated annealing
Consensus dynamics
Impulse dynamics

228 Forcefield-Based Simulations/October 1997


Types of dynamics simulations

Langevin dynamics
Stochastic boundary dynamics
Multibody order-N dynamics

Table 22. Types of dynamics simulations easily set up with MSI


simulation engines

simulation engine
method
CHARMm Discover OFF
Quenched dynamics √a

Simulated annealing √b √1 √
Consensus dynamics √c
Impulse dynamics √d √
Langevin dynamics √
Stochastic boundary dynamics √
a
Available through Insight•Discover_3 module for CDiscover, via DSL only for
FDiscover; not yet available in Cerius2•Discover.
b
Truncated (only one T-to-0 K half-cycle), and referred to as “quenched
dynamics” in the CHARMm documentation.
cAvailable in standalone mode only.
d
Available in standalone mode of CDiscover only.

Choosing the type of To access the controls used for specifying types of dynamics simu-
dynamics lations:
♦ In Cerius2, go to the DYNAMICS SIMULATION card on the
OFF METHODS deck of cards. Click the Run card menu item
to access the Dynamics Simulation control panel. Check the
check box next to the desired type of simulation. You can access
additional controls relevant to each type by clicking the Prefer-
ences… button to its right.
♦ To set up complicated dynamics runs with the Insight•
Discover_3 module, you need to set up several stages of mini-
mization (if desired) and dynamics, using the Calculate/Mini-
mize and Calculate/Dynamics commands. You probably also
need to write and read appropriate files at various stages of the
run with the Language_Control/File_Control command. In

Forcefield-Based Simulations/October 1997 229


5. Molecular Dynamics

addition, you may need to set up control loops (such as If and


Foreach statements) with the Language_Control/Looping_
Control command.
Alternatively, to produce a phi/psi map (showing energy as a
function of rotation about two torsions of a model), you can use
the Strategy/Phi_Psi_Map command.
♦ In QUANTA, select the CHARMm/Dynamics Option menu
item. Use the Setup Heating, Setup Equilibration, and Setup
Simulation radio buttons to access dialogs in which you set
temperatures as desired for simulated annealing for these
stages.
Langevin dynamics is set up by clicking the Setup Detailed
Dynamics button. You can combind Langevin dynamics with
simulated annealing by setting the Bath Temperature to 0.
Stochastic boundary conditions are set by selecting the
CHARMm/Constraints Options/Stochastic Bdry Settings
menu item to specify these constraints and then selecting the
CHARMm/Constraints Options/Stochastic Bdry On menu
item to activate them.
Exploring conformational A common limitation of classical minimization algorithms is that
space using dynamics they usually locate a local minimum close to the starting configu-
ration, which is not necessarily the global minimum (Local or global
minimum?). This is because the minimizers discussed under Mini-
mization algorithms are specifically designed to ignore configura-
tions if the energy increases.
By using the available thermal energy to climb and cross confor-
mational energy barriers, dynamics provides insight into the
accessible conformational states of the molecule that would be
inaccessible to classical minimization.
The results can be assessed by plotting selected data in 3D format
(e.g., a phi/psi map for rotation about the peptide bond) or by
minimizing selected structures that are generated during the
dynamics run (see Quenched dynamics).

230 Forcefield-Based Simulations/October 1997


Types of dynamics simulations

Quenched dynamics
How it works In quenched dynamics, periods of dynamics are followed by a
quench period in which the structure is minimized. You can spec-
ify the simulation time between quenches and the number of min-
imization steps. The quenched structure can then be written to a
trajectory file (Dynamics trajectories), and dynamics continues with
the prequenched structure.
Uses Quenched dynamics is a way to search conformational space for
low-energy structures.

Simulated annealing
How it works In simulated annealing, the temperature is altered in time incre-
ments from an initial temperature to a final temperature and back
again. This cycle can be repeated. The temperature is changed by
adjusting the kinetic energy of the structure (by rescaling the
velocities of the atoms).
Simulated annealing can be combined with quenched dynamics.
That is, at the end of each temperature cycle, the lowest-energy
structure of that cycle can be minimized and saved in a trajectory
file (Dynamics trajectories). Annealing continues using the last
structure and velocities from the previous cycle.

Tip
Use the quartic form of the nonbond term for simulated
annealing studies in which the initial coordinates are unknown.
These simulations can involve atoms moving through other
atoms and for this it is essential that the nonbond term not go to
infinity.

Simulated annealing cannot be combined with impulse dynamics


(Impulse dynamics).
Uses Annealing allows the energy of the structure to be changed grad-
ually, without allowing the structure to become trapped in a con-
formation that has a lower energy than nearby conformations but
a higher energy than more distant conformations (that is, in a local
energy minimum).

Forcefield-Based Simulations/October 1997 231


5. Molecular Dynamics

Consensus dynamics
How it works Consensus dynamics can be thought of as an extension of the teth-
ering restraint technique (Template forcing, tethering, quartic droplet
restraints, and consensus conformations). In tethering, a model sys-
tem serves as a fixed template. The “moving” system, driven by
molecular mechanics or dynamics, is then forced to conform to the
template by applying restraints.
In contrast, the consensus technique allows the “template” to
respond to changes in the “moving” system, by treating all the
models as “moving templates”. The net result is that both models
change so that their structures become similar. Consensus dynam-
ics can be applied to more than two models simultaneously.
Uses Consensus restraints have several uses. One example is the deter-
mination of structural similarities among a set of homologous
compounds—perhaps several similar compounds that bind to a
particular receptor. If you hypothesize that an apparently homolo-
gous region of the compounds is responsible for the binding, then
you would want to find a configuration for this region that is com-
patible with relatively low-energy conformations for all the com-
pounds. By applying consensus restraints to the homologous
region of each compound, you can use dynamics followed by min-
imization to find a likely binding configuration.

Impulse dynamics
How it works Impulse dynamics allows you to assign initial directional veloci-
ties to selected atoms before carrying out dynamics.
Impulse dynamics can be used only with constant-energy, con-
stant-volume dynamics (NVE ensemble) and cannot be combined
with simulated annealing (Simulated annealing).
Uses You can use impulse dynamics to push interacting molecules over
energy barriers before allowing the structure to relax.

232 Forcefield-Based Simulations/October 1997


Types of dynamics simulations

Langevin dynamics
How it works The Langevin dynamics method (McCammon et al. 1976, Levy et
al. 1979) approximates a full molecular dynamics simulation of a
system by eliminating unimportant or uninteresting degrees of
freedom. The effects of the eliminated degrees of freedom are sim-
ulated by mean and stochastic forces.
Friction coefficients must be assigned to selected atoms before
starting the dynamics run. Langevin dynamics includes a con-
stant-temperature bath.
Uses For example, instead of simulating hundreds of solvent molecules
surrounding the solute molecules, the solvent can be ideally repre-
sented as a viscous fluid described in terms of dissipative and fluc-
tuative equations.

Stochastic boundary dynamics


How it works The stochastic boundary molecular dynamics method (Brooks et
al. 1985a) uses a combination of both Langevin dynamics and
Newtonian dynamics. With this method, the model is partitioned
into a reaction region where Newtonian dynamics simulation is
run, a buffer region where Langevin dynamics is run, and a reser-
voir region.
In this way, atoms distant from the specific interactive sites in a
large macromolecular system can be effectively eliminated from
extensive analysis.
Uses This allows detailed studies of spatially localized portions of inter-
acting models. Enzyme–substrate interactions at the active site can
be effectively studied using this technique.

Multibody order-N dynamics


The Insight•MBOND module and the standalone MBOND/
CHARMm program (documented separately) is used for multi-
body order-N dynamics, in which models are substructured into a
set of interconnected rigid and flexible bodies, as well as atomistic
regions. The rigid bodies move as units, and the deformations of

Forcefield-Based Simulations/October 1997 233


5. Molecular Dynamics

flexible bodies are represented by sets of low-frequency compo-


nent modes. The atomistic regions are simulated by conventional
dynamics. This method is another way of eliminating unimportant
or uninteresting degrees of freedom and thus allowing a longer
timestep to be used.

Constraints during dynamics simulations


Constraints and restraints (Applying constraints and restraints) can
be selectively applied during a dynamics run to save computing
time and/or focus the simulation on more interesting parts of your
model. For example, atoms can be specified as fixed or movable,
and restraints can be applied in order to pull certain atoms
towards one another.
Improving computational In addition, certain types of constraints are applied only during
efficiency dynamics, to increase computational efficiency. This section
includes information only on constraints that are used only in
dynamics simulations:
The SHAKE algorithm
The RATTLE algorithm

Setting constraints during To access the controls used for setting up SHAKE or RATTLE
dynamics dynamics:
♦ In the Cerius2•Discover module, select the Run menu item in
the DISCOVER card to open the Run Discover control panel.
In this control panel, set the Task popup to Dynamics and click
the More… pushbutton to the right of the Task popup to open
the Discover Dynamics control panel. In this panel, check the
Rattle check box and click the More… pushbutton to its right
to open the Rattle control panel. Use the latter to specify
whether bonds and/or angles should be constrained.
♦ In the Insight•Discover_3 module, select the Calculate/
Dynamics command, toggle More on, and then toggle Rattle
on.

234 Forcefield-Based Simulations/October 1997


Constraints during dynamics simulations

Note
For full access to the RATTLE functionality, you need to use
BTCL statements. You can write out a command input file from
Cerius2 or Insight, edit the BTCL rattle command statement(s)
with a text editor, and then use that file for your run.
Alternatively (in Insight), select the Language_Control/
Command_Comment command. Set the Comment Type to
Command. Enter the desired rattle command statement(s) in
the Command/Comment entry box and select Execute.

♦ In QUANTA, select the CHARMm/SHAKE Options menu


item. Set the controls as desired and click OK.
Effect on timestep and Bond vibrations constitute the highest frequencies in the system
energy conservation and thus determine the largest timestep that can be used during
dynamics. If the bonds are constrained, longer timesteps can be
used during dynamics, because of the absence of these high-fre-
quency bond vibrations.
In a test run on crambin, a timestep as long as 3 fs could be used
with the RATTLE algorithm, but dynamics without RATTLE or
SHAKE can have a timestep of no more than 1 fs. Furthermore,
energy conservation for the 3 fs timestep with RATTLE was as
good as that at 1 fs without RATTLE or SHAKE.
Computational costs Although constraining bonds allows the timestep to be increased
without incurring significant overhead for carrying out RATTLE
iteratively, constraining angles does not really help to reduce the
computational cost, because it takes a long time for the iterative
procedure to converge. In some circumstances, it actually takes
much longer than without RATTLE or SHAKE.

Tip
We do not recommend that angle constraints be used even
though this functionality is available. However, if angle
constraints have to be used, you should use a larger tolerance
than with bonds.

The SHAKE algorithm


Availability The SHAKE method (Ryckaert et al. 1977), which is a popular algo-
rithm for introducing distance constraints during molecular

Forcefield-Based Simulations/October 1997 235


5. Molecular Dynamics

dynamics simulations, is used in CHARMm to constrain the har-


monic stretching of bonds.
Allowed constraints dur- SHAKE may be applied to all bonds or only to bonds containing
ing dynamics hydrogens, and to no angles, all angles, or only angles containing
hydrogens.

Tip
To maintain a high degree of accuracy during the dynamics
simulation, as well as to provide a larger integration time step,
it is best to apply SHAKE only to bonds containing hydrogens.
Effect on timestep A twofold increase in the time step (to 0.001 ps) is enabled.

The RATTLE algorithm


What is the RATTLE algo- Constraints can be applied during dynamics runs via the RATTLE
rithm algorithm, which is the velocity version of SHAKE. The RATTLE
procedure is to go through the constraints one by one, adjusting
the coordinates so as to satisfy each in turn. The procedure is iter-
ated until all the constraints are satisfied to within a given toler-
ance. Unlike SHAKE, RATTLE (Andersen 1983) makes sure that
velocities are adjusted also, to satisfy the constraints, and is suit-
able for use with the velocity version of Verlet integrators.
Allowed constraints dur- RATTLE can be used to constrain bonds, angles spanned by two
ing dynamics constrained bonds, as well as the distance between any pair of
atoms in periodic and nonperiodic systems. It can be used with the
constant-volume ensemble.

Note
For the time being, RATTLE cannot be used with constant-
pressure dynamics, since that involves calculating the pressure
on a molecule basis, which is not currently available.

Dynamics in fixed-geome- Another major use of RATTLE is to enable the use of a fixed-geom-
try water etry water model. With RATTLE, you can use the SPC, TIP3P, or
current water models. The initial configurations of the water mol-
ecules are set to the equilibrium geometry of the SPC or TIP3P
model or to the current geometry of water in your model.

236 Forcefield-Based Simulations/October 1997


Dynamics trajectories

Dynamics trajectories
What are trajectories? The results of dynamics simulations can be saved in trajectory
files, which are essentially a series of snapshots of the simulation
taken at regular intervals. Trajectories can include data such as
model structure, minimized-model structure (in quenched
dynamics, see Quenched dynamics), temperature, energies, volume,
pressure, cell parameters, and stress.
Uses Trajectory files can be used for analysis of the results and also for
continuing an interrupted dynamics simulation.
You can use trajectory files for computing the average structure
and for analyzing fluctuations in geometric parameters, thermo-
dynamic properties, and time-dependent processes. The time
series can be evaluated for global properties such as the radius of
gyration and the number density of the model. Examples of exper-
imental quantities that can be calculated using correlation func-
tions are frictional coefficients, IR line widths, fluorescence
depolarization rates, spectral densities (the Fourier transform of
the correlation function), and NMR relaxation times.
Animations You can also animate trajectories, to view how the model behaved
during a dynamics run.
Typically, the displayed model’s conformation is updated during
a run so you can monitor progress visually. The frequency with
which the display is updated can affect the time needed to perform
a dynamics simulation, especially with large models.
With the Cerius2•Dynamics Simulation module, some informa-
tion also can be displayed in the form of graphs that are updated
as the run proceeds.

General methodology for dynamics


calculations
To perform dynamics simulations, you need to know something
about the general strategy of all types of dynamics simulations,

Forcefield-Based Simulations/October 1997 237


5. Molecular Dynamics

and you need to know the general procedure for setting up a


dynamics run.
This section includes information on:
Stages and duration of dynamics simulations
Dynamics with MSI simulation engines
Restarting a dynamics simulation

Prerequisites One of the most important steps in any simulation is properly pre-
paring the system to be simulated. Calculations on the fastest com-
puter running the most efficient dynamics algorithm may be
worthless if the hydrogen is put on the wrong nitrogen or an
important water molecule is omitted.
Unfortunately, it is impossible to provide a single recipe for a suc-
cessful model—too much depends on the objectives and expecta-
tions of each calculation. Are conformational changes that are far
removed in conformational space from the area being focused on
expected or interesting? What is the hypothesis being tested? The
effects of specific dynamics variables, as well as tethering, fixing,
energy cutoffs, etc., on the results can be answered only by con-
trolled preliminary experiments.

Stages and duration of dynamics simulations


Dynamics simulations are usually carried out in two stages, equil-
ibration and data collection (or “production”). The duration of
each stage depends on the system as well as on the purpose of the
run.
Equilibration stage For the equilibration stage, you typically assign random velocities
to atoms in the model according to the Maxwell–Boltzmann distri-
bution around the desired target temperature. Temperature con-
trol during the equilibration stage is usually by direct velocity
scaling (Direct velocity scaling). Depending on your model and the
purposes of your simulation, you may bring the simulated system
up to the target temperature relatively quickly or in gradual steps.

238 Forcefield-Based Simulations/October 1997


General methodology for dynamics calculations

The purpose of equilibration is to prepare the system so that it


comes to the most probable configuration consistent with the tar-
get temperature and pressure.
When a system is large, it may take a long time to equilibrate
because of the vast conformational space it has to search. The con-
tour of the energy surface is another factor to consider. If energy
barriers between various local minima and the global minimum
are high, barrier crossing is difficult and may take longer.
Has equilibrium been One way to judge whether a model has equilibrated is to plot the
achieved? various thermodynamic quantities, such as energy, temperature
and pressure, versus time. When equilibration has been achieved,
these quantities fluctuate around their averages, which remain
constant over time. This is a necessary but not sufficient test,
because it is not unusual for a sudden conformational change to
occur after a long period of time.
Another way to check equilibration is to start the calculation with
different initial conformations and different initial velocities. Con-
vergence to similar conformations and properties from different
initial values is a good indicator that equilibration has occurred.
Production (data-collec- After equilibrating the system at the target temperature and pres-
tion) stage sure, you can begin the production stage, during which data and
statistics are collected. Temperature control during the production
stage is generally by a more gentle, realistic method than direct
velocity scaling (How temperature is controlled). You may use differ-
ent thermodynamic ensembles (Statistical ensembles) during the
equilibration and production stages.
Depending on the purpose of your study (Types of dynamics simu-
lations), you may run several data-collection simulations under
different conditions after a single equilibration run or stage.
How long should the simu- One commonly asked question is how long to collect the statistics.
lation be? The answer depends on both the properties being calculated and
the model being simulated.
Some quantities, such as internal energy, temperature, and pres-
sure, converge readily, since they are calculated from averages and
the dominant contributions are from the most probable states.
Other quantities, like specific heat or isothermal compressibilities,
are more difficult to obtain, since they depend on the fluctuations.

Forcefield-Based Simulations/October 1997 239


5. Molecular Dynamics

One empirical way to find out how long is long enough for a
dynamics simulation is to monitor the change in the desired quan-
tities over time.
The length of a trajectory needed to calculate a property depends
on the time variation of the property under consideration. If the
property is a slowly varying function, the dynamics integration
should be extended to cover several periods.
Characteristic durations for common events in real molecules are
listed in Table 23.

Table 23. Durations of some real molecular events

Event Approximate duration


Bond stretching. 1–20 fs
Elastic domain modes. 100 fs to several ps
Water reorientation. 4 ps
Inter-domain bending. 10 ps–100 ns
Globular protein tumbling. 1–10 ns
Aromatic ring flipping. 100 µs to several seconds
Allosteric shifts. 2 µs to several seconds
Local denaturation. 1 ms to several seconds

If a property varies randomly about a mean value with a decay


time of t and the simulation is run for length T, the variance in the
estimate is proportional to (t/T)0.5. When multiple independent
fragments are present (for example, each water molecule in a sol-
vent simulation), averaging can be done over the fragments to
improve sampling.

Dynamics with MSI simulation engines


Prerequisites To set up a dynamics run, first:

1. Choose the desired forcefield if you don’t want to use the


default forcefield (Forcefields).

240 Forcefield-Based Simulations/October 1997


General methodology for dynamics calculations

2. Set up the forcefield and prepare your model (Preparing the


Energy Expression and the Model).

3. Run a preliminary minimization


The model usually needs to be minimized (Minimization) to
remove strains that might cause abnormally large forces on
some atoms and therefore result in unrealistic dynamics simu-
lations.
Purpose of the run Next:
4. Specify items such as the dynamics algorithm(s) (Integration
algorithms), time step (The choice of timestep), and temperature-
and pressure-control methods (How temperature is controlled
and How pressure and stress are controlled) (unless you want your
calculation to run under the default conditions).
You generally run two dynamics simulations in sequence (or a
single two-stage dynamics run). The first run or stage is for
equilibrating the model under the desired conditions and the
second, for collecting data and statistics (see Stages and duration
of dynamics simulations. In Discover and CHARMm, these two
runs are generally both set up at the same time, by repeating
Steps 4 and 5 for each stage before starting the run.
Additional runs or stages may be required, depending on the
purpose of the simulation (see Types of dynamics simulations).
Depending on the type of run and the simulation engine, you
may be able to set up all stages at the same time, by repeating
Steps 4 and 5 for each stage before starting the run. If not, the
procedure for restarting runs that have ended is outlined under
Restarting a dynamics simulation.
Accessing dynamics con- To find the relevant controls in the molecular modeling pro-
trols grams:
♦ For the Cerius2•Dynamics Simulation module, go to the OFF
METHODS deck of cards and choose the DYNAMICS SIMU-
LATION card. Select the Run card item to access the Dynamics
Simulation control panel. You can access additional tools, such
as for setting variables for the dynamics methods and types, by
clicking any of several Preferences… buttons in this control
panel.

Forcefield-Based Simulations/October 1997 241


5. Molecular Dynamics

In addition, you may select Dynamics Controls on the


DYNAMICS SIMULATION card. These controls, however,
are typically left at their default values.
♦ In the Cerius2•Discover module, select the Run menu item in
the DISCOVER card to open the Run Discover control panel.
In this control panel, set the Task popup to Dynamics and click
the More… pushbutton to the right of the Task popup to open
the Discover Dynamics control panel. Use the controls in the
Equilibration and Production sections of this control panel to
set up the equilibration and data-collection stages of the simu-
lation.
Additional control over the equilibration stage is available by
clicking the More… pushbutton in the Equilibration section to
open the Equilibration Options control panel.
If you want a minimization stage to precede the equilibration
stage, check the Pre-Minimize check box. The associated
More… pushbutton opens the Discover Minimize control
panel (Minimization).
♦ In the Insight•Discover_3 module, select the Calculate/
Dynamics command. Toggle More on to access additional con-
trols. Set the controls as desired and select Execute for each
dynamics stage you want to include in your run. You may set
up quite complicated simulations by also using other com-
mands in the Calculate and Language_Control pulldowns.
Alternatively, if you want to run only a simple minimization
followed by dynamics, select the Strategy/Simple_Min_Dyn
command and change any desired parameters.
♦ In the Insight•Discover module, select the Parameters/Dynam-
ics command. Set the controls as desired and select Execute.
You may also want to set some other parameters with the
Parameters/Variables command.
♦ In QUANTA, select the CHARMm/Dynamics Option menu
item. You can use the Setup Heating, Setup Equilibration, and
Setup Simulation radio buttons to access dialogs in which you
set controls as desired for warming and equilibrating the sys-
tem and then running the data-collection stage of the simula-
tion. Alternatively, you can use the Setup Detailed Dynamics
button to access a dialog with additional controls.

242 Forcefield-Based Simulations/October 1997


General methodology for dynamics calculations

Discover and CHARMm offer additional functionality when


run in standalone mode. (How to run Discover and CHARMm
in standalone mode is documented separately—see Available
documentation.)

Specifying output 5. Specify the desired output:


♦ In Cerius2•Dynamics Simulation, click the Output… button in
the Dynamics Simulation control panel or select Output on the
DYNAMICS SIMULATION card to specify what information
to display during the simulation.
Click the Trajectory… button in the Dynamics Simulation con-
trol panel or select the Trajectory/Output menu item on the
DYNAMICS SIMULATION card to write out various types of
trajectory files.
♦ In the Cerius2•Discover module, select the Run menu item in
the DISCOVER card to open the Run Discover control panel.
In this control panel, set the Task popup to Dynamics and click
the Output… pushbutton to open the Discover Dynamics Out-
put control panel. You can specify output in the form of stan-
dard output, table, and/or trajectory files and specify the
frequency of output and the type of averaging.
♦ In the Insight•Discover_3 module, use the Analyze/Output
command (it may not be accessible until after you have exe-
cuted the Calculate/Dynamics command). Set the controls as
desired and select Execute. Do this for every stage for which
you want output in the form of standard output, table, archive,
and/or history files. You can periodically write frames to an
archive file with the Language_Control/File_Control com-
mand.
♦ In the Insight•Discover module, use the Run/Report and/or
Run/Files commands. Set the controls as desired and select
Execute.
♦ In QUANTA, you can specify output files with any of the dia-
logs accessed by selecting the CHARMm/Dynamics Option
menu item.
Discover and CHARMm offer additional functionality when
run in standalone mode.

Forcefield-Based Simulations/October 1997 243


5. Molecular Dynamics

Verifying the run instruc- 6. Review what you have requested for the run:
tions
Since dynamics simulations are often quite complex and time
consuming, you should review your run specifications before
starting the simulation.
♦ In the Cerius2•Discover module, select the Run menu item in
the DISCOVER card to open the Run Discover control panel.
In this control panel, click the Input… pushbutton to open the
Discover Input File control panel. Click the Save Strategy to
.inp File action button.
♦ In the Insight•Discover_3 module, select the Setup/List com-
mand, set List Options to Input_File, and select Execute. The
command input file is listed in the textport. Then you can read
the input file by using any text editor, issuing the UNIX more
command, or clicking the Edit .inp File action button.
♦ In the Insight•Discover module, select the Parameters/Dynam-
ics or the Run/Run command. Toggle the List option on and
select Execute to view the parameters you have set so far.
To view the complete command input file, select the Run/Run
command. Set the controls as desired, being sure that Run_
Dynamics is on and Computation Mode is set to Command
File, and select Execute. (Run_Minimization can be on or off,
depending on whether you want to specify that minimization
be performed before the dynamics equilibration stage.) Read
the input file with any text editor or the UNIX more command.

Starting a dynamics run Finally:


7. Start the dynamics run:
♦ In the Cerius2•Dynamics Simulation module, click the RUN
DYNAMICS button in the Dynamics Simulation control panel.
(If you are starting a new simulation. Restarting a dynamics sim-
ulation, be sure to click the Reset button in this control panel if
you need to reinitialize the atom velocities rather than continu-
ing from some previous run.)
♦ In the Cerius2•Discover module, click the RUN pushbutton in
the Run Discover control panel.
♦ In the Insight •Discover_3 module, execute the D_Run/Run
command.

244 Forcefield-Based Simulations/October 1997


General methodology for dynamics calculations

Alternatively, if you want to run only a simple minimization


followed by dynamics, select Execute in the Strategy/Simple_
Min_Dyn parameter block.
♦ In the Insight•Discover module, select the Run/Run command.
Set the controls as desired, being sure that Run_Dynamics is on
and Computation Mode is set to Interactive or Batch, and
select Execute. (Run_Minimization can be on or off, depend-
ing on whether you want minimization to be performed before
the dynamics equilibration stage.)
♦ In QUANTA, click CHARMm Dynamics in the Modeling
menu window.

Specific information For specific information on setting up and running dynamics with
the various MSI simulation engines, and on analyzing the results,
please see the relevant documentation (see Available documenta-
tion).

Restarting a dynamics simulation


You can continue a dynamics run without a break from the previ-
ous stage of the simulation, restart an interrupted run from where
it ended, or start a new dynamics run from a particular point in a
previous run.
When needed With the Insight•Discover_3 module, continuing a run from one
stage to the next (e.g., switching from equilibration to data collec-
tion or setting up complicated simulations) is straightforward: all
stages can be specified when you set up the simulation, before
starting the run. However, you may need to continue or restart a
run, for any number of reasons.
Cerius2•Dynamics Simulation and CHARMm enable stages only
for limited types of simulations, and Cerius2•Discover and
Insight•Discover allow you to specify only a simple two-stage
equilibrium and data-collection run (with or without preliminary
minimization): to set up some other type of simulation, you must
wait until a run has ended before setting up subsequent runs.
Read this section if you need to continue or restart a simulation
that has already ended.

Forcefield-Based Simulations/October 1997 245


5. Molecular Dynamics

Files and precautions The trajectory and other input file(s) that are required depend on
what simulation engine is being used, and the exact procedure
depends on whether you a continuing a run without a break from
the last conformation of an immediately preceding run, restarting
an interrupted run from the last conformation, or using an inter-
mediate point of an earlier run as the basis for a new simulation.
If you changed the model’s conformation (for example, by minimi-
zation) since your previous dynamics simulation ended, you need
to reassign the initial (random) velocities, since the old velocities
do not apply to the new coordinates. The Insight•Discover_3 and
Cerius2•Dynamics Simulation modules can detect this situation
and reinitialize velocities automatically.
You may need to analyze the data from a dynamics run before
doing a restart from other than its final conformation. You may, for
example, want to find and use the lowest-energy conformer as the
starting point for a new simulation or may want to restart a run
with a particular set of velocities.
Please see the relevant specific documentation (Available documen-
tation).
Restarting a dynamics Briefly, the general procedure with each simulation engine is:
simulation
♦ In the Cerius2•Dynamics Simulation module, specify the tra-
jectory file with the Dynamics Trajectory Input control panel,
which is accessed by selecting the Trajectory/Input item from
the DYNAMICS SIMULATION card. Use this control panel to
restart an interrupted run at its final conformation, to specify
the conformation number at which to begin a new run, and to
specify the type of data to use from the previous run.
To restart the simulation at the last step of an immediately pre-
ceding run, click the RUN DYNAMICS button in the Dynam-
ics Simulation control panel.
To start a new simulation from the end or some point before the
end of a previous run, specify a new dynamics method, algo-
rithm, output, etc., if desired (see Purpose of the run). Then click
the RUN DYNAMICS button.
To re-randomize the atomic velocities before starting a new run,
click the Reset button in the Dynamics Simulation control panel
before clicking the RUN DYNAMICS button.

246 Forcefield-Based Simulations/October 1997


General methodology for dynamics calculations

♦ In the Insight•Discover_3 module, select the Language_Con-


trol/File_Control command. Set Select File Type to Dyn_
Restart and File Operation to Retrieve. Select Execute to set up
a stage to read the dynamics restart file.
If you want to start the new simulation from some point before
the end of a previous run, next set Select File Type to Archive_
File (or History_File) and set the Frame No to a constant or a
variable. (A variable can be can be initialized with the
Language_Control/Command_Comment command and used,
for example, in a Loop statement set up with the Language_
Control/Looping_Control command.) Select Execute to set up
a stage to control reading frame(s) from the desired file.
Specify any other dynamics parameters if desired, as outlined
above in Steps 4 and 5. Velocity (in the Dynamics Calculate
parameter block) should be set to Current if you want to con-
tinue or restart the previous run. Set it to Create if you want to
re-randomize the atomic velocities before starting a new run.
Review the command input file and start the run as outlined in
Steps 6 and 7.
♦ With the Cerius2•Discover or Insight•Discover module, you
need to use a text editor to prepare an appropriate command
input file for continuing, starting, or restarting a run. Also make
sure that all required files are in the current run directory.
In Cerius2•Discover, use the prepared command input file by
selecting the Run menu item in the DISCOVER card and click-
ing the Input… pushbutton in the Run Discover control panel.
Then select the desired file using the browser controls in the
Discover Input File control panel and click the Run .inp File
action button.
In Insight•Discover, use the prepared command input file by
selecting the Run/Run command, toggling Strategy on, and
entering that file’s name in the Input_File_Name entry box.
Select Execute.
♦ In QUANTA, select the CHARMm/Dynamics Option menu
item. All the dialog boxes accessed via the radio buttons allow
you to start the simulation from the beginning or restart it from
the restart file. You can also use the CHARMm/Settings/
Restore menu item to reload a CHARMm setup file.

Forcefield-Based Simulations/October 1997 247


5. Molecular Dynamics

Discover and CHARMm offer additional functionality when


run in standalone mode. (How to run Discover and CHARMm
in standalone mode is documented separately—see Available
documentation.)

248 Forcefield-Based Simulations/October 1997


General methodology for dynamics calculations

Forcefield-Based Simulations/October 1997 249


5. Molecular Dynamics

250 Forcefield-Based Simulations/October 1997


6 Free Energy

Who should read this Forcefield-based calculation of relative or absolute free energy
chapter should be considered an “expert” application. We recommend
using the FDiscover program for relative and absolute free energy
calculations. Although it is possible to run relative free energy cal-
culations with CHARMm, that program is slower and less flexible.
This chapter explains Relative free energy—theory and implementation
Absolute free energy

Relative free energy — theory and


implementation
This section includes Finite difference thermodynamic integration (FDTI)
Relative free energy—methodology

To compare chemically distinct systems, the FDiscover program


can be used to calculate the relative free energy difference between
chemically unique species. This approach uses the finite difference
thermodynamic integration (FDTI) algorithm of Mezei et al.
(1987). Since FDTI does not require analytical derivatives of the
Hamiltonian with respect to the coupling parameter, it is more
suited for the complex coupling formalisms required in chemical
perturbations. It also has been shown to possess better conver-
gence properties than other methods.

Finite difference thermodynamic integration (FDTI)


FDTI combines aspects of both the perturbation method (PM) and
thermodynamic integration (TI) to improve the convergence and

Forcefield-Based Simulations/October 1997 251


6. Free Energy

accuracy of free energy calculations. A review of the properties of


the perturbation method helps to appreciate the advantages of the
FDTI approach.
Perturbation method (PM) The free energy A is related to the partition function Q by the equa-
tion:

A = – k B T ln Q Eq. 114

If Q1 and Q2 are the partition functions for States 1 and 2, the dif-
ference in free energy between these states is:

Q2
∆A = A 2 – A 1 = – k B T ln ------ Eq. 115
Q1

Defining Ei(p,r) as the energy function corresponding to state i, the


ratio of partition functions is related to the expectation value of
exp[–(E2 – E1)/kBT] by:

∫ ∫ exp[ –( E – E )/k T ]P drdp


Q2
------ = 2 1 B 1
Q1 Eq. 116

where P1 is the Boltzmann probability function for State 1:

exp [ – E 1 /k B T ]
P 1 = ----------------------------------- Eq. 117
Q1

This can be expressed more compactly in bracket notation as:

Q2
------ = 〈 exp [ – ∆E/k B T ]〉 1 Eq. 118
Q1

where the subscript 1 indicates that State 1 is considered the refer-


ence state (that is, the energy difference ∆E is computed relative to
an ensemble of structures for State 1). Thus, the ratio of the parti-
tion functions can be computed from an ensemble average of the
energy difference between a reference state and a perturbed state.
The free energy change is then given directly as:

252 Forcefield-Based Simulations/October 1997


Relative free energy—theory and implementation

∆A = A 2 – A 1 = – k B T ln 〈 exp [ – ∆E/k B T ]〉 1 Eq. 119

This approach is accurate only when the energy of the initial and
final states differs by a small amount, on the order of 2 kBT. Larger
energy differences than this lead to such small values for the expo-
nential term that the statistical uncertainty overwhelms the
observable. For small changes, this method is very attractive
because it calculates the complete free energy difference in one cal-
culation.
To calculate free energy differences larger than 2 kBT, several
sequential runs are performed, each computing the free energy
change over a subinterval. Errors introduced by failure to con-
verge at each step are propagated—each successive calculation
adds its contribution to the previously accumulated total.
FDTI Thermodynamic integration of the relative free energy assumes
that the free energy change can be expressed as an integral:

∫ -------------
δA(λ)
∆A = dλ
δλ
0

Assuming that a suitable coupling parameter λ can be found,


which adequately describes a continuous conversion between the
two states, the above equation can be integrated numerically. FDTI
employs the perturbation formalism to numerically compute the
derivatives of the free energy function with respect to the coupling
parameter. Using Eq. 119, it is possible to compute the change in
free energy (∆Ai) for a perturbation δλ away from the ith λ point (λi
± δλ):

∆A i = A(λi ) – A ( λ i ± δλ ) = – k B T ln 〈 exp [ – ( E(λ i) – E ( λ i ± δλ ) )/k B T ]〉 i Eq. 120


Computing ∆Ai for many different values of λ spanning the inter-
val from 0 to 1, dividing each ∆Ai by δλ, and then numerically inte-
grating over the interval, the total free energy change ∆A can be
estimated. Mathematically, this is summarized as:

Forcefield-Based Simulations/October 1997 253


6. Free Energy

k
ln 〈 exp [ – ( E(λi ) – E ( λ i ± δλ ) )/k B T ]〉i
∆A = – k B T
∑ --------------------------------------------------------------------------------------------- ∆λ i Eq. 121
δλ
i=1

where k is the number quadrature points in the numerical integra-


tion. Notice that ∆Ai ⁄ δλ can be computed both for a forward (+δλ)
and a backward (–δλ) perturbation. Computing both at the same
time takes no more computer time and is a measure of conver-
gence (both should be equal for suitable values of δλ). The Dis-
cover program averages the two values for the instantaneous
value of δA ⁄ δλ at λi.
Method used in FDiscover The value of ∆λi in Eq. 121 for a given i depends on the numerical
integration scheme used. For example, a simple trapezoidal rule
approach would make each ∆λi equal. More sophisticated integra-
tion methods may allow each ∆λi to be different. The FDiscover
program uses a Gaussian–Legendre quadrature method (Press
1986) that chooses the values needed, given the total number of
intervals specified. Hence, you never explicitly specify the λ inter-
vals.
Advantages The advantages of FDTI are:
♦ Unlike the perturbation method (PM), large changes in free
energy can be calculated in fewer steps.
♦ Unlike thermodynamic integration (TI), analytical derivatives
of the Hamiltonian with respect to the coupling parameter are
not needed.
♦ As has been suggested by Mezei, FDTI may converge faster
than either TI or PM.
Incorporation of the cou- The first step in any free energy calculation is parameterizing the
pling parameter λ energy function to provide a continuous change between the ther-
modynamic states that are being compared. The relative free
energy function as implemented in the FDiscover program is
parameterized at the level of the forcefield parameters themselves.
For example, the energy E of a harmonic bond as a function of the
bond length b is:

E(b) = K ( b – b 0 ) 2 Eq. 122

254 Forcefield-Based Simulations/October 1997


Relative free energy—theory and implementation

where K is the force constant and b0 is the reference bond length.


In a free energy calculation E(b) may be different for the initial A
and final B states (i.e., each state has different force constants and
reference bond lengths). The function is reformulated as a function
of the coupling parameter:

2
E(b ;λ) = λK A
+ ( 1 – λ )K
B
b–
A
[ λb 0 + (1 –
B
λ )b 0 ] Eq. 123

As λ changes from 0 to 1, the bond description gradually changes


from a State A bond to a State B bond.

Relative free energy—methodology


This describes the steps involved in performing an in vacuo calcu-
lation to solve the free energy difference between methanol and
ethane.
Background of example A seminal free energy calculation reported by Jorgensen and Ravi-
problem mohan (1985) showed that free energy perturbation techniques
seem to accurately predict the difference in solvation energy
between methanol and ethane. This calculation has since been
repeated in several laboratories, using independent programs and
forcefields, and has become a de facto benchmark for free energy
calculations (Singh et al. 1987; Fleischman and Brooks 1987).
Construction of the ther- Consider the following thermodynamic cycle for the solvation of
modynamic cycle ethane and methanol:

∆A 1

CH 3 CH 3(g) → CH3 CH 3(aq)

∆A 3 ↓ ↓ ∆A 4
∆A 2
CH 3 OH ( g ) → CH3 OH ( aq )

Forcefield-Based Simulations/October 1997 255


6. Free Energy

This cycle requires that:

∆A 4 – ∆A 3 = ∆A 2 – ∆A 1 Eq. 124

The right side of this equation is the difference in solvation free


energy that can be measured experimentally. However, the left
side is more convenient to compute (making water appear around
a solute is a much larger perturbation than changing a methyl
group into a hydroxyl group). Either side can be chosen, because
both sides are equal.
If there is no difference in intramolecular energy between ethane
and methanol, then ∆A3 is zero and the gas phase calculation can
be ignored. Although this is intuitively incorrect, a suitable force-
field could treat the intramolecular energy of ethane and methanol
identically. For example, if the nonpolar hydrogens were not rep-
resented explicitly, no atom pairs would have more than two inter-
vening bonds. Most forcefields calculate intramolecular nonbond
interactions only for atom pairs separated by more than two inter-
vening bonds. Furthermore, if the solutes are treated as rigid mod-
els, the energy would not be affected by deformations in bond
lengths or angles.
In this exercise all hydrogens are explicitly included. In addition,
variations in degrees of freedom are allowed for both the solvent
and solute molecules.
Including all degrees of freedom requires both a vacuum and a sol-
vent calculation. The computational penalty for an additional gas
phase calculation is minor; the solvent calculation is two orders of
magnitude more intensive. Moreover, because setting up the free
energy calculation is virtually identical for both calculations, the
gas phase calculation provides a tractable example that yields
experience and confidence in performing free energy calculations
without wasting an inordinate amount of computer time.
Defining the λ = 0 and λ = To set up the calculation, you must first design the chemical per-
1 states turbation for methanol going to ethane. You must first choose
which of these models corresponds to the λ = 0 state. The λ = 0
state must always have enough atoms to accommodate both
molecular extremes, even if invisible “dummy” atoms must be
added as placeholders for atoms that appear as the perturbation
proceeds.

256 Forcefield-Based Simulations/October 1997


Relative free energy—theory and implementation

Ethane is the logical choice for the λ = 0 state, because methanol


can be completely subsumed within ethane. However, to illustrate
that the choice is arbitrary, the calculation in the opposite direc-
tion, for methanol going to ethane, is described.
The λ = 0 state is provided to FDiscover as would any model for
simulation, in ordinary coordinate (.car) and molecular data (.mdf)
files. In the FDiscover command input file (.inp), you can specify
which atoms are being perturbed and their limiting states with the
Warp command. The limiting states are defined completely by the
potential atom type, the partial atomic charge, and the mass of
each atom.
Analyzing the output of a The free energy summary table printed out at the end of the com-
methanol-to-ethane free pleted free energy loop (after the last resume) contains the com-
energy calculation plete history. Throughout the run, a partial summary table is
output after each resume and contains all the available data.
Assessing convergence Graphing the Boltzmann factor [exp –(∆E ⁄ kT)] vs. time aids in
assessing convergence during a given calculation. Systematic drift
in exp –(∆E ⁄ kT) over time is symptomatic of nonconvergence.
To simplify analysis, various quantities are output during free
energy calculations to a file with the extension .tot. Figure 32 plots

1.1
exp (–∆E/kT)

1.0

0.9 λ = 0.056
λ = 0.44
λ = 0.94
0.8
10 12 14 16 18 20
Time (ps)
Figure 32. The ensemble-averaged quantity from Eq. 120 plotted vs.
time for three values of λ
Each point plotted is actually an average of 10 points sampled over 100 fs.

Forcefield-Based Simulations/October 1997 257


6. Free Energy

Column 5 of the .tot file [exp –(∆E ⁄ kT)] for the first, third, and last
λ points for the methanol-to-ethane gas phase calculation. Note
that there is a slight upward drift in the curve for the first λ point,
although the others seem to have converged.

Absolute free energy


This section includes Theory and implementation
Example: Fentanyl
Analysis of results

Theory and implementation


This section describes a technique for convenient, precise, and
numerically efficient determination of absolute free energies of
stable or unstable constrained model conformations. The tech-
nique of absolute free energy is general and can be applied in a trans-
parent manner to systems in a vacuum or in solution, under any
conditions of volume and/or temperature.
This approach is a special case of the thermodynamic integration
approach to free energy calculations, which is itself a general
method for computing the change in free energy upon going from
one thermodynamic state to another. Absolute free energy simply
constrains one of these states to be a model system for which the
absolute free energy is known analytically. By integrating from a
known, albeit model, state to the final real state, the absolute free
energy becomes the sum of the numerically computed thermody-
namic integration step and the analytical absolute free energy of
the model state.
Uses of absolute free Evaluating absolute free energies for particular conformations is
energy calculations an important goal for several reasons:
♦ Absolute free energy values of different thermodynamic states
can be compared directly without having to devise a transition
pathway between them, as is necessary in relative free energy
calculations.

258 Forcefield-Based Simulations/October 1997


Absolute free energy

♦ The poor convergence properties associated with reversible


structural changes in relative free energy calculations can be
avoided.
This section includes a description of the essential characteristics
of the absolute free energy method, including a derivation of the
ideal solid model and the thermodynamic integration method.
Derivation for ideal sys- The absolute free energy algorithm depends on defining a model
tems for which the partition function in Eq. 114 can be derived analyti-
cally. The model implemented in the FDiscover program is an
ideal solid. That is, the atoms in the system are constrained har-
monically to a lattice (analogous to a solid) and do not interact
with each other (analogous to the familiar ideal gas). The Hamilto-
nian for such a system is:

∑ 2m
1 1
H Ideal Solid = --------
- --------- ( p 2
xi + p y2 + p z2 )
h 3N i i i

i
Eq. 125
N


0 2 0 2 0 2
+ Ki ( xi – xi ) d ( yi – yi ) + ( zi – z i )

where the first summation is simply the kinetic energy (including


a reciprocal of Planck’s constant for each degree of freedom, a
quantum effect), and the second is a harmonic function constrain-
ing each atom to a corresponding lattice point (xi0, yi0, zi0) with a
force constant of Ki. Note that there are no terms for interactions
between particles. This simplification is what makes an analytical
solution possible. Substituting this Hamiltonian into the partition
function gives:

N
3⁄2
2m i πk B T 2
Q =
∏ ---------  -------------
Ki  h 
Eq. 126

i=1

Returning to Eq. 114, the free energy of the model ideal solid can
be written as:

Forcefield-Based Simulations/October 1997 259


6. Free Energy

N
2
2m i  πk B T 

3
A ideal solid = – --- k B T ln ---------  --------------- Eq. 127
2 Ki  h 
i=1

A remarkable result of this formula is that it does not depend on


coordinates—neither the model coordinates nor the lattice site
coordinates. It depends only on the mass of the atoms and the
force constant used for the harmonic constraint. This property has
important practical implications for free energy calculations, mak-
ing it possible to choose whatever set of lattice site coordinates
gives the best convergence properties for the subsequent thermo-
dynamic integration step.
Thermodynamic As shown above, integrating the partition function analytically
integration—derivation may be possible for simple Hamiltonians. However, for more real-
for real systems istic systems that include many nonbond and bond interactions
between atoms, an analytic solution is impossible. As in the rela-
tive free energy calculation (see FDTI), we use thermodynamic
integration to determine the change in free energy:

∫ -------------
∂A ( λ )
A1 – A 0 = dλ Eq. 128
∂λ
0

Substituting the equation for the free energy (Eq. 114) into Eq. 128,
we can write the difference in free energy as a function of the par-
tition function Q(λ):

1 1
∂[ – k B T ln Q(λ) ]
∫ ∫
1 ∂Q(λ)
A1 – A0 = ---------------------------------------- dλ = – k B T ----------- -------------- dλ Eq. 129
∂λ Q(λ) ∂λ
0 0

Without defining explicitly how the Hamiltonian depends on λ,


we can write the difference in the energies as:

260 Forcefield-Based Simulations/October 1997


Absolute free energy

∫ -------------------------
∂H(p, r, λ)
A1 – A0 = dλ Eq. 130
∂λ
0

The problem of computing a free energy difference is thus simpli-


fied to that of computing the expectation value of a derivative of
the Hamiltonian. Since an expectation value is, according to Gibbs’
postulate (an axiom of statistical mechanics), the ensemble average
of the quantity, it is easily computed as the average of that quantity
over a suitably equilibrated set of snapshots of the system.
These snapshots are typically generated from either a molecular
dynamics or a Monte Carlo simulation. Thus, appropriately aver-
aging the results of a molecular dynamics trajectory enables
Eq. 130 to be evaluated. This is far easier than calculating the gen-
eral partition function of Eq. 129.
Eq. 130 is perfectly general for any classical system and is the fun-
damental equation of all thermodynamic integration methods.
However, we have not yet described what the Hamiltonian is or
how it can be parameterized in terms of λ. The Hamiltonian could
be parameterized in an infinite number of ways. Because a func-
tion of state (the free energy) is being integrated, the path between
the end points is arbitrary. For absolute free energy calculations, it
turns out that the simplest form is perfectly adequate.
What is the Hamiltonian? The parameterization scheme adopted by the FDiscover program
is straightforward. The Hamiltonian is defined as a linear combi-
nation of the two potential energy functions that describe the
extreme states:

H(λ) = K(p) + ( 1 – λ )V 0 + λV H Eq. 131

Here, K(p) is the kinetic energy (because the two states are com-
pletely defined by their respective potential energy functions, the
kinetic energy does not have to be coupled to λ), V0 is the normal
potential energy function (including bonds, angles, torsions, etc.),
and VH is a harmonic oscillator site potential given by:

Forcefield-Based Simulations/October 1997 261


6. Free Energy

∑ K (r – r )
0 2
VH = i i i Eq. 132

Parameterization of the where Ki is the ith atom’s spring constant, ri is the ith atom’s instan-
Hamiltonian taneous coordinate, and ri0 the reference lattice of the noninteract-
ing atoms (often referred to as an Einstein solid). This choice of
parameterization is motivated by the template-forcing concept in
the FDiscover program and by previous free energy evaluations
(Hoover and Ree 1967). In Eq. 131, λ is greater than 0 and less than
1, and describes an energy–space path between a system described
by an unadulterated molecular forcefield (λ = 0) and one repre-
sented by independent harmonic oscillators (λ = 1).
Eq. 132 is thus the reference system for which an absolute free
energy can be directly calculated. The potential VH restricts the
exploration of phase space to a region defined by atomic mean-
squared displacements relative to the Einstein solid. Conse-
quently, the choice of both the reference state and spring constants,
in general, affects the calculated free energies. For most cases, the
energy-minimized coordinates of a mechanically stable structure
are used for the reference Einstein solid. Using Eq. 127 for the free
energy of the Einstein solid A1 and Eqs. 130 through 132, the abso-
lute free energy of the real state is:
Absolute free energy of 1

∫ ∂ λ[ K(p) + ( 1 – λ )V + λH ] dλ
the real state ∂
A0 = A1 – 0 H

0
1

= A1 +
∫ V –V
0 H dλ

Eq. 133

Computational consider- The simple form of Eq. 133 makes evaluating its ensemble average
ations and precautions straightforward. A comprehensive dynamics trajectory for a given
value of λ represents an ensemble of structures for that λ. By com-
puting and averaging the values of (V0 – VH) for each structure in
this ensemble, the integrand can be numerically estimated for any
given value of λ. By performing several such calculations for many

262 Forcefield-Based Simulations/October 1997


Absolute free energy

values of λ between 0 and 1, eventually the function can be numer-


ically integrated.
Several practical simulation considerations are apparent from Eqs.
131 and 133.
For example, when λ is close to zero, the calculated ensemble of
structures includes configurations far from the reference state (the
ensemble is generated according to Eq. 131). Therefore, fluctua-
tions of VH are large when VH is evaluated for this ensemble.
Conversely, when λ is close to one, the structures generated are
minimally influenced by the real forcefield. These ensembles can
include structures with distorted bonds, angles, and even overlap-
ping atoms, leading to large fluctuations in V0 in Eq. 133. Large
fluctuations of (V0 – VH) decrease the precision with which its
integral can be calculated. The overall effect is to destabilize the
integral and, in extreme cases, cause it to diverge.
These undesirable effects can be minimized by reasonable choices
of reference structures, spring constants, and integration algo-
rithm. For example, the FDiscover program uses a Gaussian–Leg-
endre quadrature algorithm to integrate Eq. 133. This has the
useful property that one need not evaluate the function at the
boundaries (where the divergence is worst).
Choice of the reference The choice of the reference state is the most critical step in an abso-
state lute free energy calculation. The reference state determines where
the sampling of configuration space is centered. Choosing refer-
ence states that are only slightly different (for example, 0.1 Å rms
in coordinates) can change the free energy significantly if the ref-
erence state has any residual strain.
Best results are obtained when the reference state represents a min-
imum-energy conformation on the normal energy surface. If the
reference structure is a minimum for both V0 and VH, there is no
impetus for the conformation to wander away, which, as described
above, is the primary cause of divergence.
For non-minimum configurations such as the saddle point c in
Figure 34, excess strain can be removed by minimizing with tor-
sional restraints (see Example: Fentanyl).
Limitations The absolute free energy technique is primarily used to evaluate
the free energy of different conformations of the same model. This
is particularly difficult to do by perturbation methods alone, since

Forcefield-Based Simulations/October 1997 263


6. Free Energy

a path from one conformation to another may be difficult to con-


struct and model (for example, a path converting an α-helix into a
β-sheet). Although free energy methods are path independent
(because what is being evaluated is the difference between ther-
modynamic state functions), the integration through the path
must be performed reversibly, which is problematic for large struc-
tural rearrangements.

Example: Fentanyl
This example presents a calculation of the free energies of various
conformational states of fentanyl (Figure 33). Previous studies of

C1
N3 C5
C2 C4

φ1 = C1–C2–N3–C4
φ2 = C2–N3–C4–C5
Figure 33. Definitions of φ1 and φ2 for fentanyl

fentanyl have examined the conformational behavior around the


anilido nitrogen and identified two main classes of energy minima
in the 2D energy map (Tolleaneare et al. 1986). Figure 34 shows
such a map, generated by contouring the energies obtained from a
flexible-geometry minimization study performed with the Dis-
cover program. These minima are separated by approximately 10
kcal mol-1 energy barriers and differ from each other by about
1 kcal mol-1. The free energy of the states represented by the min-
ima and the barriers between them will be calculated by the abso-
lute free energy method.

264 Forcefield-Based Simulations/October 1997


Absolute free energy

100 a b

c
φ2 0

1
0 24 8 2
-100
6 8 64
10

-100 0 100
φ1
Figure 34. Two-Dimensional energy map of fentanyl as a function of the
two dihedral angles defined in Figure 33
Locations a and b correspond to the two unique minima, and c is at a transition
state.

The reference state has two roles in the free energy calculation. A
trivial contribution is the free energy it contributes as an ideal
solid. This contribution is trivial because it does not depend on the
conformation of the reference structure. It depends only on the
masses of its atoms and the spring constants constraining each
atom to the lattice (see Eq. 127).
The nontrivial role is that it determines which part of configura-
tional space is sampled during dynamics. In this example, the free
energies of states a, b, and c (Figure 34) of fentanyl are compared.
The reference states for a and b correspond to completely mini-
mized structures at these points. Point c in Figure 34 represents the

Forcefield-Based Simulations/October 1997 265


6. Free Energy

free energy at a barrier. Obviously, it would be incorrect to mini-


mize this reference structure—it would simply roll down into one
of the adjacent wells. The lowest-energy structure at the saddle
point can be constructed by forcing the dihedral angles to adopt
the values of the saddle point (using the force torsion command in
the standalone version of the FDiscover program, or the Con-
straint/TorsionForce command in the Insight version) while min-
imizing the rest of the model (this, of course, is how the energy
map in Figure 34 was generated in the first place).
It is important to choose the lowest-energy structure consistent
with a given conformational state, so that the free energy calcu-
lated does not contain excess enthalpy. Constraining the sampling
of configurational space to a rather small region localized around
the reference structure limits the distribution of energies that is
sampled. Moving the reference structure to a slightly higher-
energy conformation would sample different energies and change
the total free energy.
Consider Figure 35, where the history of φ1 and φ2 is superim-
posed on the φ1 ⁄ φ2 energy map for one of the λ ensembles. Note
how only the local region around the reference structure is sam-
pled. While this may be unsettling (what is the meaning of a free
energy that does not represent all of configurational space),
remember that you are interested in the difference in free energy
between closely related states (a, b, and c differ mainly by rotation
of two dihedral angles). To achieve this resolution, the sampling
must be restricted accordingly.
Hints for setting up the In the Insight version of the FDiscover program, an absolute free
absolute free energy cal- energy calculation is set up using the Parameters/Absolute com-
culation mand. The system is first minimized, then the atoms are tethered
to the minimized locations. The Parameters/Minimize and
Parameters/Dynamics commands can be used to control the initial
minimization, as well as the dynamics used to sample the confor-
mational space.
In the standalone version, the reference state is defined in the FDis-
cover input file (.inp), with all atoms being tethered. Using the
tether command allows you to use different reference structures
easily. Once the tethered atom list is generated, the model could
undergo dynamics to randomize the starting point just prior to
starting the free energy calculation.

266 Forcefield-Based Simulations/October 1997


Absolute free energy

100

φ2 0

1
0 24 8 2
-100 6 8 64
10

-100 0 100
φ1
Figure 35. Time history of the dihedral angles φ1 and φ2 superimposed on
the 2D energy map of Figure 34
Two trajectories are plotted, one for a minimum corresponding to state a in
Figure 34, the other for the transition state (c). Note that only a small fraction of
the conformational space of the model needs to be sampled to estimate the
free energy of a conformer. Each trajectory corresponds to the first λ value, i.e.,
when the potential function is most like the standard potential.

There is an important exception to this rule. Once a free energy cal-


culation is initiated, the reference structure is saved into a coordi-
nate file with the extension .cli. Thus, a permanent record is
maintained of the reference state for each free energy calculation.
This file is used by the FDiscover program to restart free energy
calculations that may have been interrupted. The file format is
identical to the .car and .cor files so that they can be used inter-
changeably.

Forcefield-Based Simulations/October 1997 267


6. Free Energy

One important property of the .cli file is that, if the file exists when
the free energy calculation begins (for example, from an earlier or
different run), the reference state is read from the file regardless of
any prior tethered atom list generated. This does not mean that if
a .cli file is provided, a tethered atom list command is not needed.
Rather, if the .cli file is present, no matter what conformation is
used for the tethered atom list command, the actual reference
structure used is that in the .cli file. Be aware then, when doing
new free energy calculations requiring different reference states
that any existing .cli files that would override the internally gener-
ated reference state must be removed.
The magnitude of the spring constants in Eq. 132 indirectly affects
the free energy calculated by controlling the volume of phase
space being sampled during dynamics. Large spring constants
bias the sampling to the immediate neighborhood of the reference
structure. Smaller values allow the model to sample more space.
The spring constants also affect the error associated with a calcu-
lation. Ensembles that contain structures far from the reference
state make greater contributions to the free energy, by virtue of the
greater fluctuations in (V0 – VH) for this ensemble.
The FDiscover program provides two methods for assigning
spring constants:
1. All atoms are assigned the same value. You may want to assign
large Ki values when you are studying a mechanically unstable
state, to constrain dynamics to phase space near the reference
structure. In the FDiscover standalone version this choice is
specified by including the keyword assigning in the free
energy command along with the specification of a spring con-
stant. In the Insight environment, the force constant is con-
trolled by changing the Spring Constant parameter in the
Parameters/Absolute command.
2. Spring constants can be estimated from the definition of mean-
squared displacements under a harmonic potential. In this case,
the following expression is used:

kB T
K i = ----------------- Eq. 134
∆x 2 0

and the default is for each atom to receive a unique value of Ki.

268 Forcefield-Based Simulations/October 1997


Absolute free energy

Method 1 is useful for studying mechanically unstable states, for


example, the extended state of decaglycine. Method 2 is more
appropriate for studying mechanically stable states, for example,
the energy-minimized α-helix or hairpin structures of decaglycine.
Typical values of the spring constants range from 1 to 50 kcal mol-1
Å-1.
If possible, you should let Method 2 choose the spring constants
automatically. However, if the conformational state is not in a local
minimum, you must explicitly specify a spring constant. An esti-
mated value corresponding to a nearby minimum is a good start-
ing point. The error associated with an explicit choice should be
monitored, as well as the configurational space sampled for this
calculation. Error analysis is discussed under Analysis of results.
Another choice that has to be made is how many λ intervals to use
in the integration. This of course depends ultimately on the level
of precision needed and the behavior of the integral and cannot be
anticipated for all cases.
For small flexible models such as fentanyl, using a medium spring
constant (50 kcal mol-1 Å-1), the use of 10 intervals allows a reason-
ably behaved integral to be estimated to with a statistical error of
± 0.5 kcal mol-1.
For careful work, the behavior of the integral (that is, how (V0 –
VH) varies as a function of λ) must be examined. Rapid changes in
(V0 – VH) are indicative of systematic errors. Some regions of the
integral may need to be computed for longer times and with more
intervals, to achieve the desired accuracy.
The number of λ intervals typically ranges from 4 to 12. If adequate
precision cannot be achieved with 12 intervals, look for and correct
systematic factors destabilizing the integral (i.e., Are the spring
constants too weak? Is the reference state at too high an energy? Is
the system completely equilibrated?).

Analysis of results
Available output Although a free energy calculation is done as a single FDiscover
run, the calculation actually consists of several independent cycles
of dynamics equilibration (initialize dynamics) and data collec-
tion (resume dynamics)—as many cycles as λ intervals. The

Forcefield-Based Simulations/October 1997 269


6. Free Energy

results of these dynamics calculations are stored independently for


each λ value, in three files.
The standard output file (extension .out) is the most important.
Among other things, it includes a history of average energy values
(potential and kinetic) during each dynamics cycle. Data needed
for the actual free energy calculation are collected only during the
resume dynamics phase. The actual free energy results are in the
SUMMARY OF FREE-ENERGY CALCULATION table included
at the end of each resume dynamics output. The total free energy
can be calculated only at the conclusion of all the λ cycles. How-
ever, the incomplete table is output at the end of each λ interval as
an intermediate result.
A second output file with the extension .tot contains only the
instantaneous values of the kinetic energy plus the parameterized
potential energy for the entire run (all values of λ). The frequency
of output into this file is controlled by the sampling option in the
absolute free energy command and is, by default, output every
step. The .tot file is particularly convenient as input to plotting and
statistical analysis programs (such as SAS or RS1).
The third source of analysis information is the dynamics history
file (extension .his). This file contains all the coordinates and veloc-
ities for each λ run. Because a free energy calculation consists of
several λ cycles, each with its own initialize command, there can
be several history files. To allow for several history files with the
same root name, an option has been added to the initialize com-
mand to change the extension from .his to a number (e.g., FENTA-
NYL.1, FENTANYL.2, etc.). This is specified by including the
keyword save in the initialize command. If save is not included,
the Discover program keeps only the history file corresponding to
the last λ run.
Assessing and minimizing A major source of systematic error in these calculations is lack of
statistical and systematic convergence (that is, failure to equilibrate long enough to achieve
errors thermodynamic equilibrium at each λ value) and insufficient sam-
pling of configurational space. Other sources of systematic error
include inaccuracies in the force field (both in the functional form
and the parameters) and quantum mechanical effects.
Random errors are a natural consequence of free energy calcula-
tions. The statistical distribution of states available to a molecule at
a given temperature is precisely what defines its entropy. Measur-

270 Forcefield-Based Simulations/October 1997


Absolute free energy

ing entropy is an inherently statistical process that can be quanti-


fied with standard random error analysis procedures. Statistical
errors are calculated for the average of (V0 – VH) at each λ value.
Systematic error/conver- Careful analysis of the random errors reported by the FDiscover
gence program helps point to likely systematic errors. For example,
rapid changes in error as a function of λ could be caused by a fail-
ure to completely equilibrate.
Although repeating the entire run with longer equilibration and
sampling times is an obvious solution, it is not a particularly effi-
cient one. The integral computed under both short and long con-
ditions may be indistinguishable up to λ = 0.8. It is wasteful to
compute all intervals at the worst-case level.
For such situations, the absolute free energy command in the
standalone version of the FDiscover program allows you to per-
form a free energy calculation over just a part of the interval. If you
are using the Insight interface to the FDiscover program, the
parameters Lower_lambda, Upper_lambda, and Quadrature
Points in the Parameters/Absolute command control the points
sampled, allowing you to break the calculations into ranges.
You could perform two independent free energy calculations. The
first calculation would use short equilibration and sampling times
to compute the free energy from λ = 0 to 0.8. The second calcula-
tion would compute the free energy from λ = 0.8 to 1.0 with longer
equilibration and sampling times.
The total free energy is then just the sum of the two. Note that,
when summing the free energies from these two runs, the ideal
solid contribution must be included only once. Mathematically,
the integral is simply being broken into pieces:

1 0.8 1.0

∫ dλ V – V
0 H =
∫ dλ V – V + ∫ dλ V – V
0 H 0 H Eq. 135

0 0 0.8

Absolute free energy files The .cli file contains the free energy comparison list, i.e., the tem-
and output plate coordinates prevalent during tethering. This file can be used
as input for further simulations with the same template coordi-
nates, by specifying file filename with the calculate command in
the standalone version of the FDiscover program.

Forcefield-Based Simulations/October 1997 271


6. Free Energy

The .tot file contains a history of instantaneous parameterized


energies from the dynamics simulations.
Table of spring constants A table of spring constants is printed before commencing the first
dynamics simulation for the integration over λ. The data tabulated
are absolute atom #, harmonic force constant, and (optionally)
mean-square displacement and mean harmonic energies as
deduced from spring-constant-generation procedures utilizing
molecular dynamics.
Dynamics running aver- Dynamics running averages contain additional categories for (V0 –
ages VH) and <V0>.
Summary of averages per Each table of dynamics averages generated by resume is followed
λ by a table summarizing contributions to the free energy integral,
including statistical errors and variances on the value dF for each
lambda value.
The final page of the output contains a SUMMARY OF FREE
ENERGY CALCULATIONS. It also includes a total free energy
summary, giving the absolute free energy of the ideal solid (the
λ = 1 state of the template). Each λ value has an associated output
table in the .out file.

272 Forcefield-Based Simulations/October 1997


A References

Allen, M. P.; Tildesley, D. J. Computer Simulation of Liquids, Claren-


don Press, Oxford Science Publications (1987).
Andersen, H. C. “Molecular dynamics simulations at constant
pressure and/or temperature”, J. Chem. Phys., 72, 2384 (1980).
Andersen, H. C. “Rattle: A “velocity” version of the Shake algo-
rithm for molecular dynamics calculations”, J. Comp. Physics,
52, 24–34 (1983).
Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola,
A.; Haak, J. R. “Molecular dynamics with coupling to an exter-
nal bath”, J. Chem. Phys., 81, 3684–3690 (1984).
Born, M.; Oppenheimer, J. R. Ann. Physik, 84, 457 (1927).
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swami-
nathan, S., and Karplus, M., J Comp. Chem., 4, 187–217 (1983).
Brooks, C. L., III; Brunger, A. T.; Karplus, M. Biopolymers, 24, 843
(1985a).
Brooks, C. L., III; Montgomery; Pettitt, B.; Karplus, M. “Structural
and energetic effects of truncating long range interactions in
ionic and polar fluids”, J. Chem. Phys., 83, 5897–5908 (1985b).
Brown, D.; Clark, J. H. R. “A comparison of constant energy, con-
stant temperature, and constant pressure ensembles in molec-
ular dynamics simulation of atomic liquids”, Molecular
Physics, 51, 1243–1252 (1984).
Burchart, E. de Vos Studies on Zeolites; Molecular Mechanics, Frame-
work Stability, and Crystal Growth, Ph.D. Thesis, Technische
Universiteit Delft (1992).
Casewit, C. J.; Colwell, K. S.; Rappé, A. K., J. Am. Chem. Soc., 114,
10035 (1992a).

Forcefield-Based Simulations/October 1997 273


A. References

Casewit, C. J.; Colwell, K. S.; Rappé, A. K., J. Am. Chem. Soc., 114,
10046 (1992b).
Catlow, C. R. A.; Norgett, M. J. “Lattice structure and stability of
ionic materials”, personal communication (1976).
Dauber–Osguthorpe, P.; Roberts, V. A.; Osguthorpe, D. J.; Wolff, J.;
Genest, M.; Hagler, A. T. “Structure and energetics of ligand
binding to proteins: E. coli dihydrofolate reductase–trimetho-
prim, a drug–receptor system”, Proteins: Structure, Function
and Genetics, 4, 31–47 (1988).
Deem, M. W.; Newsam, J. M.; Sinha, S. K.. “The h = 0 term in Cou-
lomb sums by the Ewald transformation”, J. Phys. Chem., 94,
8356–8359 (1990).
Demontis, P.; Yashonath, S.; Klein, M. L. “Localization and mobil-
ity of benzene in sodium-Y zeolite by molecular dynamics cal-
culations”, J. Phys. Chem., 93, 5016 (1989).
Ding, H. Q.; Karasawa, N.; Goddard, W. A., III “Atomic level sim-
ulations on a million particles: The cell multipole method for
Coulomb and London nonbond interactions”, J. Chem. Phys.,
97, 4309 (1992).
Dinur, U.; Hagler, A. T. “Approaches to empirical force fields”, in
Reviews of Computational Chemistry, Vol. 2; Lipkwowitz, K. B.;
Boyd, D. B. Eds.; VCH: New York; Chapter 4 (1991).
Ermer, O. “Calculation of molecular properties using force fields.
Applications in organic chemistry”, Structure and Bonding, 27,
161–211 (1976).
Ewald, P. P. Ann. d. Physik, 64, 253 (1921).
Fletcher, R. Practical Methods of Optimization, Vol. 1, Unconstrained
Optimization, John Wiley & Sons: New York (1980).
Fletcher, R.; Reeves, C. M. Comput. J., 7, 149 (1964).
Garofalini, S. H. J. Amer. Ceram. Soc., 67, 133 (1984).
Garofalini, S. H.; Zirl, D. M., J. Amer. Ceram. Soc., 73, 2848 (1990).
Greengard, L.; Rokhlin, V. I. “A fast algorithm for particle simula-
tions”, J. Comp. Phys., 73, 325 (1987).
Gunsteren, W. F.; Karplus, M. J. Comp. Chem., 1, 266 (1980).

274 Forcefield-Based Simulations/October 1997


Ha, S. N.; Giammona, A.; Field, M.; Brady, J. W. “A revised poten-
tial-energy surface for molecular mechanics studies of carbo-
hydrates”, Carbohydrate Research, 180, 207–221 (1988).
Hagler, A. T.; Dauber, P.; Lifson, S. “Consistent force field studies
of intermolecular forces in hydrogen bonded crystals. III. The
C=O…H–O hydrogen bond and the analysis of the energetics
and packing of carboxylic acids”, J. Am. Chem. Soc., 101, 5131–
5141 (1979a).
Hagler, A. T.; Ewig, C. S. “On the use of quantum energy surfaces
in the derivation of molecular force fields”, Comp. Phys.
Comm., 84, 131–155 (1994).
Hagler, A. T.; Lifson, S.; Dauber, P. “Consistent force field studies
of intermolecular forces in hydrogen bonded crystals. II. A
benchmark for the objective comparison of alternative force
fields”, J. Am. Chem. Soc., 101, 5122–5130 (1979b).
Hagler, A. T.; Stern, P. S.; Sharon, R.; Becker, J. M.; Naider, F. “Com-
puter simulation of the conformational properties of oligopep-
tides. Comparison of theoretical methods and analysis of
experimental results”, J. Am. Chem. Soc., 101, 6842–6852
(1979c).
Halgren, T. A., J. Amer. Chem. Soc., 114, 7827–7843 (1992).
Halgren, T. A.; Nachbar, R. B. “The Merck molecular force field: IV.
Conformational energies and geometries for MMFF94,” J.
Comp. Chem., 19, 587–615 (1996).
Harvey, S. C., “Treatment of electrostatic effects in macromolecular
modeling”, Proteins: Structure, Function, and Genetics, 5, 78–92
(1989).
Hill, J. -R.; Sauer, J. “Molecular mechanics potential for silica and
zeolite catalysts based on ab initio calculations. 1. Dense and
microporous silica”, J. Phys. Chem., 98, 1238–1244 (1994).
Homans, S. W. “A molecular mechanical force field for the confor-
mational analysis of oligosaccharides: Comparison of theoret-
ical and crystal structures of Man α1–3 Man β1–4 GlcNAc”,
Biochemistry, 29, 9110–9118 (1990).
Hoover, W. “Canonical dynamics: Equilibrium phase–space distri-
butions”, Phys. Rev., A31, 1695–1697 (1985).

Forcefield-Based Simulations/October 1997 275


A. References

Hwang, J. K.; Warshel, A. “Semiquantitative calculations of cata-


lytic free energies in genetically modified enzymes”, Biochem-
istry, 26, 2669–2673 (1987).
Hwang, M.-J.; Stockfisch, T. P.; Hagler, A. T. “Derivation of Class II
force fields. 2. Derivation and characterization of a Class II
force field, CFF93, for the alkyl functional group and alkane
molecules”, J. Amer. Chem. Soc., 116, 2515–2525 (1994).
Kao, J.; Allinger, N. L., J. Amer. Chem. Soc., 99, 975 (1977).
Karasawa, N.; Goddard, W. A., III “Acceleration of convergence
for lattice sums”, J. Phys. Chem., 93, 7320–7327 (1989).
Karasawa, N.; Goddard, W. A., III “Force fields, structures, and
properties of polyvinylidene fluoride crystals,” Macromole-
cules, 25, 7268 (1992).
Kirkwood, J. G. “Statistical mechanics of fluid mixtures”, J. Chem.
Phys., 3, 300–313 (1935).
Kitson, D. H.; Hagler, A. T. “Theoretical studies of the structure
and molecular dynamics of a peptide crystal“, Biochemistry, 27,
5246–5257 (1988).
Kohler, A. E.; Garofalini, S. H. Langmuir, 10, 4664 (1994).
Levitt, M.; Lifson, S. J. Molec. Biol., 46, 269 (1969).
Levy, R. M.; McCammon, J. A.; Karplus, M. Chem. Phys. Lett., 64, 4
(1979).
Lifson, S.; Hagler, A. T.; Dauber, P., J. Amer. Chem. Soc., 101, 5111
(1979).
Liljefors T.; Tai, J. C.; Li, S.; Allinger, N. L., J. Comp. Chem., 8, 1051
(1987).
Maple, J.; Dinur, U.; Hagler, A. T. “Derivation of force fields for
molecular mechanics and dynamics from ab initio energy sur-
faces”, Proc. Nat. Acad. Sci. USA, 85, 5350–5354 (1988).
Maple, J. R.; Hwang, M.-J.; Stockfisch, T. P.; Dinur, U.; Waldman,
M.; Ewig, C. S; Hagler, A. T. “Derivation of Class II force fields.
1. Methodology and quantum force field for the alkyl func-
tional group and alkane molecules”, J. Comput. Chem., 15, 162–
182 (1994a).

276 Forcefield-Based Simulations/October 1997


Maple, J. R.; Hwang, M.-J.; Stockfisch, T. P.; Hagler, A. T. “Deriva-
tion of Class II force fields. 3. Characterization of a quantum
force field for the alkanes”, Israel J. Chem., 34, 195 –231 (1994b).
Maple, J. R., Thacher, T. S., Dinur, U.; Hagler, A. T. “Biosym force
field research results in new techniques for the extraction of
inter- and intramolecular forces”, Chemical Design Automation
News, 5 (9), 5–10 (1990).
Mayo, S. L.; Olafson, B. D.; Goddard, W. A. III “DREIDING: A
generic force field”, J. Phys. Chem., 94, 8897–8909 (1990).
McCammon, J. A.; Gelin, B. R.; Karplus, M.; Wolynes, P. G. Nature,
262, 325–326 (1976).
McQuarrie, Statistical Mechanics, Harper & Row, Chapter 3 (1976).
Mezei, M.; Beveridge, D. “Free energy simulations”, Ann. N.Y.
Acad. Sci., 482, 1–23 (1986).
Miller, G. W.; Knaebel, K. S.; Ikels, K. G., AIChE J., 33, 194 (1987).
Momany, F. A.; Carruthers, L. M.; McGuire, R. F.; Scheraga, H. A.,
J. Phys. Chem., 78, 1595 (1974).
Momany, F. A.; Rone, R. J. Comp. Chem., 13, 888–900 (1992).
Némethy, G.; Pottle, M. S.; Scheraga, H. A., J. Chem. Phys., 87, 1883
(1983).
Nosé, S. “A molecular dynamics method for simulations in the
canonical ensemble”, Molec. Phys., 52, 255–268 (1984a).
Nosé, S. “A unified formulation of the constant temperature
molecular dynamics methods”, J. Chem. Phys., 81, 511–519
(1984b).
Nosé, S. “Constant temperature molecular dynamics methods”,
Prog. Theoret. Phys. Supplement, 103, 1–46 (1991).
Parrinello, M.; Rahman, A. “Polymorphic transitions in single
crystals: A new molecular dynamics method”, J. Appl. Phys.,
52, 7182–7190 (1981).
Pickett, S. D.; Nowak, A. K.; Thomas, J. M.; Peterson, B. K.; Swift,
J. F. P.; Cheetham, A. K.; Denouden, C. J. J.; Smit, B.; Post, M. F.
M. “Mobility of adsorbed species in zeolites—A molecular
dynamics simulation of xenon in silicalite”, J. Phys. Chem., 94,
1233 (1990).

Forcefield-Based Simulations/October 1997 277


A. References

Powell, M. J. D. “Restart Procedure for the Conjugate Gradient


Method,” Mathematical Programming, 12, 241 (1977).
Press, W. H.: Flannery, B. P.; Teukolsky, S. A.; Vetterling, W. T. In
Numerical Recipes, The Art of Scientific Computing; Cambridge
University Press, Cambridge (1986).
Quirke, N.; Jacucci, G. “Energy difference functions in Monte
Carlo simulations: Application to (1) the calculation of free
energy of liquid nitrogen. II. The calculation of fluctuation in
Monte Carlo averages”, Mol. Phys., 45, 823–838 (1982).
Rappé, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard, W. A.; Skiff,
W. M. “UFF, a full periodic table force field for molecular
mechanics and molecular dynamics simulations”, J. Amer.
Chem. Soc., 114, 10024 (1992).
Rappé, A. K.; Colwell, K. S.; Casewit, C. J., Inorg. Chem., 32, 3438
(1993).
Rappé, A. K.; Goddard, W. A., J. Phys. Chem., 95, 3358 (1991).
Ray, J. R. Comp. Phye. Rep., 8, 109 (1988) and references therein.
Reidl, D., International Tables for Crystallography: A Space-Group
Symmetry, Vol. A; Dordrecht, Holland (1983).
Rigby, D.; Sun, H.; Eichinger, B. E. “Computer simulations of
poly(ethylene oxides): Force field, PVT diagram and cycliza-
tion behavior”, Polymer, 44, 311 (1998).
Root, D.M., C.R. Landis, and T. Cleveland, J. Am. Chem. Soc., 1993,
115, 4201-4209.
Root, D.M., Ph.D. thesis, University of Wisconsin, Madison, 1997.
Rosenthal, A. B.; Garofalini, S. H., J. Amer. Ceram. Soc., 70, 821
(1987).
Rosenthal, A.B.; Garofalini, S. H., J. Non-Crys. Solids, 107, 65 (1988).
Ryckaert, J.–P.; Ciccotti, G.; Berendsen, H. J. C. “Numerical Integra-
tion of the Cartesian equations of motion of a system with con-
straints: Molecular dynamics of n-alkanes”, J. Comp. Phys., 23,
327–341 (1977).
Schmidt, K. E.; Lee, M. A. “Implementing the fast multipole
method in three dimensions”, J. Stat. Phys., 63, 1223 (1991).

278 Forcefield-Based Simulations/October 1997


Shi, S.; Yan, L.; Yang, Y.; Fisher, J. ESFF Forcefield Project Report II;
MSI, San Diego.
Singh, U. C; Brown, F. K.; Bash, P. A.; Kollman, P. A. “An approach
to the application of free energy perturbation methods using
molecular dynamics: Applications to the transformations of
methanol to ethane, oxonium to ammonium, glycine to ala-
nine, and alanine to phenylalanine in aqueous solution and to
H3O+(H2O)3 NH4+(H2O)3 in the gas phase”, J. Am. Chem. Soc.,
109, 1607–1614 (1987).
Soules, T. F., J.Chem. Phys., 71, 4570 (1979).
Soules, T. F.; Varshneya, A. K., J. Amer. Ceram. Soc., 64, 145 (1981).
Sprague, J. T.; Tai, J. C.; Yuh, Y.; Allinger, N. L., J. Comp. Chem., 8,
581 (1987).
Straatsma, T. P.; Berendsen, H. J. C.; Postma, J. P. M. “Free energy
of hydrophobic hydration: A molecular dynamics study of
noble gases in water”, J. Chem. Phys., 85, 6720–6727 (1986).
Sun, H. J. Comp. Chem., 15, 752 (1994).
Sun, H. “Ab initio calculations and force field development for
computer simulation of polysilanes”, Macromolecules, 28, 701
(1995).
Sun, H. “The parameterization and validation of a condensed-
phase optimized ab initio forcefield” (in preparation).
Sun, H.; Mumby, S. J.; Maple, J. R.; Hagler, A. T. “An ab initio
CFF93 all-atom force field for polycarbonates”, J. Amer. Chem.
Soc., 116, 2978–2987 (1994).
Sun, H.; Rigby, D. “Polydimethylsiloxane: Ab initio force field:
Structural, conformational and thermophysical properties”,
Spectrochimica Acta (in press).
Sun, H.; Ren, P.; Fried, J. R. "The COMPASS Force Field: Parame-
terization and Validation for Polyphosphazenes" Computa-
tional and Theoretical Polymer Science, 8 (1/2), 229 (1998).
Sun, H. "COMPASS: An ab Initio Force-Field Optimized for Con-
densed-Phase Applications - Overview with Details on
Alkane and Benzene Compounds" J. Phys. Chem., (1998 in
press).

Forcefield-Based Simulations/October 1997 279


A. References

Tembe, B. L.; McCammon, J. A. “Ligand–receptor interactions”,


Comput. Chem., 8, 281–283 (1984).
Tesar, A. A.; Varshneya, A. K. J. Chem. Phys., 87, 2986 (1987).
Tosi, M. P., Solid State Physics, 16, 107 (1964).
van Beest, B. W. H.; Kramer, G. J.; van Santen, R. A., Phys. Rev. Lett.,
64, 1955 (1990).
Verlet, L. “Computer experiments on classical fluids. I. Thermody-
namical properties of Lennard–Jones molecules”, Phys. Rev.,
159, 98–103 (1967).
Waldman, M.; Hagler, A. T. “New combining rules for rare gas van
der Waals parameters”, J. Comput. Chem., 14, 1077–1084 (1993).
Warshel, A.; Sussman, F.; King, G. “Free energy of charges in sol-
vated proteins: Microscopic calculations using a reversible
charging process”, Biochemistry, 25, 8368–8372 (1986).
Watanabe, K.; Austin, N.; Stapleton M. R. “Investigation of the air
separation properties of zeolites types A, X, and Y by Monte
Carlo simulations,” Molec. Sim., 15, 197–221 (1995).
Weiner, S. J.; Kollman, P. A.; Case, D. A.; Singh, U. C.; Ghio, C.;
Alagona, G.; Profeta, S. Jr.; Weiner, P. “A new force field for
molecular mechanical simulation of nucleic acids and pro-
teins”, J. Am. Chem. Soc., 106, 765–784 (1984).
Weiner, S. J.; Kollman, P. A.; Nguyen, D. T.; Case, D. A. “An all
atom forcefield for simulations of proteins and nucleic acid”,
J. Comp. Chem., 7, 230–252 (1986).
Wiberg, K. B.; Murcko, M. A., J. Am. Chem. Soc., 111, 4821 (1989).
Williams, D. E., J. Phys. Chem., 45, 3370 (1966).
Wilson, E. B.; Decius, J. C.; Cross, P. C. Molecular Vibrations, Dover,
New York (1980).
Woodcock, L. V., Advances in Molten Salt Chemistry; Plenum, New
York (1975).
Yashonath, S.; Thomas, J. M.; Nowak, A. K.; Cheetham, A. K. “The
siting, energetics and mobility of saturated hydrocarbons
inside zeolitic cages—Methane in zeolite-Y”, Nature, 331, 601
(1988).

280 Forcefield-Based Simulations/October 1997


Zirl, D. M.; Garofalini, S. H. Phys. Chem. Glasses, 30, 155 (1989).

Forcefield-Based Simulations/October 1997 281


A. References

282 Forcefield-Based Simulations/October 1997


B Forcefield Terms and Atom Types

This appendix explains This appendix includes tables of atom types for these forcefields:
Original AMBER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here
Homans’ AMBER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here
CFF91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here
CHARMm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here
COMPASS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here
CVFF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here
CVFF_aug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here
ESFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here
PCFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . here

This appendix also includes a table of definitions of individual


terms found in various forcefields (Forcefield term definitions).
Examining forcefield files You can examine the forcefield files directly for atom types and
other information (for information on how to edit these files,
please see Editing a forcefield):
♦ For the Insight molecular modeling program, human-readable
forcefield files are found in $BIOSYM_LIBRARY/*.frc
♦ For Cerius2•OFF, human-readable forcefield files are found in
the C2_installation_directory/Cerius2-Resources/FORCE-
FIELD directory (the Cerius2-Resources/FORCE-FIELD direc-
tory also appears in the directory in which you run Cerius2).
♦ For CHARMm, the human-readable forcefield parameter files
are located in $CHM_DATA/*PRM. The atom types are in the
$CHM_DATA/MASSES.RTF file.

Forcefield-Based Simulations/October 1997 283


B. Forcefield Terms and Atom Types

Forcefield term definitions

Table 24. Common potential terms in major forcefields supported by MSI

force-
name illustrated form of the term fielda
quadratic AMBER,
bond- k ( r – r0 ) 2 CHARMm,
stretching UFF
quartic CFF
bond- k2 ( r – r0 )2 + k3 ( r – r0 ) 3 + k4 ( r – r 0 ) 4
stretching
Morse bond- 0 2 CVFF, ESFF
stretching ( –α ( r b – rb ) )
k 1–e
quadratic AMBER,
angle- k ( θ – θ0 )2 CHARMm,
bending CVFF
quartic CFF
angle- k2 ( θ – θ0 )2 + k3 ( θ – θ0 ) 3 + k4 ( θ – θ 0 ) 4
bending
cosine ESFF, UFF
angle- various
bending
single-cosine AMBER,
torsion k ( 1 + cos ( nφ – φ 0 ) ) or similar CHARMm,
CVFF
three-term k 1 [ 1 – cos ( φ – φ 01 ) ] + k 2 [ 1 – cos ( 2φ – φ 02 ) ] CFF
cosine tor-
sion + k 3 [ 1 – cos ( 3φ – φ 03 ) ]
cosine-Fou- UFF
rier torsion
k ( 1 ± cos nφ )
sin–cos tor- ESFF
 sin2 θ 1 sin2 θ 2 sinn θ 1 sin2 θ 2 
sion - cos [ nφ ]
k φ  -------------------------------- + sign -------------------------------
 sin2 θ 1 sin2 θ 2
0 0 0
sinn θ 1 sinn θ 2
0

284 Forcefield-Based Simulations/October 1997


Forcefield term definitions

Table 24. Common potential terms in major forcefields supported by MSI

force-
name illustrated form of the term fielda
improper AMBER,
cosine out- k [ 1 + cos ( nχ – χ 0 ) ] or similar CVFF, UFF
of-plane 3 4
improper CHARMm
quadratic
k ( χ – χ0 )2
out-of- 2
plane
improper CVFF
square out- 1
of-plane, kχ 2
imprope
Wilson (or CFF, ESFF,
umbrella) kχ 2 UFF
out-of- UFF
plane
k ( cos χ – cos χ 0 ) 2

pyrimid- none
height out-
of-plane
not used

6–9 van der CFF, ESFF,


Waals
A ij B ij UFF
------ – ------ or ε [ 2 ( r*/r ) 9 – 3 ( r*/r ) 6 ]
r ij9 r ij6
6–12 van der AMBER,
Waals
A ij B ij CHARMm,
------- – ------ or ε [ ( r*/r ) 12 – 2 ( r*/r ) 6 ]
r ij12 r ij6 CVFF

electrostatic AMBER,
q i qj CFF,
--------- or similar CHARMm,
εr ij CVFF, ESFF,
UFF

Forcefield-Based Simulations/October 1997 285


B. Forcefield Terms and Atom Types

Table 24. Common potential terms in major forcefields supported by MSI

force-
name illustrated form of the term fielda
quadratic CFF,
bond– CVFFb
bond k ( r – r 0 ) ( r′ – r′ 0 )

quadratic CFF, CVFF


bond–
angle k ( r – r0 ) ( θ – θ0 )

angle–angle CFF, CVFF

k ( θ – θ 0 ) ( θ′ – θ′ 0 )

end bond– CFF


torsion
( b – b 0 ) [ k 1 cos φ + k 2 cos 2φ + k 3 cos 3φ ]

center CFF
bond–tor-
sion ( b′ – b′0 ) [ k 1 cos φ + k 2 cos 2φ + k 3 cos 3φ ]

angle–torsion CFF

( θ – θ 0 ) [ k 1 cos φ + k 2 cos 2φ + k 3 cos 3φ ]

286 Forcefield-Based Simulations/October 1997


AMBER atom types

Table 24. Common potential terms in major forcefields supported by MSI

force-
name illustrated form of the term fielda
angle– CFF, CVFF
angle–tor-
sion k cos φ ( θ – θ 0 ) ( θ′ – θ′0 )

improper CVFF
out-of-
plane–-out-
of-plane,
improper k [ 1 – cos 2χ ] 1 / 2 [ 1 – cos 2χ′ ] 1 / 2

aMajor
forcefields, for the purposes of this table, are AMBER, CFF, CHARMm, CVFF, ESFF, UFF.
b
AMBER, CHARMm, ESFF, and UFF contain no cross terms.

AMBER atom types

Standard AMBER forcefield

Table 25. Atom types—AMBER (Page 1 of 4)

gener- atom
al class typea description
hydrogen types
H amide or imino hydrogen
HC explicit hydrogen attached to carbon
HO hydrogen on hydroxyl oxygen
HS hydrogen attached to sulfur
HW hydrogen in water
H2 amino hydrogen in NH2
H3 hydrogen of lysine or arginine (positively charged)
all-atom carbon typesb

Forcefield-Based Simulations/October 1997 287


B. Forcefield Terms and Atom Types

Table 25. Atom types—AMBER (Page 2 of 4)

gener- atom
al class typea description
2
C sp carbonyl carbon and aromatic carbon with hydroxyl substit-
uent in tyrosine
CA sp2 aromatic carbon in 6-membered ring with 1 substituent
CB sp2 aromatic carbon at junction between 5- and 6-membered
rings
CC sp2 aromatic carbon in 5-membered ring with 1 substituent and
next to a nitrogen
CK sp2 aromatic carbon in 5-membered ring between 2 nitrogens
and bonded to 1 hydrogen (in purine)
CM sp2 same as CJ but one substituent
CN sp2 aromatic junction carbon in between 5- and 6-membered
rings
CQ sp2 carbon in 6-membered ring of purine between 2 NC nitro-
gens and bonded to 1 hydrogen
CR sp2 aromatic carbon in 5-membered ring between 2 nitrogens
and bonded to 1 H (in his)
CT sp3 carbon with 4 explicit substituents
CV sp2 aromatic carbon in 5-membered ring bonded to 1 N and
bonded to an explicit hydrogen
CW sp2 aromatic carbon in 5-membered ring bonded to 1 N–H and
bonded to an explicit hydrogen
C* sp2 aromatic carbon in 5-membered ring with 1 substituent
united carbon typesc
CD sp2 aromatic carbon in 6-membered ring with 1 hydrogen
CE sp2 aromatic carbon in 5-membered ring between 2 nitrogens
with 1 hydrogen (in purines)
CF sp2 aromatic carbon in 5-membered ring next to a nitrogen with-
out a hydrogen
CG sp2 aromatic carbon in 5-membered ring next to an N–H
CH sp3 carbon with 1 hydrogen
CI sp2 carbon in 6-membered ring of purines between 2 NC nitro-
gens
CJ sp2 carbon in pyrimidine at positions 5 or 6 (more pure double
bond than aromatic with 1 hydrogen)
CP sp2 aromatic carbon in 5-membered ring between 2 nitrogens
with one hydrogen (in his)
C2 sp3 carbon with 2 hydrogens
C3 sp3 carbon with 3 hydrogens

288 Forcefield-Based Simulations/October 1997


AMBER atom types

Table 25. Atom types—AMBER (Page 3 of 4)

gener- atom
al class typea description
nitrogen types
N sp2 nitrogen in amide group
NA sp2 nitrogen in 5-membered ring with hydrogen attached
NB sp2 nitrogen in 5-membered ring with lone pairs
NC sp2 nitrogen in 6-membered ring with lone pairs
NT sp3 nitrogen with 3 substituents
N2 sp2 nitrogen in base NH2 group or arginine NH2
N3 sp3 nitrogen with 4 substituents
N* sp2 nitrogen in purine or pyrimidine with alkyl group attached
oxygen types
O carbonyl oxygen
OH alcohol oxygen
OS ether or ester oxygen
OW water oxygen
O2 carboxyl or phosphate nonbonded oxygen
sulfur types
S sulfur in disulfide linkage or methionine
SH sulfur in cystine
phosphorus
P phosphorus in phosphate group
ion types
CU copper ion (Cu+2)
CØ calcium ion (Ca+2)
I iodine ion (I–)
IM chlorine ion (Cl–)
MG magnesium ion (Mg+2)
QC cesium ion (Cs+)
QK potassium ion (K+)
QL lithium ion (Li+)
QN sodium ion (Na+)
QR rubidium ion (Rb+)

Forcefield-Based Simulations/October 1997 289


B. Forcefield Terms and Atom Types

Table 25. Atom types—AMBER (Page 4 of 4)

gener- atom
al class typea description
other
LP lone pair
a
From Weiner et al. (1984) and Weiner et al. (1986).
bNon-hydrogen-containing carbons.
c
United-atom carbons with implicit inclusion of hydrogens.

Homan’s carbohydrate forcefield

Table 26. Atom types—Homans

general class atom type description


carbohydrate-hydrogen types
AH α anomeric hydrogen
BH β anomeric hydrogen
HT sp3 hydrogen
HY hydroxyl hydrogen
carbohydrate-carbon types
AC α anomeric carbon
BC β anomeric carbon
CS sp3 carbon in sugar ring
carbohydrate-oxygen types
OA α anomeric oxygen
OB β anomeric oxygen
OE ring oxygen
OT hydroxyl oxygen

290 Forcefield-Based Simulations/October 1997


CFF91 atom types

CFF91 atom types

Table 27. Atom types—CFF91 (Page 1 of 4)

atom
general class type description
hydrogen types
dw deuterium in heavy water (equiv. to h*)
h hydrogen bonded to C or S
hc hydrogen bonded to C (equiv. to h)
hi hydrogen in charged imidazole ring
hn hydrogen bonded to N (equiv. to h*)
ho hydrogen bonded to O (equiv. to h*)
hp hydrogen bonded to P (equiv. to h)
hs hydrogen bonded to S (equiv. to h)
hw hydrogen in water (equiv. to h*)
h* polar hydrogen bonded to N or O
h+ charged hydrogen (in cation)
carbon types
c generic sp3 carbon
ca general amino acid alpha carbon (sp3) (equiv. to c)
cg sp3 alpha carbon in glycine (equiv. to c)
ci sp2 aromatic carbon in charged imidazole ring (his+)
(equiv. to cp)
co sp3 carbon in acetal (equiv. to c)
coh sp3 carbon in acetal with hydrogen (equiv. to c)
cp sp2 aromatic carbon
cr carbon in guanidinium group (HN=C(NH2)2) (arg)
cs sp2 carbon in 5-membered ring next to S (equiv. to cp)
ct sp carbon involved in triple bond
c1 sp3 carbon bonded to 1 H, 3 heavy atoms (equiv. to c)
c2 sp3 carbon bonded to 2 H’s, 2 heavy atoms (equiv. to
c)
c3 sp3 carbon in methyl (CH3) group (equiv. to c)
c5 sp2 aromatic carbon in 5-membered ring (equiv. to
cp)
c3h sp3 carbon in 3-membered ring with hydrogens (equiv.
to c)

Forcefield-Based Simulations/October 1997 291


B. Forcefield Terms and Atom Types

Table 27. Atom types—CFF91 (Page 2 of 4)

atom
general class type description
3
c3m sp carbon in 3-membered ring (equiv. to c)
c4h sp3 carbon in 4-membered ring with hydrogens (equiv.
to c)
c4m sp3 carbon in 4-membered ring (equiv. to c)
c′ sp2 carbon in carbonyl (C=O) group in amide
c" carbon in carbonyl group, not amide (equiv. to c*)
c* carbon in carbonyl group, not amide
c– carbon in carboxylate (COO–) group
c+ carbon in guanidinium group
c= nonaromatic end doubly bonded carbon
c=1 nonaromatic, next-to-end doubly bonded carbon
c=2 nonaromatic doubly bonded carbon
nitrogen types
n sp2 amide nitrogen
na sp3 amine nitrogen
nb sp2 nitrogen in aromatic amine (equiv. to nn)
nh sp2 nitrogen in 5- or 6-membered ring, bonded to
hydrogen
nho sp2 nitrogen in 6-membered ring, next to a carbonyl
group and with a hydrogen (equiv. to nh)
nh+ protonated nitrogen in 6-membered ring
ni sp2 nitrogen in charged imidazole ring (his+) (equiv. to
nh)
nn sp2 nitrogen in aromatic amine
np sp2 nitrogen in 5- or 6-membered ring, not bonded to
hydrogen
npc sp2 nitrogen in 5- or 6-membered ring, bonded to a
heavy atom (equiv. to nh)
nr sp2 nitrogen in guanidinium group (HN=C(NH2)2)
nt sp nitrogen involved in triple bond
nz sp nitrogen in N2
n1 sp2 nitrogen in charged arginine (equiv. to nr)
n2 sp2 nitrogen in guanidinium group (HN=C(NH2)2)
(equiv. to nr)
n4 sp3 nitrogen in protonated amine (equiv. to n+)
n3m sp3 nitrogen in 3-membered ring (equiv. to na)
n3n sp2 nitrogen in 3-membered ring (equiv. to n)

292 Forcefield-Based Simulations/October 1997


CFF91 atom types

Table 27. Atom types—CFF91 (Page 3 of 4)

atom
general class type description
3
n4m sp nitrogen in 4-membered ring (equiv. to na)
n4n sp2 nitrogen in 4-membered ring (equiv. to n)
n+ protonated amine nitrogen
n= nonaromatic end doubly bonded nitrogen
n=1 nonaromatic, next-to-end doubly bonded nitrogen
n=2 nonaromatic doubly bonded nitrogen
oxygen types
o sp3 oxygen in alcohol, ether, acid, or ester group
oc sp3 oxygen in ether or acetal (equiv. to o)
oe sp3 oxygen in ester (equiv. to o)
oh oxygen bonded to H (equiv. to o)
op oxygen in aromatic ring (e.g., furan)
o3e sp3 oxygen in 3-membered ring (equiv. to o)
o4e sp3 oxygen in 4-membered ring (equiv. to o)
o′ oxygen in carbonyl (C=O) group
o* oxygen in water
o– oxygen in carboxylate (COO–) group
sulfur types
s sp3 sulfur in sulfide, disulfide, or thiol group
sc sp3 sulfur in methionine (C–S–C) group (equiv. to s)
sh sulfur in sulfhydryl (SH) group (equiv. to s)
sp sulfur in aromatic ring (e.g., thiophene)
s1 sulfur involved in S–S disulfide bond (equiv. to s)
s3e sulfur in 3-membered ring (equiv. to s)
s4e sulfur in 4-membered ring (equiv. to s)
s′ sulfur in thioketone (>C=S) group
s– partial-double sulfur bonded to something that is
bonded to another partial-double oxygen or sulfur
phosphorus
p general phosphorous atom
halogen types
br bromine bonded to a carbon
cl chlorine bonded to a carbon
f fluorine bonded to a carbon
i covalently bound iodine
ion types

Forcefield-Based Simulations/October 1997 293


B. Forcefield Terms and Atom Types

Table 27. Atom types—CFF91 (Page 4 of 4)

atom
general class type description
Br bromide ion
ca+ calcium ion (Ca2+)
Cl chloride ion
Na sodium ion
argon
ar argon atom
silicon
si silicon atom
other
lp lone pair
nu null atom for relative free energy

CHARMm atom types

Table 28. Atom types—CHARMm (Page 1 of 4)

atom
general class type description
hydrogen types
H hydrogen bonding hydrogen (neutral group)
HA aliphatic or aromatic hydrogen
HC hydrogen bonding hydrogen (charged group)
HMU mu-bonded hydrogen for metals and boron-hydride
HO hydrogen on an alcohol oxygen
HT TIPS3P water-model hydrogen
carbon types
C carbonyl or guanidinium carbon
C3 carbonyl carbon in 3-membered aliphatic ring
C4 carbonyl carbon in 4-membered aliphatic ring
C5R aromatic carbon in 5-membered ring
C5RP for aryl-aryl bond between C5R rings
C5RQ for second aryl-aryl bond between C5RP rings (ortho)
C6R aromatic carbon in a 6-membered ring
C6RP for aryl-aryl bond between C6R rings

294 Forcefield-Based Simulations/October 1997


CHARMm atom types

Table 28. Atom types—CHARMm (Page 2 of 4)

atom
general class type description
C6RQ carbon of C6RP type ortho to C6RP pair
CF1 carbon with one fluorine
CF2 carbon with two fluorines
CF3 carbons with three fluorines
CM carbon in carbon monoxide or other triply bonded carbon
CP3 carbon on nitrogen in proline ring
CPH1 CG and CD2 carbons in histidine ring
CPH2 CE1 carbon in histidine ring
CQ66 third adjacent pair of CR66 types in fused rings
CT aliphatic carbon (tetrahedral)
CT3 carbon in 3-membered aliphatic ring, usually tetrahedral
CT4 carbon in 4-membered aliphatic ring, usually tetrahedral
CUA1 carbon in double bond, first pair
CUA2 carbon in double bond, second conjugated pair
CUA3 carbon in double bond, third conjugated pair
CUY1 carbon in triple bond, first pair
CUY2 carbon in triple bond, second conjugated pair
extended-atom carbon types
C5RE extended aromatic carbon in 5-membered ring
C6RE extended aromatic carbon in 6-membered ring
CH1E extended-atom carbon with one hydrogen
CH2E extended-atom carbon with two hydrogens
CH3E extended-atom carbon with three hydrogens
CR55 aromatic carbon-merged 5-membered rings
CR56 aromatic carbon-merged 5- or 6-membered rings
CR66 aromatic carbon-merged 6-membered rings
CS66 second adjacent pair of CR66 types in fused rings
nitrogen types
N nitrogen: planar-valence of 3, i.e., nitrile, etc.
N3 nitrogen in a 3-membered ring
N5R nitrogen in a 5-membered aromatic ring
N5RP for aryl-aryl bond between 5-membered rings
N6R nitrogen in a 6-membered aromatic ring
N6RP for aryl-aryl bond between 6-membered rings
NC charged guanidinium-type nitrogen

Forcefield-Based Simulations/October 1997 295


B. Forcefield Terms and Atom Types

Table 28. Atom types—CHARMm (Page 3 of 4)

atom
general class type description
NC2 for neutral guanidinium group - Arg sidechain
NO2 nitrogen in nitro or related group
NP nitrogen in peptide, amide, or related group
NR1 protonated nitrogen in neutral histidine ring
NR2 unprotonated nitrogen in neutral histidine ring
NR3 nitrogens in charged histidine ring
NR55 N at fused bond between two 5-membered aromatics
NR56 N at fused bond between 5- and 6-membered aryls
NR66 N at fused bond between two 6-membered aromatics
NT nitrogen (tetrahedral), i.e., amine, etc.
NX proline nitrogen or similar
oxygen types
O carbonyl oxygen for amide or related structures
O2M oxygen in Si-O-Al or Al-O-Al bond
O5R oxygen in 5-membered aromatic ring-radicals, etc.
O6R oxygen in 6-membered aromatic ring-radicals, etc.
OA carbonyl oxygen for aldehydes or related
OAC carbonyl oxygen for acids or related
OC charged oxygen
OE ether oxygen / acetal oxygen
OH2 ST2 water-model oxygen
OK carbonyl oxygen for ketones or related
OM oxygen in carbon monoxide or other triply bonded oxygen
OS ester oxygen
OSH massless O for zeolites or related cage compounds
OSI oxygen in Si-O-Si bond
OT hydroxyl oxygen (tetrahedral) or ionizable acid
OW TIP3P water-model oxygen
sulfur types
S5R sulfur in a 5-membered aromatic ring
S6R sulfur in a 6-membered aromatic ring
SE thioether sulfur
SH1E extended-atom sulfur with one hydrogen
SK thioketone sulfur
SO1 sulfur bonded to one oxygen

296 Forcefield-Based Simulations/October 1997


COMPASS atom types

Table 28. Atom types—CHARMm (Page 4 of 4)

atom
general class type description
SO2 sulfur bonded to two oxygens
SO3 sulfur bonded to three oxygens
SO4 sulfur bonded to four oxygens
ST sulfur, general: usually tetrahedral
phosphorus types
P6R phosphorous in aromatic 6-membered ring
PO3 phosphorous bonded to three oxygens
PO4 phosphorous bonded to four oxygens
PT phosphorous, general: usually tetrahedral
PUA1 double-bonded phosphorous
PUY1 triple-bonded phosphorus
other
LP ST2 lone pair

COMPASS atom types

Table 29. Atom types—COMPASS (Page 1 of 6)

atom
general class type description
hydrogen types
h hydrogen, generic
h1 hydrogen, nonpolar
h1+ hydrogen, proton
h1h hydrogen in H2
h1n hydrogen bound to N, Cl
h1o hydrogen bound to O, F
carbon types
c carbon, generic
c1o carbon in carbon monoxide, CO
c2= carbon, SP, two double bonds O=C=O, S=C=S
c2t carbon, SP, triple bond

Forcefield-Based Simulations/October 1997 297


B. Forcefield Terms and Atom Types

Table 29. Atom types—COMPASS (Page 2 of 6)

atom
general class type description
c3 carbon, SP2, generic, 3 bonds
c3” carbon, SP2, carbonyl, two polar substituents
c3# carbon, SP2, in CO32- anion
c3’ carbon, SP2, carbonyl, one polar substituent
c3- carbon, SP2, carboxylate
c3= carbon, SP2, double bond to C (-C=C-)
c3a carbon, SP2, aromatic
c3n carbon, SP2, double bond to N (-C=N-)
c3o carbon, SP2, carbonyl
c4 carbon, SP3, generic, 4 bonds
c43 carbon, SP3, with 3 heavy atoms
c44 carbon, SP3, with 4 heavy atoms
c4o carbon, SP3, bond to oxygen
c4x carbon, SP3, bond to chlorine
nitrogen types
n nitrogen, generic
n1n nitrogen, in N2
n1o nitrogen in NO
n1t nitrogen, SP, 1 triple bond
n2= nitrogen, SP2, 1 double bond, non-aromatic
n2a nitrogen, SP2, 2 partial double bonds, aromatic
n2t nitrogen, SP, 1 triple bond, non-aromatic
n3 nitrogen, SP3, in amines
n3* nitrogen, SP3, in NH3
n3+ nitrogen, SP3, charged
n3a nitrogen, SP2, aromatic
n3h1 nitrogen, SP3, in amines with 1 hydrogen
n3h2 nitrogen, SP3, in amines with 2 hydrogens
n3m nitrogen, SP3, in amides without hydrogen
n3mh nitrogen, SP3, in amides with hydrogen
n3o nitrogen, SP2, in nitro group
n4+ nitrogen, SP3, in protonated amine

298 Forcefield-Based Simulations/October 1997


COMPASS atom types

Table 29. Atom types—COMPASS (Page 3 of 6)

atom
general class type description
n4o nitrogen, SP3, in amine oxide
oxygen types
o oxygen, generic
o-2 oxygen, anion in metal oxides (O-2)
o1- oxygen, SP2, in carboxylate
o12 oxygen, SP2, in nitro group (-NO2)
o1= oxygen, SP2, in carbonyl
o1=* oxygen, in CO2
o1c oxygen, in CO
o1n oxygen, in NO
o1o oxygen, in O2
o2 oxygen, SP3, generic
o2* oxygen, SP3, in water
o2a oxygen, SP2, aromatic, in 5-membered ring
o2b oxygen, SP3, bridge atom in anhydrides
o2c oxygen, SP3, in acids
o2e oxygen, SP3, in ethers
o2h oxygen, SP3, in alcohol
o2s oxygen, SP3, in esters
o2z oxygen, in siloxanes and zeolites
o3 oxygen, in H3O+
o3z oxygen, bridge in zeolites
sulfur types
s sulfur, generic
s1= sulfur, 1 double bond (=S)
s2 sulfur, 2 single bonds (-S-)
s2= sulfur, 2 double bonds (=S=)
s2a sulfur, in aromatic ring (thiophene)
s3= sulfur, 3 bonds, 1 is double
s4 sulfur, 4 single bonds
s4= sulfur, 4 bonds, 2 are double
s6 sulfur, 6 single bonds

Forcefield-Based Simulations/October 1997 299


B. Forcefield Terms and Atom Types

Table 29. Atom types—COMPASS (Page 4 of 6)

atom
general class type description
phosphorus types
p phosphorus, generic
p4= phosphorus, in phosphazenes
halogen types
br bromine, generic
br- bromine, anion
br1 bromine, one bond
cl chlorine, generic
cl- chlorine anion
cl1 chlorine, one bond
cl12 chlorine, to a carbon that has 2 X
cl13 chlorine, to a carbon that has 3 X
cl14 chlorine, to a carbon that has 4 X
cl1p chlorine, in phosphazenes
f fluorine, generic
f- fluorine anion
f1 fluorine, one bond
f12 fluorine, to a carbon that has 2 F
f13 fluorine, to a carbon that has 3 F
f14 fluorine, to a carbon that has 4 F
f1p fluorine, in phosphazene
i iodine, generic
i- iodine anion
i1 iodine, with one bond
metal types
Ag silver, metal
Al aluminum, metal
Au gold, metal
Cr chromium, metal
Cu copper, metal
Fe iron, metal
Mo molybdenum, metal

300 Forcefield-Based Simulations/October 1997


COMPASS atom types

Table 29. Atom types—COMPASS (Page 5 of 6)

atom
general class type description
Ni nickel, metal
Pb lead, metal
Pd palladium, metal
Pt platinum, metal
Sn tin, metal
W tungsten, metal
metal ion types
ca+ calcium ion (Ca2+)
cu+2 copper ion (Cu2+)
fe+2 iron ion (Fe2+)
mg+2 magnesium ion (Mg2+)
zn+2 zinc ion (Zn2+)
alkali metal and ion types
K potassium, metal
Li lithium, metal
Na sodium, metal
cs+ cesium ion (Cs+)
k+ potassium ion (K+)
li+ lithium ion (Li+)
na+ sodium ion (Na+)
rb+ rubidium ion (Rb+)
zeolite and silicon types
al4z aluminum, in zeolites
si silicon, generic
si4 silicon, generic with 4 bonds
si4c silicon, in siloxane with heavy atoms only
si4z silicon, in zeolites
noble gas types

Forcefield-Based Simulations/October 1997 301


B. Forcefield Terms and Atom Types

Table 29. Atom types—COMPASS (Page 6 of 6)

atom
general class type description
ar argon
he helium
kr krypton
ne neon
xe xenon

CVFF atom types

Table 30. Atom types—CVFF (Page 1 of 4)

atom
general class type description
hydrogen types
d general deuterium (equiv. to h)
dw deuterium in heavy water (equiv. to h*)
h generic hydrogen bonded to C, Si, or H
hc hydrogen bonded to C (equiv. to h)
hi hydrogen in charged imidazole ring (equiv. to hn)
hn hydrogen bonded to N
ho hydrogen bonded to O
hp hydrogen bonded to P (equiv. to h)
hs hydrogen bonded to S
hw hydrogen in water (equiv. to h*)
h* hydrogen in water
h+ charged hydrogen in cation (equiv. to hn)
hscp hydrogen in SPC water model
htip hydrogen in TIP3P water model
carbon types
c generic sp3 carbon
ca general amino acid alpha carbon (sp3) (equiv. to cg)
cg sp3 alpha carbon in glycine
ci sp2 aromatic carbon in charged imidazole ring (his+)
cn sp3 carbon bonded to N (equiv. to cg)

302 Forcefield-Based Simulations/October 1997


CVFF atom types

Table 30. Atom types—CVFF (Page 2 of 4)

atom
general class type description
3
co sp carbon in acetal (equiv. to c)
coh sp3 carbon in acetal with hydrogen (equiv. to cg)
cp sp2 aromatic carbon (partial double bonds)
cr carbon in guanidinium group (HN=C(NH2)2) (arg)
cs sp2 carbon in 5-membered ring next to S
ct sp carbon involved in triple bond
c1 sp3 carbon bonded to 1 H, 3 heavy atoms (equiv. to
cg)
c2 sp3 carbon bonded to 2 H’s, 2 heavy atoms (equiv. to
cg)
c3 sp3 carbon in methyl (CH3) group (equiv. to cg)
c5 sp2 aromatic carbon in 5-membered ring
c3h sp3 carbon in 3-membered ring with hydrogens
(equiv. to cg)
c3m sp3 carbon in 3-membered ring (equiv. to c)
c4h sp3 carbon in 4-membered ring with hydrogens
(equiv. to cg)
c4m sp3 carbon in 4-membered ring (equiv. to c)
c′ sp2 carbon in carbonyl (C=O) group of amide
c" carbon in carbonyl group, not amide (equiv. to c′)
c* carbon in carbonyl group, not amide (equiv. to c′)
c– carbon in charged carboxylate (COO-) group (equiv.
to c′)
c+ carbon in guanidinium group (equiv. to cr)
c= nonaromatic end doubly bonded carbon
c=1 nonaromatic, next-to-end doubly bonded carbon
c=2 nonaromatic doubly bonded carbon
nitrogen types
n generic sp2 nitrogen in amide
na sp3 nitrogen in amine (equiv. to n3)
nb sp2 nitrogen in aromatic amine (equiv. to n3)
nh sp2 nitrogen in 5- or 6-membered ring, with hydrogen
attached (equiv. to np)
nho sp2 nitrogen in 6-membered ring, next to a carbonyl
group and with a hydrogen (equiv. to np)
nh+ protonated nitrogen in 6-membered ring
ni sp2 nitrogen in charged imidazole ring (his+)

Forcefield-Based Simulations/October 1997 303


B. Forcefield Terms and Atom Types

Table 30. Atom types—CVFF (Page 3 of 4)

atom
general class type description
2
nn sp nitrogen in aromatic amine (equiv. to n3)
np sp2 nitrogen in 5- or 6-membered ring
npc sp2 nitrogen in 5- or 6-membered ring, bonded to a
heavy atom (equiv. to np)
nr sp2 nitrogen in guanidinium group (HN=C(NH2)2)
nt sp nitrogen involved in triple bond
nz sp nitrogen in N2
n1 sp2 nitrogen in charged arginine
n2 sp2 nitrogen in guanidinium group (HN=C(NH2)2)
n3 sp3 nitrogen with 3 substituents
n4 sp3 nitrogen in protonated amine
n3m sp3 nitrogen in 3-membered ring (equiv. to n3)
n3n sp2 nitrogen in 3-membered ring (equiv. to n)
n4m sp3 nitrogen in 4-membered ring (equiv. to n3)
n4n sp2 nitrogen in 4-membered ring (equiv. to n)
n+ sp3 nitrogen in protonated amine (equiv. to n4)
n= nonaromatic end doubly bonded nitrogen
n=1 nonaromatic, next-to-end doubly bonded nitrogen
n=2 nonaromatic doubly bonded nitrogen
oxygen types
o generic sp3 oxygen
oc sp3 oxygen in ether or acetal (equiv. to o)
oe sp3 oxygen in ester (equiv. to o)
oh oxygen bonded to H
op sp2 aromatic oxygen in 5-membered ring
o3e sp3 oxygen in 3-membered ring (equiv. to o)
o4e sp3 oxygen in 4-membered ring (equiv. to o)
o′ oxygen in carbonyl (C=O) group
o* oxygen in water
o– oxygen in charged carboxylate (COO–) group
oscp oxygen in SPC water model
otip oxygen in TIP3P water model
sulfur types
s sp3 sulfur
sc sp3 sulfur in methionine (C–S–C) group (equiv. to s)
sh sulfur in sulfhydryl (SH) group

304 Forcefield-Based Simulations/October 1997


CVFF atom types

Table 30. Atom types—CVFF (Page 4 of 4)

atom
general class type description
sp sulfur in aromatic ring, e.g., thiophene
s1 sulfur involved in S–S disulfide bond (equiv. to s)
s3e sulfur in 3-membered ring (equiv. to s)
s4e sulfur in 4-membered ring (equiv. to s)
s′ sulfur in thioketone (>C=S) group
s– partial-double sulfur bonded to something that is
bonded to another partial-double oxygen or sulfur
phosphorus
p general phosphorous atom
halogen types
br bromine bonded to a carbon
cl chlorine bonded to a carbon
f fluorine bonded to a carbon
i covalently bound iodine
ion types
Br bromide ion
ca+ calcium ion (Ca2+)
Cl chloride ion
Na sodium ion
argon
ar argon atom
silicon
si silicon atom
other
lp lone pair
nu null atom for relative free energy calculations

Forcefield-Based Simulations/October 1997 305


B. Forcefield Terms and Atom Types

CVFF_aug atom types


As in CVFF (Table 30) with these additional atom types:

Table 31. Atom types—CVFF_aug (Page 1 of 3)

atom
element type charge description
Si sz 2.4 tetrahedral silicon in a zeolite or silicate
O oz -1.2 oxygen in a zeolite or silicate
Al az 1.4 tetrahedral aluminum atom in zeolites
P pz 3.4 phosphorous atom in zeolites
Ga ga 1.4 gallium atom in zeolites
Ge ge 2.4 germanium atom in zeolites
Ti tioc 1.6 titanium (octahedral) in zeolites
Ti titd 2.4 titanium (tetrahedral) in zeolites
Li li+ 1.0 lithium ion in zeolites
Na na+ 1.0 sodium ion in zeolites
K k+ 1.0 potassium ion in zeolites
Rb rb+ 1.0 rubidium ion in zeolites
Cs cs+ 1.0 cesium ion in zeolites
Mg mg2+ 2.0 magnesium ion in zeolites
Ca ca2+ 2.0 calcium ion in zeolites
Ba ba2+ 2.0 barium ion in zeolites
Cu cu2+ 2.0 copper(II) ion in zeolites
F f- -1.0 fluoride ion in zeolites
Cl cl- -1.0 chloride ion in zeolites
Br br- -1.0 bromide ion in zeolites
I i- -1.0 iodide ion in zeolites
S so4 2.8 sulfur in sulfate ion to be used with oz

Si sy 4.0 tetrahedral silicon atom in clays


O oy -2.0 oxygen atom in clays
Al ay 3.0 octahedral aluminum atom in clays
Al ayt 3.0 tetrahedral aluminum atom to be used with oy
Na nac+ 1.0 sodium ion in clays
Mg mg2c 2.0 octahedral magnesium ion in clays

306 Forcefield-Based Simulations/October 1997


CVFF_aug atom types

Table 31. Atom types—CVFF_aug (Page 2 of 3)

atom
element type charge description
Fe fe2c 2.0 octahedral Fe(II) ion in clays
Mn mn4c 4.0 manganese (IV) ion to be used with oy
Mn mn3c 3.0 manganese (III) ion to be used with oy
Co co2c 2.0 cobalt (II) ion to be used with oy
Ni ni2c 2.0 nickel (II) ion to be used with oy
Li lic+ 1.0 lithium ion to be used with oy
Pd pd2+ 2.0 palladium(II)
Ti ti4c 4.0 titanium (octahedral) to be used with oy
Sr sr2c 2.0 strontium ion to be used with oy
Ca ca2c 2.0 calcium ion to be used with oy
Cl cly- -1.0 chloride ion to be used with oy
H hocl 1.0 hydrogen in hydroxyl group in clays
P py 5.0 phosphorous atom to be used with oy
V vy 4.0 tetrahedral vanadium to be used with oy
N nh4+ 1.0 united-atom type for ammonium ion to be used with oy
S so4y 6.0 sulfur in sulfate ion to be used with oy
Li lioh 1.0 lithium ion in water to be used with o*
Na naoh 1.0 sodium ion in water to be used with o*
K koh -1.0 potassium ion in water to be used with o*
F foh -1.0 fluoride ion in water to be used with o*
Cl cloh -1.0 chloride ion in water to be used with o*
Be beoh 0.0 beryllium (II) in water to be used with o*
Al al 0.0 aluminum metal
Na Na 0.0 sodium metal
Pt Pt 0.0 platinum metal
Pd Pd 0.0 palladium metal
Au Au 0.0 gold metal
Ag Ag 0.0 silver metal
Sn Sn 0.0 tin metal
K K 0.0 potassium metal
Li Li 0.0 lithium metal
Mo Mn 0.0 molybdenum metal
Fe Fe 0.0 iron metal
W W 0.0 tungsten metal
Ni Ni 0.0 nickel metal

Forcefield-Based Simulations/October 1997 307


B. Forcefield Terms and Atom Types

Table 31. Atom types—CVFF_aug (Page 3 of 3)

atom
element type charge description
Cr Cr 0.0 chromium metal
Cu Cu 0.0 copper metal
Pb Pb 0.0 lead metal

ESFF atom types


The atom types in ESFF for the first three rows of the periodic table
are listed in Table 32. For the atom types that are not listed in
Table 32, please refer to the $BIOSYM_LIBRARY/esff.frc file.
Atom-typing rules in ESFF The names of the atom types for metals are based on the symmetry
of the metal complex and on both the oxidation state and coordi-
nation number of the metal. For example, Ag024t indicates an Ag
that has an oxidation level of 2+, is 4-coordinated, and has tetrahe-
dral symmetry. The following table lists the abbreviations used for
the symmetry types:

symbol symmetry
l C2v
s D4h
t Td
o Oh
p D5h
h D2h, D3h
d D∞

Some metals with differing oxidation numbers and symmetries


may be handled with the same parameters. Here, a generic metal
atom type is used.
The oxidation number of a metal is determined according to:

308 Forcefield-Based Simulations/October 1997


ESFF atom types

∑ ∑
1 Fq j
N ox = Q t – ------- Fq i – -------- Eq. 136
Nm Nb

where Qt is the total charge on the complex, and the sums over Fqi
and Fqj are the sums of formal charges on atoms not bonded to
metals and bonded to metals, respectively. Nm is the number of
metal atoms in the complex, and Nb is the number of metal atoms
bonded to the jth ligand atom.

Table 32 Atom types for the first three rowsa of the periodic table—ESFF (Page 1 of
4)

gener- atom
al class type description
hydrogen types
dw deuterium in heavy water
h generic hydrogen
hi hydrogen in charged imidazole ring (equiv. to h*)
hw hydrogen in water (equiv. to h*)
h* hydrogen bonded to nitrogen, oxygen
h+ charged hydrogen in cations
carbon types
c generic sp3 carbon
ca general amino acid alpha carbon (sp3) (equiv. to c)
cg sp3 alpha carbon in glycine (equiv. to c)
ci carbon in charged imidazole ring (equiv. to cp)
co sp3 carbon in acetals (equiv. to c)
coh sp3 carbon in acetals with hydrogen (equiv. to c)
cp sp2 aromatic carbon with partial double bond
cr c in neutral arginine (equiv. to c=)
cs sp2 aromatic carbon in 5-membered ring next to S (equiv. to cp)
ct sp carbon involved in a triple bond
ct3 sp carbon involved in CO
c1 sp3 carbon with 1 H 3 heavies (equiv. to c)
c2 sp3 carbon with 2 H’s, 2 heavies (equiv. to c)
c3 sp3 carbon with 3 H’s, 1 heavy (equiv. to c)
c5 sp2 aromatic carbon in 5-membered ring
c5p sp2 aromatic carbon in 5-membered big pi ring
c’ carbon in carbonyl (C=O) group

Forcefield-Based Simulations/October 1997 309


B. Forcefield Terms and Atom Types

Table 32 Atom types for the first three rowsa of the periodic table—ESFF (Page 2 of
4)

gener- atom
al class type description
c- c in charged carboxylate
c+ c in guanidinium group (equiv. to c=)
c= generic sp2 carbon
nitrogen types
n generic sp2 nitrogen (in amides)
na sp3 nitrogen in amines
nb sp2 nitrogen in aromatic amines
nh sp2 (3 [sp2] 2 [p]) nitrogen in 5-membered ring
nho sp2 nitrogen in 6-membered ring
ni nitrogen in charged imidazole ring (equiv. to nh)
no sp2 nitrogen in oxides of nitrogen
np sp2 nitrogen in 5-membered ring
nt sp nitrogen involved in a triple bond
nt2 central nitrogen involved in azide group
nz sp nitrogen in N2
n1 sp2 nitrogen in charged arginine (equiv. to n=)
n2 sp2 nitrogen (NH2) in guanidinium group (HN=C(NH2)2) (equiv. to n=)
n4 sp3 nitrogen with 4 substituents (equiv. to n+)
n+ sp3 nitrogen in protonated amines
n= sp2 nitrogen in neutral arginine (double bond)
oxygen types
o generic sp3 oxygen in alcohol, ether, or acid group
oa sp3 oxygen in ester or acid
oc sp3 oxygen in ether or acetalsE (equiv. to o)
oh oxygen bonded to hydrogen (equiv. to o)
op sp2 aromatic in 5-membered ring
os oxygen bonded to two silicons
ot oxygen with hybridization sp
o1 oxygen bonded to oxygen
o’ oxygen having a single double bond
o* oxygen in water
o- double bonded oxygen in charged carboxylate COO– (equiv. to o’)
sulfur types
s sp3 sulfur
sp sulfur in an aromatic ring (e.g., thiophene)

310 Forcefield-Based Simulations/October 1997


ESFF atom types

Table 32 Atom types for the first three rowsa of the periodic table—ESFF (Page 3 of
4)

gener- atom
al class type description
3
s1 sp sulfur involved in (S-S) group of disulfides
s2d sulfur with oxidation number 4, two double sigma bond
s3d sulfur with oxidation number 4, three sigma bond, (C3v)
s4d sulfur with oxidation number 6, four sigma bond, (Td)
s4l sulfur with coordination number 4 (C2v)
s5l sulfur with coordination number 5 (D4h, C2v)
s5t sulfur with coordination number 5 (D3h)
s6 sulfur with coordination number 6 (D4h, D2h)
s6o sulfur with coordination number 6 (Oh)
s’ S in thioketone group
s- double bonded sulfur in charged phosphate PSS– or PSO– (equiv. to s’)
phosphorus types
p general phosphorous atom
p4d phosphorous atom with oxidation number 5 and 4 sigma bonds (CTd)
p4l phosphorous atom with oxidation number 5 and 3 sigma bonds (C2v)
p5l phosphorous atom with oxidation number 5 and 3 sigma bonds (D4h, C2v)
p5t phosphorous atom with oxidation number 5 and 3 sigma bonds (D3h)
p53 phosphorous atom with oxidation number 5 and 3 sigma bonds (planar)
p6 phosphorous atom with oxidation number 5 and 3 sigma bonds (D4h, D2h)
p6o phosphorous atom with oxidation number 5 and 3 sigma bonds (Oh)
p’ sp2 phosphorous atom
other second-row elements
b boron sp3 atom
bt boron sp atom
b’ boron sp2 atom
Be berillium atom
Be+ berillium cation
Be+2 berillium cation
f fluorine atom
F fluorine anion
Li lithium atom with s orbitals involved in bonding
Li+ lithium ion
ne neon atom
other third-row elements
ar argon atom

Forcefield-Based Simulations/October 1997 311


B. Forcefield Terms and Atom Types

Table 32 Atom types for the first three rowsa of the periodic table—ESFF (Page 4 of
4)

gener- atom
al class type description
Al aluminum atom
Al033 aluminum atom with coordination number 3
Al034 aluminum atom with coordination number 4
Al035 aluminum atom with coordination number 5
Al035s aluminum atom with coordination number 5 (D4h)
Al035t aluminum atom with coordination number 5 (D3h)
Al036 aluminum atom with coordination number 6 (D4h, D2h)
Al036o aluminum atom with coordination number 6 (Oh)
cl chlorine atom
Cl chlorine ion
cl’ chlorine atom in oxo acid
he helium atom
Mg magnesium atom
Mg025 magnesium atom with 5 coordinations
Mg025s magnesium atom with 5 coordinations (D4h)
Mg025t magnesium atom with 5 coordinations (D3h)
Mg026 magnesium atom with 6 coordinations (D4h, D2h)
Mg026 magnesium atom with 6 coordinations
o
Mg+ magnesium +1 cation (Oh)
Mg+2 magnesium +2 cation
Na sodium atom
Na+ sodium ion
si silicon atom
si4l silicon atom (D3h, C2v)
si5l silicon atom (D4h)
si5t silicon atom (D3h)
si6 silicon atom (D4h, D2h)
si6o silicon atom (Oh)
si’ sp2 silicon atom
a
Please see $BIOSYM_LIBRARY/esff.frc for heavier elements.

312 Forcefield-Based Simulations/October 1997


PCFF—additional atom types

PCFF—additional atom types


This table lists those atom types included in PCFF in addition to the
CFF91 atom types listed in Table 27.

Table 33. Additional atom types—PCFFa (Page 1 of 2)

atom
general class type description
hydrogen types
hn2 amino hydrogen
ho2 hydroxyl hydrogen
carbonyl functional groups (C and O)
c_0 carbonyl carbon of aldehydes, ketones
c_1 carbonyl carbon of acid, ester, amide
c_2 carbonyl carbon of carbamate, urea
cz carbonyl carbon of carbonate
o= oxygen double bonded to O, C, S, N, P
o_1 oxygen in carbonyl group
o_2 ester oxygen
oo oxygen in carbonyl group, carbonate only
oz ester oxygen in carbonate
phosphorus types
p= phosphazene phosphorous atom
silicon-related types
si silicon atom
sio siloxane silicon
hsi silane hydrogen
osi siloxane oxygen
noble gas types
he helium
ar argon
kr krypton
ne neon
xe xenon
metal atoms and halogen ions
Ag silver metal
Al aluminium metal

Forcefield-Based Simulations/October 1997 313


B. Forcefield Terms and Atom Types

Table 33. Additional atom types—PCFFa (Page 2 of 2)

atom
general class type description
Au gold metal
Br bromine ion
Cl chlorine ion
Cr chromium metal
Cu copper metal
Fe iron metal
K potassium metal
Li lithium metal
Mo molybdenum metal
Na sodium metal
Ni nickel metal
Pb lead metal
Pd palladium metal
Pt platinum metal
Sn tin metal
W tungsten metal
zeolite-related types
az aluminium atom in zeolites
oss oxygen atom betweem two silicons
osh oxygen atom in terminal hydroxyl group on silicon
oah oxygen atom in terminal hydroxyl group on aluminium
oas oxygen atom between aluminium and silicon
ob oxygen atom in bridging hydroxyl group
sz silicon atom in zeolites
hb hydrogen atom in bridging hydroxyl group
hoa hydrogen atom in terminal hydroxyl group on alumin-
ium
hos hydrogen atom in terminal hydroxyl group on silicon
aPCFF
also includes all the atom types listed for CFF91 (page 291).

314 Forcefield-Based Simulations/October 1997


Index

A equivalences, 85
forcefield parameter assignment, 84
ab initio, 11 wildcarding, 85, 87
ABM4 integrator, 196 wildcarding, precedence, 87
compared with Verlet velocity integrator, X, 85
204 atomic positions, poorly defined, 101
absolute free energy atomic velocities, 211
algorithm, 259 atoms
calculation, 258 fixing, 102
conformational searching, 267 forces on, 181
constraining dynamics, 268 types, 79, 283
convergence, 271 typing, 78
dynamics running averages, 272 atom-type charge, 81
errors, 269, 271 Austin, N., 67, 280
example, 264 automatic atom types, 31, 61
ideal solid, 259
λ intervals, number, 269
reference state, 263, 265 B
setting up, 267
spring constants, 268, 269, 272 Bash, P. A., 279
Adams–Bashforth–Moulton integrator, 196 Becker, J. M., 275
adiabatic compliance tensor, 210 Berendsen equation, 217
adiabatic compressibility, 210 Berendsen, H. J. C., 217, 224, 226, 273, 278, 279
adiabatic ensembles, 205 Beveridge, D. L., 180, 277
Alagona, G., 280 BKS forcefield, 65
Allen, M. P., 116, 210, 222, 227, 273 bold type, meaning, 7
Allinger, N. L., 70, 276, 279 bond hybridization, 50
AMBER, 70 bond increments, 31, 61
bonds
atom types, 55
atom types for carbohydrates, 55 between symmetrically related objects, 120
characteristics, 22 constraints, 109, 237
distance-dependent dielectric, 126 Born–Oppenheimer equation, 11
functional form, 54 Born, M., 11, 273
hydrogen-bond term, 54 Boyd, D. B., 274
1–4 nonbond interactions, 123 Brady, J. W., 275
amino acids, 23, 57, 61 Brooks, B. R., 22, 56, 273
Andersen, H. C., 209, 227, 237, 273 Brooks, C. L., 255
angles Brooks, C. L., III, 127, 234, 273
constraints, 109, 237 Brown, D., 227, 273
atom types Brown, F. K., 279
assigning, 79 Bruccoleri, R. E., 273
asterisks, 85 Brunger, A. T., 273

Forcefield-Based Simulations/October 1997 315


.

buffer region, 134 charges


building models, 76 assigning, 83
bulk modulus, 211 CHARMm
Burchart forcefield, 65 characteristics, 22, 56
implementation, 65 functional form, 56
Burchart–Dreiding forcefield, 66 Cheetham, A. K., 277, 280
Burchart–Universal forcefield, 66 chemical perturbations, 251
Burchart, E. de Vos, 65, 273 chirality, 54
Ciccotti, G., 278
Clark, J. H. R., 227, 273
C classic harmonic oscillator, 13
calculation classical forcefields
dynamics, 238 availability, 53
calculations types, 53
controlling, 15, 100, 231 Colwell, K. S., 273, 274, 278
loops, 231 common structures, finding, 233
minimization, 173 compressibility, 211
carboxylates, 136 computation
Carruthers, L. M., 277 costs, 102, 167
Cartesian and crystal axes, 114, 115 efficiency, 128
Cartesian coordinates, 18, 114 time, 118
Casewit, C. J., 21, 42, 273, 274, 278 conformation
Case, D. A., 280 searches, 232
Catlow, C. R. A., 147, 274 conformational changes
cell multipole method, 138 forcing, 103
conformational energies, 182
accuracy, 142
and nonbond interactions, 143 conformational energy barriers, 231
and system size, 142 conformational searches, 191, 232
CPU time, 142 conformations
derivation of, 139 comparing, 103
Cerius2•Morphology module forcefields, 61 searches, 191, 208
CFF forcefields stable, 154
availability, 26 consensus dynamics, 233
functional form, 28 Consistent Valence Forcefield, see CVFF and
CFF91 forcefields
atom types, 291, 313 constraints, 109
automatic parameter assignment, 87 creating, 109
characteristics, 20 definition, 15, 98
functional form, 28, 31 dynamics, 235
out-of-plane coordinate, 28 Coulombic
“CFF93”, 69 interactions, 28, 122
CFF95 terms, 18, 60
additional information, 31 cross terms, 16, 28, 60, 177
limitations, 31 definition, 18
charge groups importance of, 60
cutoffs and, 135 Cross, P. C., 280
defined, 135 crystals

316 Forcefield-Based Simulations/October 1997


.

environment, 113, 149 dynamics, 189


phase transitions, 227 ABM4, 196
surface contributions, 149 achieving equilibrium, 217
cutoff distance algorithms, 193
effect on van der Waals energy, 129, 130 Andersen method, 220, 227
switching function, 130 animation, 238
cutoffs appropriate ensemble, 211
double, 136 artifacts, 201
methods, 127 Berendsen method, 217, 226
periodic systems, 116, 117 blowing up, 202
variable names in versions 2.9.5 and 95.0, canonical ensemble, 208
132 colliding hydrogen atoms example, 199
CVFF computing time, 235
atom types, 61 consensus, 233
automatic parameter assignment, 87 constant-energy, constant-volume ensem-
characteristics, 22 ble, 207
functional form, 58 constant-pressure, constant-enthalpy en-
semble, 209
constant-temperature, constant-pressure
D ensemble, 208
constant-temperature, constant-stress en-
Dauber-Osguthorpe, P., 57, 274 semble, 208
Dauber, P., 275, 276 constant-temperature, constant-volume
Decius, J. C., 280 ensemble, 208
Deem, M. W., 149, 274 constraints, 237
Demontis, P., 67, 274 constraints and restraints, 235
Denouden, C. J. J., 277 continuing a run, 246
density, periodic boundary conditions, 222 data-collecting stage, 239
diagonal terms, 59 definition, 12
direct velocity scaling, 216
dielectric constant, 60, 126 double cutoffs and, 137
Dielectric parameter, 127 energy conservation example, 204
diffusion coefficients, 191 equilibration, 217, 239
Ding, H. Q., 139, 142, 274 files, 247
DiNola, A., 273 generating statistical ensembles, 191
Dinur, U., 20, 274, 276, 277 Hoover, 217
dipole–dipole interactions, 122 impulse, 233
disordered periodic systems, 137 initial velocities, 212
distance constraints, 109 integration errors, 199, 204
distance restraints, 106 integrators, 193, 194
Langevin, 234
potential form, 106, 108 lengths of run stages, 240
distorted structures, 177 limitations, 199
Dist_Dependent parameter, 127 liquid simulations, 227
DNA, 23 microcanonical ensemble, 207
Dreiding forcefield multiple timesteps, 136
atom type naming, 52 Nosé, 217
availability, 35 Nosé–Hoover, 217
characteristics, 22 Nosé–Hoover method and fictitious mass,
versions, 52 218, 219

Forcefield-Based Simulations/October 1997 317


.

Nosé–Hoover method and integrator, 220 temperature effects, 203


Nosé–Hoover method and relaxation time, theory, 192
219 timestep, 197
Nosé–Hoover method and timestep, 219 timestep length, 199, 236
Nosé–Hoover thermostat, 217 trajectories, 238
obtaining accurate fluctuations, 210 trajectory, and absolute free energy calcula-
Parrinello-Rahman method, 228 tion, 261
periodic systems, 224 true canonical ensemble, 217
preparing the system, 240 types, 229
pressure, 220, 226 unit cell changes, 208, 209
pressure and cell size, 227 uses, 3, 189, 191
pressure control, 205, 208 Verlet velocity integrator, 195
pressure, effect of cutoffs, 227 with minimization, 232
quenched, 232
RATTLE algorithm, 237
reinitializing velocities, 247 E
relaxation time, 217
repeating a run, 193 Eichinger, B. E., 278
reproducibility, 193 Einstein solid, 262
restarting a run, 246 elastic constant, 211
results, 238 electronic motion equation, 11
reviewing setup, 245 electrostatic interactions, 54, 60
Runge–Kutta-4, 197 energy
setting up, 239, 241 barriers, 233
setting up SHAKE or RATTLE constraints, contributions of terms, 15
235 enthalpy, 209
SHAKE algorithm, 236 at absolute zero, 180
simulated annealing, 232 binding, 180
snapshots, 238 entropy, 180, 183, 271
specifying output, 244 equilibrium thermodynamic properties, 210
specifying run conditions, 242
specifying the integrator, 194 Ermer, O., 4, 166, 274
specifying the pressure- and stress-control ESFF
methods, 226 angle rules, 39
specifying the temperature-control meth- angle types, 38
od, 215 atom types, 308
specifying the thermodynamic ensemble, availability, 35
206 bond energy, 37
specifying the timestep, 198 bond rules, 37
specifying types of simulations, 230 characteristics, 21
stability of numerical integration, 199 charges, 39
stages, 242, 246 electronegativity, 40
starting a new run, 246 energy expressions, 37
starting the run, 245 functional form, 37
statistical ensembles, 205, 229 hardness, 40
stochastic boundary, 234 ionization potential, 41
stress control, 209 out-of-plane term, 39
target temperature, 216 periodic table coverage, 42
temperature control, 205, 208, 209, 214, 217 torsion rules, 39
temperature cycles, 232 torsion term, 39

318 Forcefield-Based Simulations/October 1997


.

van der Waals interactions, 41 broadly applicable, 21, 35


ethane, 255 Cerius2•Morphology module, 67
Ewald calculations choosing, 19
accuracy, 150 classical, 22
and nonbond interactions, 150 components, 12
and system size, 150 crystal morphology, 67
computational load, 147 CVFF, 57
CPU time, 150 definition, 4, 12
dipole moment, 145
2D periodicity, 150 editing, 90
Ewald, P. P., 143, 274 ESFF, 37
Ewig, C. S., 20, 31, 275, 276 functional forms, 12, 16
explicit image model, 117, 118 general sequence of activities for using, 75
glass, 62
graphical and standalone-mode use, 75
F graphical molecular modeling interfaces,
75
fentanyl, 264 Homans AMBER, 22
phi,psi map, 265, 267 importance, 4
Field, M., 275
internal coordinates, 18
files
limitations, 15
.car, 257, 268
.cli, 268, 272 mechanical approach, limitations of, 15
.cor, 268 mechanical vs. quantum mechanical ap-
dynamics, 247 proach, 13
forcefield, 85 MSXX, 64
.frc, 31, 61 old, 68, 69
.his, 270 parameters and atom types, 84
history, 270 purpose, 13
.inp, 257 quantum calculations and, 27
.mdf, 79, 257 quantum mechanical parameterization, 26
molecular data, 79 reading parameters, 76
.out, 270, 272 rule-based, 21, 35
output, 270 second-generation, 20, 26
.tot, 257, 270, 272
Fisher, J., 279 selecting, 78
Flannery, B. P., 278 silicas, 65
sorption, 67
Fleischman, S. H., 255
sorption onto zeolite structures, 67
Fletcher, R., 165, 178, 274
special-purpose, 22
fluids, nonviscous, 221
standalone, 75
fluorescence depolarization rates, 238
summary table, 24
forcefields
terms, 12
aluminophosphates, 65
AMBER, 53 types, 20
and modeling programs, 1 types of molecules, 23
atom type equivalences, 85 unsupported, 68, 69
atom types, 79 using, 75
automatic, 86 zeolites, 65
automatic parameter assignment, 86 frictional coefficients, 234, 238

Forcefield-Based Simulations/October 1997 319


.

G Hoover, W. G., 217, 262, 275


Hwang, J. K., 180, 276
Garofalini, S. H., 62, 274, 276, 278, 281 Hwang, M.-J., 20, 27, 31, 69, 276, 277
Gaussian–Legendre quadrature method, 254
hydrocarbons, 61
Gelin, B. R., 277
hydrogen atoms, colliding, 200
general multipole moments, 140
generating Cartesian coordinates, 77 hydrogen bonds, 60
Genest, M., 274 hydrostatic pressure, 208, 221, 224, 228
Ghio, C., 280 hypervalent molecules, 49
ghost
images, 118 I
list regeneration, 118
molecules, 117, 118 Ikels, K. G., 277
Giammona, A., 275 image centering, 119
glass forcefield, 61, 62 inexact line searches, 169
automated setup, 63
instantaneous kinetic temperature, 213
functional form, 62
global minimum, 179, 231 instantaneous pressure function, 222
glycoproteins, 55 interatomic distances, controlling, 106
Goddard, W. A., 44, 64, 83, 125, 146, 147, 150, internal energy, 210
274, 276, 277, 278 International Tables for Crystallography, 114
Greengard, L., 138, 142, 274 IR line widths, 238
Gunsteren, W. F., 165, 274 isothermal compressibility, 210, 240
isothermal ensembles, 205
H
Haak, J. R., 273 J
Hagler, A. T., 20, 27, 31, 60, 125, 128, 180, 181, Jacucci, G., 180, 278
274, 275, 276, 277, 279, 280 Jorgensen, W. L., 255
Halgren, T. A., 21, 34, 35, 125, 275
Hamiltonian, 259
Harris, F., 150 K
Harvey, S. C., 125, 275
Kao, J., 70, 276
Ha, S. N., 55, 275
help, on-line, 6 Karasawa, N., 64, 125, 146, 147, 150, 274, 276
hemoglobin, 142 Karplus, M., 22, 56, 165, 273, 274, 276, 277
Hessian matrix, 169, 177 kinetic energy, 210
computational cost, 167, 169 kinetic energy of nuclei, 183
definition, 166 King, G., 280
from second derivatives, 172 Kirkwood, J. G., 180, 276
mathematical form, 169 Kitson, D. H., 128, 276
preconditioning, 172
Klein, M. L., 274
properties, 169
size, 167 Knaebel, K. S., 277
heterogenous atom pairs, 124 Kohler, A. E., 62, 276
Hill, J.-R., 20, 32, 275 Kollman, P. A., 279, 280
Homans, S. W., 22, 54, 55, 275 Kramer, G. J., 280

320 Forcefield-Based Simulations/October 1997


.

L advantages, 169, 170


Broyden–Fletcher–Goldfarb–Shanno, 169
lattices, periodic, 113 comparative efficiency, 172
Lee, M. A., 139, 142, 278 conjugate gradients, 164, 177
Lennard–Jones terms, 18, 60 Davidon–Fletcher–Powell, 169
Levitt, M., 162, 276 differences among Newton methods, 168
Levy, R. M., 234, 276 discontinuities in potential energy surface,
Lifson, S., 67, 162, 275, 276 sensitivities to, 137
limitations, 167
Liljefors T., 70, 276 Newton–Raphson, 166, 167, 177
Lipkwowitz, K. B., 274 Newton–Raphson, algorithm, 168
Li, S., 276 quasi-Newton-Raphson, 168
local minimum, 179 steepest descents, 161, 163
truncated Newton-Raphson, 170
minimum image model, 117, 225
M MMFF
Maple, J. R., 20, 27, 28, 31, 276, 277, 279 availability, 26
Maxwell–Boltzmann energy expression, 34
distribution, 212 modeling incomplete systems, 101
equation, 211 molecular dynamics, see dynamics
factor, 257 molecular mechanics, 12
Mayo, S. L., 22, 52, 69, 277 molecular modeling program, difinition, 5
McCammon, J. A., 180, 234, 276, 277, 280 Momany, F. A., 22, 56, 68, 277
McGuire, R. F., 277 monomer definitions, 76
McQuarrie, 220, 277 Montgomery, 273
methanol, 255 Morphology/Lifson forcefield, 67
Mezei, M., 180, 251, 277 Morphology/Momany forcefield, 68
Mezei’s algorithm, 251 Morphology/Scheraga forcefields, 68
Miller, G. W., 67, 277 Morphology/Williams forcefield, 68
minimization, 153 Morse potential, 59, 177
algorithms, 155, 176 harmonic potential, compared with, 60
constraints, 100 MSI’s website, 6, 125
convergence, 177, 178, 179 MSXX, 64
derivatives, 156, 178 Morse functional form, 65
efficiency, 155 parameterization, 64
energy expression, 155 Mumby, S. J., 279
energy zero, arbitrary, 180 Murcko, M. A., 55, 280
initial relaxation, 163
iteration, defined, 160
line search, 157, 158 N
progressive, 101 Nachbar, R. B., 21, 35, 275
restraints, 100
setting up, 173 Naider, F., 275
significance, 180 natural response functions, 210
strategies, 155, 173 neighbor list, 134
system size, 165, 167 Némethy, G., 68, 277
uses, 2 Newsam, J. M., 274
minimizers Newton’s equation of motion, 11, 192

Forcefield-Based Simulations/October 1997 321


.

Nguyen, D. T., 280 Postma, J. P. M., 273, 279


NMR relaxation times, 238 Post, M. F. M., 277
nonbond interactions, 60 potential energy surface, 10, 11, 16
cell multipole method, 138 discontinuities, 137
cutoff distance and, 128, 129 empirical fit, 11, 12
definition, 18 non-quadratic, 177
Ewald calculations, 143 saddle points, 186
system size and, 127 shape of, 183
nonbond list, 134 transition states, 186
nonbond terms, 16 water molecule, 17
non-hypervalent, 49 Potentials Forcefield, 81, 83
nonpolar hydrogens, 54 Pottle, M. S., 277
Norgett, M. J., 147, 274 Powell, M. J. D., 165, 278
normal mode analysis, 179 pressure, 210
Nosé, S., 217, 219, 277 and correct statistical ensemble, 225
Nowak, A. K., 277, 280 calculation, 222
changing, 226
nuclear motion equation, 11 control, 225
nucleic acids, 53 functional form, 220
number density, 238 periodic boundary conditions, 222
sign conventions, 222
thermodynamic, 222
O units, 221
off-diagonal terms, see cross terms virial theorem, 222
Olafson, B. D., 273, 277 Press, W. H., 178, 197, 254, 278
oligosaccharides, 55 Profeta, S. Jr., 280
Oppenheimer, J. R., 11, 273 proteins, 23, 53
Osguthorpe, D. J., 274 protonated amines, 136
p1_4 parameter, 123

P
Q
Parrinello, M., 209, 228, 277
partition function, 259 quantum effects, 180
peptides, 23 quantum mechanical probability function, 14
periodic boundary conditions, 224, 225 quartz, 144
definition, 113 Quirke, N., 180, 278
periodic systems
disordered, 137 R
dynamics, 208, 209, 222, 224
Ewald calculations, 143 radius of gyration, 238
Peterson, B. K., 277 Rahman, A., 228, 277
Pettitt, B., 273 Rappé, A. K., 21, 42, 43, 44, 83, 273, 274, 278
Pickett, S. D., 67, 277 rattle command, 109
polymer forcefield, 62 Ravimohan, C., 255
polymers, 53 Ray, J. R., 207, 210, 278
polysaccharides, 54, 55 reading models, 76
polyvinylidene fluoride forcefield, 61, 64 Reeves, C. M., 165, 274

322 Forcefield-Based Simulations/October 1997


.

Ree, F. H., 262 Shi, S., 279


Reidl, D., 114, 278 simulated annealing, 232
relative free energy simulation engine, definition, 4
benchmark calculation, 255 Singh, U. C., 180, 255, 279, 280
convergence, 254, 257 Sinha, S. K., 274
degrees of freedom, 256
drift, 258 Skiff, W. M., 278
example, 255 small molecules, 53
FDTI, 251, 253, 254 Smit, B., 277
finite difference thermodynamic integra- solvent, 113, 115
tion, see FDTI Sorption Demontis forcefield, 67
functional form, 254 Sorption Pickett forcefield, 67
perturbation method, 252
results, 257 Sorption Yashonath forcefield, 67
setting up, 256 Soules, T. F., 62, 279
thermodynamic cycle, 255 special-purpose forcefields
residue definitions, 76 availability, 61
restraints, 16, 154 characteristics, 62
angle, 110 specific heat, 210, 211, 240
definition, 15, 98 specific heat at constant pressure, 210
distance, 106
spectral densities, 238
dynamics, 235
inversion, 112 spline switching nonbond cutoff, 133
torsion, 110 Sprague, J. T., 70, 279
Rigby, D., 278, 279 Stapleton M. R., 280
RNA, 23 States, D. J., 273
Roberts, V. A., 274 statistical ensembles, 191, 205
Rokhlin, V. I., 138, 142, 274 Stern, P. S., 275
Rone, R., 22, 56, 277 Stockfisch, T. P., 276, 277
Rosenthal, A. B., 62, 278 Straatsma, T. P., 180, 279
rotational barriers, heights, 182
stress, 221, 228, 229
Runge–Kutta integrator, 197
control, 225
Ryckaert, J.–P., 236, 278
sign conventions, 222
tensor components, 208
S units, 221
stress–strain relationship, 209, 221, 228
saddle points, 178
structural similarities, 233
Sauer, J., 20, 32, 275
Sun, H., 20, 32, 278, 279
Scale_Terms Parameters, 123
Scheraga, H. A., 277 surface charges, 149
Schmidt, K. E., 138, 142, 278 Sussman, F., 280
Schrödinger equation, 10 Swaminathan, S., 273
Set Parameters, 127 Swift, J. F. P., 277
setting up calculations, 77 switching atom defined, 135
Sharon, R., 275 switching function, effect on nonbond energy,
shear stress, 221 132
shielded dielectric function, 126 symmetry relations, 119

Forcefield-Based Simulations/October 1997 323


.

T Universal forcefield
atom type naming, 43
Tai, J. C., 276, 279 availability, 35
Tembe, B. L., 180, 280 characteristics, 21
temperature, 211 charges, 44
and average kinetic energy, 212 implementation, 42
and correct statistical ensemble, 214 parameter generation, 43
calculation of, 213 versions, 44
control, 220
damping, 217
distribution of atomic velocities, 211 V
integration errors, 203 VALBOND
nonperiodic system, 213 characteristics, 21
nonzero, 189 valence exclusions, 123
periodic system, 213 valence interactions, 16
template atoms, 103 van Beest, B. W. H., 65, 280
template forcing, 103, 155 van der Waals
functional forms, 103 combination rules, 124
tensile stress, 221 cutoff distance and, 129, 130
Tesar, A. A., 62, 280 Ewald calculations, 144
tethering, 101, 105 interaction potential, 199
functional forms, 105 interactions, 28, 54, 60
Teukolsky, S. A., 278 van Gunsteren, W. F., 273
Thacher, T. S., 277 van Santen, R. A., 280
thermal expansion, 210, 211 Varshneya, A. K., 62, 279, 280
thermodynamic cycle, 180 Verlet leapfrog integrator, 194, 195
thermodynamic ensembles, 191 Verlet velocity integrator, 194, 195
thermodynamic response function, 207 compared with ABM4 integrator, 204
thermodynamic temperature, 213 Verlet, L., 195, 280
Thomas, J. M., 277, 280 Vetterling, W. T., 278
Tildesley, D. J., 116, 210, 222, 227, 273 vibrational calculations, 185
timestep length in dynamics, 197 computational costs, 187
Tolleanaere, J. P., 264 cross terms and, 187
torsions entropy, 187
forcefield accuracy and, 187
forcing, 110
free energy, 182
restraints, 110, 111 frequencies, 60, 167, 182, 185
trajectory from potential energy surface, 184
animation, 238 harmonic, 153
contents, 238 imaginary frequencies, 186
definition, 193 normal mode analysis, 167
length, 241 prerequisites, 185
uses, 238 quality, 185
transition states, 183 quantum mechanical equation, 186
uses, 182
virial, 210
U viscous fluids, 234
united-atom representation, 54 volume, periodic boundary conditions, 222

324 Forcefield-Based Simulations/October 1997


.

W
Waldman, M., 20, 125, 276, 280
Warshel, A., 180, 276, 280
Watanabe–Austin forcefield, 67
Watanabe, K., 67, 280
water, 57
constrained, 237
fixed-geometry model, 109, 237
SPC, 109, 237
TIP3P, 109, 237
Weiner, P., 280
Weiner, S. J., 22, 53, 70, 280, 290
Wiberg, K. B., 55, 280
Williams, D. E., 68, 280
Wilson, E. B., 28, 183, 280
Wolff, J., 274
Wolynes, P. G., 277
Woodcock, L. V., 62, 280

Y
Yang, Y., 279
Yan, L., 279
Yashonath, S., 67, 274, 280
Yuh, Y., 279

Z
zeolite forcefield, 61, 62
zero-point corrections, 180, 182, 186
Zirl, D. M., 62, 274, 281

Forcefield-Based Simulations/October 1997 325


.

326 Forcefield-Based Simulations/October 1997

Vous aimerez peut-être aussi