Vous êtes sur la page 1sur 246

UNIVERSITY OF SOUTHAMPTON

Simulation Studies of the Structure and Energetics of a Host-Guest System

Richard Humfry Henchman

A dissertation submitted in partial fullment of the requirements for the degree of Doctor of Philosophy at the University of Southampton.

Department of Chemistry December 1999

UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF SCIENCE Doctor of Philosophy SIMULATION STUDIES OF THE STRUCTURE AND ENERGETICS OF A HOST-GUEST SYSTEM by Richard Humfry Henchman Computer simulations are used to understand the binding behaviour of a number of amino acid derivatives in macrobicycle 12 in chloroform. Previous experimental work on this system indicated that macrobicycle 12, as well as being enantioselective for l amino acid derivatives, was able to stabilise the amide bond of the amino acid derivatives in the cis conformation. Monte Carlo (MC) simulations and free energy perturbation (FEP) calculations were able to successfully reproduce the observed behaviour. A detailed analysis was performed to rationalise the selectivity. In the course of this work, a methodology was developed that made feasible the simulations on the macrobicycle 12 system. The development of a novel MC sampling procedure and the replacement of explicit solvent by the GB/SA continuum model were found necessary to carry out realistic simulations. A new charge derivation called REPD was developed to produce OPLS-like charges by tting to the molecular electrostatic potential. Free energies of hydration were calculated to test both REPD charges and the relative performance of the FEP and the linear interaction free energy methods.

Acknowledgments
The rst person I am most deeply indebted to is my supervisor Jonathan Essex for his guidance, advice, friendliness, encouragement, tact, availability all the qualities one would want in a supervisor. Next I must express my gratitude to the Commonwealth Scholarship Commission for funding my Ph.D. and that long list of people at the British Council who always used to send me random amounts of free money once in a while. I am lucky to have a long list of other people to thank. First of all is Lewis who was very helpful to me, particularly at the start when I was settling down to live in strange new land. Then theres my Ph.D. brother Ian, with whom I shared all the excitement of each stage of the degree. Discussions with Ian were always very useful, and I must thank him especially for doing the statistical analysis on my free energy data using the LIE method. Andrew Jif Lemons help was invaluable to me in coming to terms with Unix, good old awk and the programs that wrote programs, without which this thesis could not physically have been completed. When Jif left it was Steve who principally helped me out in this area and moved me onto perl. Rich Ts help with the GB/SA model was particularly useful, especially given that most of it was done over the telephone. I should also mention Christophers help in running the PB calculations and for spellchecking his own name in the thesis. I must also thank Oz for maintaining the computers, Ian for helping me run simulations on the University machines, and Ed and Oliver for discussions about ab initio work. Being in the small room towards the end, Tim and I had a number of insightful conversations, some related to work. Julen also deserves a mention for some organic nomenclature. The assistance of Rob, Adrian Hickford, and others previously mentioned in proofreading was also of great assistance when my ability to pick up mistakes was stronly diminishing. I would also like to thank all my other friends and group members, too many to mention here, who made sure I enjoyed myself even on the (rare) occasions when work did not. Finally, I am most grateful to my family for allowing me to spend three years away from home to do this work, and for giving me all the support. ii

Contents
1 Introduction 1.1 1.2 1.3 1.4 Molecular Association. . . . . . . . . . . . . . . . . . . . . . . . . . . The Macrobicycle 12 Host-Guest System. Importance of the Host Binding Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 5 7 7 7 9 10 12 14 15 17 19 19 19 21 22 22 24 25 25

Aim of This Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Simulation and Free Energy Methods 2.1 Simulations Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7 2.2 The Role of Computer Simulations. . . . . . . . . . . . . . . . Representation of the System. . . . . . . . . . . . . . . . . . . The OPLS Force Field. . . . . . . . . . . . . . . . . . . . . . . The System to be Modelled and Other Approximations. . . .

Molecular Dynamics Simulations. . . . . . . . . . . . . . . . . Monte Carlo Simulations. . . . . . . . . . . . . . . . . . . . . Molecular Dynamics Versus Monte Carlo. . . . . . . . . . . . .

Free Energy Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 The Problem of Calculating Free Energies. . . . . . . . . . . . Free Energy Perturbation. . . . . . . . . . . . . . . . . . . . . Thermodynamic Integration. . . . . . . . . . . . . . . . . . . . Diculties With Free Energy Methods. . . . . . . . . . . . . . Fast Free Energy Methods. . . . . . . . . . . . . . . . . . . . . Choice of Free Energy Method. . . . . . . . . . . . . . . . . .

2.3

Applications of Free Energies Methods. . . . . . . . . . . . . . . . . . 2.3.1 Problems Calculating Free Energies of Binding. . . . . . . . . iii

CONTENTS 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.4 Relative Free Energies of Binding. . . . . . . . . . . . . . . . . Absolute Free Energies of Binding. . . . . . . . . . . . . . . . Previous Studies on Host-Guest Systems. . . . . . . . . . . . . Free Energies of Solvation. . . . . . . . . . . . . . . . . . . . . Partition Coecients. . . . . . . . . . . . . . . . . . . . . . . .

iv 25 26 27 29 30 31 33 33 33 35 37 37 38 38 39 41 41 43 44 46 46 47 48 50 53 54 54 54 56

Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Setup of the Host-Guest System 3.1 Force Field 3.1.1 3.1.2 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Missing Parameters from the OPLS Force Field. . . . . . . . . Transferable Parameters. . . . . . . . . . . . . . . . . . . . . .

Charge Derivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 3.2.2 3.2.3 3.2.4 REPD Charges. . . . . . . . . . . . . . . . . . . . . . . . . . . Relative Partition Coecients. . . . . . . . . . . . . . . . . . . Free Energy Protocol. . . . . . . . . . . . . . . . . . . . . . . Eect of Basis Set, Ab Initio Method and Geometry. . . . . .

3.3

Dihedral Parameterisation. . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 3.3.2 3.3.3 Calculation of Ab Initio Energy Prole. . . . . . . . . . . . . . Fitting a Fourier Series to the Energy Prole. . . . . . . . . . Parameterisation Complications. . . . . . . . . . . . . . . . . .

3.4

Structural Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 3.4.2 3.4.3 The Z-matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . Residue Denitions. . . . . . . . . . . . . . . . . . . . . . . . . Residues and Their Application to MC Moves. . . . . . . . . .

3.5 3.6

Simulation Code Customisation and Optimisation. . . . . . . . . . . . Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Partial Charge Methods 4.1 Partial Charges in Force Fields. . . . . . . . . . . . . . . . . . . . . . 4.1.1 4.1.2 The Use of Charges and Methods to Derive Them. . . . . . . Advantages and Disadvantages of OPLS Charges. . . . . . . .

CONTENTS 4.1.3 4.2 Advantages and Disadvantages of EPD Charges. . . . . . . . .

v 56 58 58 60 61 64 65 66 69 70 70 73 73 74 74 78 80 81 81 81 82 84 85 87 90 92 92 93 94

Development of the REPD Charge Method. . . . . . . . . . . . . . . 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 4.2.8 4.2.9 The EPD Charge Method. . . . . . . . . . . . . . . . . . . . . Basis Set, Ab Initio Method and Geometry. . . . . . . . . . . Fitting Point Method. . . . . . . . . . . . . . . . . . . . . . . Multipolar Constraints. . . . . . . . . . . . . . . . . . . . . . . Charge Restraining. . . . . . . . . . . . . . . . . . . . . . . . . A New Restraining Function. . . . . . . . . . . . . . . . . . . Choice of Atoms to Restrain. . . . . . . . . . . . . . . . . . . Independence of the New Restraint On Point Selection. . . . . Charge Averaging. . . . . . . . . . . . . . . . . . . . . . . . .

4.3

The REPD Charge Method. . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 4.3.2 4.3.3 4.3.4 Summary of the Method. . . . . . . . . . . . . . . . . . . . . . Comparison with EPD and OPLS Charges. . . . . . . . . . . . Inuence of Molecule Set on the Parameterisation. . . . . . . . Conformational Dependence. . . . . . . . . . . . . . . . . . . .

4.4

Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Testing of REPD Charges by FEP and LIE 5.1 FEP Free Energies of Hydration. . . . . . . . . . . . . . . . . . . . . 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 5.2 The Molecule Test Set. . . . . . . . . . . . . . . . . . . . . . . Selection of Mutations. . . . . . . . . . . . . . . . . . . . . . . Simulation Protocol. . . . . . . . . . . . . . . . . . . . . . . . Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eect of Restraint, Basis Set and Geometry. . . . . . . . . . . Particular Discrepancies with Experiment. . . . . . . . . . . .

LIE Free Energies of Hydration. . . . . . . . . . . . . . . . . . . . . . 5.2.1 5.2.2 5.2.3 Form of the LIE Equation. . . . . . . . . . . . . . . . . . . . . LIE Protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . Derivation of the LIE Parameters. . . . . . . . . . . . . . . . .

CONTENTS 5.2.4 5.2.5 5.2.6 5.3 Performance of LIE Free Energies. . . . . . . . . . . . . . . . . Overtting to the Data. . . . . . . . . . . . . . . . . . . . . .

vi 96 99

Alternative LIE Functions. . . . . . . . . . . . . . . . . . . . . 100

Analysis of the LIE Method. . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.1 5.3.2 5.3.3 5.3.4 5.3.5 5.3.6 Motivation for the Analysis. . . . . . . . . . . . . . . . . . . . 102 Correlation Analysis. . . . . . . . . . . . . . . . . . . . . . . . 102 Biased Regression Methods. . . . . . . . . . . . . . . . . . . . 104 The Most Predictive Model. . . . . . . . . . . . . . . . . . . . 105 MLR Versus CR. . . . . . . . . . . . . . . . . . . . . . . . . . 105 The Signicance of The Electrostatic Term. . . . . . . . . . . 107

5.4

Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 110

6 Methods to Improve Monte Carlo Sampling 6.1

Identication of Sampling Problem. . . . . . . . . . . . . . . . . . . . 110 6.1.1 6.1.2 6.1.3 Generation of Possible Host-Guest Structures. . . . . . . . . . 110 Analysis of Annealed Structures. . . . . . . . . . . . . . . . . 113 Sampling From Simulations. . . . . . . . . . . . . . . . . . . . 114

6.2

Approaches to Improve Sampling. . . . . . . . . . . . . . . . . . . . . 116 6.2.1 6.2.2 6.2.3 6.2.4 Methods to Improve MC Acceptance. . . . . . . . . . . . . . . 116 Biased Sampling Methods. . . . . . . . . . . . . . . . . . . . . 117 More Sophisticated MC Moves. . . . . . . . . . . . . . . . . . 118 Adoption of Methods to Improve Sampling. . . . . . . . . . . 119

6.3

Additional MC Moves. . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.3.1 6.3.2 6.3.3 6.3.4 6.3.5 6.3.6 6.3.7 The Conrot Move. . . . . . . . . . . . . . . . . . . . . . . . . 121 Implementation and Testing of the Conrot Move. . . . . . . . 123 Application of the Conrot Move to Macrobicycle 12. . . . . . 125 Acceptance Probability of the Conrot Move. . . . . . . . . . . 126 Variations of the Conrot Move. . . . . . . . . . . . . . . . . . 128 The Flip Move. . . . . . . . . . . . . . . . . . . . . . . . . . . 128 The Large Dihedral Move. . . . . . . . . . . . . . . . . . . . . 129

CONTENTS 6.3.8 6.4

vii Three Part Solute Move. . . . . . . . . . . . . . . . . . . . . . 130

Parameterisation and Implementation of the GB/SA Continuum model.131 6.4.1 6.4.2 6.4.3 6.4.4 6.4.5 The GB/SA Continuum Model. . . . . . . . . . . . . . . . . . 131 Requirements for GB/SA. . . . . . . . . . . . . . . . . . . . . 134 Parameterisation to Poisson-Boltzmann Free Energies. . . . . 135 Parameterisation to Experimental Free Energies. . . . . . . . . 137 Performance of the Derived Parameters. . . . . . . . . . . . . 138

6.5 6.6

Sampling of Macrobicycle 12 in Continuum Chloroform. . . . . . . . 140 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 144

7 Free Energy Calculations for Macrobicycle 12 7.1

The Macrobicycle 12 System. . . . . . . . . . . . . . . . . . . . . . . 144 7.1.1 7.1.2 7.1.3 The Simulation System. . . . . . . . . . . . . . . . . . . . . . 144 Experimental Data. . . . . . . . . . . . . . . . . . . . . . . . . 146 The Role of Computer Simulations. . . . . . . . . . . . . . . . 148

7.2

Explicit Solvent Free Energy Calculations. . . . . . . . . . . . . . . . 149 7.2.1 7.2.2 7.2.3 7.2.4 7.2.5 7.2.6 Gas Phase Simulation Protocol. . . . . . . . . . . . . . . . . . 149 Explicit Chloroform Protocol. . . . . . . . . . . . . . . . . . . 151 Window Spacing. . . . . . . . . . . . . . . . . . . . . . . . . . 153 Guest Free Energies in the Gas Phase. . . . . . . . . . . . . . 155 Guest Free Energies in Explicit Chloroform. . . . . . . . . . . 156 Host-Guest Free Energies in Explicit Chloroform. . . . . . . . 157

7.3

Continuum Solvent Free Energy Calculations. . . . . . . . . . . . . . 158 7.3.1 7.3.2 7.3.3 Continuum Chloroform Simulation Protocol. . . . . . . . . . . 158 Guest Free Energies in Continuum Chloroform. . . . . . . . . 159 Host-Guest Free Energies in Continuum Chloroform. . . . . . 161

7.4 8

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 167

Analysis of the Macrobicycle 12 System 8.1

Description of the Binding Site. . . . . . . . . . . . . . . . . . . . . . 168

CONTENTS 8.1.1 8.1.2 8.1.3 8.1.4 8.2 8.3

viii Host Binding Features. . . . . . . . . . . . . . . . . . . . . . . 168 Guest Binding Features. . . . . . . . . . . . . . . . . . . . . . 170 Origins of Selectivity. . . . . . . . . . . . . . . . . . . . . . . . 171 The V-Model For Binding. . . . . . . . . . . . . . . . . . . . . 175

Guest Orientation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Hydrogen Bond Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . 181 8.3.1 8.3.2 Hydrogen Bond Patterns. . . . . . . . . . . . . . . . . . . . . 181 Interpretation of Hydrogen Bond Patterns. . . . . . . . . . . . 183

8.4

Steric Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 8.4.1 8.4.2 8.4.3 Extracting Meaningful Steric Information. . . . . . . . . . . . 187 Probing the Close Contacts for Dierent Guests. . . . . . . . . 188 The Nature of the Close Contacts. . . . . . . . . . . . . . . . 191

8.5

Energy Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 8.5.1 8.5.2 Energy Components. . . . . . . . . . . . . . . . . . . . . . . . 194 Interpretation of the Energies. . . . . . . . . . . . . . . . . . . 195

8.6

Conformational Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . 197 8.6.1 8.6.2 8.6.3 8.6.4 Variation of Host Shape With Dierent Guests. . . . . . . . . 197 Hydrocarbon Chain Conformation. . . . . . . . . . . . . . . . 198 Dominant Hydrocarbon Subconformations. . . . . . . . . . . . 202 Phenyl Ring Conformation of the Guest. . . . . . . . . . . . . 207

8.7

Rationalisation of Free Energies. . . . . . . . . . . . . . . . . . . . . . 211 8.7.1 8.7.2 Binding Motifs Observed in The Simulations. . . . . . . . . . 211 Connection Between Binding Free Energies and Motifs. . . . . 215

8.8

Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 218 222 226

9 Conclusion A Charges Bibliography

Chapter 1 Introduction
1.1 Molecular Association.

The study of molecular association is a problem of great interest in many areas of science, particularly so in chemistry and biology. Molecular association, or simply, binding, is a key mechanistic step for a wide range of processes including chemical reactions as well as their catalysis and inhibition, structural change, chemosensors and pollutant removal. Any systematic, rational advance in these areas must be accompanied by an understanding of the factors that control binding and its consequences. Computer simulation studies are able to provide much information about binding that is inaccessible to both experiment and analytical theory. However, limitations in computational power currently restrict their application and degree of complexity to rather small systems which can adversely aect the degree of realism desired. Nevertheless, reasonably accurate computer simulation studies of moderately sized host-guest systems have recently become of great use in the study of molecular binding.15 Host-guest systems typically comprise a large host molecule containing a specic binding site to which a guest molecule can bind. They are studied for a number of reasons. Firstly, host-guest systems may be of intrinsic interest in themselves. Secondly, they may serve as simpler prototypes of more complex systems that are both more feasible to computational study both for practical reasons and due to the reduced level of complexity, making it is easier to deduce cause and eect. Thirdly,

CHAPTER 1. INTRODUCTION

O O H NH NH O

H
S

HN HN

H O

H H 3C

N H CO2O

H
macrobicycle 12

H3C

N Acglycine

H H 3C

N H CO2O

CH3

N H CO2O

N Aclphenylalanine

N Aclalanine Figure 1.1: The macrobicycle 12 host molecule and three guest amino acid derivatives. they can provide insight into particular types of molecular interaction. Alternatively, they can also serve as rigorous test cases for method development.

1.2

The Macrobicycle 12 Host-Guest System.

The particular system that is the subject of this work satises all four of these motivations for studying host-guest systems. The host is called macrobicycle 126 and it binds small amino acid derivative guests in chloroform. The host and three of these amino acid derivatives are illustrated in Figure 1.1. Macrobicycle 12 is a cup-shaped molecule made up of two rings that binds the amino acid derivatives by virtue of strong hydrogen bonds between thiourea and amide units of the host and the carboxylate group of the guest. While the host was designed to bind the guest inside the cavity, it is actually able to bind the guest in two possible ways, either inside or outside the cavity. This host-guest system has been found to possess a number of intriguing binding properties. The rst of these is the remarkable ability to stabilise the amide bond

CHAPTER 1. INTRODUCTION

Figure 1.2: N-Ac-l-cis-alanine (ball and stick) bound to macrobicycle 12 (surface representation). of the guest amino acid derivative in the cis conformation when the guest is bound inside the cavity. The second binding property is that the l enantiomer is bound preferentially inside the host, while the d form prefers to bind outside. Thus the cis stabilisation occurs principally for l enantiomers. However, the molecule does not possess marked selectivity for dierent amino acids. Figure 1.2 shows the guest, N-Ac-l-cis-alanine, bound to a surface representation of the host, macrobicycle 12.

1.3

Importance of the Host Binding Properties.

These binding properties are signicant for a number of reasons. The amide bond connecting amino acids is of fundamental importance in determining protein structure. The conformation of this bond can exist in either the cis or trans forms, as shown in Figure 1.3 for Nmethylacetamide, the archetypal molecule containing the amide bond. However, in natural proteins there is almost a complete predominance

CHAPTER 1. INTRODUCTION cis


H H H H H

4 trans
H H

C
H H

Figure 1.3: The cis and trans structures for N methylacetamide. of the trans conformation with only around 0.05% cis.7 The commonly observed secondary structure of proteins is largely a result of this trans predominance. This dierence in abundance is attributed to the lower energy of the trans conformation due to steric and electronic eects. An experimental measurement of the energy difference for N-methylacetamide was 2.60.4 kcal mol1 .8 On the rare occasions that a cis amide bond does occur in a protein, it is usually found adjacent to a proline residue. The presence of the cis conformation, due to its rarity, can have a significant eect on overall protein conformation.9 trans to cis isomerisation is believed to be a signicant rate-determining step in protein folding. This is due to the large free energy barrier of at least 14 kcal mol1 separating the two conformers.8, 10, 11 cistrans isomerases that catalyse this conversion can provide potential targets for pharmaceutical drugs. Indeed, the immunosuppressant agents used in organ transplants, cyclosporin A, FK506 and rapamycin are inhibitors of such proteins.12 As well as determining protein structure, the ability to stabilise molecules in a particular conformation can lead to the formation of dierent products in chemical reactions than would otherwise be obtained. cis amide bonds are also widely found in small cyclic peptides since the cis conformation is conducive to ring structures.13 The study of cis amide bonds in such large and complex systems is of tremendous interest, yet computer simulation studies of such systems are problematic.14 Besides conformational stabilisation, enantioselectivity is another binding property that is the subject of much study. In both biological systems and synthetic

CHAPTER 1. INTRODUCTION

chemistry, it is critical in determining molecular shape, reactivity and the possible products from chemical reactions. While there exist many naturally occuring enantioselective molecules, much eort in producing synthetic versions has focused on the design of macrocyclic molecules, of which there are numerous examples.15 Such molecules possess the properties considered essential for enantioselectivity. These properties include good binding capability, strong chiral centres, rigidity, structural complementarity and some degree of symmetry. The desire to keep molecules smaller than the usually larger, naturally occurring systems for the sake of simplicity shares a common aim with the practicalities of computer simulations. Finally, there is the all important feature of selective binding for dierent molecules. An understanding of the factors that control this would greatly assist the design of host molecules in order to bind and manipulate small molecules, and vice versa. This is particularly so for the binding of small peptides. An understanding of the inuence of the amino acid side groups is critical for molecular discrimination and protein structure determination. Even though the macrobicycle 12 system does not appear to dierentiate markedly between guests in binding strength, there may still be differences in binding motifs that can be exploited in future work to enhance binding strengths.

1.4

Aim of This Work.

Some computational modelling on the structure of the macrobicycle 12 system had shed some light on the reasons for cis stabilisation.6 The aim of the current work is to perform a complete systematic study to understand the behaviour of the system. This would involve calculating accurate relative binding free energies for dierent guests with diering stereochemistry and amide bond conformation. This, combined with a structural and energetic analysis of the resulting equilibrium structures, would provide an understanding of exactly what is causing the selectivities in binding. As well as containing the interesting binding properties, the macrobicycle 12 is an ideal system for computational study. Its small size makes possible the application

CHAPTER 1. INTRODUCTION

of the highest quality free energy methods. The small number of interactions reduces the ambiguity common in larger protein-ligand systems regarding the origin of various eects. Yet it is complex enough to provide highly interesting behaviour that cannot currently be rationalised by experiment. Indeed, the system is suciently complex that conventional simulation methods proved inadequate for the study of this system and so a number of methodological developments and implementations became necessary. Novel schemes were used to construct the host to ensure good sampling. New parameters were derived for a number of dihedrals not included in the OPLS force eld.16 A range of optimisation procedures were included in the simulation code to improve speed. A new method was developed for deriving OPLS-like charges17 for functionalities not covered by the current OPLS force eld.16 The ability of these charges to reproduce experimental free energies of hydration was tested using two free energy methods and their accuracy validated.18 This work also led to a reappraisal of the applicability of the linear interaction method for calculating free energies.19 More sophisicated Monte Carlo moves were included to further improve sampling. A new parameterisation was developed for the Generalised Born/Surface Area model for the OPLS-AA force eld in chloroform. Finally, a number of improvements were suggested to free energy calculation protocol. All of these ingredients combined to produce a working methodology to study the macrobicycle 12 system.

Chapter 2 Simulation and Free Energy Methods


2.1 Simulations Methods.

It is not at all surprising that classical observation of real physical phenomena provides a means to understand such phenomena. Nor is it surprising that certain rules and theories may be deduced from this. However, it is conceptually quite remarkable that computers can be used to approximately reproduce real physical behaviour, particularly for large complex systems. This chapter seeks to explain the practical questions of how computer simulations work and how they may be used to look at real physical behaviour. This review is not exhaustive but focuses on the problem in this thesis.

2.1.1

The Role of Computer Simulations.

Computer simulations are now an invaluable tool in examining and understanding phenomena in all scientic disciplines, especially so in chemical and biological systems.2023 While still quite limited in their application to real systems, they are able to address many deciencies associated with experimental and other theoretical methods, the other main tools in research. With regards to experiment, computer simulations oer the following advantages: they can provide mechanistic and structural information on an atomic level; they provide means of obtaining quantities that are unmeasurable by experiment; the degree of control and exibility inherent 7

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

in simulations is generally greater; they make possible the study of rarely found or undesirable phenomena; they are only limited by the imagination; while they must ultimately obey the physical laws of nature, their implementation can include tricks that transcend these laws; and nally, there are no messy chemicals. With regards to analytical theory, again, they can provide more information and means of calculating quantities; they can tackle many more types of problems which are often intractable to theory; in many cases they preclude the need for various assumptions thus often giving more accurate answers; and just like real experiment, they can produce new, quite unexpected behaviour. However, computer simulations are nothing without theory and experiment since the models they use must be based on some foundation. Experimentally and theoretically derived input are frequently essential in many simulation models, while experiment and analytical theory may be used to test simulations. On the other hand, simulations may also act in the reverse role, testing theories and experiments, and frequently interfacing directly between theory and experiment. The interplay between all three can therefore cultivate the development of each eld. Simulation data suggest new possible theories and experiments, and vice versa. Since simulations almost exclusively run on computers, their applicability is limited by computer power. This in turn can lead to very long, large scale simulations to calculate some property that may be obtained in a fraction of the time by experiment or analytical theory. To be useful, a continual balance must be maintained between the restrictions of system size, timescale and the level of realism so that systems are studied that are both of interest and simultaneously practicable. Nevertheless, increasing computer power will greatly enhance the predictive ability of simulations in the future. This work on macrobicycle 12 is an example of this interplay between simulation and experiment. The work was originally motivated by experimental studies. The idea was that computer simulations could be performed on the system, rstly, to test the ability of the simulations to do so by correctly reproducing the experimental trends, secondly, to provide insights into the systems behaviour that are unavailable

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

in experimental studies, and thirdly to then use simulations predictively and suggest sensible experiments. Since the simulation techniques initially applied were not able to achieve the rst objective, new methods had to be derived, modied and adopted to achieve this goal. What follows next is a discussion of various simulation techniques and a rationalisation for the selection of each technique. While it would be preferable to use the most accurate method in every case, its suitability and expense must be borne in mind and balanced with the objectives of the study in order to produce useful results over accessible timescales.

2.1.2

Representation of the System.

The rst question to be decided in this study was how the system was to be physically modelled. There are many scales on which to perform simulations, ranging from atomistic to mesoscopic through to macroscopic. However, in order to study processes on a molecular level and calculate accurate free energies, simulations must be performed on an atomistic scale. The evaluation of physical properties using statistical mechanics requires a formulation to calculate the energy for each structure. Conventional methods used to calculate the energy range from molecular mechanics (MM) methods with empirical, classical force elds to far more accurate but expensive quantum mechanical (QM) methods. The relatively large system size and the need for multiple system congurations ruled out full quantum mechanical methods. Hybrid QM/MM techniques modelling the area of interest by the more accurate QM and the rest by MM would have been feasible. However, since there was no chemical rearrangement of the bonds, the increased complexity was not deemed necessary to successfully model the system. MM force elds provide a fast, approximate way of calculating the energy and forces of a system. They typically consist of bond, angle and dihedral terms for atoms covalently linked together, and non-bonded interaction terms as given by Eq. 2.1. E = Ebnd + Eang + Edih + Enb (2.1)

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

10

There are a wide variety of MM force elds to choose from. They are all comparable in computational expense. However there are dierences in the complexity and form of the force eld, the derivation and availability of parameters, and their design philosophy. Such variation occurs since they are approximate and cannot generally reproduce all possible experimental data simultaneously. There are more complicated force elds that attempt to reproduce experimental or quantum mechanical data for small to medium sized molecules to a high level of accuracy, especially for intramolecular energetics. An example is MM3.24 However these force elds usually contain more complex energy functions and cross terms. There are force elds designed for modelling larger systems such as proteins. They are both simpler in functional form and parameter types and are designed more for reproducing non-bonded energetics. Such are the attributes required in the macrobicycle 12 system. Widely used force elds of this type include OPLS,25 AMBER,26 GROMOS,27 and CHARMM.2830

2.1.3

The OPLS Force Field.

In this work the OPLS-AA force eld was adopted. OPLS is an acronym for Optimised Potentials for Liquid Simulations. This acronym summarises the main ideal of this force eld, namely to reproduce experimental properties. AA is an acronym for allatom which means that all atoms are explicitly modelled. The alternative united atom approach is still in widespread use. United atoms are created by combining hydrogens with the atom to which they are attached in order to reduce the total number of atoms. However, this approximation was not necessary for the small system to be studied here. The functional form of each of the components given in Eq. 2.1 for this force eld is as follows. The bond and angle bending contributions are given by Ebnd =
i

Ki (ri req,i )2

(2.2)

Eang =
i

Ki (i eq,i )2

(2.3)

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

11

where the summations are over all bonds and angles respectively, with Ki the force constants and req,i and eq,i the respective reference values. The dihedral energy is given by a three term Fourier series

Edih =
i

V2,i V3,i V1,i [1+cos(i +fi,1 )] + [1cos(2i +fi,2 )] + [1+cos(3i +fi,3 )] 2 2 2 (2.4)

where Vji are the coecients, fji are the phase angles and i is a sum over all 14 dihedral atom pairs attached to the atoms forming the central bond. Finally, the non-bonded energy consists of a Lennard-Jones term and an electrostatic term given as Enb =
i j>i

qi qj +4 4 0 rij

ij

ij rij

12

ij rij

fij

(2.5)

where the double sum is over all distinct atom pairs, qi , are the partial atomic charges,
ij

and ij are the Lennard-Jones well-depth energy and collision diameter parameters,

rij are the distances between the atoms, and fij is given by : i, j separated by less than 3 bonds 0 0.5 : i, j separated by 3 bonds fij = 1 : otherwise

(2.6)

For heteronuclear atom pairs, and

are combined using the geometric mean.

The decision to use OPLS-AA was made for two reasons. Firstly, the OPLS force eld is designed to reproduce experimental properties, and secondly, it was incorporated in the simulation packages, BOSS31 and MCPRO,32 to be used in this work. The AA version was used to achieve more realism and because the system was small enough to aord this additional expense. The one diculty with this choice was that OPLS lacked certain parameters for this system and these had to be derived (Section 3.1). One further issue related to the force eld is how the solvent is modelled. Ideally, the more realistic means to model chloroform would be to model the molecules

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

12

explicitly as for the solute. However, the generalised Born/surface area continuum (GB/SA) solvent model33 was implemented, principally due to sampling problems discussed in Section 6.4 caused by the explicit solvent representation. The parameterisation of this method required the use of yet another continuum method, the Poisson-Boltzmann (PB) method.34

2.1.4

The System to be Modelled and Other Approximations.

The next issue was system size and what further approximations were necessary to simplify the simulation. In reality, the solute is surrounded by the solvent chloroform which ideally would extend indenitely. A balance must be struck between having the solute in a realistic environment and having a suciently small system that is feasible to simulate. The possible approaches depend on how the solvent is modelled. If it is modelled explicitly, the rst necessary approximation is periodic boundary conditions (PBC). A box even of thousands of chloroform molecules by itself would experience very strong edge eects. Therefore the box is surrounded by periodic images of itself in all three dimensions to remove all surfaces. This is illustrated in Figure 2.1. When a particle leaves the simulation box, it is replaced by a new particle with the same properties coming in at the opposite side of the box. There is a choice of several box shapes, but a cubic box was chosen, being the simplest. The problem introduced by PBC is that atoms now have the ability to to see themselves, inducing a possible crystallinity artefact into the system. This problem, however, is removed by the next approximation. Another common approximation is to use a cuto radius for non-bonded interactions. This is primarily made to reduce the number of energy calculations between an atom and its neighbours to save on computational expense. It is justied on the basis that the energy of interaction with atoms beyond a certain distance is negligible and so can be ignored or approximated by a simple analytical function. This also removes the previously mentioned self-interaction problem as long as the simulation box is

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

13

Figure 2.1: Periodic conditions for a solute in solvent. The dashed line represents the cuto radius. Interacting solvent molecules within the cuto radius are shaded darker. made at least twice as large as the cuto radius. Such an approximation can also be made in solute-solute interactions but this was not necessary given the small size of the solutes in this system. The cuto radius approximation is reasonable for quickly decaying dispersion interactions but becomes questionable for electrostatic interactions, which decay a lot more slowly, especially for ions. Four techniques that treat long range electrostatic interactions without using a cuto are Ewald summation,35 the related faster Particle Mesh Ewald method,36 the Reaction Field method,37 and the Fast Multipole Method.38 However, these add a signicant cost to computations. While the inclusion of one of them would be desirable to examine its eect, the normal cuto technique was retained on the grounds that there was only one ion in the system with no other ions with which to interact. One clear dierence between explicit and continuum solvent simulations is that no cuto approximations are required in implicit solvent. The GB/SA model already assumes that the dielectric contin-

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

14

uum extends innitely, while the PB model models the solute in a box of polarisable dielectric surrounded by an innite unpolarised dielectric. Having chosen a force eld, to calculate free energy changes, an initial starting structure is needed. This is typically an experimental or an energy minimised structure, although either of these structures typically require some further equilibration to ensure that representative structures are being sampled. Such equilibration is usually performed using the same method as that used to generate an ensemble of congurations from which equilibrium properties may be derived. Two conventional methods to produce such ensembles are molecular dynamics (MD) and Monte Carlo (MC). A summary of each method follows.

2.1.5

Molecular Dynamics Simulations.

MD attempts to simulate the real dynamics of a system by integrating Newtons Laws of motion, generating a trajectory of the system through time. U = F = ma where (2.7)

U is the position derivative of the potential energy, F the force, m the mass of

the particle and a the acceleration. The assumption is that if a long enough time, , is taken, then property X, measured along this trajectory, will become representative of properties of the real system and so its time average, Xav , will give the value of the desired property. This idea is expressed in the equation Xav = lim 1

X(t)dt
0

(2.8)

Eq. 2.7 can be written down for every particle in the system. This leads to N rst order partial dierential equations for the positions and N for the velocities, each as a function of time and of the positions of all other atoms within the cuto radius. Such coupled equations must be solved numerically at discrete timesteps, usually by a nite dierence method. To solve these equations, a number of approximations must be made. The solutions are written as carefully chosen combinations of truncated

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

15

Taylor series expansions about t as a function of the current positions, velocities and accelerations. This can be done in a number of possible ways, such as the Verlet, leap-frog or velocity-Verlet algorithms. Over this time period, t, the force acting on the particles is assumed to be constant. This leads to new expressions for positions and velocities at a later time, t + t. The Verlet algorithm, for example, gives the new positions, r(t + t), and velocities, v(t + t) at time t + t as r(t + t) = 2r(t) r(t t) + t2 a(t) v(t + t) = [r(t + t) r(t t)] 2t (2.9) (2.10)

For the Verlet algorithm, the velocities are not actually needed to propagate the trajectory of the system but may still be needed to calculate properties such as the kinetic energy. The equations must be solved at every time step, t, which itself must be chosen to be small enough to keep the constant force approximation valid. Otherwise the force applied drifts from the real force and energy is no longer conserved. Therefore, the choice of t is a balance between maximising the length of the simulation and keeping errors due to this approximation small. A typical value is of the order 1 fs. The conguration produced at each time step in this way generates the required ensemble of congurations from which to calculate molecular properties.

2.1.6

Monte Carlo Simulations.

MC generates congurations by a dierent procedure. Every conguration is randomly generated in a prescribed way. How these congurations are generated critically inuences the practicality of calculating various properties. For example, to measure a property, X, if N congurations were to be taken randomly from a uniform distribution in conguration space, and weighted by the probability, Pi , that each occurs, that is, by their Boltzmann factors, then the average value of X, is given by
N N i=1 Xi exp(Ui /kB T ) N i=1 exp(Ui /kB T )

Xav =
i=1

Xi P i =

(2.11)

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

16

where Xi is the value of the property and Ui the energy of a particular conguration i. Such an average would be very slow to converge since almost all terms would be negligibly small due to signicant overlap and thus very high energy between atoms. A more sensible sampling scheme is the commonly used Metropolis sampling 39 which produces congurations both randomly and weighted according to the Boltzmann factor. Thus each conguration generated now contributes equally to the average. With such a sampling scheme, Eq. 2.11 simply becomes 1 = N
N

Xav

Xi
i=1

(2.12)

Now, the signicant contribution of each conguration leads to much faster convergence of the averages. Furthermore, new congurations are usually generated by small moves from the original conguration and so typically have energies very similar to the old ones, reducing the likelihood that a high energy non-contributing state is attempted, improving the acceptance rate. Metropolis MC is the sampling method used in this work. To produce such a biased distribution, one must now reject certain congurations if they do not meet certain criteria. The way such an ensemble of congurations is generated without ever actually calculating a state probability is as follows. Let m be the probability that the system is in state m, let mn be the probability that a trial move from state m to state n is selected, and let mn be the probability that this trial move is actually accepted. If the matrix consisting of the elements, mn , can be constructed so as to satisfy microscopic reversibility, that is, mn m = nm n (2.13)

then the ratio of transition probabilities will be Boltzmann weighted, as desired, given by n mn = = exp[(Un Um )/kB T ] nm m (2.14)

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS In the Metropolis scheme, the mn that achieves this is given by : n m , m = n mn mn (n /m ) : n < m , m = n = 1 m=n mn : m = n

17

mn

(2.15)

where the matrix is assumed to be symmetrical to ensure microscopic reversibility. In practice, a move is rst selected from the matrix. The energy change, (Un Um ) is then calculated. If it is negative, then n m and the move is accepted. If it is positive, then it is accepted with probability n /m . This is done by comparing the quantity exp[(Un Um )/kB T ] to a random number in the interval [0,1]. If the quantity is greater than the random number, then the attempted conguration is accepted. Otherwise, the original conguration (m = n) is accepted again. The matrix determines the move type. There is a vast number of ways to choose it. An example of a typical move is the translation of an atom from position xm in the x direction by a random displacement in the range [x,x], chosen to be symmetric about the original position. mn , now an innitesimal probability density, is then given by x/(2x) : xn [xm x, xm + x] 0 : xn [xm x, xm + x]

mn =

(2.16)

This equation says that the probability of moving into a region at xn is the width of this region, x, divided by twice the maximum displacement if xn is within x from xm , and zero otherwise. A similar principle operates for altering any other coordinate such as a dihedral angle. Much more complex moves are possible. These are discussed in Section 6.1.

2.1.7

Molecular Dynamics Versus Monte Carlo.

While MD and MC should in theory give identical answers by the ergodic hypothesis, in practice, a number of dierences between them favour one method over the other depending on the system. Firstly, MD naturally samples the NVE ensemble,

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

18

while MC the NVT ensemble. Both can be made to sample the NPT ensemble, the one in which experiments are most commonly performed. However this is achieved more simply in MC, for which a volume move brings about the necessary volume uctuations. MD requires more complex techniques involving extended Lagrangian formulations to allow the energy and volume to uctuate and the temperature and pressure to be constrained. Secondly, MD simulates real time behaviour and so allows dynamical properties such as diusion to be studied since the momenta are explicity dened. This is not possible in MC since all properties are derived solely from coordinates. Thirdly, MD naturally allows all degrees of freedom to change, whereas MC only samples the degrees of freedom that comprise move attempts. Thus it is easy in MC to restrict sampling to those degrees of freedom considered important to the problem at hand, whereas MD requires more complex methods such as SHAKE40 to constrain bond lengths. Fourthly, this freedom of choice in MC move selection can lead to more ecient sampling of congurational space, since larger moves can be attempted not restricted by the small time step of MD. Furthermore, moves can be designed to jump over energy barriers over which MD has to incrementally climb. This is related to the fth dierence, that of relative computational expense in congurational space exploration. MD updates the whole system each conguration, while MC only updates a small section. Thus new MC congurations are usually faster to generate, but more must be generated to move the whole system. Which method is faster is very much dependent on the system. Sixthly, since the forces are included in MD, it is easier to perform motions that are more cooperative and larger than in MC, in which moves are more disjoint and generally have to be small to provide a reasonable acceptance probability. Seventhly, MD can suer from numerical problems due to continual approximations in solving Newtons equations. Given the generally simpler nature of MC, its greater ability to traverse congurational space, and the absence of a need for dynamical information, MC was adopted as the protocol for the generation of congurations. Most properties of interest which depend on the derivative of the partition function such as the energy may be cal-

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

19

culated using Eq. 2.12. However, measuring free energies is more complex and is discussed in the next section.

2.2
2.2.1

Free Energy Methods.


The Problem of Calculating Free Energies.

The free energy, or its counterpart, the partition function, is the most important quantity in statistical mechanics. Furthermore, from it all other thermodynamic quantities can be derived. Knowing the free energy of a system reveals its stability and allows a prediction of the correct state of a system under a given set of conditions. Much experimental data comes in the form of an equilibrium constant between two states, which is eectively a free energy dierence. Thus calculation of free energies is an important way of linking computer simulations and experiment.2, 4145 In the NPT ensemble the pertinent free energy function is the Gibbs free energy, G. It may be calculated from the ensemble average G = kB T ln QNPT = Gideal + kB T ln exp NPT +U kB T (2.17)

where Gideal is the ideal gas part and may be calculated separately if U , the potential NPT energy of the system, is independent of particle momenta. What is problematic here is that the exponential depends on (U ), a large positive number. Therefore, all terms, especially those with the highest energy, will make a signicant contribution to the average. Thus not only is Metropolis sampling which favours low energy states now inappropriate but also vastly more congurations must now be sampled. Thus there is no hope of expecting this average to converge and hence absolute free energies are inaccessible in this way.

2.2.2

Free Energy Perturbation.

However, such a problem can be sidestepped by calculating free energy dierences which only depend on the ratio of partition functions. To evaluate this quantity

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

20

there are two commonly used well-proven techniques. The rst of these is free energy perturbation (FEP).46 The free energy dierence between two dierent states A and B is given by QB UAB = kB T ln exp QA kB T

G = kB T ln

(2.18)
A

where UAB = UB UA , and the averaging is performed over congurations for state A. Such an expression is exact, but it only converges if the two states A and B are similar to each other to keep UAB small. This is because the end points commonly sample quite dierent regions of conguration space. What may be the low energy states for one Hamiltonian may become very high energy states for another. Therefore it is frequently necessary to dene several arbitrary non-physical intermediate states between A and B to increase the similarity between successive states and thus overlap to a greater extent the congurations that they sample. This arbitrary partitioning is allowed because free energy is a state function independent of the path taken between two points. Conventionally each state is dened by the variable, , which ranges from 0 to 1. There are two common ways to perturb molecules between states A and B. The conventional way is the so-called single topology method. State A gradually changes into state B. Dummy atoms, atoms with zero charge and Lennard-Jones parameters, are used to grow or remove atoms. The less commonly used alternative is the dual topology method.47 In this method both A and B are present in the system but they never interact with each other. The perturbation proceeds by gradually turning o the interactions for A and turning on those for B. The single topology method was used in this work. The parameters in the Hamiltonian that dier between A and B are altered according to the value of . Typically this dependence is made to be linear. For example, if A and B dier by a certain bond length, then the intermediate bond length, r , is given by r = rA + (1 )rB (2.19)

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

21

Alternatively, the whole Hamiltonian itself can be scaled in this way. The total free energy is then the sum of the component free energies for each section obtained using an analogous formula to Eq. 2.18 except with dierent end points. One point to note is that FEP does not require congurations to be sampled from state B. If this were done, then the reverse free energy may be also calculated. The value of this free energy should be the negative of the former. This can be exploited to produce a faster method termed the double wide sampling technique in which only every second window is sampled,48 since the free energy forward and backward can be calculated from one simulation. However, it has been observed in this work and elsewhere49, 50 that perturbations in the direction of molecules increasing in size have better convergence. This is probably because the reference state, being smaller, is able to much better sample the perturbed congurations than for the reverse case. Therefore, calculating free energies in both directions (double ended sampling) provides a reliable check that successive windows are sampling similar congurations, thus giving a converged average.

2.2.3

Thermodynamic Integration.

The second free energy dierence technique is thermodynamic integration (TI).51 This method gives the free energy dierence by
1

G =
0

(2.20)

where H is the Hamiltonian. In practice this integral must be broken up into a number of discrete parts along the coordinate in the same way as for FEP, although in this case numerical integration does introduce an approximation, although with enough windows this is usually insignicant compared to the error due to inadequately converged averages. The usual approach (multiconguration TI) is to use of the order 10 windows with adequate equilibration and converged data collection.52 Another approach, called slow growth, changes the Hamiltonian minutely for every

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

22

conguration (single conguration TI).53 However, such an approach is dubious as the question arises as to whether the new Hamiltonian at each is being properly sampled and whether the conformations of the system in general lag behind those appropriate for the value of .

2.2.4

Diculties With Free Energy Methods.

As a means of calculating free energy changes, FEP and TI are both capable of giving accurate results in many cases. However, they are also not without their problems. Firstly, like most simulation techniques, the accuracy of their results is limited by the force eld and approximations made in energy calculations such as cutos. Hence it is quite common to test force elds themselves by comparing free energy results obtained by this method to experiment, as is done in this work in Chapter 5. Secondly, they are limited by computational expense, especially when large mutations are performed for which many intermediate states are required. Thirdly, they require complete sampling of all relevant congurational space. Thus it also important to assess what length of simulation is necessary for converged results.5456 In many systems with complex energy landscapes special sampling techniques must be used, both to overcome large energy barriers and to traverse narrow regions requiring highly cooperative motions. Such an approach was found to be necessary in the macrobicycle 12 system. The addressing of this problem is described in Chapter 6. Fourthly, there are considerable diculties converting between molecules with dierent charges since long range interactions have to be carefully treated. Fifthly, other corrections are necessary due to the standard state and changes in symmetry.3

2.2.5

Fast Free Energy Methods.

There are many other free energy calculation techniques that seek to speed up this rather slow calculation. The objectives of such methods are generally to obtain free energy information for many molecules from only one or two simulations. These methods can be categorised into three approaches. The rst class simulates a particular

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

23

reference state, often an intermediate between the two end points. The method of Smith et al.57 seeks to calculate free energy changes by writing the free energy change as a standard Taylor series expansion and calculating all the necessary terms from a single simulation of a reference state. Free energy dierences may be calculated between many similar molecules from an ensemble of an intermediate non-physical molecule modelled with a soft core Lennard-Jones potential given in Eq. 2.2158
12 ij ij i j>i 6 6 ij + rij 2 6 ij 6 6 ij + rij

Enb = 4

2.

(2.21)

determines the softness of the atoms. Free energies are then obtained using the FEP formula by mutating to each real molecule. The diculties these methods suer is the standard problem of very dierent, non-overlapping states. The second class of methods allows itself to vary so that dierent hybrids of the two or more end points appear in the same simulation. Tidor used MD simulations in real space coupled with MC moves in space.59 The -dynamics method60 simulates many ligands together each with their own variable which itself is able to uctuate as a dynamic variable. Similar information can be obtained from the simulated annealing technique of Jarque and Tidor61 and the chemical MC/MD technique62, 63 which simulates all molecules together with all but one treated as ghost molecules, and makes MC moves between ligands such that one is always real. The extreme case of simulating only the end points together is possible using the Jumping Between Wells method of Sendorowitz et al.64 which is able to perform MC moves from one molecule directly to the other. All of the approaches can be of much use in ranking relative binding, but their ability to calculate numerical free energy dierences is reduced when these dierences grow so large that the least stable molecules are not sampled properly. The nal class of methods obtain free energies only from quantities calculated at the end points of a perturbation. Thus this approach requires two simulations. The original linear interaction energy method of qvist et al.65 calculates free energies A

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS according to the equation: Gbind = Uvdw + 0.5 Uelec .

24

(2.22)

where Uvdw and Uelec are the average dierences in van der Waals and electrostatic energy between the two end points, and is an arbitrary parameter usually tted to experiment. Exactly which terms are included on the right-hand side of Eq. 2.22 has recently been the subject of much debate and is studied further in Chapter 5. Recently, another linear interaction energy method has been proposed, termed the generalised linear response method because it combines both the van der Waals and electrostatic energies.66 The free energy of hydration is given by Ghyd = 1.49nkB T + VH
0.5

(2.23)

where the rst term is a cavity term from scaled particle theory, n is the number of atoms, and VH
0.5

is the average solute-solvent energy averaged over an ensemble

where the solute is halfway between itself and a point singularity. It has an advantage in that it requires no empirically derived parameters. Finally, there are a number of other methods that make use of empirical free energy functions.67, 68 In particular, a fairly fast theoretical approach has been developed by Kolossvary that has been used to calculate conformational free energies69 and dl isomerism.70 The testing of this approximate but faster method to the macrobicycle 12 system would be of interest.

2.2.6

Choice of Free Energy Method.

Since the calculation of accurate free energies was desired, the number of ligands was small, and the macrobicycle 12 system was small enough, the choice of method was between FEP and TI. Both methods produce similar accuracy for similar computational expense. The inclusion of more windows is simpler in TI than FEP since the former requires only one additional simulation while the latter three. For TI the free energy components are also broken up into terms according to the Hamiltonian allowing an analysis of the contributions, although whether such contributions

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

25

are meaningful is debatable.71 Despite these small advantages of TI, the free energy method used in this work was chosen to be FEP, primarily because this was the method included in the simulation packages BOSS and MCPRO.

2.3
2.3.1

Applications of Free Energies Methods.


Problems Calculating Free Energies of Binding.

The aim of this work was to study the binding of amino acid derivatives of various conformations and stereochemistries to macrobicycle 12. The key thermodynamic quantity of binding is not simply the dierence in energies but the free energy difference. Formally, this is the free energy dierence between the host and guest at innite separation, and the host and guest bound. Computational chemistry is now able to provide such quantities as well as much other useful information.15 However, calculation of absolute free energies of binding is a non-trivial exercise. If the end points for the mutation are taken as the free guest and host, and the bound complex, then would dene a reaction coordinate, r, in between. The free energy would be obtained by numerically integrating the potential of mean force, w(r), calculated as the guest approaches from a large separation distance, rmax , towards the host.46 The free energy is then given by
rmax

Gbind = kB T ln 4

r 2 exp(w(r)/kB T )dr .
0

(2.24)

Not only would many windows be required, but each one would require a tremendous amount of sampling of all the possible orientations and internal degrees of freedom, making this method a large computational eort.

2.3.2

Relative Free Energies of Binding.

There are two ways around this problem. The simpler method is to only attempt to calculate relative free energies of binding by making use of thermodynamic cycles.72

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

26

The relative free energy of binding gives the preference of one solute to bind over another and is given by Eq. 2.25. It is calculated by making use of the following thermodynamic cycle.

H+A Gmut AB
c

Gbind A

HA Gmut AB(H)

H+B

Gbind B

HB

GAB = Gbind Gbind = Gmut Gmut B A AB(H) AB

(2.25)

It can be seen that the relative free energy may be calculated in two ways, either by calculating binding free energies for A and B, or evaluating free energy changes going from A to B when free and when bound in the host. This second kind of perturbation, while unphysical, is much more practical computationally as it only requires the now much smaller perturbation of A to B. The host does not need to be simulated separately since its properties are constant in isolation. This is the type of calculation used in studying the binding of amino acid derivatives to macrobicycle 12 as described in Chapter 7.

2.3.3

Absolute Free Energies of Binding.

There is, however, another more complex approach that can be used to obtain absolute free energies of binding. It is called double decoupling3 and is based on the double annihilation technique that was originally proposed to calculate such a quantity.73 It is a special case of the previous situation except with B now replaced by a molecule completely non-interacting with the rest of the system. Corrections are also included to obtain a proper free energy in the standard state. It makes use of Eq. 2.26 derived from the following thermodynamic cycle.

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

27

A(sol) + H(sol)

Gbind A
rr rr rr

AH(sol) Gdecoup A(H)


c

Gdecoup A

rr j r

A(gas) + H(sol)

Gbind = Gdecoup Gdecoup A A A(H)

(2.26)

The guest is decoupled from the system twice, once in solution and once when bound to the host. Such mutations are performed by gradually turning o the interactions of the molecule with the rest of the system. It is not strictly an annihilation because the molecule has only been removed from the system and still possesses the degrees of freedom of a gas molecule. Such simulations require more care, especially the one in the host. This is rstly because the perturbation is usually much larger and requires many windows. Secondly, the molecule when it is barely interacting with the system is able to access more space and may require a restraint. Thirdly, problems can also arise due to the presence of a singularity at the end point, necessitating the use of other functional forms for the potential such as separation-shifted scaling,49 softcore,74 or alternatively a hard sphere which can then be accounted for theoretically.75 Fourthly, complications arise in the calculation of Gdecoup due to the denitions of A(H) the standard state at each end point. A correction must be therefore used.3

2.3.4

Previous Studies on Host-Guest Systems.

Free energy calculations on host-guest systems has been in common use for fteen years now. The earliest studies concentrated on molecular species dierent in only minor ways but now increasingly dierent molecules are being studied. There has been much work on the selectivity of macrocycles for single atom ions diering only in their radii. Two studies on particular halide ion binders demonstrated the preference for smaller sized ions principally because larger ions were too large to t.76, 77 Another

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

28

calixspherand host showed selectivity towards K+ over both Na+ and Rb+ .78 The solvent can play an important role in determining binding. In a study by Cho and Kollman79 on the binding of alkali metal ions to a rigid starand host, they found that the larger the ions radius, the stronger it bound to the starand host. This was despite the reverse preference in gas phase. The reason for the reversal of order is that the desolvation penalty for larger ions in water was found to be smaller. Mordasini Denti et al. conrm the preferential binding of pyrene to cyclophane in chloroform over that in water,80 attributing the dierence to solvent cavitation rather than dierent host-guest interactions. Dierent host-guest interactions have been studied as well. Duy and Jorgensen reproduced the relative binding of quinoxaline, pyrazine and pyridine to Rebeks acridine diacid in chloroform81 and rationalised the dierence due to additional hydrogen bonds and host exibility. Kirchho et al.82 studied the binding of neopentane and tetramethylammonium ion to cryptophane in water for a range of reasons. They examined the dierent modes of binding for the two guests and their inuence on host exibility. They also studied how these properties varied when the explicit solvent water was replaced by the Poisson continuum solvent model.83 They observed greater conformational freedom for both guest and host in the continuum solvent than in explicit. Some studies have also focused on enantioselective binding. Costante-Crassous et al.84 were able to reproduce the correct enantioselective complexation of bromochlorouoromethane to a chiral cryptophane in chloroform and subsequently assign the correct optical activities. Burger et al.85 calculated the relative free energies of binding to a podand ionophore for l and d amino acid-derived substrates. They then predicted a more enantioselective guest and this was subsequently veried by experiment. The appropriateness of various methods can also be studied. Senderowitz et al.64 used the same podand ionophore system to test the Jumping Between Wells fast free energy method mentioned in Subsection 2.2.5.64 Eriksson et al.86 found that

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

29

the use of Particle Mesh Ewald to treat long range electrostatic eects was essential to model the binding of iminium and guanidinium organic cations to a negatively charge cyclophane host. Mark et al.87 used free energy studies on the binding of para-substituted phenols to -cyclodextrin complexes to examine the inuence of force eld and sampling.82, 83 Pitera and Kollman62 studied a range of guests in Rebeks tennis ball host to predict which one bound the most strongly and to test their fast CMC/MD method against the slower TI method.

2.3.5

Free Energies of Solvation.

Free energy methods can also be used to calculate a molecules free energy of solvation.8891 This quantity is intrinsically related to a molecules solubility. When calculated in water, this quantity is commonly referred to as the free energy of hydration. Unlike the calculation of absolute free energies of binding, absolute free energies of solvation require only a single perturbation. This perturbation is the decoupling of the molecule from the solvent environment. The resultant free energy of solvation, Gsolv , is given by Gdecoup . Relative free energies of hydration, Gsolv between AB two dierent molecules, A and B, may also be calculated. This quantity requires two simulations, one in solvent and one in gas phase and is given by Eq. 2.27 using the following thermodynamic cycle.

Gas Phase:

A Gsolv A
c

Gmut AB(gas) E B Gsolv B


c Gmut AB(sol) E B

Solvent:

GAB = Gsolv Gsolv = Gmut Gmut B A AB(sol) AB(gas)

(2.27)

Free energies have to be calculated only for the smaller A to B mutations. If the rigid molecule assumption is made, then even Gmut AB(gas) does not need to be calculated.

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

30

The reason for this is that the free energy change calculated in solvent is now not strictly Gmut but Gmut with the intramolecular free energy change going from AB(sol) AB(sol) A to B removed. This intramolecular free energy change is the same in gas phase and solvent since the molecule is rigid. In other words, it is equal to Gmut . Therefore AB(gas) GAB simply equals the free energy change in solvent. Free energies of solvation are commonly used as tests for force eld parameterisations and methods. The wide availability of experimental data on free energies of solvation further enhances the power of this approach.9295 Indeed, in this work, a new charge derivation method, REPD, is proposed (Chapter 4). Free energies of hydration are calculated using such methods to test how well these charges reproduce experimental behaviour. This work is described in Chapter 5.

2.3.6

Partition Coecients.

Another quantity of interest that can be addressed by free energy calculations is the partition coecient. This quantity gives the ratio of concentrations of a solute equilibrated between two solvents. It is possible to obtain this quantity directly using a similar idea to calculating free energies of hydration. The free energy is calculated not for the real physical process of transfer from one solvent to another but rather by decoupling the molecule from each solvent. The following thermodynamic cycle
XY shows that the solvent X/solvent Y partition coecient, PA , for molecule, A, is

given by the dierence in free energies of solvation using the expression GX A


E

Solvent X:

A GXY A
c

nothing G = 0

Solvent Y:

GY A

nothing

XY 2.3RT log(PA ) = GXY = GX GY A A A

(2.28)

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

31

XY A negative value of log(PA ) indicates that most of the solute is in solvent Y, with

the right hand side of Eq. 2.28 being positive. Partition coecients provide another route to verication of simulation methodology since there is an abundance of experimental data. However, this comparison is not usually made through the use of absolute partition coecients, since the calculation of these requires two expensive absolute free energy of solvation calculations. The comparison is made through the calculation of relative partition coecients.9698 The relative partition coecient is dened as PAB = PA /PB . Using the following therXY modynamic cycle, the relative solvent X/solvent Y partition, coecient, log(PAB ),

between A and B is given by

Solvent X:

A GXY A
c

GX AB

B GXY B

Solvent Y:

GY AB

XY 2.3RT log(PAB ) = GXY GXY = GX GY A B AB AB

(2.29)

The two terms on the right hand side of Eq. 2.29 are nothing more than relative free energies and, by comparison to absolute values, are quite easy to calculate using normal free energy methods. In the absence of any experimental free energies of solvation, verication of simulation methodology by experiment must resort to reproducing the relative partition coecient. This was the case for a new set of thiourea charges used later in this work (Subsection 3.2.2). This method is not as rigorous as comparing to free energies of solvation since it involves dierences but it still can prove useful.

2.4

Conclusion.

The rationale for using computer simulations in this work and how they may be used to study particular systems has been presented. The necessary steps outlined in

CHAPTER 2. SIMULATION AND FREE ENERGY METHODS

32

this chapter are now taken in this work. Firstly, the system is set up and the force eld parameterised. Since part of this involves a new parameterisation technique, additional computer simulations are undertaken to test the new parameter set. The MC simulation protocol is expanded to be capable of simulating the macrobicycle 12 system. Finally, the binding free energy protocol is established and calculations on the macrobicycle 12 system are performed and rationalised.

Chapter 3 Setup of the Host-Guest System


This chapter is devoted to all the steps necessary to set up the macrobicycle 12 system for free energy calculations. It rst deals with the derivation of parameters missing from the OPLS force eld, particularly the charges and dihedrals. It continues by discussing starting structures and how the architecture of the system is constructed. Finally, a number of measures to customise and optimise the MCPRO simulation package for the system are described.

3.1
3.1.1

Force Field
Missing Parameters from the OPLS Force Field.

The force eld used in this work to model macrobicycle 12 is the OPLS-AA force eld25 (see Subsection 2.1.3 for the functional form of the components). The OPLS parameters come from a range of sources. The bond and angle parameters largely come from AMBER;99 the dihedral parameters are chosen to best reproduce the conformational energy prole generated from structures optimised from HF/6-31G* calculations; the non-bonded parameters are optimised to best reproduce experimental liquid phase properties for a variety of molecules. Such properties include enthalpies of vaporisation, densities, heat capacities and compressibilities. The OPLS force eld covers a wide range of functionalities. However, some parameters were missing for macrobicycle 12 and thus had to be obtained by other means. These include bond, angle, dihedral and non-bonded parameters. The assignment of 33

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

34

Table 3.1: Parameters Derived or Assigned for Macrobicycle 12 Not Supplied by the OPLS Force Field. Bond B1 B2 CS CN Bond Parameters K/kcal req / A 1 2 mol A 1.666 1.335 Angle Parameters Angle K/kcal mol1 rad2 A1 S CN A2 NCN A3 CNH A4 CNC A5 CCN 63 eq / degree 122.7 114.6 119.8 121.9 110.1

Dihedral Parameters Dihedral V1 /kcal V2 /kcal V3 /kcal mol1 mol1 mol1 D1 CNCS 0.0 2.907 0.0 D2 CNCN -1.389 2.907 0.0 D3 HNCS 0.0 2.907 0.0 D4 HNCN 0.0 2.907 0.0 D5 CNCC 1.142 -1.579 0.271 D6 CCCC 0.0 0.258 0.0 D7 NCCC 0.0 0.245 0.0 D8 CNCC -0.930 -1.706 0.0 D9 CCNC 2.166 6.089 0.0 a D10 CNCC -7.340 5.188 -1.974 D11 OCCN 0.0 2.649 0.0

Non-bonded Parameters Atom q / A /kcal mol1 1 C 0.110 3.500 0.066 2 C 0.140 3.500 0.066 3 C 0.191 3.750 0.105 4 S -0.385 3.550 0.250 5 N -0.290 3.250 0.170 6 H 0.272 0.000 0.000 7 C -0.051 3.500 0.066 8 H 0.083 2.500 0.030 9b C -0.020 3.500 0.066 10c C 0.040 3.500 0.066

The last atom is the carboxylate carbon. NAcglycine. c NAcalanine, NAcphenylalanine.

bond, angle, and non-bonded parameters is fairly straightforward if new functionalities are similar to pre-existing functionalities due to the transferable nature of the OPLS force eld. Dihedral parameters, on the other hand, are much more sensitive and must be derived separately for any new dihedrals (Section 3.3). The thiourea group is moderately similar to the urea functionality which is included in the OPLS force eld. However, it was felt that the non-bonded charge parameters important in host-guest binding should be derived specically for thiourea. The derivation of these parameters is described later in Section 3.2. Table 3.1 contains a summary of all these required parameters and the values derived for them in this work. The location of the new parameter types in macrobicycle 12 are shown in Figure 3.1. Figure 3.2 contains the location of the new

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

35

D6

D6
C

D7

A5 D8

2
C

H 8

N
A4 B2

A3 A2 A1

N
3 B1 D1,D2 D3,D4

D5

H8

C
H

H 8

Figure 3.1: The location of the new parameter types in macrobicycle 12. The parameters are listed in Table 3.1. Only a cross-section of the host is shown. parameter types in the guest.

3.1.2

Transferable Parameters.

New atom labels were created for the sulfur and the centre carbon of the thiourea unit. This meant dening new parameters for all bonds and angles containing these atoms. New atom labels could have been assigned to other atoms of the thiourea moiety but this would have required the denition of many more new bond and angle types very similar to pre-existing types for urea. Therefore, reference bond lengths were required only for the CS and CN bonds. These values were taken from a MP2/6

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM


X 9 or 10

36

C
D9 D10

D11

Figure 3.2: The location of the new parameter types in the amino acid derivatives. The parameters for the highlighted carbon depend on the side group, X. 31+ G* optimised structure of N,N -dimethylthiourea in the Z,Z conformation, the one present in the host. This geometry was chosen for reasons discussed further on in Section 3.2. Bond force constants were not necessary since these degrees of freedom were not sampled in free energy calculations for reasons discussed later in Subsection 6.3.3. Reference angle values for thiourea were also required. The reference angles for SCN and NCN were obtained from the same MP2/631+ G* structure of N,N dimethylthiourea, while the remaining A3 and A4 angles with nitrogen at their apex were taken as the equivalent urea OPLS parameters. No new angle bending force constants were required for the thiourea moiety again due to the angles not being sampled. However, the A5 CCN parameters elsewhere in macrobicycle 12 were taken to be the same as another chemically very similar angle already dened in the force eld. A number of new atom types had to be dened, each requiring non-bonded parameters. Some new types were necessary since they bridge two dierent functionalities. This was the case for atoms 1, 2, 9 and 10 in Table 3.1. In these cases the same Lennard-Jones parameters were used as for the atom in the unbridged case and the charges were assumed to be additive in order to preserve overall charge. For example, atom 1 is the carbon connecting the two aromatic rings. It was assigned a charge totalling two toluene carbon atoms and four hydrogens. Non-bonded Lennard-Jones parameters were also needed for atoms 38 of the thiourea group. For atoms 36, the same Lennard-Jones parameters were taken as for urea, with the exception of

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM


H H H

37

C
H

S
H H

H H

C
H

H H

Figure 3.3: The N,N -diethylthiourea molecule used to parameterise REPD charges for macrobicycle 12. Charges were derived only for the atoms enclosed in the box. the sulfur, for which the OPLS sulde parameters were used. For atoms 7 and 8, standard hydrocarbon parameters were adopted. The parameters for the amino acid derivative carbon are either those for atom 9 or 10 depending on the side group.

3.2
3.2.1

Charge Derivation.
REPD Charges.

The REPD/631+G* charges,17 discussed in Chapter 4, were used for thiourea. The principal motivation for the REPD method is the need to produce charges for this part of macrobicycle 12. REPD charges are designed to replicate OPLS charges while at the same time be easily derivable. It was decided that new charges should be derived as far as the rst carbon away from the thiourea. Therefore, the charges were derived on the molecule N,N -diethylthiourea pictured in Figure 3.3. The end methyl groups had their charges constrained to OPLS values to further reduce any remaining discontinuity between the charge sets. Reference bond and angle values listed in Table 3.1 were used together with the minimum energy conformation most commonly observed in macrobicycle 12. By parameterising to the predominant conformation, the conformational dependence of charges discussed later in Subsection 4.3.4 is minimised. The charges derived in this way for the thiourea moiety in macrobicycle 12 are presented in Table 3.1 as atoms 38.

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

38

3.2.2

Relative Partition Coecients.

Considerable eort was spent in trying to reproduce some experimental behaviour for these charges to validate their use. The only useful experimental property that could be found was the chloroform-water partition coecient.100 To avoid full free energy of solvation calculations, the relative partition coecient was calculated 9698 with a similar molecule, acetamide. A full description of the use of relative partition coecients is given in Subsection 2.3.6. Respectively, let T and A represent thiourea and acetamide, and C and W represent chloroform and water. This allows a relative
CW chloroform/water partition coecient, log(PTA ), to be calculated as follows.

Chloroform:

Thiourea GCW T

GC TA

Acetamide GCW A

Water:

Thiourea

GW TA

Acetamide

CW 2.3RT log(PTA ) = GCW GCW = GC GW T A TA TA

(3.1)

3.2.3

Free Energy Protocol.

Full details for the FEP protocol can be found in Section 5.1. The protocol described here contains only essential points relating to the mutations in this system. These points relate to the solvent box, number of congurations, MC moves and spacing of windows. The mutations in water were performed in a cubic box of side 25 A containing 505 TIP4P101 water molecules, while the chloroform box was of side 33 A and contained 264 chloroform molecules.97 Equilibrium congurations were generated in the NPT ensemble at 25 C and 1 atm using the MC Metropolis algorithm. There were 3 million (M) congurations of equilibration and 5 M of data collection per window. Maximum move sizes for solute translations and rotations were selected to be 0.15 and 15 . The maximum volume move sizes were set to 250 3 in water and A A 390 3 in chloroform. Mutations were performed in both directions using 11 windows A

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

39

S
H

O
H H

N
H

C
Du H

N
H

Thiourea

Acetamide

Figure 3.4: The change in geometry for mutating thiourea to acetamide. spaced at = 0.1 intervals. Figure 3.4 illustrates the mutation performed.

3.2.4

Eect of Basis Set, Ab Initio Method and Geometry.

Experimentally, PA = 0.01 and PT = 0.00079,100 gives log(PT A ) = 1.10. The initial calculations were performed using EPD/631G* charges and 631G* optimised geometries. The EPD/631G* charges used are given in Table 3.2. The individual free energies obtained, Gw = 5.92 kcal mol1 and Gc = 2.30 kcal mol1 , gave TA TA log(PT A ) = 2.66, as shown in Table 3.3. The free energy results were rather disappointing. Such a large negative value was attributed to the increase in dipole moment and polar hydrogen charges for thiourea and was the spur for developing the REPD method. A range of other charge sets and geometries for thiourea were therefore studied. With the development of the REPD method, the calculation was repeated with the REPD/631+ G* charges and geometry. The result was improved at 2.04 but still too negative. Much of the problem seemed to lie in the large dipole moment, since a solution phase (dioxane) experimental thiourea value is only 4.89 D.102 An assumption that had always been made up to this point was that the HF/631+G* geometry was adequate for all compounds. However, such an assumption may be breaking down for thiourea with its large row 3 sulfur atom. The system may be over-polarised. Many combinations of larger basis sets and better methods were therefore tested, but the main improvement was obtained using a MP2/631+ G* geometry. The principal dierence was that this shortened the CS bond length from 1.683 to 1.661, lowering

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

40

Table 3.2: Charges and Dipole Moments for Thiourea and Acetamide Using a Range of Basis Sets and Geometries. thiourea
EPD/631G* REPD/631+ G*b REPD/631+ G*c REPD/631+ G*d
a

C 0.452 0.075 0.102 0.326

S -0.502 -0.431 -0.404 -0.416

N -0.746 -0.418 -0.458 -0.616

H(syn) 0.366 0.284 0.290 0.312

H(anti) 0.406 0.312 0.319 0.349 H(anti) 0.432 0.301

/D 6.39 6.34 6.01 5.12 HC /D 0.150 3.92 0.029 4.26

acetamide
EPD/631G* REPD/631+ G*b
a b

C O N C H(syn) 0.978 -0.649 -1.099 -0.559 0.447 0.528 -0.570 -0.666 -0.011 0.331

HF/631G* geometry. HF/631+ G* geometry. c MP2/631+G* planar geometry. d MP2/631+G* transoid geometry.

the dipole moment from 6.34 to 6.01 D. This led to another improvement in the log(PT A ), now giving 1.61. Another possible variation in geometry lay in the treatment of the NH2 groups. For urea it has been observed that three possible geometries exist since the NH2 groups are not strictly planar and pyramidalise slightly. transoid urea has one facing up and one facing down, cissoid has both NH2 groups facing the same way, while planar ignores this eect and averages the hydrogen geometries, as is assumed for standard force elds. All three geometries are pictured in Figure 3.5. The cissoid geometry did not appear to exist for thiourea. However, a transoid thiourea geometry did exist. Compared to the planar geometry, it had a slightly lower energy and gave a Table 3.3: GT A in Water and Chloroform Using Charge Sets Derived with Various Basis Sets and Geometries. thiourea
EPD/631G* REPD/631+ G*b REPD/631+ G*c REPD/631+ G*d
a b

acetamide
a

EPD/631G* REPD/631+ G*b REPD/631+ G*b REPD/631+ G*b

GwA T 5.92 3.79 2.97 0.45

Gc A T 2.30 1.01 0.77 -0.08

log(PT A ) -2.66 -2.04 -1.61 -0.39

HF/631G* geometry. HF/631+ G* geometry. c MP2/631+G* planar geometry. d MP2/631+G* transoid geometry.

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM


O O N
H H

41

H H

H H

C
cissoid

transoid
H H

C
planar

O
H

Figure 3.5: The transoid, cissoid and planar geometries for urea. much lower dipole moment of 5.12 D, as seen in Table 3.2. log(PT A ) became 0.39, more positive now than the experimental value. Therefore the geometry that would give the best result would in theory lie between planar and transoid. The need to use MP2 geometries was then questioned, since the high polarity could be solely due to the use of a planar geometry. However, a HF/631+ G* transoid calculation on thiourea gave a dipole moment of 6.28 D, still far too large. Hence the MP2 geometry is still necessary. Given that log(PT A ) was so sensitive to such a small change in geometry and that two plausible geometries bounded the experimental result, it was decided that the best geometry would be a planar MP2/631+ G*, consistent with all other force eld assumptions. Hence this was the geometry used to derive charges for the N,N dimethylthiourea moiety as given in Table 3.1.

3.3
3.3.1

Dihedral Parameterisation.
Calculation of Ab Initio Energy Prole.

As mentioned in Subsection 3.1.1, dihedral parameters had to be derived for some of the dihedrals in the host and guests as listed in Table 3.1. These dihedrals were unusual, either because they lay at junctions between dierent functional geometries or involved thiourea. There was an OPLS parameter for D9 but it was felt that this important dihedral determining the cistrans energy dierence should be reparameterised explicitly. The prototype amino acid derivative used here was NAcalanine.

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM


H

42

S
N
H

C
H

C
D1 D2 D3 D4 D1 D2 D3 D4
H

S
N

C
H

C
H H

C
H

N
D5

N,N -dimethylthiourea
H H H H

N,N -ethylmethylthiourea
H

D6

C
H

D6

C
H

C C
H

C C C
H

C C C
H H

C C
H H H

C C

H H

C
D7
H

N
D8

C C
H

C C
H

C
H H

diaryl methane
H

Nbenzylacetamide
H

O
H

C
H

C
H H

C
D9

N
H

C
H

D10

O
D11

NAcalanine Figure 3.6: The ve molecules used for parameterising the dihedral parameters in macrobicycle 12 and the amino acid derivative. In the parameterisation process a neutral fragment molecule was selected that was large enough to enclose not only the dihedral in question but all local functionality to minimise truncation eects. The ends of this molecule were typically capped by hydrogens or methyl groups. The guest was small enough to be treated as a whole. The ve molecules used for this parameterisation are pictured in Figure 3.6. A conformational energy prole for each dihedral was constructed by calculating the energy of an ab initio optimised structure at a few important values of dihedral angle.26 These points correspond to maxima and minima. The ab initio calculations were done using Gaussian 94.103 The energy minima were calculated by optimising the molecule at the HF/6-31G* level, while the maxima were obtained by nding the transition states. Where possible, symmetry was used to reduce the number of ab

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

43

initio calculations. The transition state calculation was often problematic and usually had to be performed in two stages, by rst optimising with the dihedral constrained to an intuitive value near the transition state, and then using the programs transition state search option. The presence of minima and transition states was conrmed by normal mode analyses at these geometries to check the local potential energy curvature. The test is that frequencies of all normal modes for minima should be real, while for transition states, one should be imaginary. While the most accurate results are not expected for this level of theory and basis set, they are practical for calculations on up to 25 atoms, and in any case, the error will be substantially reduced since only energy dierences are required. However, even energy dierences have been shown to change with ab initio method. For example, previous ab initio calculations on N,N -dimethylthiourea104 reveal a 0.8 kcal mol1 change in the energy dierence between the two minima on going from HF/631G* to higher level MP2/631G*.

3.3.2

Fitting a Fourier Series to the Energy Prole.

A force eld calculation was then performed for the dihedral, called dihedral driving. This was done using BOSS.105 At a range of dihedral angles from 0 to 360 the energy due the rest of the force eld was calculated by optimising the whole structure with the dihedral constrained to the desired value. This energy is made up of changes in non-bonded, known dihedral, angle, and bond stretching contributions. This creates a second energy prole. The dierence between these proles should be reproduced by the dihedral parameters to be derived. The three term Fourier series given in Eq. 2.4 was tted to this dierence to produce new Fourier coecients, Vij . While including three terms would always give the best t, in all but two cases it was clear from the energy prole that only one or two terms were adequate. The resulting Fourier series is due to all unknown 14 dihedral atom pairs about the central bond. Hence the Fourier coecients had to be partitioned into components due to each pair. This was done according to the following rules based on common OPLS practice. Firstly, any terms involving hydrogen except those in amide

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM


10 5 0 1
1

44
D5

D1,D2 D3,D4

10 5 0 3 2 1

D6

D7

Energy / kcal mol

0.5 0 10 5 0 10 5 0 0 90 180

D8

0 20 10 0 10 5 0 270 360 0 90 dihedral angle / degrees

D9

D10

D11

180

270

360

Figure 3.7: The conformation energy proles for the parameterised dihedrals. Note the dierent energy scales. or sulfonamide bonds were assumed to be small enough to be set to zero. This assumption signicantly reduced the number of terms in most cases, leaving only D3 and D4 as hydrogen-containing dihedrals. Bearing in mind that known dihedrals do not contribute to the tted coecients, D5, D9 and D10 could be assigned the whole values of the coecients. For the remainder, if a non-zero Vi1 coecient was present, it was placed entirely in the main chain term. This can be seen for D2, while D1, D3 and D4 only contain a Vi2 coecient. Otherwise, the coecients were split evenly between all components. This was the case with the Vi2 terms for D1, D2, D3, D4, D6, D7 and D11. The resulting ts to ab initio are shown in Figure 3.7.

3.3.3

Parameterisation Complications.

There were a number of additional complications. Firstly, all molecules in this study contained not one but two or more unknown dihedrals. Consequently, iterative pa-

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

45

rameterisation was necessary, tting one and then applying to the other, and vice versa, until they all gave consistent reproduction of their individual energy proles with the same parameters. The second point concerned a dierence in the treatment of degrees of freedom by ab initio calculations and the force eld. The ab initio calculations were performed using fully exible geometries to produce realistic energies. However, for the force eld dihedral driving to be consistent with the force eld used in the simulations, the geometry was constrained, such as forcing aromatic rings to be rigid. Dihedrals derived in this way would ensure that the force eld would reproduce ab initio energy barriers correctly. The necessity for using fully relaxed ab initio geometries was no more evident than for preliminary calculations on thiourea. The barrier to rotation of a NH2 group was 10.5 kcal mol1 if a fully exible geometry was used, but rose massively to 18.8 kcal mol1 if a planar, force eld-type geometry was enforced on the NH2 group. A third complication was the degree to which the Fourier series could be made to t the ab initio points. The priorities of the dihedral parameters were that they reproduce well the ab initio energy minima, followed next by the energy barriers. Therefore a number of approximations were necessary to achieve this. For symmetric proles, if the ab initio points diered by a few degrees from a critical value for a particular Fourier term, such as 0 or 120 , then the points were moved to the nearest critical angle, but never to a dierent energy. This ensured that the energy barrier remained the same. For example, it was common to replace a closely spaced double minima by one minimum at the centre. If the t was still very bad, then the points that were considered the least important were discarded altogether from the t. This was often the case for highly irregular energy proles such as D10. Priority for retaining points was given to minima over transition state values as can be seen for D6, for example. A fourth complication arose due to multiple minima due to conformational freedom in other dihedrals in the molecule. Where possible the remainder of the molecule was placed in the minimum energy conformation. However, for the D7 dihedral, two closely spaced energy proles existed. One had the D8 dihedral at 90 and the other at 270 . The dierent ab initio points are evident

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

46

in Figure 3.7. Which of these two proles was lower depended on the value of D7. In this case, the overall prole was taken as the minimum of both proles, producing cusps in the energy proles and the t was made at the average value of the two angles from each prole.

3.4
3.4.1

Structural Setup.
The Z-matrix.

The original geometry for macrobicycle 12 was taken from the pdb le used in the previous modelling of this molecule.6 Since this work used a united atom model, hydrogen atoms were added to make the model all-atom. All parameter types and geometries were assigned values either from OPLSAA or from the derived parameters in Table 3.1. Finally, the coordinates were converted into OPLS Z-matrix format. Z-matrix construction concerns not simply the initial structure at the start of a simulation. The le contains information about the molecules geometry, parameter types to be used, how residues are assigned, degrees of freedom to be sampled and other information particular to the molecule. Therefore, the way the Z-matrix is dened is vital to the success of the whole simulation, particularly with regards to residue denitions and designation of degrees of freedom, both of which play a major role in the eciency of MC sampling and computations. The fundamental principle of the Z-matrix is that the coordinates of all but the rst three atoms are dened by a bond, angle and dihedral to three other atoms already dened in the Z-matrix. An example Z-matrix is illustrated in Figure 3.8.
H5 H3
0.945 109.5 108.5 1.090 1.090

O2

C1
1.090

1.364

H4

Number Atom 1 C H6 2 O 3 H 4 H 5 H 6 H

Bond 1 2 1 1 1 1.364 0.945 1.090 1.090 1.090

Angle

Dihedral

1 2 2 2

108.5 109.5 109.5 109.5

3 4 4

180.0 120.0 -120.0

Figure 3.8: Structure and Z-matrix for methanol.

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

47

Note that the second atom only contains a bond, while the third contains a bond and an angle. Since all other atoms depend on the rst three atoms, these three atoms may be used to dene whole molecule rotation and translation moves. This approach is quite dierent to cartesian coordinates, for which every atom is dened independently by x, y and z coordinates. Thus the Z-matrix coordinates dened in this way are ideal candidates for MC moves since they allow the direct alteration of a bond length, angle or dihedral while minimising the distortion in other internal coordinates. Thus large moves can be made for a low energy cost. There are many possible ways to dene a Z-matrix. The choice made determines which particular bonds, angles and dihedrals are sampled in MC moves. The main criteria used to determine whether bonds, angles and dihedrals are explicitly dened for sampling in the Z-matrix is the likelihood that they will vary much during the simulation with minimal increase in the energy. It is usually benecial to dene explicitly in the Z-matrix those coordinates that can vary the most. On the other hand, it may be just as worthwhile dening explicitly those that are not expected to vary. This allows degrees of freedom to be constrained, particularly useful for bonds. The strength and weakness of the Z-matrix is that all other atoms dened with respect to a moving atom will move with it. Thus parts of a molecule can move together over quite large displacements, a more ecient process than each one moving individually. However, it can lead to quite large displacements for atoms distantly connected to the moving atom. In condensed phases or ring systems, large displacements typically lead to large energies and subsequent move rejection. For larger molecules, the means of achieving a compromise between these competing eects is to use residues.

3.4.2

Residue Denitions.

The partitioning of a molecule into residues for MC simulations is traditionally done for proteins to allow only small parts of the protein to be moved at a time. It involves eectively breaking up a large molecule into smaller molecules, even if they are still connected by covalent bonds. In a Z-matrix, this is done by dening atoms in a

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

48

residue independently of atoms in all other residues. MC moves now only adjust the variable degrees of freedom in one residue at a time, reducing the scope of the move. There are two primary advantages of the residue partitioning. Firstly, since residues are smaller than the whole parent molecule, the tradeo between larger moves and high MC acceptance moves is more favourable. In other words, moves of signicant size for small residues have a better chance of being accepted than large moves of the whole molecule. If a move involving the whole molecule were to be attempted, it would either have to be tiny or else never be accepted, and tiny moves are ineective in sampling conguration space quickly and extensively. This is particularly important when taking into account computational considerations since moves of only a small number of atoms are much cheaper to perform. The second benet of residues is that they ensure that only a small number of atoms are dened with respect to any other atom and that these atoms are spatially close. Thus if any atom in the residue were moved, there would only be a minimal change in the coordinates of other dependent atoms. This further improves the acceptance probability for a move. For example, consider the molecule octane with carbons numbered starting from one end. If all the dihedrals are altered, the other end is likely to move a long way from its original position in cartesian space and will most probably collide with another molecule, dramatically reducing the chances that that move would be accepted. On the other hand, if the molecule is broken up into two butane units, this problem would be reduced. The downside to this is that the connecting bond between the butane units can become strained and the sampling of the dihedral and angles about this bond can be reduced since they are no longer dened explicitly in the Z-matrix.

3.4.3

Residues and Their Application to MC Moves.

This same philosophy was adopted for macrobicycle 12. While there are no natural dividing points such as the amide bonds for proteins, it was possible to break up the host into nine roughly equally sized residues. These can be categorised as one N,N -

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

49

Figure 3.9: The nine residues of macrobicycle 12. Identical residues are shown in the same colour. dimethylthiourea (thiourea, for short), two hydrocarbon, two amide, two benzamide, and two toluene residues. This partitioning is shown in Figure 3.9. The guest was taken as a single residue. Atoms within each residue were dened in the Z-matrix according to the criteria described earlier. So that all residues could be dened separately, three dummy atoms were placed at the centre of the host as the rst three atoms in the Z-matrix. The atoms in each residue were then dened exclusively with respect to these three dummy atoms. The rst atom of each residue, the residue anchor, was taken as the one closest to the centre of the residue to minimise large amplitude displacements of distant atoms. The bonds and angles relative to the three centre dummy atoms were assigned a force constant of zero. It was decided that all bonds would be xed at their reference values, with the exception of the bonds connecting the centre three dummy atoms to the residue anchors, and the residue-connecting bonds which get sampled indirectly. All aromatic rings would be held rigid and planar. The thiourea unit was also made to be rigid and planar for reasons that will be discussed in Section 6.3. A dummy atom was placed at the intersection of the two CN bonds of thiourea as will be explained in Section 6.3. All

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

50

other angles and dihedrals were allowed to vary and so were dened in the Z-matrix so as to maximise the chance that they could change signicantly. This is particularly important for dihedrals. If the dihedral for a particular atom changes, all attached atoms should be dened to follow it. This allows methylene, aryl, and planar groups to move as complete units without distortion. Having dened the residues, it is now important to make clear exactly what the MC moves are. A host residue move involves the random selection of a residue, followed by the random change of all variable degrees of freedom in that residue. A regular solute move involves random changes in all variable degrees of freedom in the guest as well as some random translation and rotation of the whole molecule.

3.5

Simulation Code Customisation and Optimisation.

The simulation package used in this work was MCPRO.32 The original intention had been to use BOSS,105 since this program was supposedly more suited for small molecule simulations while MCPRO was designed specically for protein systems. However, MCPRO was chosen for two reasons. It allowed the use of residues, the benets of which have just been described. It also contained an energy updating routine that was more ecient for large systems than the complete energy calculation used in BOSS. Additional features were required for the macrobicycle 12 system that were not implemented in MCPRO. Therefore, a number of alterations were made to the MCPRO code. These included modications to the solute-solvent energy calculation, the energy calculation for perturbed solutes, the use of a greater residue, implementation of new MC moves, and the inclusion of the GB/SA solvent continuum method. The rst change that required a considerable coding eort was to non-bonded cuto protocol for the solvent-solute energy calculation (see Subsection 2.1.4 for details). MCPRO uses a less stringent residue-solvent cuto radius rather than a full

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

51

molecule-solvent cuto. Only those solvent molecules within the cuto range of the residue see that residue. This procedure makes simulations for large proteins practicable by reducing the total number of non-bonded interactions. The diculty with this is that if residues are charged, then large discontinuities in energy may be expected when solvent molecules move across the cuto radius boundary. For proteins, this eect is assumed to largely cancel, given the large number of charged residues in the neighbourhood of each solvent molecule. However, this eect may be more signicant for a smaller molecule with fewer residues. Some of the residues in macrobicycle 12 are indeed charged. Therefore, a full molecule-solvent cuto radius was implemented for the host. The feathering of the potential that is commonly used at the cuto radius to soften the discontinuity in energy was turned o due to various complications that arose from the new cuto procedure. It should be pointed out that the guest solute has a charge of -1, so this energy problem of solvent molecules suddenly seeing a charge still occurs to some extent. Owing to the computational expense of more sophisticated electrostatic treatments or larger cuto radii, the current procedure was retained. One area that was pinpointed for optimisation was the calculation of energies for perturbed molecules. MCPRO and BOSS calculate energies for perturbed molecules in exactly the same way as for the reference molecule energy calculation. However, only the part of the molecule that actually perturbs requires a new energy calculation. The energy for the rest of the molecule remains the same. By reusing these constant energies, most energy terms did not have to be recalculated for the perturbed molecules. The use of residues can lead to problems for the atoms at residue boundaries. If a strict residue boundary is used, these boundary atoms will not be dened in the Zmatrix with any relationship to atoms in adjacent residues. Thus relative movement of the two residues can lead to distortions for all bonds, angles and dihedrals that lie across the residue boundary. This problem can be partly eliminated by allowing certain atoms at the boundary such as hydrogens to be dened with respect to both

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM


C

52

Zmatrix: H 5 6 7 8 bond 2 2 3 3 angle dihedral 1 1 4 4 3 3 2 2


H H

4
H

3
H

C C

N 1

C
H

Figure 3.10: The denition of the hydrogens in the Z-matrix at the boundary between the thiourea and hydrocarbon residues (delineated by the boxes). residues. If either adjacent residue moves, they will also move. Hence they must be added to the moving residue to form a greater residue for which all energies are reevaluated. Figure 3.10 shows, for example, with respect to which atoms the hydrogens at the residue boundary between the thiourea and hydrocarbon residues are dened to achieve this. A number of changes were made to the MC move maximum amplitudes for bonds, angles and dihedrals, primarily as a consequence of the use of the residues. Maximum amplitudes for each dihedral were individually optimised to give 40 % acceptance probability. This ensured that dihedrals in the Z-matrix with the greatest ability to vary were given sucient opportunity to do so. The maximum amplitudes themselves for each move were scaled down according to how many degrees of freedom were involved in the move. This scaling is necessary because the more degrees of freedom that change, the smaller should be their maximum move size so that the subsequent energy change is not too large. Finally, a large number of modications were made to improve the sampling of the system. These included the conrot,106 ip and large dihedral MC moves for the

CHAPTER 3. SETUP OF THE HOST-GUEST SYSTEM

53

host, special combinations of translation and rotation moves for the guest, and the GB/SA continuum solvent model33 for the solvent. These methods are described in Chapter 6.

3.6

Conclusion.

Most of the preparations necessary for calculating the free energies on the macrobicycle 12 system have been described. These include parameterisation, structural setup and simulation code customisation and optimisation. However, some of the issues involved in parameterisation and MC sampling require further discussion. The REPD charge derivation method and its testing are described in the next two chapters, while the nal issues addressing MC sampling are elaborated on in Chapter 6 before commencing on the free energy calculations themselves.

Chapter 4 Partial Charge Methods


The initial motivation for this work was the need for partial charges for the thiourea unit in macrobicycle 12, since they were unavailable in the OPLS force eld (see Subection 3.1.1). Following convention, charges were to be tted to the molecular electrostatic potential (MEP) calculated at the 631G* level.107112 In order to validate the use of such charges, thioureas water/chloroform relative partition coecient with acetamide would be compared with experiment. However, as described in Subsection 3.2.4, the comparison faired poorly and the charges produced were of much greater magnitude than typical OPLS charges. While geometry and basis set were two factors that had a strong inuence on the partition coecient, variation of these alone proved insucient to obtain a reasonable comparison. Therefore a method was sought that produced charges that could reproduce the relative partition coecients and be more OPLS-like. This necessitated a study of the use of charges in force elds and various methods to derive them.17

4.1
4.1.1

Partial Charges in Force Fields.


The Use of Charges and Methods to Derive Them.

The simplest, most widespread method of representing the electrostatic interaction between molecules in a computer simulation is to approximate it with a Coulombic potential between sets of atom-centred charges (see the OPLS force eld in Subsec-

54

CHAPTER 4. PARTIAL CHARGE METHODS

55

tion 2.1.3, for example). The deciencies associated with this method include the monopole approximation itself,113115 the absence of explicit polarisation in their parameterisation,116 and the dependence of charges on the geometry of the molecule from which they were obtained, and in particular, its conformation.117123 Although extensions of the basic method to overcome these problems are available, the use of charges is extremely common in the simulation of molecular systems, being almost ubiquitous for biological molecules. This is in part historical, but is also largely due to the computational expense resulting from the increased complexity, implementation and reparameterisation required to incorporate the extensions. Even if these extensions cannot be directly introduced, including an implicit treatment of them in the charge derivation method is desirable. The use of charges in a computer simulation is an approximate method for modelling electrostatic interactions, and consequently there is neither a single set of parameters capable of reproducing every piece of data, nor is there a unique, correct way to derive these parameters. Indeed, there are a wide variety of methods to parameterise charges. Charges can be derived from either experimental or theoretical data. Of the methods using experimental information, charges may be obtained from the electron densities derived from X-ray diraction studies.124 Alternatively, charges may be optimised to reproduce structural and thermodynamic experimental data for pure liquids or aqueous solutions of a molecule. This procedure is used in the popular OPLS force eld.25 Hybrid techniques also exist that use both experimental data and quantum mechanical calculations such as the charge equilibration method,125 while the atomic polar tensors method can use either.126128 There is a large collection of purely quantum mechanical methods that partition the electron population of each atom according to the calculated molecular orbitals,129132 as in, for example, the Mulliken population analysis.129 However, for all of these, the partitioning is somewhat arbitrary, and the calculated charges are basis set dependent and generally give a poor reproduction of the MEP.108, 133 An alternative approach is to optimise charges

CHAPTER 4. PARTIAL CHARGE METHODS

56

to reproduce various quantum mechanical quantities such as the calculated interaction energy of a collection of molecular complexes,98 or, alternatively, the MEP,107112 the molecules electric eld134 or a distributed multipole potential.135137

4.1.2

Advantages and Disadvantages of OPLS Charges.

Since the OPLS method optimises charges to reproduce condensed phase properties, they are arguably the parameters of choice for condensed phase simulations. In addition, the OPLS method is best able to address the problems previously discussed by including approximate, implicit treatments of polarisation and conformational averaging. However, there are a number of limitations in the OPLS method. Firstly, extensive computer simulations are needed. Secondly, for each chemical functionality, a simple molecule containing that functionality is required for which there is experimental data either of the liquid phase or an aqueous solution. Thirdly, the charges for a given functional group may not always be transferable, particularly for two adjacent polar groups. Thus, because OPLS charges are not available for all types and combinations of functional groups, another reliable and practical method must be used.

4.1.3

Advantages and Disadvantages of EPD Charges.

Since charges are used to model intermolecular interactions, they should be optimised to reproduce molecular data in regions lying outside a molecules van der Waals volume. Using the MEP to calculate charges is therefore a possibility, whereas use of the electron density is not. Furthermore, the monopole approximation becomes less severe at these longer ranges. Therefore, the widely used MEP procedure seems the next best, simplest alternative. The calculation of these Electrostatic Potential Derived (EPD) charges is described in more detail in Subsection 4.2.1, but in essence the method works by calculating the charges that produce an electrostatic potential that most closely matches the MEP at a number of points spaced evenly around the molecule. The procedure is easy to implement, is practical for even quite large

CHAPTER 4. PARTIAL CHARGE METHODS


H

57

C
H H

C
O

Figure 4.1: The highlighted carbons and nitrogen are the atoms with buried charges in acetamide. molecules and, most importantly, can be applied to any functionality. Nevertheless, a number of diculties still exist with the use of EPD charges. Firstly, some degree of polarisation is required to model that induced in the condensed phase. Typically, this is introduced in an ad hoc fashion through the use of a low-quality basis set such as 631G*,138 although it is possible to calculate charges in a polarisable continuum.139 Secondly, the MEP generally does not provide enough information to determine statistically valid charges for all atoms in a molecule, as demonstrated by a singular value decomposition analysis by Francl et al.111 This essentially implies that the under-determined atoms may adopt charges over a wide range of values without signicantly aecting the charge electrostatic potential (CEP) and thus the quality of the t to the MEP. This is particularly the case for atoms with three or four neighbours, commonly termed buried atoms. The buried atoms are illustrated in Figure 4.1 for acetamide. The charges of buried atoms contribute less to the CEP in regions outside the molecule than surface atom charges. Buried atoms also have a greater number of neighbours. Both of these facts allow the buried atom charges to vary without signicantly changing the CEP nor the charges of its neighbours. Thirdly, EPD charges often do not accord with chemical intuition. This manifests itself in a number of ways. EPD charges can be large in magnitude,140 larger than chemical intuition would suggest, and also larger than the equivalent OPLS parameters. This raises compatibility problems if some combination of the two force elds is desired. EPD charges can also vary signicantly for chemically similar atoms in a given homologous series, raising diculties if a transferable pa-

CHAPTER 4. PARTIAL CHARGE METHODS

58

rameter set is desired. Furthermore, the polarity of EPD charges does not always correlate with electronegativities, as is often found for CH bonds. However, this should not be unexpected given that EPD charges are derived from the MEP and not the charge density. Fourthly, EPD charges often have quite signicant conformational dependence,117123 a problem exacerbated by their large magnitude.

4.2

Development of the REPD Charge Method.

Given these diculties with the EPD charges and the unavailability of OPLS charges for some functionalities, ideally a method needs to be developed with the speed and exibility of the MEP procedure that generates electrostatic parameters of comparable magnitude to the OPLS force eld. The method developed in this work seeks to do just that. It is called the Restrained Electrostatic Potential Derived (REPD) charge method.17 It is based on the RESP (Restrained ElectroStatic Potential) procedure of Bayly et al.140 and uses to our advantage the statistical indeterminacy of buried charges. What follows is a description of the EPD method, an analysis of its attributes, and the subsequent development of the REPD method.

4.2.1

The EPD Charge Method.

The EPD charge method produces charges in the following way. The MEP may be calculated at any arbitrary point in space, ri , using the equation Vi =
j

Zj /rij

(r) dr |ri r|

(4.1)

where rij is the distance between ri and atom j with charge Zj , and (r) is electron density at each point, r, in space, obtained from a quantum mechanical calculation. Charges, qj , are calculated by performing a least squares t between the CEP, given by
M

Vi =
j=1

qj rij

(4.2)

CHAPTER 4. PARTIAL CHARGE METHODS

59

and the MEP at N points spaced evenly around the molecule. M is the number of atoms in the molecule. That is, the following 2 function,
N

2 =
i=1

(Vi Vi )2

(4.3)

is to be minimised. To nd charges that minimise 2 in Eq. 4.3, setting the derivative of 2 with respect to each charge to zero produces M equations linear in qj of the form 2 = 2 qk
N

i=1

1 rik

Vi

j=1

qj rij

=0

(4.4)

which can be easily solved in matrix form for q, the vector of charges, Aq = b where the elements of A are given by
N

(4.5)

Ajk =
i=1

1 rij rik

(4.6)

and the elements of b by


N

bk =
i=1

Vi . rik

(4.7)

The quality of the t can be assessed by the relative root mean square (RRMS) of 2 , dened by
N 1/2

RRMS =

/
i=1

Vi2

(4.8)

CHAPTER 4. PARTIAL CHARGE METHODS

60

Table 4.1: Urea Charges for Various Basis Sets, Methods and Geometries. Method/Basis Seta HF/631G HF/631G* HF/631G** HF/631+ G* HF/631+ G** MP2/631G* experimentalb
a b

C 1.314 1.161 1.153 1.266 1.255 1.130 1.095

O -0.789 -0.695 -0.697 -0.740 -0.741 -0.688 -0.708

N -1.222 -1.153 -1.139 -1.237 -1.218 -1.136 -1.105

H (syn) 0.480 0.455 0.451 0.482 0.477 0.451 0.463

H (anti ) /D 0.480 5.09 0.465 4.62 0.460 4.61 0.492 4.81 0.485 4.80 0.464 4.76 0.449 5.18

In each case, charges and geometry are calculated using the same basis set and method. Geometry from diraction data.142

4.2.2

Basis Set, Ab Initio Method and Geometry.

The rst feature examined was the basis set and ab initio method used to calculate the MEP. Charges were generated for a range of basis sets, adding polarisation and diuse functions and the HF method versus MP2. The same basis set and level of theory were used for both the optimisation and the MEP calculation. The calculations were performed using GAMESS141 with the Connolly109 point selection scheme, and urea was forced to be planar. The results, together with dipole moments, , are given in Table 4.1 for the molecule urea. It can be seen that there is little variation in charge beyond the 631G* level. Diuse functions increase the charge magnitude and dipole moment marginally, extra polarisation functions for hydrogens make neglibible dierence, while MP2 charges are almost identical to HF ones. The charges produced using the experimental geometry are similar to the ab initio ones, but the dipole moment obtained is the highest of all of the methods. The choice between basis set and geometry was made primarily for practical reasons. HF/631G* charges and geometries have been found to agree well with experiment,110, 138 are easily calculated for small molecules, and, unlike experimental geometries, are readily obtainable for any common molecular functionality, basis set permitting. The almost identical HF/631+ G* geometries have similar properties and are only moderately more computationally demanding to calculate. Later free energy work (see Section 5.1) found that the 631+ G* basis set with its slightly larger dipole

CHAPTER 4. PARTIAL CHARGE METHODS

61

Figure 4.2: Fitting points (Connollys method) around acetamide at which the electrostatic potential is calculated. moments reproduced experimental results better. Furthermore, inclusion of diuse functions is recommended for large atoms such as sulfur. Therefore, the ab initio charges and optimised geometries using both HF/631G* and HF/631+ G* were considered.

4.2.3

Fitting Point Method.

The selection of points was another candidate for inuencing charges. This aects what parts of the MEP one considers are important for the EPD charges to reproduce. Typically, points are placed in regions considered important for molecular interactions, namely, beyond some distance from the van der Waals volume of the molecule but not too distant either. Figure 4.2 shows the point spacing around acetamide. There are four common methods used to select points. The CHELP method110 places points sparsely around a number of concentric spheres centred on each atom. CHELPG122 places points on a cubic grid. Connollys method109 places points much more uniformly on spheres. The geodesic method143 places points even more uniformly on spheres according to the tesselation of an icosahedron. The CHELP method was

CHAPTER 4. PARTIAL CHARGE METHODS

62

Table 4.2: Urea Charges for CHELPG, Connolly and Geodesic Point Selection Schemes. Point Selection Method CHELPG Connolly Geodesic C O N H (syn) 1.152 -0.705 -1.101 0.432 1.161 -0.695 -1.153 0.455 1.163 -0.700 -1.146 0.453 H (anti ) /D 0.446 4.67 0.465 4.62 0.462 4.64

shown to produce charges that varied widely with how the points were spaced122 and was not further considered. However, the charges produced by the CHELPG method itself have also been shown to depend on the orientation of the grid with respect to the molecule.143 The charges produced by these three methods for urea are given in Table 4.2. Point densities were chosen for each method to give roughly equal numbers of tting points, approximately 500. From this table little dierence can be discerned between the charges produced by any of the methods. The Connolly method was adopted for this work since it is most widely used. A number of aspects of the Connolly method were varied to determine their eect on charges. These included the point density on each sphere, the number and spacing of spheres, and the distance of the closest and furthest spheres. All of these made negligibile dierence to the charges produced as long as the number of tting points was not too low nor the points too close to the van der Waals surface. On the question of point density, Figure 4.3 shows the variation of the EPD/631+ G* charges for acetamide with the point density used to calculate the charges. At densities above 1 point 2 , the charge variation is insignicant for the purposes of reproducing the A MEP. The RRMS at this density is calculated to be 0.109 and only improves to 0.108 at the much higher density of 100 points 2 . While it has been suggested that A thousands of points are necessary to obtain well-dened charges,122 the SVD analysis of Francl et al.111 suggests that the MEP does not contain sucient information to produce such charges regardless of the number of points. However, the quality of the A t to the MEP is still adequate at point densities of 1 point 2 , the density used in this work. A number of other schemes that sampled points at dierent distances according

CHAPTER 4. PARTIAL CHARGE METHODS


1.5 1 0.5 0 0.5 1 1.5 CH O N 1 0 1 2 log (point density / point ) 2 CO

63

HN HC

Figure 4.3: Variation of the charges with point density for REPD/631+ G* acetamide. to atom type were tested on the basis that the MEP around some atoms was more important. Another scheme weighted points in the least-squares tting using various functions according to the distance from the molecule.107 Yet another used least absolute value tting. However, again, no signicant dierence was observed, and so the conventional Connolly method was retained. A nal note about point selection is that care must be taken not to include points closer than approximately 1.4 times the molecules van der Waals surface. Firstly, it is unnecessary to include such close points because the atoms of other molecules are unable to approach to this distance and secondly, it may aect the values of charges obtained. To illustrate this point, charges are calculated using a single sphere of tting points of dierent radii. Figure 4.4 reveals how the carbon charge of 631+ G* benzene is severely inuenced by the distance at which the sphere of tting points is chosen. The carbon charge becomes much smaller in magnitude at shorter distances. This variation was found later to have a profound eect on the free energy of hydration for benzene (see Subsection 5.1.6). This eect is probably due to a severe worsening of the point charge approximation and such charges, dependent on the radius chosen, are questionable, as well as being unsuitable for modelling elec-

q/e

CHAPTER 4. PARTIAL CHARGE METHODS


0.06

64

0.08

0.10

q/e
0.12 0.14 0.16 1.0

1.2

van der Waals scale factor

1.4

1.6

1.8

2.0

Figure 4.4: Inuence of the radius of the tting point sphere (determined by the van der Waals scale factor) on the carbon charge calculated for EPD/631+ G* benzene. trostatic interactions at longer ranges. The CHELPG method122 has been frequently implemented (GAMESS,141 Gaussian 94103 ) including these close points.

4.2.4

Multipolar Constraints.

A number of multipolar constraints can be included in the least-squares t to reproduce ab initio values. These include charge, dipole and quadrupole constraints. The results for urea are presented in Table 4.3. Evidently, the unconstrained charges capture well the electrostatic moments of the MEP and so multipolar constraints proved unnecessary. However, the charge constraint, implemented using Lagrange multipliers,110 was employed to ensure that the absolute molecular charge is exactly correct. In addition, the rounding of charges to three decimal places necessitates manual readjustment to ensure that the total molecular charge is correct. In this Table 4.3: Urea Charges Using Various Multipole Constraints. Constraint no restraint charge dipole quadrupole C 1.161 1.160 1.157 1.152 O -0.695 -0.694 -0.693 -0.688 N H (syn) -1.153 0.455 -1.153 0.455 -1.150 0.454 -1.165 0.462 H (anti ) /D 0.465 4.62 0.465 4.62 0.464 4.60 0.470 4.60

CHAPTER 4. PARTIAL CHARGE METHODS

65

adjustment, buried atom charges were given preference since they have less eect on the CEP.

4.2.5

Charge Restraining.

Since the magnitudes of the charges appeared to be largely independent of parameters used in the EPD method, a simple alternative method that eciently reduces charge magnitudes is linear scaling. This method is commonly used to modify charges tted to semi-empirical MEPs.144 Such a method is, however, too crude since the dipole moment, for example, scales by exactly the same factor. However, a method was found in the literature that aims to reduce the magnitude of charges by using a restraint. It is called the RESP method.140 In the RESP method, charges qj are tted to the MEP while simultaneously being minimised. In this way the large size of EPD charges noted previously is reduced. This is accomplished using a restraining function of the following form in the least-squares t
M

2 rest

=a
j=1

2 ((qj + b2 ) 2 b)

(4.9)

where M is now the number of non-hydrogen atoms. This function is added to Eq. 4.3 which is then minimised, simultaneously reducing the magnitude of the charges. The parameter a determines the strength of the restraint and takes two values, one for the initial restraint and one in the averaging of conformationally equivalent atoms (see Subsection 4.2.9). These values are 0.0005 and 0.001, respectively. The parameter b determines the tightness of the hyperbola and is given a value of 0.1. The hyperbolic restraining function restrains all charges with approximately the same force except for those with a magnitude comparable to b, for which the restraining force is smaller. The restraint exploits the statistical indeterminacy of buried charges in order to bring down their magnitude without signicantly aecting the quality of the t. When the RESP method was applied to urea, though, it was found to have negligible eect as seen in Table 4.4. On closer inspection, a number of deciencies in

CHAPTER 4. PARTIAL CHARGE METHODS

66

Table 4.4: Urea Charges by EPD and RESP Methods.140 Method EP RESP C O N 1.161 -0.695 -1.153 1.005 -0.653 -1.035 H (syn) 0.455 0.426 H (anti ) /D 0.465 4.62 0.433 4.58

the RESP method were apparent. Firstly, the eect of the restraint is dependent on the selection of points used in the tting procedure; secondly, the charges do not vary uniformly with a; thirdly, hydrogen atoms are selectively excluded from the restraint; and nally, parameters a and b are somewhat arbitrary. Therefore, a restraining method was sought that eliminates these problems.

4.2.6

A New Restraining Function.

The aim of the REPD method is to restrain the charges to become as close as possible to OPLS values. A whole host of methods were implemented and tested. These included the choice of restraining function, restraining dierent atoms selectively, arbitrarily assigning charges, using dierent tting equations to the MEP, or even simply halving charges of buried atoms, which surprisingly made charges more OPLSlike, even if in a rather ad hoc fashion. The general aim in all cases was to observe the properties of each method and to use these to make charges OPLS-like, preferably using as few parameters as possible. In order to gain a proper understanding of how the restraint aected a large number of dierent molecules, 29 molecules were included in the study. Three restraining functions that were of particular interest. One was linear, one hyperbolic, and one quadratic in charge, the hyperbolic function being similar to that of Bayly et al. These functions are, respectively,
M N

2 rest

=a
j=1 i=1

|qj | 2 rij

(4.10)

N 2 ((qj + b2 )1/2 b)

2 rest

=a
j=1 i=1

1 2 rij

(4.11)

CHAPTER 4. PARTIAL CHARGE METHODS


1.0 0.5

67
CO HN HC CH O N

CO HN HC
CH

linear

CO HN HC CH O N

hyperbolic

quadratic

q/e

0.0 0.5 1.0 0.0000

O N

0.0002

0.0004

0.0000

0.0002

0.0004

0.0000

0.0002

0.0004

Figure 4.5: Eect of the restraint, a, on the 631+ G* charges for acetamide for the linear, hyperbolic and quadratic restraining functions.
M N

2 rest

=a
j=1 i=1

qj 2 ) rij

(4.12)

Respectively, the diagonal elements, Akk , of the A matrix in Eq. 4.5 after the addition of the restraint now become
N

Akk =
i=1

1 a ) (1 + 2 rik 2|qk |

(4.13)

Akk =
i=1

a 1 (1 + ) 2 2 rik 2(qk + b2 )1/2

(4.14)

Akk =
i=1

(1 + a) 2 rik

(4.15)

while the o-diagonal elements remain the same as in Eq. 4.5. The value of b for the hyperbolic restraint is taken as 0.1, the same as for RESP. Figure 4.5 shows how the charges for acetamide vary with a for each of the three restraining functions, linear (Eq. 4.10), hyperbolic (Eq. 4.11) and quadratic (Eq. 4.12). The one feature that all restraints have in common is that they restrain buried atoms the most. The linear restraint applies a constant force to each charge. When one charge reaches zero, all other charges suer discontinuities in their slopes. This makes the linear restraint unsatisfactory because some charges reach zero before other

CHAPTER 4. PARTIAL CHARGE METHODS

68

charges are even mildly aected. Setting buried charges to zero or some other small value may be a promising way of developing transferable force elds, but since the overall goal is to make charges reproduce OPLS parameters, this restraint was not pursued. Using a hyperbolic restraint produced similar results but with these gradient discontinuities rounded o and obscured. This, together with need for the extra parameter, b, ruled out the use of a hyperbolic function. The quadratic restraint applies a restraining force proportional to the charge, thus restraining larger charges more. Large charges are generally found for polar atoms or buried atoms. Bayly et al. decided against using a quadratic restraint because larger, polar charges experience a stronger restraint than smaller, non-polar charges. However, it is rather the overall proximity of the atom to the tting points that determines the sensitivity of its charge to a restraint rather than its size. As Figure 4.5 shows, the charges of the buried atoms in acetamide, the carbons and nitrogen, are rapidly reduced in magnitude while the well-exposed, polar hydrogens and oxygen charges change much more slowly. The break-through for the quadratic restraining function proposed here was that only one single parameter is found necessary to make charges considerably more OPLS-like. One further advantage of using a quadratic restraint is that Eq. 4.5 may be solved in one step. This solving process is more complicated for the methods using linear and quadratic restraints since their Akk values depend on qk , as seen in Eqs. 4.13 and 4.14. Therefore, for them, Eq. 4.5 must be solved iteratively. From this point on, any reference to the REPD (Restrained Electrostatic Potential Derived) method involves a use of the quadratic restraint. The eect of the quadratic restraint on the t to the MEP is only minor. Figure 4.6 illustrates how the dipole moment, quadrupole moment and RRMS vary with the quadratic restraint. The dipole moment and diagonal elements of the quadrupole moment are hardly aected by the restraint, while the o-diagonal elements of the quadrupole moment decrease in magnitude to some extent. A moderate but unavoidable increase in the RRMS accompanies the restraint, but the restrained charges still adequately reproduce the MEP. By comparison, simple linear scaling of 631+ G*

CHAPTER 4. PARTIAL CHARGE METHODS


5 4
5

69
Q xx

/D

3 2
0

Qii/ D
0.0002 0.0004 0.0006

1 0 0.20

Q zz
5

Q yy Qxz,Qyz

RRMS

0.15 0.10 0.05 0.00 0.0000


10

Q xy

15

0.00000

0.00020

0.00040

0.00060

Figure 4.6: Eect of the quadratic restraint parameter, a, on the dipole moment, quadrupole moment and the RRMS for 631+ G* acetamide. charges by 0.761 gives the best reproduction of the corresponding OPLS values for acetamide. However, the RRMS, originally 0.052, is now 0.245, compared to 0.109 for REPD.

4.2.7

Choice of Atoms to Restrain.

Finally, the restraint in this work is applied to the charges of all atoms without exception. Bayly et al. chose not to restrain hydrogen atom charges because they are already well-dened and therefore do not require restraining. However, it was felt that this exclusion is inconsistent and unnecessary. Atom type does not necessarily correlate with the need for a restraint, for on the one hand it could be argued that other well-dened atoms such as carbonyl oxygens might also not require restraining while on the other, hydrogen atoms in large, folded molecules might be suciently buried as to require restraining. In either case, whether or not hydrogens are restrained makes little dierence to the charges for two reasons. Firstly, there is strong coupling between adjacent atoms, especially signicant for atoms with only one neighbour such as hydrogens. Thus any change in the charge of an atom neighbouring a hydrogen will aect the hydrogen charge indirectly. Secondly, the charges of all well-dened

CHAPTER 4. PARTIAL CHARGE METHODS

70

surface atoms, usually including hydrogens, are relatively insensitive to the restraint as is evident in Figure 4.5. To verify this, charges calculated as a function of the restraint with and without the hydrogens restrained were found to dier negligibly. Therefore, for the reasons described above, the charges of all atoms are chosen to be restrained.

4.2.8

Independence of the New Restraint On Point Selection.

The benet of the quadratic restraining function proposed is that it acts independently of point selection. This is not only convenient if dierent point densities are desired, but is essential if the restraining function is to act in a similar way on dierent molecules with dierent numbers of points. Since the parameter a can be factored into the sum over points used in the t, the eect of a scales proportionally with Akk as N is varied. This is not the case for the restraint of Bayly et al. (Eq. 4.9) which has Akk elements given by
N

Akk =
i=1

1 a + 2 2 rik (qk + b2 )1/2

(4.16)

The restraint term, being independent of N , becomes insignicant for very large N . Consequently, the eect of their restraint is dependent on N . Figure 4.7 shows how RESP/631+ G* and REPD/631+ G* charges for acetamide vary with point density. Clearly, the eect of the additive RESP restraining function is diminished at higher point densities while the multiplicative REPD restraining function acts on all charges similarly for all point densities.

4.2.9

Charge Averaging.

One other issue that was addressed was the averaging of geometry and charges. Atoms that are related by symmetry are constrained to have the same charge, such as the hydrogens of formaldehyde. However, atoms that are not symmetry related but conformationally equivalent should have their geometry and charges constrained to be iden-

CHAPTER 4. PARTIAL CHARGE METHODS

71

1.0

0.5

RESP charges REPD charges

q/e

0.0

0.5

1.0 0 2 4 6 8 10

point density/ point

Figure 4.7: Comparison between RESP/631+ G* (dashed line) and REPD/631+ G* (solid line) charges with a=0.000252 for acetamide as the point density is varied. tical if conformational exibility is allowed in a simulation. An example is methanol which has three conformationally equivalent hydrogens on the carbon as shown in Figure 4.8. This issue of averaging conformationally equivalent atoms is merely a special case of general conformational dependence (see Subsection 4.3.4) in which the new conformation is identical to the old. Geometry averaging can be introduced as a constraint in the ab initio optimisation stage. Charge averaging is eectively a constraint on the tting procedure and consequently can reduce the quality of the t. Averaging for atoms that were equivalent by symmetry was found to have a negligibile eect on the quality of t. However, this was not so for conformationally equivalent atoms. The objective is to achieve averaging together with the best t possible. The
H H H

H H

H H

Figure 4.8: The three non-equivalent hydrogens highlighted in methanol.

CHAPTER 4. PARTIAL CHARGE METHODS

72

Table 4.5: REPD/631G* Methanol Charges, Dipole Moment and RRMS for Dierent Geometry and Charge Averaging Methods. Method no averaging geometry only charge during charge after charge two stage C 0.271 0.348 0.304 0.348 0.358 O -0.688 -0.708 -0.640 -0.708 -0.708 HO H (trans) 0.428 0.041 0.431 0.025 0.381 -0.015 0.431 -0.024 0.431 -0.027 H (gauche) -0.026 -0.048 -0.015 -0.024 -0.027 /D RRMS 1.84 0.129 1.85 0.130 1.98 0.181 2.16 0.211 2.15 0.211

ve alternatives similar to the work of Bayly et al. were considered and tested on methanol. Table 4.5 shows the results for each method. Firstly, no averaging of geometry or charges can be made. Secondly, an averaged geometry may be used. Averaging the geometry is seen to have a moderate eect on charges but little eect on dipole moment or RRMS. Geometry averaging was considered essential for usefulness in force elds and consistency if any subsequent charge averaging was to be used and so was retained for testing the next three methods. Thirdly charge averaging may be performed directly in the t. Charge averaging is achieved by adding together the rows for these atoms of the matrix equation (Eq. 4.5), reducing the dimensionality of the matrix.140 This charge averaging approach was found to adversely aect all charges, particularly the oxygen and polar hydrogen, as seen in the during row in Table 4.5. Fourthly, a tting procedure with no constraints was performed, followed by averaging of conformationally equivalent charges. Such averaging is seen to cause a quite a large increase in the RRMS and dipole moment. However, this method was discarded on the grounds that the nal charges are then not optimised to reproduce the MEP. Fifthly, the two-stage averaging procedure similar to that of Bayly et al. was tested that captures the advantages of the previous two methods. In this method, a charge t is rst performed without averaging. Then, while freezing the charges of all atoms not requiring averaging nor adjacent to ones that do, a second t is performed with conformationally equivalent atoms constrained to be the same, as in the rst method. This freezing in the second stage prevents the averaging from detrimentally aecting the charges of the rest of the molecule. An increase in dipole

CHAPTER 4. PARTIAL CHARGE METHODS moment and RRMS is still however unavoidable.

73

The fth approach was the one adopted as it seemed to be the best compromise between producing averaged charges while still tting to the MEP. However, unlike the approach of Bayly et al., the same restraint a was applied in both stages. One exception to the averaging rule was that amide hydrogens were excluded from the conformational averaging since their barrier to rotation is suciently high to keep them distinguishable in a simulation.

4.3
4.3.1

The REPD Charge Method.


Summary of the Method.

A method has been derived that addresses the deciencies of the RESP method and achieves the objective of OPLS-like charges. The full REPD method will now be summarised. Ab initio-optimised geometries are calculated at the Hartree-Fock (HF) level with 631G* and 631+ G* basis sets using Gaussian 94.103 Points are selected according to the method of Kollman and Singh,109 in which points are spaced using Connollys algorithm145 on four spheres centred on each atom with radii 1.4, 1.6, 1.8 and 2.0 times the atoms van der Waals radius. Points at a distance less than 1.4 times the van der Waals radius of any atom are excluded. The point density on each sphere is 1 point 2 , a sucient density to obtain converged charges. GAMESS141 A is used to generate a table containing the MEP at each point using the same ab initio method and basis set as the geometry optimisation method. All charge tting is performed using a modied version of the RESP module in the AMBER software package,146 which takes the MEP table from GAMESS as input. Distances r are in units of Bohr and charges q are in electronic charge units. The new restraint is independent of the number of points used in the tting procedure, is quadratic in charge, is applied uniformly to all atoms, and contains one adjustable parameter that is optimised to yield charges in close agreement with the equivalent OPLS parameters for a wide range of molecules.

CHAPTER 4. PARTIAL CHARGE METHODS

74

The values of a for the quadratic restraint with the HF/631G* and HF/631 + G* protocols are taken as 0.000184 and 0.000252, respectively. Each parameter a was calculated according to the following procedure. Firstly, the above method was used to calculate charges for a large diverse group of 29 molecules for which there are OPLS charges. The parameter a was then varied until the slope of a plot of all unique charges versus their corresponding OPLS values equalled unity. In this way, charges are obtained that both reproduce the MEP and are comparable in size to OPLS.

4.3.2

Comparison with EPD and OPLS Charges.

Four sets of charges have been calculated. These are EPD/631G*, REPD/631G*, EPD/631+ G* or REPD/631+ G*. They dier in their basis set, 631G* or 6 31+ G*, and whether a restraining function is used. The charge sets are given in Table A.1 in Appendix A, together with the OPLS charges.25 Figure 4.9 shows correlation plots of the set of unique charges calculated for each basis set versus their OPLS counterparts. For both sets of REPD charges, the slopes of the lines are of course 1.00 by denition and there is a good correlation with OPLS charges, with identical correlation coecients of 0.97. The correlation is poorer for unrestrained charges, with a correlation coecient of 0.93 for both EPD/631G* and EPD/631+ G* charges and slopes of 1.24 and 1.31, respectively. In addition, a correlation plot of one restrained charge set against the other (not shown) gives a correlation coecient of 1.00 and the expected slope of unity. For comparison, the RESP charges also give a better reproduction of OPLS charges than EPD, with a correlation coecient of 0.96, even though they were not designed to do this. Nevertheless, the slope is still 1.13 and there exist the previously mentioned problems.

4.3.3

Inuence of Molecule Set on the Parameterisation.

To determine the dependence of the parameter a on the molecules used in the parameterisation, a cross-validation analysis was used. In this procedure, the 29 molecules

CHAPTER 4. PARTIAL CHARGE METHODS


REPD/631G*
1 1

75
REPD/631 G*
+

q/e

1 1 0

m=1.00 r =0.97

q/e

1 1 1 0

m=1.00 r =0.97

qOPLS/e

qOPLS/e

EPD/631G*
1 1

EPD/631 G*

q/e

1 1 0

m=1.24 r =0.93

q/e

1 1 1 0

m=1.31 r =0.93

qOPLS/e

qOPLS/e

Figure 4.9: Correlation plots of REPD/631G*, REPD/631+ G*, EPD/631G* and EPD/631+ G* charges versus the corresponding OPLS parameters for the 29 molecules listed in Table A.1. were divided up randomly into 5 groups, each containing 5 or 6 molecules. For each group, the a parameter was derived from the remaining 4 groups and then applied to the original group to predict charges for the selection of molecules in that group. A correlation coecient between these charges and their OPLS counterparts was then calculated. This process was repeated 100 times, giving 500 dierent selections of the 29 molecules. The average correlation coecients for the predicted charges with OPLS charges were found to be 0.97 for both REPD/631G* and REPD/631 + G*, the same as the overall correlation coecient. To provide further insight into how a depends on the choice of molecules, Figure 4.10 shows how the absolute deviation of

CHAPTER 4. PARTIAL CHARGE METHODS


0.000200

76

0.000150

<|a|>

0.000100

0.000050

0.000000

number of molecules

10

20

30

Figure 4.10: Variation of the absolute error in a from its universal value with number of molecules used to derive a, averaged over 300 random molecule sequences. a from its universal value varies with the number of molecules used to derive a for the 631+ G* basis set. This graph has been obtained by averaging over 300 random molecule sequences, and by denition, reaches zero at 29 molecules. While there is a clear decreasing dependence of a on molecule selection as more molecules are used, this dependence is still non-negligible for twenty molecules, with a variation of around 0.000020. This corresponds to an average charge variation of 0.01. Clearly, no single a parameter is perfectly suited to all molecule types. In order to test how the a parameter performs for each chemical functionality, the molecules were divided up into groups chosen according to functionality, as shown in Table 4.6. Some aromatic molecules are put into two groups, since they contain two of the listed functionalities. Firstly, the universal parameter was applied to each functionality, giving a slope and correlation coecient. Secondly, a specic a parameter was calculated using only the molecules in each group. The closeness of the correlation coecient, the gradient to unity, and the individual a values compared to the universal a value indicate the appropriateness of the general restraint for that functionality. Table 4.6 reveals that the ethers, suldes and amides are over-restrained by the universal a value. The zero value of a for the ethers and suldes arises because their unrestrained charges are actually smaller than OPLS charges. On the other hand, the carbonyl compounds are not suciently restrained. While it is conceivable

CHAPTER 4. PARTIAL CHARGE METHODS

77

Table 4.6: Slopes, m, Correlation Coecients, r, and Restraint Parameters, a, for Each Functional Group. functionality alcohol/thiolc ether/thioetherd amidee aromaticf aldehyde/ketoneg carboxylic acid/esterh aminei otherj
a

m 0.957 0.777 0.940 0.957 1.073 1.147 1.019 1.125

631G* ra ab 0.98 0.000070 0.94 0.000000 0.97 0.000144 0.98 0.000138 0.99 0.000284 0.97 0.000392 0.98 0.000212 0.94 0.000354

m 0.939 0.749 0.857 0.905 1.021 1.091 0.974 1.066

631+ G* ra ab 0.99 0.000234 0.95 0.000000 0.97 0.000174 0.97 0.000200 0.99 0.000408 0.97 0.000456 0.98 0.000304 0.93 0.000450

Derived by applying the universal a value to each group. Derived by tting the charges of the molecules in each group to their OPLS counterparts to give a slope of unity. c Water, methanol, ethanol, methanethiol, ethanethiol, phenol. d Dimethyl ether, diethyl ether, dimethyl sulde, diethyl sulde. e Formamide, acetamide, trans-N-methyl acetamide. f Benzene, phenol, aniline, benzonitrile, chlorobenzene, benzoic acid. g Formaldehyde, acetaldehyde, acetone. h Acetic acid, methyl acetate, benzoic acid. i Ammonia, methylamine, ethylamine, aniline. j Methane, ethene, chloroethane.
b

that dierent a parameters could be calculated for dierent functional groups, not only would more parameters be being tted to fewer molecules, but it would then be impossible to calculate a values for functionalities not covered by the OPLS force eld. Thus the method only remains practicable if the universal a value is used. The charges obtained using all four protocols together with the OPLS charges25 are given in Table A.1 for all 29 molecules. There are a number of features to note. As expected, EPD charges are larger in magnitude than both REPD charges and OPLS charges, especially for buried atoms, while the charges of surface atoms such as in hydroxy groups decrease only a small amount on restraint. Secondly, EPD/631 + G* charges are generally larger in magnitude than EPD/631G* charges, while REPD/6 31+ G* and REPD/631G* charges are comparable in magnitude with respect to each other and to OPLS charges, as indeed they are designed to be. Thus, as mentioned above, a larger value of a is needed to restrain the larger EPD/631+ G* charges to

CHAPTER 4. PARTIAL CHARGE METHODS


H

78

H H

C C
H H

H H

N
H

C
H

g-g+1propylamine Figure 4.11: The two main dihedrals in 1propylamine. The rst conformation is about the CN bond. their corresponding OPLS values. In Subsection 5.1.5, further comparisons are made between all four charge sets in relation to their dipole moments and free energies of hydration.

4.3.4

Conformational Dependence.

One nal aspect of REPD charges to be discussed is their conformational dependence. 1-propylamine is used here as a test case. 1propylamine contains two signicant dihedrals as shown in Figure 4.11. The conformational dependence of charges may be assessed by observing how the dipole moment and RRMS of 1-propylamine vary when the charges derived from one conformation are applied to the other conformations.120, 121 Ideally, this variation should be negligible. The results are shown in Table 4.7. Along the top of this table are the charge sets derived from each conformation, both EPD and REPD. Along the side are the conformations to which these charge sets are applied. The diagonal elements indicate that the charge set is being applied to the conformation from which it was derived. For REPD charges compared to EPD charges, the dipole moments generally deviate less from ab initio and the increase in RRMS is smaller. It can therefore be concluded that the REPD charges are less conformationally dependent and better able to reproduce the MEP for other conformations. In the particular case where charges are applied to their own conformation, however, the relative performance of EPD and REPD charges is very similar.

CHAPTER 4. PARTIAL CHARGE METHODS

79

Table 4.7: Comparison of the Conformational Dependence between EPD and REPD Charges in Relation to the Dipole Moments and RRMS of the Fit for 1-Propylamine. Eect on the Dipole Moment 631+ G* Charge set aa ag ga gg + 1.52 1.49 2.05 2.09 2.05 1.94

test conab a formation initio EPD REPD aa 1.50 1.45b 1.39 ag 1.57 1.40 1.35 ga 1.44 1.99 1.76 gg + 1.50 2.27 2.10 g +g + 1.41 2.01 1.79 sum of errorsc 2.14 1.63

g +g +

EPD REPD EPD REPD EPD REPD EPD REPD

1.47 2.24 1.88 1.88 1.54 2.26 1.87 1.45 2.16 1.82 1.94 1.54 2.14 1.81 1.80 1.77 1.60 1.86 1.60 1.67 1.54 1.84 2.37 2.16 1.55 1.53 2.00 1.74 1.80 1.80 1.64 1.89 1.61 1.60 1.52 1.24 2.92 1.68 1.70 0.46 2.25 1.06

sum of errorsc

2.11 1.09

631G* Charge setd 2.05 1.14 2.09 1.08 1.56 0.44 Eect on the RRMS 631+ G* Charge set ag ga gg +

1.38 0.60

test conformationa aa ag ga gg + g +g + total RRMS

aa 0.21b 0.27 0.37 0.46 0.44 1.75

g +g +

EPD REPD EPD REPD EPD REPD EPD REPD EPD REPD

0.22 0.22 0.23 0.43 0.31 0.32 0.28 0.46 0.33 0.30 0.25 0.26 0.43 0.36 0.36 0.31 0.41 0.33 0.31 0.39 0.33 0.28 0.26 0.34 0.31 0.32 0.29 0.41 0.42 0.34 0.54 0.45 0.30 0.30 0.42 0.35 0.38 0.42 0.34 0.38 0.36 0.37 0.31 0.29 0.28 1.62 1.70 1.50 2.06 1.74 1.69 1.51 1.90 1.58 631G* Charge setd 1.66 1.42 1.75 1.53 1.59 1.43

total RRMS

1.76 1.54

1.59 1.39

a Conformations described by (lp)-N-C-C and N-C-C-C torsions. Anti and gauche conformations are denoted a and g, respectively. b Bold numbers indicate that charges are tested on the same conformation from which they were derived. c The error is taken relative to the ab initio dipole moment. d Only the errors and total RRMS are shown. The full table is similar to the 631+G* results.

The gg + REPD charge sets requires particular attention, since the errors in dipole moment and RRMS are particularly small for all conformations, suggesting that this charge set is the most suitable for exible 1-propylamine simulations. Coincidentally, this is also the charge set with the smallest charge on the nitrogen. These observations regarding dipole moment and RRMS are mainly due to the fact that restrained charges are smaller in magnitude and so the variation of charges with

CHAPTER 4. PARTIAL CHARGE METHODS

80

conformation will be correspondingly smaller. One method of dealing with conformational dependence is to produce an averaged set of charges over all conformations. Reynolds et al. 120 perform this averaging over dierent conformations, each conformation Boltzmann-weighted according to its energy. However, the method is more computationally demanding, only conformational energy minima are considered, and the charges produced are temperature-dependent. The main problem remains that there is generally no unique charge set able to reproduce well the MEP for all the conformations of a molecule. Charges that actually vary with conformation may be implemented in the force eld,125 although as yet this solution has not proved popular.

4.4

Conclusion.

The derivation, formulation and properties of the REPD charge method has been presented. The method is able to achieve its objective of producing OPLS-like charges by a much more accessible route than the OPLS method. This is done simply by tting to the MEP using a carefully chosen restraint. Charges derived by this method have been used for parameterising the thiourea unit in macrobicycle 12 (see Section 3.2). The usefulness of REPD charges may be further validated by testing their properties with experiment. Therefore the comparison of the free energies of hydration for molecules parameterised with REPD is made in the next chapter.

Chapter 5 Testing of REPD Charges by FEP and LIE


Since REPD charges are intended for use in condensed phase computer simulations, it is imperative that the charges are able to reproduce experimental condensed phase data. Therefore, to examine their performance, the commonly used technique of calculating the free energies of hydration for a range of small molecules and comparing with experiment was employed here. It is already well known that the free energies of hydration of molecules using EPD/631G* charges compare reasonably well with experiment with an average error of around 4.5 kJ mol1 .8891 This study allows the eect of the restraint and basis set on MEP charges to be examined.18 Owing to the computationally intensive nature of the FEP method, an alternative procedure was also tested, that of the linear interaction energy (LIE) method.65, 147, 148 Owing to potential overtting problems observed in the LIE parameterisation, a detailed statistical analysis of the method was subsequently performed to determine the best LIE equation.

5.1
5.1.1

FEP Free Energies of Hydration.


The Molecule Test Set.

The free energies of hydration were calculated for the 22 molecules listed in Table 5.1. Of the original 29 molecules used for the REPD parameterisation, this list excludes

81

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

82

7 of the molecules used in the original charge parameterisation for the following reasons. The suitability of the rigid molecule approximation was the main criterion for choosing these molecules. This approximation was made to simplify the free energy calculations and reduce other sources of error such as sampling or dihedral angle parameterisation that may complicate the assessment of the performance of each charge set. Thus most of the molecules included contain no signicant conformational degrees of freedom, apart from those associated with hydrogens on methyl groups. Of those molecules with conformational degrees of freedom, methyl acetate, acetic acid, phenol and aniline, were included to ensure that all chemical functionalities were represented. In addition, acetamide and trans-N-methyl acetamide were used as the barrier to rotation about the amide CN bond is large enough to justify rigidity. Benzonitrile was excluded because it lacked an experimental free energy of hydration, while formamide and formaldehyde were retained due to their usefulness as intermediates in the free energy calculations. Standard OPLS Lennard-Jones parameters were used.25 The atomic charges used were those developed in the previous chapter, namely EPD/631G*, REPD/631G*, EPD/631+ G*, and REPD/631+ G*. The 631G* and 631+ G* optimised geometries matched with the particular basis set used to derive the charges were adopted for the simulations. The simulations were performed in TIP4P water.101

5.1.2

Selection of Mutations.

To calculate the free energy of hydration for a rigid molecule, the molecule is mutated from itself to a non-interacting particle in aqueous solution as described in Subsection 2.3.5. The FEP method was used in this work to calculate this free energy change. As noted earlier in Subsection 2.2.2, the FEP equation (Eq. 2.18) only converges in practice if the two states are very similar. With this in mind, the mutation to nothing was divided up into a number of smaller stages. Firstly, molecules were mutated to their next simplest most similar molecule, as shown in the mutation tree in Figure 5.1. Where possible, perturbation pathways were chosen to minimise

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

83

methane acetaldehyde formaldehyde d d acetone methyl acetate ethene    dimethyl ether dimethyl sulde  OPLS methane r ammonia g f r t r TIP4P water TIP3P water rr gft r water gf t t gf methanethiol gf acetic acid gf f chloromethane g nothing formamide g i g methanol rr i d trans-N-methyl acetamide r i dr acetamide i aniline d d i  methylamine  i  benzene i i OPLS benzene  r tr r t chlorobenzene t t phenol

Figure 5.1: Free energy tree showing the mutations performed to calculate the free energies of hydration. Mutations for the grey lines were performed elsewhere.149151 changes in the number and position of atoms. The typical change in structure involved in a perturbation may be seen for the mutation from methyl acetate to acetone as given in Figure 5.2. By calculating the free energy changes for all the stages in the tree, the absolute free energy for any molecule was then calculated by summing the components. In this work, all molecules were eventually mutated to OPLS methane, OPLS benzene or TIP4P water, for which the decoupling free energies have already been determined.149151 Therefore, only the easier relative free energy calculations had to be performed. To further increase the similarity between end states of the
O
H H H

C
H

C
O

C
H H H

H Du

C
H

Methyl Acetate

Acetone

Figure 5.2: The change in geometry mutating from methyl acetate to acetone.

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

84

mutation, each of these mutations between molecules was subdivided further into a number of windows dened by the coupling parameter, , which varies from 0 to 1.

5.1.3

Simulation Protocol.

Free energy simulations were performed using BOSS 3.6.31 Each solute molecule was placed in a cubic box with side 27 containing 648 equilibrated water molecules. A To ensure faster equilibration, 710 water molecules with the highest interaction energy with the solute were discarded, depending on the solute size. Equilibrium congurations were generated in the NPT ensemble at 25 C and 1 atm using the MC Metropolis algorithm with quadratic feathering to zero over the last 0.5 . At A each window the system was equilibrated for 3 million (M) congurations followed by 5 M congurations of data collection. As a test for the convergence of the free energies, some of the mutations involving the more polar molecules were extended to 10 M congurations of data collection. Solute moves comprised 1 % and volume moves 0.04 % of all attempted congurations. Maximum move sizes for solute translations and rotations were selected to be between 0.10.3 and 1030 , respectively, to give A an acceptance probability of approximately 40%. The maximum volume move sizes were set to 320 3 . No internal solute moves were attempted due to the rigid molecule A approximation. The simulations run at each value of used exactly the same starting geometry since it was assumed that the equilibration was adequate. This allowed simulations of all windows to be run simultaneously. Propagating the coordinates from the end of one window to the start of the next can be used to save on equilibration costs. However, it does mean that all windows must run in sequence rather than parallel, dramatically increasing the length of time for a single calculation. At each window the free energy change was calculated for mutating to both the next and previous windows using Eq. 2.18. The free energy change was taken as the average of the forward and reverse free energies. Any non-zero dierence between the forward and reverse free energies provided by double-ended sampling allows the convergence to

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

85

be monitored. Such a dierence is commonly referred to as the hysteresis. Errors for each window were calculated as half the hysteresis between the forward and reverse free energy changes. Accumulation of the errors gave the total error for the overall perturbation. The mutations between molecules as given in Figure 5.1 were carried out using the REPD/631G* charge set. Geometries and non-bonded parameters were scaled linearly with between the initial and nal molecules. Atoms that disappeared were mutated to dummy atoms and had their bond lengths reduced to 0.2 . If no atoms were destroyed, six windows were used, spaced at 0.0, 0.2, A 0.4, 0.6, 0.8 and 1.0. However, additional windows were used in other cases in which signicant hysteresis of 1 kJ mol1 was evident for a given window, particularly where methyl groups were mutated to hydrogens and where polar groups were adjusted. The mutations for trans-N-methyl acetamide and methanol were broken up into two completely separate simulations for which rst the geometry and then the non-bonded parameters were perturbed. The free energies for the molecules parameterised with the remaining charge sets were calculated by mutating their charges to their respective REPD/631G* charges in three windows of 0.0, 0.5 and 1.0. Such mutations typically converged very quickly.

5.1.4

Results.

The free energies of hydration calculated by FEP are presented with simulation errors in Table 5.1 for all 22 molecules for each of the four charge sets, together with the experimental values.9295 The average unsigned error for 20 molecules with respect to experiment given at the bottom of the table indicates the overall performance for that charge set. To further clarify the comparison with experiment, Table 5.2 contains the errors with respect to experiment for each molecule and charge set. The average signed error is given at the bottom of this table and indicates whether the free energies are on average too positive or too negative compared with experiment. The longer simulations consisting of 10 M congurations of data collection gave smaller random errors as expected. Also, the calculated free energies from these longer runs

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

86

Table 5.1: Calculated Free Energies of Hydration (kJ mol1 ) Using FEP versus Experiment9295 for All Four Charge Sets. Molecule Gexpt methane 8.4 ethene 5.3 water -26.4 methanol -21.4 methanethiol -5.2 acetaldehyde -14.7 acetone -16.1 acetic acid -28.1 methyl acetate -13.9 ammonia -18.0 methylamine -19.1 acetamide -40.6 trans-N-methyl acetamide -42.2 dimethyl ether -7.9 dimethyl sulde -6.4 chloroethane -2.6 benzene -3.6 phenol -27.7 aniline -20.5 chlorobenzene -4.7 formaldehyde formamide Average unsigned errora
a

GEPD 631G* 9.00.8 6.01.0 -22.01.3 -18.31.2 -1.91.2 -14.21.7 -14.51.8 -31.82.7 -18.92.0 -10.71.6 -19.71.4 -43.13.6 -38.83.8 -3.01.4 1.61.7 1.72.2 -7.81.7 -27.02.0 -26.31.8 -9.71.9 -11.31.3 -40.42.0 3.5

GREPD GEPD+ G* 631G* 631 9.10.8 9.00.8 6.41.0 4.91.0 -20.71.3 -26.41.4 -16.51.2 -22.81.4 -1.71.2 -2.51.2 -12.91.7 -18.61.7 -12.81.8 -19.21.9 -27.82.7 -36.02.8 -15.91.9 -22.42.1 -12.71.6 -15.31.6 -16.21.3 -22.21.3 -36.63.5 -49.83.6 -35.43.8 -43.83.9 -2.91.4 -4.51.4 1.61.7 1.31.7 1.72.2 1.72.2 -7.61.7 -11.51.7 -23.82.0 -30.12.0 -19.51.7 -35.91.8 -9.51.9 -9.71.9 -10.61.3 -14.31.3 -34.61.9 -46.62.0 3.7 4.6

GREPDG* 631+ 9.10.8 5.61.0 -24.71.3 -19.81.2 -2.21.2 -16.61.7 -17.01.8 -30.02.7 -18.01.9 -12.31.6 -17.21.3 -40.93.5 -39.53.8 -4.71.4 1.41.7 1.72.2 -10.81.7 -25.22.0 -19.41.7 -9.41.9 -13.51.3 -38.22.2 2.9

With respect to experiment.

were identical to those obtained using 5 M congurations of data collection to within error. For example, the acetamide to methanol mutation changed from 18.33.3 to 20.12.4 kJ mol1 , the methanol to methane mutation changed from 24.80.8 to 25.60.8 kJ mol1 , and the acetic acid to methanol mutation was unchanged at 11.3 kJ mol1 , the error decreasing from 2.4 to 1.5 kJ mol1 . Thus 5 M congurations of data collection was sucient for the performance of all charge sets to be determined.

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

87

Table 5.2: Errors in Ghyd (kJ mol1 ) with Respect to Experiment for All 20 Molecules using All Four Charge Sets. Molecule methane ethene water methanol methanethiol acetaldehyde acetone acetic acid methyl acetate ammonia methylamine acetamide trans-N-methyl acetamide dimethyl ether dimethyl sulde chloroethane benzene phenol aniline chlorobenzene Average Signed Error ErrorEPD 631G* 0.6 0.7 4.4 3.1 3.3 0.5 1.6 -3.7 -5.0 7.3 -0.6 -2.5 3.4 4.9 8.0 4.3 -4.2 0.7 -5.8 -5.0 0.8 ErrorREPD 631G* 0.7 1.1 5.7 4.9 3.5 1.8 3.3 0.3 -2.0 5.3 2.9 4.0 6.8 5.0 8.0 4.3 -4.0 3.9 1.0 -4.8 2.6 ErrorEPD+ G* 631 0.6 -0.4 0.0 -1.4 2.7 -3.9 -3.1 -7.9 -8.5 2.7 -3.1 -9.2 1.6 3.4 7.7 4.3 -7.9 -2.4 -15.4 -5.0 -2.4 ErrorREPDG* 631+ 0.7 0.3 1.7 1.6 3.0 -1.9 -0.9 -2.1 -4.1 5.7 1.9 -0.3 2.7 3.2 7.8 4.3 -7.2 2.5 1.1 -4.7 0.8

5.1.5

Eect of Restraint, Basis Set and Geometry.

From Tables 5.1 and 5.2 it can be seen that the REPD/631+ G* charges perform the best with an average absolute error of only 2.9 kJ mol1 . Figure 5.3 gives a clear indication of the good correlation between REPD/631+ G* and experimental free energies of hydration. The slope for the line of best t is 0.98 and the correlation coecient is 0.97. The slope and correlation data for the other three charge sets may be found in Table 5.3. The commonly used EPD/631G* results are the

next most reliable with an average absolute error of 3.5 kJ mol1 . The REPD/6 31G* and EPD/631+ G* charge sets perform less satisfactorily, the former being too hydrophobic and the latter too hydrophilic. Evidently, application of the restraint makes the free energies of hydration less negative while changing from the 631G* to the 631+ G* basis set makes them more negative. For the REPD/631+ G* charge

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE


20

88

Ghyd(REPD/631 G*) /kJ mol

20 m=0.98 r =0.97 40

40

20 0 1 Ghyd(Experimental) /kJ mol

20

Figure 5.3: FEP REPD/631+ G* free energies of hydration versus experiment for the molecules listed in Table 5.1. The dashed line is the line of best t while the solid line has unit slope. set, these eects largely cancel, producing free energies with a similar average signed error to EPD/631G* but with an improved average absolute error. Much of the observed free energy behaviour may be understood in terms of the molcules dipole moments; these strongly inuence the water structure surrounding the solute molecule. Table 5.4 contains the dipole moments for all 22 molecules with all four charge sets. The general trends are as follows. The restraint slightly decreases the dipole moment of 631G* charges by on average 1.3%. This would be expected to have only a small eect on Ghyd , but as Table 5.2 indicates, the average free Table 5.3: Slopes, m, and Correlation Coecients, r, for Ghyd of All Four Charges Sets versus Experiment. Charge Set EPD/631G* REPD/631G* EPD/631+ G* REPD/631+ G* m 1.02 0.91 1.15 0.98 r 0.96 0.97 0.95 0.97

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

89

Table 5.4: Dipole Moments (D) for All 22 Molecules from All Four Charge Sets. Molecule methane ethene water methanol methanethiol acetaldehyde acetone acetic acid methyl acetate ammonia methylamine acetamide trans-N-methyl acetamide dimethyl ether dimethyl sulde chloroethane benzene phenol aniline chlorobenzene formaldehyde formamide EPD 631G* 0.00 0.00 2.25 2.15 2.07 3.05 3.17 1.72 1.95 1.96 1.82 4.04 4.09 1.65 1.99 2.48 0.00 1.95 1.60 2.24 2.68 4.08 REPD 631G* 0.00 0.00 2.22 2.10 2.06 3.02 3.16 1.71 1.97 1.90 1.72 4.00 4.09 1.64 2.00 2.47 0.00 1.85 1.57 2.23 2.66 4.03 EPD+ G* 631 0.00 0.00 2.35 2.27 2.12 3.30 3.42 1.86 2.07 1.94 1.86 4.30 4.29 1.72 2.04 2.53 0.00 2.01 1.58 2.21 2.87 4.28 REPDG* 631+ 0.00 0.00 2.30 2.20 2.10 3.26 3.41 1.85 2.12 1.84 1.72 4.26 4.30 1.72 2.04 2.53 0.00 1.88 1.58 2.17 2.87 4.22

energy of hydration become more positive by 1.8 kJ mol1 . Switching to the 631+ G* basis sets increases the dipole moment by an average of 4.0% for EPD charges and by a similar value for REPD charges. By combining the 631+ G* basis set with the restraint, these two eects cancel to some extent giving slightly more polarised molecules with OPLS-like charges which reproduce well experimental free energies of hydration. The results presented are an improvement on those of Carlson et al.89 who calculated an average error of 4.4 kJ mol1 for the 13 molecules used in their calculations. In that study, EPD/631G* charges were used with OPLS rather than ab initio geometries. To allow a more valid comparison, the 9 molecules used in both this study and the work of Carlson et al. were examined. In this work, the average error using the 9 common molecules was 3.2 kJ mol1 for EPD/631G* and 2.7 kJ mol1

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

90

for REPD/631+ G*, an improvement on the Carlson et al. result of 4.9 kJ mol1 . While the performance of REPD charges would be expected to be dierent, there are a number of possible reasons for this dierence in EPD charges. These include the use of dierent geometries, point selection schemes and FEP simulation protocols. As an example of the importance of geometry, 631G* optimised acetamide with EPD/631G* charges was mutated to the OPLS geometry with EPD/631G* charges rederived for that geometry. The free energy for this mutation was calculated to be 6 kJ mol1 . Part of this may be attributed to the increase in dipole moment from 4.04 D to 4.38 D for the OPLS geometry. However, this analysis has only be performed using a small number of molecules. The detrimental eect of a few poorly-reproduced molecules can strongly aect the overall average error. More molecules would need to be considered and a more systematic investigation undertaken to ascertain whether the dierence between these two studies is signicant.

5.1.6

Particular Discrepancies with Experiment.

For a number of molecules studied here, the calculated free energies of hydration are rather inaccurate, irrespective of charge set. REPD/631+ G* molecules with an error greater than 4 kJ mol1 are methyl acetate, benzene and chlorobenzene which are too hydrophilic, and ammonia, dimethyl sulde, and chloroethane which are too hydrophobic. The charge parameteristion seems to be wanting for third row atoms such as sulfur and chlorine. All of the charges listed in the previous chapter for sulfur and chlorine deviate signicantly from OPLS values but there is no consistent trend between Ghyd and charge. This does raise concerns with the application of such charges to the macrobicycle 12 system. However, as discussed in Section 3.2, this may be partially put down to 631+ G* geometries being inadequate. The rigid molecule approximation may be responsible for the discrepancy for a number of larger molecules such as methyl acetate. Benzene is an especially interesting case. Owing to symmetry, only one parameter is necessary to describe its electrostatic properties within the point charge approximation, making this number particularly

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

91

critical. A trend is apparent by comparing the benzene free energies with charges. The EPD/631G* value in Table 5.1, the OPLS value150 and the value of Carlson et al.89 are, respectively, -7.8 kJ mol1 , -3.8 kJ mol1 and -1.7 kJ mol1 while the corresponding charges are -0.133, -0.115 and -0.103; it is clear that benzenes free energy of hydration scales rather sensitively with its carbon charge. A dierence of 0.016 in charge between EPD/631G* and EPD/631+ G* benzene results in a large change in free energy of 3.7 kJ mol1 . This sensitivity is especially important given the strong dependence of benzenes EPD charges on point selection as discussed in Subsection 4.2.3. Fortunately, for molecules with a lower symmetry than benzene such as aniline and phenol, the agreement with experiment improves. The disagreement for ammonia is perplexing given the success for water; it may conceivably be related to geometry, the accuracy of which is especially important for small, polar molecules. The above-mentioned discrepancies may also be due to the Lennard-Jones parameters and even the form of the force eld itself. If these problem molecules are removed, the average error for REPD/631+ G* molecules reduces to only 1.7 kJ mol1 , signicantly better than the corresponding error of 2.6 kJ mol1 for EPD/631G*. It has been noted by a number of workers that free energy calculations fail to reproduce the increase in hydrophilicity observed experimentally for acetamide and ammonia when a hydrogen on the nitrogen is replaced by a methyl group.88, 90 For acetamide and trans-N-methyl acetamide, the results of this work are no dierent. However, for the REPD/631+ G* charge set, the relative free energy dierence was calculated to be incorrect by only 3.02.8 kJ mol1 , an improvement on previous studies and arguably within the limits of simulation error. In contrast to previous work, methylamine is found to be more hydrophilic than ammonia, although this result is probably aided by the previously discussed problems for ammonia. Assumptions and approximations inhererent in all force elds will inevitably work better for some molecules than others. Nevertheless, the main objective of having a general method that works well for as many molecules as possible has been achieved.

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

92

5.2
5.2.1

LIE Free Energies of Hydration.


Form of the LIE Equation.

The Linear Interaction Energy (LIE) method is a fast free energy method that was originally proposed by qvist et al.65 for the calculation of binding free energies for A protein ligand systems. This method has the advantage of only requiring simulations at the end points of a mutation but suers from the disadvantage of being an approximate, empirical method requiring parameterisation to known free energy data, obtained either from experiment or simulation. In the original formulation,65 binding free energies were assumed to depend on two terms. These were the dierences in van der Waals and electrostatic ligand-surrounding energies between simulations of a free ligand in solution and a ligand bound to a solvated protein. Each of these was obtained by averaging the individual values over all equilibrium congurations. The dependence was assumed to be linear, with and the respective coecients, as in the equation Gbind = < Uvdw > + < Uelec > . (5.1)

was set to 1/2 in that work, drawing on the Born model for the free energy of hydration of ions. Hence only one parameter, , was adjusted in tting to experimental data. Its value for their system was 0.161. The methodology has subsequently been applied to the calculation of free energies of solvation.147, 148 In its application to such calculations, Carlson and Jorgensen147 rstly found that the model would reproduce experiment much better if was allowed to vary. Furthermore, they studied three additional possible contributions to Gbind . The inclusion of an additional term was deemed necessary because both existing terms were negative and a positive contribution was necessary to account for molecules like methane with a positive free energy of hydration. It was physically rationalised as a water cavitation term. Possible candidates they considered were the molecular surface area, molecular volume and solvent accessible surface area (SASA). SASA is

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

93

the combined area of all atoms with their radii augmented by a solvent probe radius. In water, this is conventionally set to 1.4 . The inclusion of each of these additional A terms was able to improve the reproduction of experiment similarly, yet they elected to adopt SASA as the third tting term since its performance was marginally better. The LIE equation in their work became Ghyd = Uvdw + Uelec + (SASA) (5.2)

where Uelec and Uvdw are the absolute solute-solvent electrostatic and van der Waals energies, respectively, between the solute and water. , and parameters Carlson and Jorgensen derived were 0.348, 0.444 and 0.023 tting to FEP free energies, and 0.489, 0.421 and 0.020 tting to experiment. Subsequent to this, qvist and A Hannson claried their work, saying that a value of 0.5 for was valid for ionic systems but a value of 0.4 was more appropriate for neutral dipolar molecules.152

5.2.2

LIE Protocol.

Eq. 5.2 was the form of the LIE equation tested in this work. All the necessary data came from the end windows of the previous FEP computer simulations for 22 small molecules. Since the solutes were held rigid in these calculations, SASA was constant for each solute over the simulation. The parameters , and were optimised using a subplex algorithm153 to minimise the absolute dierence between Eq. 5.2 and either FEP or experimental free energies, summed over all molecules in the tting set. Separate tting was performed for the molecules in each of the four charge sets. A combined t using all molecules with all charges sets was also tested. The simulation protocol was identical to that applied in the FEP calculations. Uelec and Uvdw were calculated from the same 5 M congurations used for the rst window of that molecules FEP simulation. SASAs were calculated using Macromodel.154 Formaldehyde and formamide were excluded from the ts to experiment since the experimental data were not available, but were included in the t to FEP.

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

94

Table 5.5: van der Waals, Electrostatic Solute-Solvent Energies (kJ mol1 ) and Solvent Accessible Surface Areas (2 ) for EPD/631G* and REPD/631+ G* molecules. A Molecule methane ethene water methanol methanethiol acetaldehyde acetone acetic acid methyl acetate ammonia methylamine acetamide trans-N-methyl acetamide dimethyl ether dimethyl sulde chloroethane benzene phenol aniline chlorobenzene formaldehyde formamide EPD/631G*
Uvdw Uelec SASA

REPD/631+ G*
Uvdw Uelec SASA

-14.3 -17.8 10.6 -7.0 -16.6 -19.3 -26.6 -14.7 -30.6 2.4 -5.8 -15.3 -29.3 -23.1 -30.2 -28.2 -38.7 -36.7 -34.8 -48.6 -12.9 -8.0

-0.2 -9.9 -85.1 -67.9 -30.6 -50.5 -59.5 -96.0 -60.5 -58.3 -59.3 -111.2 -107.4 -34.0 -16.5 -17.2 -27.9 -66.5 -74.3 -17.7 -40.1 -111.1

143 170 114 160 182 189 219 198 234 132 172 204 240 197 214 178 245 256 265 243 153 171

-13.5 -16.2 11.4 -6.1 -17.6 -17.2 -24.1 -16.1 -30.6 1.4 -5.3 -18.4 -32.3 -20.7 -30.5 -28.9 -38.7 -32.7 -38.4 -49.7 -12.2 -10.3

-0.1 -12.9 -93.2 -69.8 -28.3 -59.3 -69.4 -87.7 -62.4 -55.0 -61.8 -101.1 -95.2 -35.9 -18.8 -17.8 -41.4 -72.7 -61.0 -17.9 -45.9 -97.4

144 170 114 160 182 189 219 197 243 132 172 204 240 197 214 178 244 256 265 243 153 171

5.2.3

Derivation of the LIE Parameters.

The energy and SASA data required for the LIE parameterisation have already been calculated for each molecule in water in the FEP Monte Carlo simulations and are presented in Table 5.5 for the EPD/631G* and REPD/631+ G* charge sets. The FEP results suggested that these charge sets were the most useful. The statistical errors on Uvdw and Uelec are standard errors calculated over batch averages of 0.5 M congurations and vary from zero to approximately 2 kJ mol1 and 6 kJ mol1 , respectively. Comparing the energy data obtained for each molecule, there is no discernable trend between the two charge sets and the SASAs are almost identical. Table 5.6 contains the , and parameters obtained from tting the energy components and SASA to either experimental or FEP free energies of hydration us-

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

95

Table 5.6: The LIE Parameters, , and for Each Charge Set Fitted Using Eq. 5.2 to Experiment and FEP. G Origin
Experiment

FEP

EPD/6-31G* 0.303 REPD/6-31G* 0.382 + G* EPD/6-31 0.366 + G* REPD/6-31 0.370 All Molecules 0.350 EPD/6-31G* 0.678 REPD/6-31G* 0.582 +G* EPD/6-31 0.506 +G* REPD/631 0.532 All Molecules 0.594

Charge Set

0.467 0.506 0.444 0.502 0.471 0.525 0.491 0.519 0.551 0.526

0.078 0.092 0.086 0.093 0.085 0.137 0.120 0.115 0.131 0.129

ing Eq. 5.2. The values of LIE parameters obtained from the tting procedure are consistent with those listed earlier for free energies of hydration.147, 148, 152 In particular, the values are all found to lie in the vicinity of 0.5 as predicted by theory,65 while there is more variation for and . Experimentally tted values for and lie around 0.35 and 0.08 while FEP-tted values are 0.6 and 0.12, respectively. This discrepancy is discussed in the next subsection. The parameters derived by tting to all molecules of all charge sets lie in the range spanned by those from each individual charge set. Overall, though, the parameters do vary somewhat between charge sets, particularly when tting to FEP free energies of hydration. This may either be a real physical eect or purely statistical. However, in theory the LIE free energies are assumed to depend strictly on Uelec , Uvdw and SASA, regardless of how the charges are derived. Some coupling may be expected between the energy terms for dierent charge sets, but the SASA term and its associated parameters strictly should not vary if its inclusion is physically the result of solvent cavitation. That the parameter does vary is indicative of some correlation between SASA and the other terms. Indeed, the correlation coecient between and is found to be 0.96. Hence the LIE equation being used may be overtting to the data.

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

96

Table 5.7: The Slopes, m, Correlation Coecients, r, Average Errors and CV Errors for Each Charge Set Fitted Using Eq. 5.2 to Experiment and FEP. G Origin m Experiment EPD/6-31G* 0.98 REPD/6-31G* 0.89 + G* EPD/6-31 0.95 + G* REPD/6-31 0.94 All Molecules 0.93 FEP EPD/6-31G* 0.97 REPD/6-31G* 0.90 +G* EPD/6-31 0.95 +G* REPD/631 1.01 All Molecules 0.96 Charge Set r Average Error CV Error 0.97 2.2 2.9 0.94 3.0 3.7 0.92 3.6 4.7 0.96 3.1 4.2 0.94 3.2 3.4 0.98 1.9 2.2 0.97 2.5 3.4 0.98 2.3 2.9 0.98 1.8 2.4 0.98 2.3 2.4

5.2.4

Performance of LIE Free Energies.

Table 5.7 gives the gradients and correlation coecients, average unsigned errors and average cross validation (CV) errors of the predicted LIE free energies versus experiment for all four charge sets. The CV error was calculated by dividing each set of molecules into four groups at random, tting to three of the groups and applying the resulting LIE parameters to the fourth to give an average error. Each of the four groups was examined in turn and the complete procedure repeated 100 times. The predicted and experimental free energies of hydration together with the errors between them for EPD/631G* and REPD/631+ G* are given in Table 5.8. The results are also presented in graphical form in Figure 5.4 for EPD/631G* charges. Although the utility of the LIE method involves tting to experimental free energies of hydration, in assessing the performance of the LIE method itself, the results of tting to FEP values and comparing with experiment should also be examined. As FEP is formally an exact method, the ability of LIE to reproduce FEP results is a critical test of the LIE free energy methodology. It is apparent from Table 5.7 that all the LIE free energies agree well with their equivalent FEP values to within approximately 2 kJ mol1 . REPD/631+ G* and EPD/631G* are in closest agreement, with mean unsigned errors of 1.8 kJ mol 1 and 1.9 kJ mol1 , respectively. The ability of LIE to reproduce experimental free

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

97

Table 5.8: Predicted LIE Free Energies of Hydration (kJ mol1 ) Fitted to Experiment Using Eq. 5.2 for the EPD/631G* and REPD/631+ G* Charge Sets. Molecule methane ethene water methanol methanethiol acetaldehyde acetone acetic acid methyl acetate ammonia methylamine acetamide trans-N-methyl acetamide dimethyl ether dimethyl sulde chloroethane benzene phenol aniline chlorobenzene formaldehyde formamide Gexpt 8.4 5.3 -26.4 -21.4 -5.2 -14.7 -16.1 -28.1 -13.9 -18.0 -19.1 -40.6 -42.2 -7.9 -6.4 -2.6 -3.6 -27.7 -20.5 -4.7 GEPD 631G* 6.8 3.3 -27.6 -21.3 -5.0 -14.7 -18.7 -33.8 -19.2 -16.1 -16.0 -40.6 -40.2 -7.5 -0.1 -2.6 -5.6 -22.1 -24.5 -4.0 -10.7 -40.9 Error 1.6 2.0 1.2 -0.1 -0.2 0.0 2.6 5.7 5.3 -1.9 -3.1 -0.0 -2.0 -0.4 -6.3 -0.0 2.0 -5.6 4.0 -0.7 GREPDG* 631+ 8.4 3.5 -32.0 -22.3 -3.7 -18.5 -23.2 -31.5 -19.9 -14.8 -16.9 -38.5 -37.3 -7.3 -0.7 -3.1 -12.3 -24.6 -20.0 -4.7 -13.3 -36.8 Error 0.0 1.8 5.6 0.9 -1.5 3.8 7.1 3.4 6.0 -3.2 -2.2 -2.1 -4.9 -0.6 -5.7 0.5 8.7 -3.1 -0.5 0.0 -

energies of hydration examines the reliability not only of the free energy methodology but also the force eld. In this regard, the EPD/631G* charge set is the most reliable, with a mean unsigned error of 2.2 kJ mol1 , followed by both REPD charge sets with mean unsigned errors of approximately 3.0 kJ mol1 . For each charge set, these errors compared to experiment are larger than those obtained on tting to FEP, as would be expected since the reliability of both the LIE relationship and the force eld aect the t. The most noteworthy comparison is that the mean unsigned errors for the LIE method tted to experiment are at worst the same as the equivalent errors for the FEP calculations reported in Table 5.1. Only the FEP REPD/631+ G* results are comparable with the LIE results. This suggests that the LIE parameterisation is able to reduce the errors in predicted free energies arising from inadequacies in the force eld. This indeed appears to be the case for molecules such as methanethiol,

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE


20

98

Ghyd(EPD/631G*) /kJ mol

20 m=0.98 r =0.97 40

40

20 0 1 Ghyd(Experimental) /kJ mol

20

Figure 5.4: LIE EPD/631G* free energies of hydration tted using Eq. 5.2 versus experiment for the molecules listed in Table 5.8. The dashed line is the line of best t while the solid line has unit slope. ammonia, dimethyl ether, and chlorobenzene. However, this is probably as a result of tting rather than any intrinsic property of the method, since for other molecules LIE fares worse than FEP such as acetone, acetic acid, methyl acetate and phenol. The CV errors presented in Table 5.6 give an indication of the sensitivity of the LIE parameters to the molecules used in the t. This sensitivity is revealed by the degree to which the CV error is worse than the average unsigned error. For the individual charge sets the CV errors given in Table 5.6 are larger than the average error by between 0.3 and 1.1 kJ mol1 . These CV errors suggest that more molecules should be included in the tting procedure, especially when tting to experiment, to reduce the dependence on the molecules used in the parameterisation. Consequently, the parameterisations were repeated using all the molecules from every charge set. In this way, the inuence of a few poorly reproduced molecules would be reduced and indeed the resulting CV error of 3.4 or 2.4 kcal mol1 for experiment or FEP is now only marginally worse than the average values of 3.2 and 2.3 kcal mol1 ,

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

99

respectively. Nevertheless, the average error is now worsened for EPD/631G* and REPD/631+ G* charge sets so it is advisable to retain the parameters derived from each charge set. The REPD/631G* and EPD/631+ G* charge sets, due to their overall worse performance, are no longer considered for the remainder of this section.

5.2.5

Overtting to the Data.

As already indicated, the values of and obtained from the tting procedure vary signicantly between charge sets. This result is undesirable and raises the possibility that Eq. 5.2 may in fact be overtting to the data. In this regard, the strong correlation between and observed in Table 5.6 and noted elsewhere147 is of particular concern since it suggests that Uvdw and SASA contain similar information regarding free energies. Fitting one parameter to each one is therefore arguably overtting. To gauge the importance of the three molecular quantities, Uvdw , Uelec and SASA, used in Eq. 5.2, a procedure involving the randomisation of each of the molecular quantities was employed. In this method, the values of a given quantity were reassigned at random to each molecule, while the remaining two terms were unaltered. A t was then performed using Eq. 5.2 and the average error and correlation coecient calculated. This procedure was repeated 100 times and the overall average error and correlation coecient obtained. If the calculated free energies of hydration are strongly dependent on one of these quantities, then the t would be expected to become extremely poor. On the other hand, if the t is largely unaected, it would suggest that use of that particular quantity in the tting equation is unjustied. The results for tting EPD/631G* and REPD/631+ G* energy data with FEP free energies of hydration using Eq. 5.2 are presented in Table 5.9. It is evident that when Uelec is randomised, the average error increases dramatically and the correlation of predicted free energy versus FEP is severely degraded. However, when Uvdw and SASA are randomised, the average error and correlation coecient increase much less signicantly. These results do not suggest that the energy terms and SASA used in

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

100

Table 5.9: Average Errors and Correlation Coecients Obtained when Fitting to FEP Using Eq. 5.2 with Randomised Uelec , Uvdw and SASA. Variable Uvdw Uelec SASA EPD/631G* Average Error r 3.6 0.94 9.6 0.62 2.8 0.97 REPD/631+ G* Average Error r 3.5 0.94 8.9 0.58 3.0 0.96

Eq. 5.2 have no inuence on the free energies, but they provide clear indications that electrostatic term appears to be always by far the most signicant. The contribution to the total free energy from both Uvdw and SASA may at best not be properly resolved. This would be particularly true when there is more noise in the t, as is the case, for example, when a smaller number of molecules or shorter simulations are used. Alternatively, at worst, these two terms have little if any systematic eect on total free energies of hydration.

5.2.6

Alternative LIE Functions.

A range of other LIE functions including various combinations of the Uelec , Uvdw and SASA terms were examined for tting EPD/631G* energy data to FEP free energies of hydration. A similar analysis has been performed by McDonald et al. in chloroform.148 The results given in Table 5.10 indicate that reducing the number of parameters in the LIE relationship does indeed lower the tting ability of the function, as would be expected. However, a number of these relationships perform almost as Table 5.10: The LIE Parameters, Average Errors and CV errors Fitting to FEP using Each Function for the EPD/631G* Charge Set. Function Uvdw + Uelec + (SASA) 0.678 0.525 Uelec + (SASA) + 1.471 0.424 Uvdw + Uelec + 0.202 0.448 Uvdw + Uelec -0.042 0.337 Uelec + (SASA) 0.426 Uelec + 0.449 Uelec 0.331 0.137 0.030 1.51 14.0 0.039 9.10 error 2.2 3.7 2.1 4.6 3.9 2.8 4.7 CV error 2.9 4.1 3.1 5.8 4.6 3.1 5.3

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

101

well as Eq. 5.2, but with either fewer parameters, energy terms, or indeed, both. Two equations of particular interest are the following: G = Uvdw + Uelec + (5.3)

G = Uelec +

(5.4)

The SASA term, usually a positive term, was originally included to allow the possibility of positive free energies of hydration.147 Eq. 5.3 and 5.4 also allow for this possibility by virtue of and being positive in each case. In Eq. 5.3, the SASA term has been replaced by a single parameter, , and the resulting t is actually 0.1 kJ mol1 better than that observed for Eq. 5.2. Given that the statistical errors on Uvdw and Uelec (data not shown) are generally larger than this, the question must be asked as to whether the inclusion of a SASA term is statistically signicant. In Eq. 5.4, an interesting two parameter equation is proposed involving only the variable, Uelec . The increase in error of 0.6 kJ mol1 is of the same order of magnitude as the statistical errors in Uelec . Finally, tting Eq. 5.4 to FEP for all four charge sets in turn yields values of that range from 0.418 to 0.449 and that range from 7.4 to 9.1 kJ mol1 . Since tting to FEP free energies of hydration should remove any systematic inuence arising from the dierent nonbonded parameters, one would expect the LIE parameters obtained for each charge set to be virtually identical. Even though fewer parameters are used, there is still some variation in them with charge set. However the variation is less than that observed in the parameters derived using Eq. 5.2. A physical interpretation of Eq. 5.4 is that the term corresponds to some averaged free energy of hydration for a solute with charge parameters of zero. Indeed, the free energies of hydration of methane, ethane and propane are all approximately 8 kJ mol1 ,155 close to the values of obtained on tting to FEP. Despite these preliminary results, a detailed statistical analysis of the LIE method was clearly necessary and this work is presented in the next section.

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

102

5.3
5.3.1

Analysis of the LIE Method.


Motivation for the Analysis.

The work in this section assessing the validity of the terms in the LIE equation applied to free energies of hydration was performed by Wall and Essex, who were also applying the LIE method to the prediction of free energies of binding for inhibitors of the enzyme neuraminidase and undertook a similar analysis for their system.156 In their work, due to the previously discussed problems with the LIE method, an investigation was subsequently carried out concerning the most appropriate equation for the calculation of free energies of binding to neuraminidase. This study revealed several important factors that must be considered when carrying out such an analysis. Firstly, it highlighted the fact that the widely used Multiple Linear Regression (MLR) method157 for assessing the signicance of variables was not appropriate for the data set used due to intrinsic cross correlations within the data.157 To overcome these correlations, generalised biased regression methods based on orthogonalised variables were carried out by implementing the continuum regression (CR) method.158160 Secondly, since the aim of LIE equations was ultimately to predict unknown energies, identication of the most appropriate model should be based on predictive ability and not the quality of t to the current data set. Therefore the purpose of this current study became twofold, rstly to investigate whether the same issues were important for free energy of solvation calculations, and secondly, to identify the most important variables and valid tting equation for such calculations. By having a range of charge sets to examine, possible agreement between dierent sets would reinforce any subsequent conclusion, and likewise disagreement indicate possible problems.

5.3.2

Correlation Analysis.

In subsequent discussions the charge sets will be referred to using the following notation: U denotes the unrestrained EPD method and R REPD. P indicates polarisation functions in the basis set (631+ G*), and otherwise the 631G* basis set is used. E

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

103

Table 5.11: Correlation Analysis of Energy Components for Data Set RE. G Uvdw Uelec SASA G 1 -0.313 -0.918 -0.112 Uvdw 1 -0.017 0.926 Uelec SASA

1 -0.183

and S represent ts to experiment and FEP simulation, respectively. For example, UE is the data set where the charges are obtained by applying the EPD method at the 631G* level and the t is to experimental free energies of hydration. Conversely, RPS is the set where charges obtained by the REPD method at the 631+ G* level are tted to free energies obtained by FEP simulations. The rst step in the investigation was to perform a correlation analysis to detect any correlations in the energy and SASA data. The number of molecules in the charge sets is 2022. Therefore, if the correlation coecient is greater than 0.42, or less than 0.42, then there is a 95% chance that the two variables are genuinely correlated. Table 5.11 gives the correlation matrix for a typical data set, RE, and shows two signicant correlations, the rst between G and Uelec and the second between Uvdw and SASA. Both these correlations are important, but for dierent reasons. The rst supports the idea that G is strongly dependent on Uelec , while the second suggests that Uvdw and SASA contain similar information and that one may be redundant to describe G. The data also suggest that there is a minor correlation between G and Uvdw , and little correlation at all between SASA and G. Similar values for these correlation coecients were obtained for the other seven data sets. One possible method considered to determine the correct tting equation was MLR.157 In this method one compares a quantity called the t-statistic to standard tabulated values for each variable and this tells whether the variable contributes signicantly to the dependent variable, G. One important assumption which underpins the MLR procedure is that the variables are independent. The high correlation between Uvdw and SASA violates this assumption, indicating that the use of MLR is

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE inappropriate.

104

5.3.3

Biased Regression Methods.

Having established that MLR is not valid for this data set, biased regression methods had to be applied. These methods construct a series of orthogonal components from linear combinations of the original variables. The regression analysis is then carried out on these orthogonalised components, thereby alleviating the problems associated with correlated variables. Partial Least Squares (PLS) and Principle Components Regression (PCR) are two well known biased regression methods which use dierent component construction criteria. However, a more recent generalised procedure called Continuum Regression (CR)158160 encompasses both these methods as well as Ordinary Least Squares (OLS). The Portsmouth formulation of CR159 implemented using the PARAGON drug design software161 uses a parameter, CR , to determine the component construction criteria CR takes values between 0 and 1.5 where CR = 0 and CR = 1.5 correspond to OLS, CR = 0.5 is PLS and CR = 1 corresponds to PCR. Intermediate values of CR represent hybrids of these methods. The CR procedure involves systematic variation of the CR parameter from 0 to 1.5 to obtain a set of models each based on dierent component constructions. Selection of the best model is based on the Leave One Out cross validated correlation coecient (q2 ). This best model is then transformed back to the original data space. CR was applied exhaustively to each data set using every possible combination of the descriptor variables, Uvdw , Uelec and SASA. Therefore, not only was the full LIE equation considered (Eq. 5.2) but so were all equations involving subsets of these descriptor variables. It should be noted that each equation examined also includes a constant term that arises from the back transformation to the original data space. For each data set, the most predictive model was taken as the one with the highest q2 value. CR was taken as the value that gave this particular q2 value. The number of components used is decided according to standard signicance tests.

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

105

Table 5.12: Best Model for Each Charge Set Showing the Variables Included, Number of Components, q2 and the Corresponding Value of CR . data set UE RE UPE RPE US RS UPS RPS Uvdw * * * * * * * * Uelec * * * * * * * * SASA components 1 1 1 1 1 2 2 1 q2 0.935 0.862 0.817 0.886 0.960 0.913 0.949 0.955 CR 0 0 1.3 0 0 0.2 0.1 0

* * *

5.3.4

The Most Predictive Model.

Table 5.12 lists the most predictive model found for each data set. Figure 5.5 shows an example of the dependence of q2 versus for the RE charge set. The most common formulation of the most predictive equation was that containing just Uelec and Uvdw , in ve of the eight cases. The number of components was found to range from 1 up to the number of variables. However, many other models were constructed with dierent component constructions that were almost as predictive as the best model. This is seen in the almost complete independence of q2 with CR , the largest variation in q2 being 0.038 for data set UPE. Furthermore, many models containing dierent descriptor variables produced results almost as good. Indeed, any model including Uelec gave similar good values for q2 . Another point of interest is that for ve of the eight models, the optimum value of CR is 0, corresponding to simple OLS. Between the charge sets themselves, again it can be seen that RP and U charge sets are more predictive than the others, and that S sets are more predictive than E sets. Despite the very similar performance of many models, the best models were adopted for further analysis.

5.3.5

MLR Versus CR.

Since it was observed that OLS gave the best predictions in the majority of cases, for a given set of variables it was decided to carry out the model construction process

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE


1

106

0.8

0.6 q 0.4 0.2 0 0


2

0.5 CR

1.5

Figure 5.5: Plot of cross validated correlation coecient (q2 ) versus CR parameter CR for model constructed from Uvdw and Uelec for charge set RE. using the build up MLR analysis, despite the correlations shown earlier, to examine performance. In this method, all descriptor variables were tested for signicance and the most signicant one, if it existed, became the rst term in the model. This was repeated until no more signicant variables could be added. Table 5.13 shows the results of the MLR procedure compared with the best model identied by CR. MLR fails to identify the best model in three cases, and when the optimum model is identied, the q2 is sometimes less than that obtained by CR, since MLR optimises r2 whereas CR optimises q2 . It should be noted that the dierences in q2 are only very subtle, and for this data set the MLR procedure generates highly predictive models. Table 5.13: Comparison of Best Model Obtained by CR for Each Charge Set with that Obtained by MLR. Data Set UE RE UPE RPE US RS UPS RPS Uvdw * * * * * * * * CR Uelec SASA * * * * * * * * * * * q2 0.935 0.862 0.817 0.886 0.960 0.913 0.949 0.955 Uvdw * MLR Uelec SASA * * * * * * * * * * * * q2 0.933 0.862 0.803 0.883 0.960 0.911 0.943 0.955

* * * *

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

107

Table 5.14: Table of Coecients, Standard Error (SE) and Signicance for Each Variable for the Best Model for Each Charge Set. Data Uvdw Uelec SASA constant Set SE. Sig. SE. Sig. SE. Sig. SE. Sig. UE 0.073 0.062 n 0.414 0.024 y 8.0 1.8 y RE 0.179 0.099 n 0.467 0.043 y - 11.4 2.8 y UPE -0.062 0.091 n 0.334 0.053 y 3.3 3.6 n RPE 0.103 0.082 n 0.442 0.037 y - 10.4 2.6 y US 0.211 0.053 y 0.454 0.018 y - 13.6 1.5 y RS 0.642 0.391 n 0.513 0.060 y 0.111 0.114 n 3.8 12.0 n UPS 0.626 0.381 n 0.545 0.063 y 0.162 0.110 n -5.1 11.5 n RPS 0.585 0.209 y 0.540 0.036 y 0.135 0.066 y -0.4 7.4 n However, CR remains the method of choice, since if applied carefully it will always generate the most predictive model. This is not true of MLR, as was found in the statistical analysis on the neuraminidase system.156

5.3.6

The Signicance of The Electrostatic Term.

Once the optimum model had been identied for each data set, a bootstrapping procedure162 was carried out on that model to estimate the standard errors on the coecients and hence establish the signicance of each variable. Table 5.14 shows the estimate of each coecient, its standard error and whether or not the associated variable is signicant at the 5% level. Bootstrapping suggests that Uvdw is only signicant for two models and SASA for one. This suggested that the LIE equation involving only the constant term and Uelec was justied. Therefore a model containing just Uelec was studied. Table 5.15 presents the q2 for the best model and for the model containing only the constant and Uelec term. The reduction of the model to the one variable equation causes a negligible reduction in q2 for the prediction of experimental results and only a small reduction of 0.08 in q2 for the ts to FEP

results. It is important to note that when the bootstrapping is carried out again on the model containing Uelec and a constant, both terms this time were found to be signicant in all cases. Since q2 is an indicator, not a denitive measure of predictive

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

108

Table 5.15: Comparison of Best Model With the Model Containing Just the Electrostatic Variable for Each Charge Set. Data Set UE RE UPE RPE US RS UPS RPS Uvdw * * * * * * * * Best Model Uelec SASA * * * * * * * * * * * q 0.935 0.862 0.817 0.886 0.960 0.913 0.949 0.955
2

Electrostatic Only Uelec q2 * 0.933 * 0.833 * 0.803 * 0.883 * 0.925 * 0.833 * 0.930 * 0.914

ability, the size of this dierence suggests that a model containing only Uelec and a constant (Eq. 5.4) is appropriate for LIE calculations of free energies of hydration, supporting the earlier suspicions. The nal coecients and errors for Eq. 5.4 tted using CR are given in Table 5.16. It can be seen that the UE error at 2.6 kcal mol1 is the smallest of all of them, even smaller than the US error, and smaller than all the FEP errors. Whether or not there is an element of luck is hard to say, but this is nevertheless a remarkable result given that only two parameters are being used and the error is only 0.4 kcal mol1 worse than the error for the original Eq. 5.2. The parameter at 0.405 is also close to the value predicted by qvist for dipolar molecules.152 A Table 5.16: Coecients for the Uelec + Model with Errors for Each Charge Set. Data Set UE RE UPE RPE US RS UPS RPS 0.405 0.437 0.342 0.429 0.427 0.418 0.436 0.449 Error 6.0 2.6 6.1 3.6 4.9 4.0 7.5 3.5 7.8 3.0 7.4 3.5 8.0 3.0 9.1 2.7

CHAPTER 5. TESTING OF REPD CHARGES BY FEP AND LIE

109

5.4

Conclusion.

The free energies of hydration for REPD charges have been calculated using FEP. It was found that REPD/631+ G* charges gave the best reproduction of experiment, marginally better than commonly used EPD/631G* charges. The other charge sets fared signicantly worse. This validates the inclusion of REPD charges in the macrobicycle 12 system. The LIE interaction method was found to perform even better, particularly for the EPD/631G* charge set, whose predictions were superior to both FEP either when tted to experiment or to FEP results. The exact form of the LIE equation initially used was found to be overtting to the Uvdw and SASA, either because their contributions were obscured by noise or because they were not signicant. Based on this analysis a new LIE equation suitable for free energies of hydration was proposed depending only on the solute-solvent electrostatic energy and a constant.

Chapter 6 Methods to Improve Monte Carlo Sampling


For simulations to be useful, they must be able to explore all areas of congurational space accessible to the system. This is because the real experimental properties observed for a molecule are the average over all these possible congurations and so simulations must do likewise. Completely misleading results will almost certainly be obtained if only one area of congurational space is sampled. The three principal diculties in achieving good sampling in simulations are rstly, to know exactly what congurational space is available to the molecule, secondly, to make the system explore this space, and thirdly to explore this space quickly. The sampling in the macrobicycle 12 system in explicit chloroform using the host residue and regular solvent moves dened in Subsection 3.4.3 proved to be negligible. Therefore, as discussed in Chapter 3, new MC moves and a continuum solvent model were introduced. These improvements led to signicant improvements in sampling.

6.1
6.1.1

Identication of Sampling Problem.


Generation of Possible Host-Guest Structures.

When testing whether a simulation protocol is achieving good sampling of all low energy structures, it is generally not possible to do this by examining the dierent structures generated by the standard simulation protocol itself. This will only re-

110

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING


H H H H

111

C
O

C C C
H

C C

C C C C
H H H

C C C
C
O
H H

C C C
H H

C N
H

C C
H H

C
H

C
H

CH
H H

H H H H H

C
H

H H

C
H

C C
H H

C
H

N C
H H

N C

C
H

C
O

C
H

C
O

N
H

Figure 6.1: The exible dihedral angles in macrobicycle 12 (cross-section shown) and N-Ac-phenylalanine. veal what conformations the protocol can access, not the ones it cannot. Therefore, another method is required to generate all possible structures that the system can adopt. These can then be compared with the structures generated by the simulation protocol in order to make a valid sampling assessment. Most structural variation in large molecules is principally due to dihedral angles. Many of these degrees of freedom can vary over a large range with little change in energy, while bonds and angles are restrained close to their reference values. There is some potential for structural variation in macrobicycle 12. Although it is bicyclic and does possess some rigid amide, thiourea and aryl groups, it also has a number of reasonably exible dihedral angles. These are indicated in Figure 6.1. Guest molecules are fairly rigid as shown by the dihedral proles in Figure 3.7. Apart from the structurally insignicant methyl group rotations, only the phenylalanine derivative possesses dihedrals that can lead to dierent structures. These are also indicated in Figure 6.1. The method used to generate all possible structures is simulated annealing. Simulated annealing involves performing a MC simulation at an elevated temperature in the gas phase. The Boltzmann factor used in the MC acceptance test is now much larger and so moves have a much higher chance of being accepted. This allows the

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

112

molecule to sample much more quickly a wider range of congurations than at room temperature. Structures produced at this temperature are unlikely to show much resemblance to the structures of interest at room temperature. Therefore, the temperature is slowly decreased during the simulation not just to room temperature but all the way to absolute zero. The gradual decrease in temperature ensures that the molecule is gradually directed towards a nearby minimum energy structure that is close to if not at the global energy minimum. Too sudden a decrease in temperature can leave the structure trapped in a high energy minimum. The reason why absolute zero rather than room temperature is chosen as the nal temperature is that at room temperature, sampling can still produce many dierent structures, while at absolute zero, structures are fewer in number, more distinct and easier to classify. In this work, simulated annealing was carried out on the macrobicycle 12 system using MCPRO.32 Host-guest complexes were constructed by placing in the host cavity one of the three guests with a particular conformation and stereochemistry. The exact nature of the guests is described in full in Section 7.1 This gave 10 types of host-guest complex. Simulations of the host alone may lead to distorted structures that may be of no use when comparing with structures produced by the standard protocol in host-guest free energy calculations. The presence of dierent guests may also lead to dierent structures. High temperature simulations were performed at 2000 K for 0.5 M congurations in the gas phase. Regular solute moves made up 10 % of attempted congurations while host residue moves were the remainder. To ensure that the guest remained inside the host, a small restraint, eectively a weak bond, was placed on the guest to keep it inside the host cavity. The temperature was then lowered in 20 steps to 20 K, each consisting of 0.1 M MC moves. The restraint holding the guest was removed at the halfway stage of the temperature reduction. The structure was then minimised to 0 K using the Fletcher-Powell algorithm. The procedure was carried out 30 times for each of the 10 possible host-guest complexes, giving 309 structures in total, although more structures were generated for some guests than others.

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING


400 number of conformations 300 200 100 0
H H

113

g+ t g

Figure 6.2: The distribution of g +, t and g conformations of the hydrocarbon chain from all structures generated from annealing runs.

6.1.2

Analysis of Annealed Structures.

A wide range of quite dierent structures of varying energies were generated, although much duplication of structures did still occur. All structures had reasonably similar overall shapes, indicating that the host was relatively rigid. There were a number of small dierences observed in host structure. Aryl groups were seen to adopt slightly dierent orientations. Amide groups also moved to some degree but all structures with the exception of a few of the high energy ones had the polar hydrogens of all four amide groups pointing inwards. However, there was one quite signicant dierence between the structures produced. The conformation of the hydrocarbon chain containing the thiourea unit appeared to vary quite substantally between structures. To classify conformations in this chain, each dihedral angle was assigned to either gauche + (g +), trans (t) or gauche (g). Figure 6.2 shows the distribution of gauche +, trans and gauche conformations present in the hydrocarbon chain of the generated structures. The middle two dihedrals remain in the trans conformation since the thiourea unit is held rigid. The denition of a unique conformation is that all 14 of the dihedral angles

O
H

dihedral number
C C
H H H H H H H H H H H H H H H H H H

H H

H H

C C

C C

H H

C C

H H

N N

S S

C C

N N

C C

H H

H H

C C

C C

H H

H H

C C

O C N

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

114

Table 6.1: Population of the four most common hydrocarbon chain conformations in the annealed structures. 1 2 3 4 Population 28 18 18 12 1 2 3 g+ t t g + g t g + g t g+ t t 4 g t t g 5 g g g g 6 g g g g 7 t t t t 8 t t t t 9 10 11 12 13 14 g g g t t g+ g g t t g g + g g g t t g+ t g+ t g + g g +

in the hydrocarbon chain consist of a unique combination of these three speciers. Of the 309 structures generated, 156 had unique hydrocarbon chain conformations. Table 6.1 shows the four most common conformations observed. Only so much can be learned from these conformations. There are no clear trends about which guests preferred which conformations, if indeed there is any such preferance. About half of the conformations are within 10 kcal mol1 of the lowest structure for that particular guest, while others are much higher in energy and are likely to be unrepresentative. Furthermore, they are all gas phase minimum energy structures and the available conformations may well be quite dierent in chloroform at room temperature. Nevertheless, what is clear is that there are a lot of dierent structures and the simulation protocol used for free energy calculations must be able to sample between them.

6.1.3

Sampling From Simulations.

A gas phase simulation was performed using MCPRO.32 The starting structure was an arbitrarily chosen low energy host-guest structure produced by simulated annealing containing the cis-glycine derivative. Its hydrocarbon conformation was the same as conformation 2 in Table 6.1. The simulation was run for 5 M congurations at 293 K. Regular solute moves comprised 5 % of all attempted moves, the rest being host residue moves. An analysis of the resultant sampling showed that the limited motion of the aryl and amide units of the host was adequately reproduced by this level of sampling. However, there was a dierent story for the sampling of the hydrocarbon chain. Figure 6.3 illustrates the resultant sampling of the 12 exible dihedrals in the hydro-

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING


N

115

8 4 0 8 4 0 8 4 0 8 4 0

C C

H H

H H

C C

C C

H H

H H

H H

H H

dihedral distribution (x10 configurations)

8 4 0 8 4 0 0 90 180 270 360

H H

C C N N

H H

H H

S S

C C

N N

H H

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

H H

C C

H H

H H

C C

H H

C C

H H

H H H H

C C N
H

H H

90

dihedral / degrees

180

270

360

Figure 6.3: The dihedral distribution for host containing the cis-glycine derivative sampled only with regular solute and host residue MC moves. carbon chain. The dihedrals about the CN bonds in thiourea are not shown since they were not sampled. The sampling produced does not come close to producing even one conformation dierent to the starting conformation, let alone sample many of the conformations observed in the annealed structures, with the possible exception of the last dihedral. This poor sampling was identical for the host with other guests, for the host alone, and the host in explicit chloroform. Clearly, the sampling of the hydrocarbon chain using only regular solute and host residue moves is inadequate. These types of MC moves are incapable over a reasonable simulation period of going

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

116

from one conformation to another no matter how they are dened in the Z-matrix. To determine whether dierent host conformations aect free energies, some free energy calculations in chloroform were performed. The full details of these are given in Subsection 7.2.6. To summarise, the results of these calculations were found to depend signicantly on hydrocarbon chain conformation. Therefore, no one conformation could possibly be used for free energy simulations. Such a result made necessary the search for improved sampling schemes.

6.2
6.2.1

Approaches to Improve Sampling.


Methods to Improve MC Acceptance.

There are a number of techniques available to improve sampling by increasing the Boltzmann factor and hence increasing the acceptance. The rst of these is simply to run simulations at a higher temperature. However, the annealing runs suggested that very large temperatures of at least 1000 K would be necessary to achieve adequate sampling, leading to problems with solvent heating. The second method is to soften the potential by reducing the strength of various terms in the force eld such as dihedral, non-bonded parameters, and the functional form for non-bonded interactions.163167 The third method is 4-dimensional sampling.168 In this method, distances are dened not in three but in four dimensions. Thus atoms that would normally overlap in three dimensions may lie far apart in the fourth dimension. The benet of this is that the high energies produced by steric clash are reduced. However, the simulation must rstly be perturbed from three to four dimensions. While all these methods may improve sampling, they vastly increase the congurational space that must be sampled, leading to longer simulations. Furthermore, while the improved sampling would help in intermediate non-physical states in free energy calculations, each system must still be perturbed back in separate runs to the standard force eld at room temperature, a state at which the sampling would remain poor. The only way around this problem would be to do these end point mutations in

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

117

one step and in one direction using the good sampling system as the reference state. Yet such an approach is likely to suer from poor convergence due to the perturbation being very large. The main application of these methods is to improve sampling in free energy calculations by increasing window spacings and thus overall calculation speed rather than to improve sampling for real states.

6.2.2

Biased Sampling Methods.

There are techniques that seek to bias the acceptance of MC moves. Such methods must include a correction to the acceptance criterion to maintain reversibility. These techniques include smart169 and force-bias170 MC that preferentially make moves in the direction of the force. Such moves, though, would be inappropriate for crossing energy barriers. Umbrella sampling171 biases the sampling to a certain region of congurational space by adding a carefully chosen potential function to the energy. For this technique to be useful, the areas that are desired to be sampled should be known and small in number. An umbrella potential may be used to gradually force a system from one state to another along a reaction coordinate. At the end points, the potential is either slowly turned o or a systematic correction is made to account for the potential. Umbrella sampling may be used to calculate a potential of mean force172 by pushing the system slowly from one state to another. It is also possible to perform a free energy perturbation between two states by using the reaction coordinate as the coordinate. Another type of method is the congurational bias method that regrows parts of molecules so as to minimise clash with the rest of the system.22 However, this method requires the molecule to have a free end. Such a move may be used for protein side chains.173 The uctuating potential method of Liu et al.174 samples between the real potential and a softened potential in which the barriers of torsional proles are chopped o. There are the J-Walking175 and S-Walking176 methods that generate trial moves from congurations generated in another simulation run at higher temperature for which the sampling will be improved. Parallel tempering177 is a similar method that

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

118

attempts moves to simulations of the system run at a higher temperature. Again, however, the annealing runs suggested that very large temperatures are necessary to obtain improved sampling. There are two similar MC methods that sample congurations not according to the Boltzmann factor but to some other factor that allows sampling at higher energies. One method is multicanonical sampling 178 which samples according to (E)1 , where (E) is the energy (E) density. The other is entropy-sampling179 which samples according to exp(S(E)), where S(E) is entropy as a function of the energy. These methods, while promising, suer from large computational expense since (E) and S(E) must be calculated iteratively. Another technique is to sample congurational space according to the generalised statistical distribution of Tsallis.180 This approach gives better sampling of high energy regions.

6.2.3

More Sophisticated MC Moves.


There are ip

Finally, more sophisticated types of MC moves may be used.

moves that rotate a randomly chosen atom about the axis connecting its two neighbours.181183 This move is very localised and induces moderate conformational changes. Another possible MC move is the extended continuum congurational bias method. It acts on a small molecule segment and does not require molecule ends, regrowing atoms into low energy positions subject either to geometric rules that enforce closure,184 or probability functions.22 There is another MC move called the concerted rotation.106 This is a fairly complex move that causes a large variation in a number of consecutive dihedral angles while keeping the ends, bonds and angles xed. There are also extended concerted rotation moves which alter even more dihedrals.185 There are other move types such as reptation186 and end-bridging.187 However, these two were not applicable to the macrobicycle 12 system, being more appropriate for polymer systems which have chain ends. The jumping between wells (JBW) method188 rst locates a set of conformations that the molecule adopts. It then denes a mapping from each conformation to every other and these mappings comprise the MC moves.

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

119

However, this requires a knowledge of the potential energy surface and its minima and thus was not practical for the macrobicycle 12 system since there appeared to be far too many possible conformations, and it is dicult to decide which of these are important. A method that does examine many minima and assesses their contribution to the ensemble is the mining minima technique.189 It would be possible to apply this method to the macrobicycle 12 system yet it is rather complex and requires many additional simulations. There are also cluster moves that move atoms together.190 Another sophisticated move is hybrid Monte Carlo.191 Congurations are generated by randomly assigning velocities from a Maxwell distribution to each atom, the system is incremented a small unit of time, then the nal conguration is tested for acceptance. This was not considered as it would have required implementing molecular dynamics algorithms.

6.2.4

Adoption of Methods to Improve Sampling.

While dierent sampling schemes and biasing techniques may improve sampling, it was felt that the introduction of more sophisticated MC moves was essential regardless of what other techniques were used since the current moves did not appear physically capable of reproducing motion that resembled conformational change. Converting between dierent structures involves the alteration of many degrees of freedom in a cooperative fashion and a means to jump over possibly large energy barriers. A further advantage of only implementing additional MC moves would be that they would require no alterations to the simulation algorithm nor any additional simulations. Of the MC moves available, the concerted rotation was selected since it apparantly had all the desired attributes to address the sampling problem. Firstly, it is pseudodynamic and causes a large change in dihedral angles as commonly occurs in conformational change. Secondly, it is localised and leaves other atoms intact. Thirdly, it suers from no adverse changes in energy due to bonds and angles changing. Only dihedrals and non-bonded interactions contribute to the energy change. Fourthly, despite the numerical complexity it is still a relatively small and cheap move. Fifthly, it has

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

120

reasonable acceptance. And sixthly, the source code for the move was available. Its implementation is described in the next section. Three other MC moves were also implemented. The ip move produces moderately large localised changes in dihedrals and was also applied to the hydrocarbon chain of the host. The large dihedral move alters particular dihedrals over large ranges. This move was used to sample the dihedral in the phenylalanine derivative. Finally, the three part solute move was designed to ensure good sampling of the guest in the host cavity. However, despite the inclusion of all these moves, it was found in later free energy calculations (see Subsection 7.2.6) that the explicit representation of the solvent made it impossible for any of these large moves, particularly the conrot move, to be eective in causing conformational change. The sampling observed was as bad as that in Figure 6.3 when only host residue moves were used. Solvent molecules were found to crowd the hydrocarbon chain to such an extent that only small conrot moves were ever accepted, negating the very advantage of the conrot move. Small moves are ineective because they explore conformational space far too slowly and in particular have problems crossing energy barriers. This is a common problem for large MC moves.188 Larger conrot moves such as the extended conrot move185 would suer even worse problems. Ideally what is needed is a type of MC move that involves the solvent moving at the same time, yet a search of the literature revealed no such method. All the previously discussed methods may well improve sampling to some degree. However, none would be expected to overcome the dilemma of performing large moves without producing signicant overlap with solvent. Possible exceptions to this might be 4-dimensional sampling168 or potential softening using the soft-core potential.167 However, sampling of the real end points remains a problem for these methods. The cluster move of Wu et al.190 may provide a possible alternative but this move has only been developed for application to spheres on a lattice. Searches were made for possible reaction coordinates that mapped from one structure preferred by one guest

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

121

to another preferred by another guest. Restraints could be used to drive the structure along this reaction coordinate to the other structure, enabling the calculation of a potential of mean force. Not only could no reaction coordinate be found, but nor could any host structures characteristic of each guest. In the end the simplest solution was to replace the explicit solvent with a continuum representation. With this solvent model, solvation energetics would be obtained together with sampling as good as that in the gas phase.

6.3
6.3.1

Additional MC Moves.
The Conrot Move.

A concerted rotation, or conrot, move106 was designed to provide a way of inducing a signicant, coordinated movement of atoms localised in a small section of the molecule. Its primary application is in long chain condensed phase polymer systems and is designed to replicate real polymer motions. The actual move involves the alteration of up to seven adjacent dihedral angles in a molecule. It is constructed such that no bonds or angles are changed and that the rest of the molecule remains unaected. The number of dihedrals actually altered is less if the move is made closer to a chain end. Since the move in this work is to be implemented in a cyclic system in which molecule ends are absent, the full move about seven bonds is always used. The conrot move is illustrated in Figure 6.4. The reason why seven dihedrals are adjusted is as follows. Assuming that the bonds and angles are constant, the only degrees of freedom are dihedral angles. The question is, what is the minimum of dihedrals that can be altered such that the following three atoms remain xed in space relative to the xed atom in front of the moving atoms? If these three atoms are xed, then all subsequent atoms will also be so, satisfying the requirement that the move is localised. This restriction is equivalent to six constraints since the three atoms possess nine degrees of freedom minus two bonds and one angle which are already constrained. If one dihedral, termed the

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

122

Figure 6.4: The seven dihedrals of n-decane that change in the conrot move. driver dihedral, is arbitrarily changed, then so must six other dihedrals be adjusted to satisfy the six constraints. This gives seven dihedrals in total. Thus four atoms actually move in space while all other atoms remain xed. An important point that must be made is that the section of macrobicycle 12 in which the conrot move is to be implemented consists of fourteen dihedrals, although two of these are actually xed. Hence there is indeed space to incorporate such a move. The conrot move works as follows. A driver dihedral and a direction down the chain are randomly selected, and the driver is then randomly altered. The essence of the problem is, for a given random displacement of the driver dihedral, what are the values for the other six dihedrals? Furthermore, if there are chain ends nearby, what modications are necessary in the algorithm? The solution to this problem involves the solving of a complex non-linear function. The details of how this problem is solved are discussed elsewhere.106 For a given new driver value, there may be multiple solutions or there may be no solutions. A typical number of solutions is 412 and is always an even number. An example of there being no solutions is the case of a molecule whose dihedrals are all in the trans conformation. If one dihedral is altered slightly, it is impossible for the other dihedrals to adjust and still preserve constant bonds and angles. Recently, an analytical solution has been applied to solve for the other dihedrals, greatly simplifying the problem.13 The conrot move is then formulated such that it samples conguration space

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

123

reversibly as a proper Metropolis MC move. Let m and n denote the original and reference states, respectively. A driver dihedral and direction are randomly selected and the driver is changed by a random amount in the range [max , max ]. All Nn nal solutions for the remaining 1 6 are then found, if any. One of these solutions is chosen randomly. The reverse problem is then solved. That is, the driver dihedral for the chosen destination solution is moved back to its original value and all possible Nm values for the other 1 6 are again found. One of these must be the original solution, m. A Jacobian determinant is then calculated for both initial and destination states, given by J(m) and J(n), respectively. This is necessary because the solution space for 1 6 is not spanned uniformly due to the constraint that the chain end must remain xed.106 Finally, the energy change, Vmn must be calculated. The move is then accepted with probability J(n)/Nn Vmn exp J(m)/Nm kB T

P(m n) = min 1,

(6.1)

6.3.2

Implementation and Testing of the Conrot Move.

The conrot code was generously supplied by Prof. D. N. Theodorou. However, a large amount of coding was necessary to tailor the code to MCPRO and to the macrobicycle 12 system. It principally involved transforming between dierent coordinate systems in the two codes, including the Jacobian in the acceptance test, generalising the code to allow dierent bonds and angles, dening how and where the moves are made in the host, assigning all the residues that move in the move to the greater residue (see Section 3.5), and allowing the thiourea unit to remain rigid in the hydrocarbon chain. A number of these points are elaborated on later. Once the conrot move was implemented, it was necessary to test that it was able to produce the uniform distribution of dihedral angles as is required for a MC move so that microscopic reversibility is satised (see Subsection 2.1.6). The smallest molecule on which to test the full seven dihedral move is united-atom n-decane. Since molecule ends were present, all the other types of conrot move that involve changing

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING


1.3 1.2 normalised distribution 1.1 1 0.9 0.8 0.7 With Jacobian Without Jacobian

124

90

180 dihedral / degrees

270

360

Figure 6.5: The dihedral angle distribution for ndecane averaged over all dihedrals and 1 M congurations with and without the Jacobian acceptance correction. six or fewer dihedral angles were also attempted. The actual number varied depends on the location of the driver angle. It is a purely geometric test and so all bonds and angles were made rigid and no dihedral or non-bonded energetics were included. The simulation was run for 1 M congurations. max was set to 180 . The results are presented in Figure 6.5. Uniform sampling is indeed obtained as desired. To make clear the need for the correction to the acceptance test using the Jacobian transformation, the distribution obtained without the correction is also shown in Figure 6.5. This time, non-uniform sampling is obtained. This demonstrates that the correction is essential to obtain the correct dihedral angle distribution. Incidentally, including the correction generally has the eect of rejecting more congurations. The acceptance rate for n-decane was 55 % including the Jacobian and 64 % without. The conrot move is designed to produce large scale dihedral sampling. To give an indication of the possible changes in the dihedral angles in a conrot move, Table 6.2 shows the dihedral angles in united atom n-decane before and after the more. Figure 6.4 shows the n-decane chain before and after the move. There is clearly a massive change in most dihedral angles. However, most conrot moves are actually a lot smaller in real systems.

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

125

Table 6.2: The Typical Change (degrees) in Dihedrals for a Conrot Move for n-Decane (See Figure 6.4 for Illustration). Dihedral Before 1 61 2 270 3 292 4 60 5 158 6 60 7 51 After 245 68 137 234 68 90 297 Change 176 158 155 174 90 30 114

6.3.3

Application of the Conrot Move to Macrobicycle 12.

An important consideration in applying the conrot move to macrobicycle 12 was where in the molecule the move is made. The hydrocarbon chain consists of twelve bonds from one junction carbon to the other. This suggests that the twelve dihedrals about these bonds are the candidates for the conrot move. However, two modications to this were made. Since the end dihedrals are typically sampled less well by conrot moves, an additional dihedral at each end about a bond in the amide/aryl ring was included, giving two more in total. This is illustrated in Figure 6.6. These additional dihedrals would not be expected to be very exible. Thus in this way the poor sampling for the end dihedrals complements well with dihedrals that do not require good sampling. To further improve the sampling at the ends, half the conrot moves attempted were made over the end seven dihedrals at each end. The second modication involves the thiourea unit. The two dihedrals about the CN bonds are rather rigid (see dihedrals D1D4 in Figure 3.7). For a given driver
included
H H
H

removed S removed S
H H
H H

included
H H H H

H H
H H

H H

N C

C C C
H H H H

C C C C
H H H H

C C N N

C C N N
Du Du H H H H

C C

C C C C
H H H H

C C C
H H H H

N C

Figure 6.6: The twelve dihedrals sampled in macrobicycle 12. Note that the two rigid dihedrals are ignored, while two outside the hydrocarbon chain are included.

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

126

dihedral, there is little control over the values the other six dihedrals will take. Thus there is a high chance these dihedrals will be altered signicantly, raising the energy of the new conguration and leading to its rejection. Therefore, these dihedrals were chosen not to be sampled. Having a gap in the chain like this increases the complexity of the conrot move. An implementation for dealing with these rigid units has been developed by Deem and Bader173 who have applied conrot moves to protein systems. Rather than adopting this approach, a much more simple alternative was found. By making the thiourea unit rigid and symmetric, the two CN bonds would intersect at a single point as shown in Figure 6.6. A dummy atom placed at this point would replace the whole thiourea unit in the hydrocarbon chain for the conrot move. The dihedrals sampled are still exactly those that require sampling in the real system. Thiourea is already a fairly rigid unit so this approximation is reasonable. So in total, twelve dihedrals in the hydrocarbon chain are eligible for the conrot move.

6.3.4

Acceptance Probability of the Conrot Move.

For any MC move, a balance must be struck between the size of the move and its acceptance probability. The large move size and high acceptance observed for the ndecane molecule was a consequence of the exibility and omission of energetics. For larger molecules with the energetics turned on, max has to be considerably reduced in order to produce reasonable acceptance probability. For example, for the n-decane system discussed earlier, max was set to 180 . This led to 36 % of the moves being rejected because there were no solutions to the conrot move. Such a rejection rate is aordable for idealised n-decane systems, but for macrobicycle 12 such large moves had negligible acceptance rates. Thus setting max lower produced a better acceptance probability by making more moves eligible for the acceptance test. The price paid was that conrot moves are now generally smaller. However, even if the driver dihedral angle is small, the other dihedrals may still move signicantly. In macrobicycle 12, setting max to only 5 gave the best compromise between acceptance and large move sizes. This typically gave around 10 % acceptance for the

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

127

host-guest complex. The problem with this is that the rst dihedral of a conrot move tends not to be well sampled, leading to poor sampling for dihedrals at the end of the sampled chain. However, the conrot move on the same seven dihedrals starting at the other end does not suer this restriction, so the reduction in sampling is only partial. Such a small value for max has to be used because the host-guest system has a number of features that severely reduce both the acceptance probability and the eectiveness of the conrot move. The rst feature is that the hydrocarbon chain can clash with other parts of the host, particularly around the tertiary carbons at the junctions. This problem is exacerbated by the use of an all-atom force eld because the large conrot moves lead to even greater displacements in the hydrogen atoms. This was a possible argument for resorting to the simpler united-atom force eld. However, modelling the host as realistically as possible remained a priority not to be compromised unless absolutely necessary. The second feature that aects the conrot move is the presence of the guest, particularly since it lies close to the thiourea unit. A rather interesting occurrence arose here that demonstrates the point that a good acceptance probability does not necessarily imply good sampling. The host structure was found to be quite dierent depending on whether a guest was inside (see Section 6.5 further on). The conformational sampling of the host alone was found to be superior to the sampling of the host when containing the guest. However, the acceptance of the conrot move was still only around 10 %, the same as the with the guest. The feature that reduced acceptance probability and conrot eectiveness dramatically was the presence of explicit solvent. Explicit solvent reduced the acceptance probability to around 7 % but as discussed earlier in Subsection 6.2.4, the eectiveness of the conrot move was eliminated, with no large scale conformational changes occurring. Nevertheless, in continuum solvent, the conrot move was found to adequately sample the conformation of macrobicycle 12, as demonstrated later in Section 6.5.

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

128

6.3.5

Variations of the Conrot Move.

Two variations of the conrot move were tested to try and improve its eectiveness in explicit solvent. The rst method was to allow bonds and angles to randomly vary in the conrot move. This increased exibility may make it easier for the conrot atoms to arrange themselves in a dierent conformation. However, the results in explicit solvent were unchanged. The second method was to use conguration bias (CB).106 The conrot algorithm generates a number of possible nal solutions and from these one is randomly chosen, regardless of the energy of this state. CB, instead of choosing a solution randomly, favours solutions of low energies by biasing the choice according to the energies of each solution. On testing this method, on the one hand, it proved to be much slower with all the extra energy calculations for every structure. On the other hand it gave about a three times higher acceptance probability. However, the overall sampling obtained was very similar. That this was the result was not so surprising. To achieve good sampling, getting a higher acceptance is not necessarily the key, since often it is the higher energy moves that involve large conformational change. These are the very moves that CB throws out. CB conrot does not introduce conrot moves that are any smarter. It only eliminates conrot moves that are less likely to be accepted. Such a feature is not going to improve sampling with explicit solvent. A biasing technique in favour of large conformational change rather than energy may prove more useful. In summary, neither of these methods were able to achieve improved sampling in explicit solvent.

6.3.6

The Flip Move.

The ip move181183 is a simple move that alters the dihedral angles and angles around a particular atom. It works by randomly choosing an atom and rotating by a random amount about the dihedral dened by the axis connecting its two neighbours, as shown in Figure 6.7. This leads to a change in two angles, four dihedrals and no bond lengths. Quite large maximum displacements are possible. 50 gives 30 % acceptance for the host. If the moves are made too large, the angle distortions start to become

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

129

Figure 6.7: The mechanics of the ip move. signicant. In applying this move to the host, it was applied only to the hydrocarbon chain, as for conrot, since this was the part requiring an improvement in dihedral sampling. However, it was not applied to the dummy atom nor to the two carbons adjacent to it in the thiourea unit since this would have changed the angles in the thiourea unit which are supposed to remain rigid. The maximum displacement used for macrobicycle 12 was set at 50 . By itself, this move was not able to produce conformational change in the gas phase. However, it does produce moderate dihedral sampling intermediate to the conrot and host residue moves.

6.3.7

The Large Dihedral Move.

The other special MC move was the so called large dihedral move. This is simply a Z-matrix coordinate dihedral move with a large maximum displacement of 180 . Its motivation was in achieving good sampling for the phenylalanine derivative. The preferred conformation for the CCCN aryl swing dihedral of this molecule was found to depend on whether the amide bond was in the cis or trans conformation (see Subsection 8.6.4 for more details). However, a large energy barrier of the order 4 kcal mol1 separated these two conformations. Normal Z-matrix coordinate moves were unable to cross this barrier. Thus for a cistrans mutation, a MC move was required that could. The large dihedral move is such a move and is shown in Figure 6.8 for the swing dihedral of NAcphenylalanine. Its acceptance probability is smaller than usual moves, ranging from 10 % in the gas phase down to around 3 % in explicit chloroform. In the gas phase, all possible conformations were successfully sampled. However, in explicit solvent the sampling problems were similar to those for the conrot move. Most of the time, the large move

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

130

C
H

C C
H

C
H

C C
O
H

C N
O
H

C
H

C
O

Figure 6.8: The large dihedral move acting on the swing dihedral of NAc phenylalanine. simply led to the aryl ring crashing into solvent molecules. This was a further reason to use continuum solvent.

6.3.8

Three Part Solute Move.

Since one of the main concerns of the whole project involves how the host and guest interact, it is important that the guest is able to move around signicantly within the host. Given that the guest is relatively rigid, a greater emphasis was placed on the ability of the guest to move around within the host cavity rather than the guests ability to change geometry internally. Therefore a more sophisticated move was designed for the guest. It is called the three part solute move. It consists of three types of move, the type chosen randomly. The rst move is predominantly a translation, with a small rotation. The second move is predominantly a rotation with a small translation. The third is a regular solute move which has only small changes in translation and rotation. The maximum amplitudes are 0.4 for the translation A and 30 for the rotation. The acceptance for the rst two types typically lies around only 10 %, while the third has better acceptance of 40 %.

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

131

chloroform continuum
= 4.81

=1
O

C N
H H

C
H

Figure 6.9: Acetamide in a cavity ( = 1) embedded in a dielectric continuum ( = 4.81) representing the solvent chloroform..

6.4

Parameterisation and Implementation of the GB/SA Continuum model.

6.4.1

The GB/SA Continuum Model.

The generalised Born/surface area (GB/SA) continuum solvent model33 provides an approximate means of studying the behaviour of a solute in solvent without explicitly modelling the solvent. Explicit solvent molecules are replaced by a polarisable dielectric continuum. Figure 6.9 shows acetamide in a cavity of dielectric in chloroform continuum with = 1 embedded

= 4.81. The omission of explicit solvent oers two

main advantages. Firstly, it leads to a solute-solvent calculation that is usually much quicker since there are no solute-solvent energies to compute. Secondly, it can lead to increased solute sampling because the solute no longer has the problem of steric clash with solvent molecules. It is this benet principally that necessitated the use of continuum solvent for the macrobicycle 12 system. The use of continuum solvent eectively increases the speed of sampling of congurational space for the host-guest system because the continuum solvent is always in equilibrium around the solute. Other continuum solvent methods are in common use. These include the polarisable continuum method (PCM),192 and the Poisson-Boltzmann (PB) method.34 However,

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

132

GB/SA, being one of the fastest and most widely tested,193 was adopted in this work. The solvation free energy, Gsol in the GB/SA model is broken up into three terms, Gsol = Gcav + GvdW + Gpol (6.2)

where Gcav is primarily due to the entropy cost in forming a cavity for the solute in the solvent, GvdW is the solute-solvent van der Waals term, and Gpol the solute-solvent polarisation term. The rst two terms are taken to be proportional to the solvent-accessible surface area (SASA) weighted according to atom type by the formula194 GSA Gcav + GvdW = i (SASA)i
i

(6.3)

where (SASA)i and i are the total area and atomic solvation parameters for atom type i, respectively. (SASA)i is the area of a particular surface around an atom i that denes the closest distance that a solvent molecule may approach. It is formed by augmenting the van der Waals radius of the atom by the solvent probe radius and removing any of this area inside the volume of other solute atoms. The chloroform probe radius was taken to be 2.5 .195 The (SASA)i terms are calculated analytically A using the method of Richmond for multiple overlapping spheres.196 While exact, this calculation is computationally very expensive. One important note regarding the use of GSA as an entropy term is that entropy contributions to free energies are always temperature-dependent. Therefore, the i parameters are also temperature dependent and should not be used at dierent temperatures to that used in the parameterisation, which eectively amounts to the temperature at which the experimental quantities were obtained. The polarisation energy, Gpol , is the energy required to form a cavity of dielectric 1 for a solute of n atoms with charge, qi , each with radius, i , in a continuum of dielectric constant, , where the dielectric boundary is taken as the van der Waals

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

133

surface of the molecule. Gpol is equal to the dierence between the total electrostatic energy in solution and in vacuum and is given by the generalised Born equation33 1
n n

GGB pol

= 166 1

qi qj
2 2 rij + ij eDij

(6.4)

i=1 j=1

where rij is the distance between atom i and j, ij =

2 (i j ) and Dij = rij /(2ij )2 .

The equation is called generalised because it is derived from the simpler Born equation which gives the energy for charging a single ion in a spherical cavity surrounded by a uniform dielectric. The radius, i , commonly termed the eective Born radius, corresponds to the radius of that spherical cavity whose electrostatic energy, calculated using the Born equation, is the same as that for the entire molecule with all other atoms in the molecule still displacing solvent and their charges set to zero. It is possible to calculate it exactly by numerically integrating the ratio of the exposed area to the full area of a sphere over all radii centred at the atom i,33 as given in the formula
1 i

=
i

dr Ai (r) r 2 4r 2

(6.5)

where the limit, i = 0.5(i 21/6 ), is the atomic van der Waals radius, and Ai (r) is the exposed surface area of atom i with radius r. This is a very demanding calculation numerically since Ai (r) must be evaluated at every radius increment. However, it may be calculated in a very quick manner using the approximation of Hawkins et al.197 called the pairwise descreening approximation. This simply assumes that all other atoms overlapping with atom i do not overlap with each other. Thus the total eclipsed surface area for atom i can be decomposed into pairwise terms solely with one other atom, for which there is a simple analytical formula. Of course, this assumption is not generally true and must be accounted for. The eect of this approximation is to overestimate the eclipsed area. Therefore, a compensating scaling factor, S, is introduced to reduce the van der Waals radii of neighbouring atoms. It simplies the

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING Born radii calculation by replacing the integral with the expression 1 2 1 1 rij + Lij Uij 4 1 1 2 2 Lij Uij 1 1 2 2 Uij Lij

134

1 i = 1 i

j=i

2 1 Lij j ln + 2rij Uij 4rij

(6.6)

where : rij + j i 1 i : rij j i < rij + j Lij = rij j : i rij j Uij = 1 : rij + j i rij j : i < rij j

except this time, for neighbouring atom, j, j = S 0.5(i 21/6 ) is the atomic van der Waals radius scaled by a screening factor, S. The scaling factor technique is also able to account in the reverse manner for any exposed area that is actually not able to contribute to solvation, such as would be expected for area in narrow gaps between atoms.

6.4.2

Requirements for GB/SA.

To apply the GB/SA model, rstly, it had to be coded into MCPRO. This was done by modifying code taken from an earlier implementation in MCPRO coded by Richard Taylor. In that implementation, the required GB/SA code had in turn been taken and modied from the Tinker software package.198 The surface area and Born calculations were adjusted to be performed for the whole molecule, rather than just on a residue basis. This was done because after a host residue move, there was little benet gained in updating SASA around only that residue because more than half of SASA for the whole molecule usually changed. It was a similar story for calculating the Born radii. A particular bug was discovered in the SASA module for an atom

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

135

that had two neighbours at exactly the same distance away from it, such as the two hydrogens on a methylene carbon. Since the bug was due to random numerical rounding, the easier option was taken to eliminate it which involved removing the degeneracy by altering one distance in the sixth decimal place. Such a small change would have no eect on the geometry. Secondly, a number of parameters were required. These are the atomic solvation parameters, i , and the screening parameter, S. Such parameters are likely to be force eld dependent. In the literature, no parameters could be found for OPLS-AA in chloroform. All that could be found in chloroform was parameters using OPLSUA charges154 or SM5 charges.199 In any case, the implementation of the latter was dubious due to the greater complexity in the Gcav term and the excessive use of parameters, numbering at least twenty. Therefore, the parameters had to be derived. The standard way to derive parameters is to nd the values that reproduce a particular property for a varied range of small molecules. The conventional properties used for GB/SA are experimental free energies of solvation to parameterise i and electrostatic free energies obtained from accurate computational calculations such as Poisson-Boltzman to parameterise S.

6.4.3

Parameterisation to Poisson-Boltzmann Free Energies.

The method used to calculate the Gpol energies was nite dierence Poisson-Boltzmann (PB).34 For this work, only the Poisson equation needs to be solved since the ionic strength is zero. The Poisson equation is given by (r) (r) = 4(r) (6.7)

and it gives the electrostatic potential, (r), over all space, r. Here, (r) is the dielectric constant and (r) is the charge density. This equation must be solved by numerical methods, such as nite dierence, for molecules of complex shape. By performing this calculation once in the continuum solvent to give sol (r) and once in

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING a dielectric of one to give


=1

136

(r), GPB is given by pol 1 2 qi (sol i =1 ) i (6.8)

GPB = pol

where the sum is over all grid points. PB calculations were performed with a modied version of UHBD.200 A number of modications to the original version had been coded into UHBD by Christopher Woods. These were the inclusion of the techniques of charge-antialiasing,201 15 point harmonic dielectric smoothing,201 and solute random rotational averaging. The protocol for the PB calculations was as follows. The dielectric constant was taken as 4.81 for the continuum and 1 for the dielectric cavity. For the PB calculations, the boundary between these two regions of dierent dielectric was taken as the solutes solvent accessible boundary using a chloroform probe radius of 2.5 . The H for A polar hydrogens was set at 1.2 since the OPLS-AA force eld assigns such atoms A a zero van der Waals radius. Charged atoms with radii of zero lead to severe numerical problems in PB202 and innite energies for GB (see Eq. 6.4). One important dierence with GB was that the GB calculations were done using the van der Waals surface as the dielectric boundary. This dierence in treatments of the dielectric is largely historical, practical and cannot be attributed any physical signicance. Each method is approximate and has been found to work best with such assumptions. A 656565 grid of spacing 0.3 was used, giving a box size of 19.5 . The potential A A at the box boundary was taken to be the potential calculated as if each of the solute atoms were independent Debye Huckel spheres. GPB was calculated and averaged pol for ten random orientations of each molecule to remove rotational dependence due to the cubic grid. 20 molecules were used to nd the S parameter that made the GGB pol give the best reproduction of GPB . The value obtained for S in this way was 0.56. pol As expected, it is less than unity and gives an indication of the degree of overlap between atoms. A number of workers have used multiple S parameters, one for each atom type197, 203

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

137

Table 6.3: PB, GB, SA, GB/SA and Experimental148, 199, 204 Free Energies for 20 Small Molecules (kcal mol1 ) Molecule GP B GGB SASA/2 GSA Gsol Gexpt A pol pol water -6.3 -5.8 228 3.7 -2.1 -2.1 methanol -4.7 -4.3 297 0.0 -4.3 -3.3 ethanol -4.5 -4.2 344 -0.8 -5.1 -3.9 acetamide -7.4 -7.7 352 1.1 -6.6 -7.0 acetone -3.9 -3.8 372 -1.6 -5.4 -5.0 butanone -3.7 -3.4 403 -2.1 -5.6 -5.4 acetic acid -5.1 -5.5 345 0.9 -4.6 -4.6 propanoic acid -4.9 -5.4 388 0.1 -5.3 -5.4 methyl acetate -2.9 -2.8 399 -1.8 -4.6 -4.9 acetaldehyde -3.8 -3.6 332 -1.1 -4.6 -3.7 . methylamine -3.5 -2.8 303 0.3 -2.5 -3.4 dimethylamine -3.1 -3.2 352 -1.9 -5.1 -3.7 diethyl sulde -3.2 -3.1 462 -4.2 -7.3 -6.4 benzene -1.8 -1.8 409 -3.7 -5.5 -4.6 toluene -2.1 -2.1 449 -4.0 -6.2 -5.5 chlorobenzene -1.7 -1.5 439 -4.0 -5.5 -5.8 aniline -3.5 -2.4 427 -1.4 -3.8 -6.7 phenol -5.0 -4.8 423 -1.7 -6.5 -7.1 nitrobenzene -3.5 -3.9 457 -3.9 -7.8 -7.8 pyradine -3.5 -3.5 398 -2.5 -6.0 -6.5 Average error 0.3 0.7

on the grounds that dierent atoms overlap to dierent extents. However, not only could more parameters lead to overtting, but one parameter was found to be quite sucient. Some workers prefer to include another parameter called the dielectric oset which moves the position of this boundary to improve the electrostatic term.33, 154 In chloroform, at least, the dielectric oset was found to be unnecessary. The resulting energies are given in Table 6.3. A close t between GGB and GPB pol pol was obtained with an average error of only 0.3 kcal mol1 . Only aniline stands out as signicantly dierent.

6.4.4

Parameterisation to Experimental Free Energies.

Using the values of GGB , atomic solvation parameters, i , were then obtained by pol tting Gsol = GGB + GSA to experimental free energies.148, 199, 204 Since dierent pol

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

138

Table 6.4: List of All Parameters Used in GB/SA Calculations. Probe Radius/ A 2.5 H / A 1.2 N , O / cal 2 A 16.0 rest / cal 2 A 9.0 S 0.56

atoms are expected to solvate to dierent extents, it is customary to assign dierent parameters to dierent atom types, i . Dierent workers have a used a number of such parameters ranging from one33 up to twenty, admittedly using of the order of 100 molecules,199 but this is still too many parameters for the size of data set. An inspection of Table 6.3 reveals that at least two i parameters are required because some GGB are greater than experiment and some are less. Using the rule that the pol number of parameters advisable is log3 N , where N is the number of molecules, this suggests 23 as the number of possible parameters. Ideally, more molecules would be used, but experimental free energy data could only be found for 20 varied molecules. Erring on the side of caution, the number of parameters chosen was 2. The best results were obtained when these were assigned according to the ability of the atom to hydrogen bond. Nitrogen and oxygen atoms were assigned one parameter, while the remaining atoms were assigned the other. Such an assignment has been used previously in a free energy model by Fraternali and van Gunsteren.205 The exception to the rule was nitrobenzene, for which all atoms were included in the other category on the grounds that the hydrogen bonding ability was being over-represented. The parameters obtained were 16 and 9 cal 2 for N /O and rest , respectively. A

6.4.5

Performance of the Derived Parameters.

A complete list of all parameters used in GB/SA is given in Table 6.4. The free energy of solvation results with this parameterisation are presented in Table 6.3. An average error with experiment of 0.7 kcal mol1 was obtained. Gsol is well reproduced for most molecules, with the exception of aniline which is too positive and ethanol and dimethylamine which are too negative. This is despite the fact that

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

139

the most recent OPLS parameters are used for dimethylamine.206 No adjustment of the GB/SA parameters was able to bring these molecules into line. It is interesting to note the sign of the i parameters. Chloroform is a hydrophobic solvent, so polar groups solvate less well in it than non-polar groups, hence the positive N and O , and the negative rest . Going to a three parameter t with a separate for oxygen and nitrogen led to a marginal improvement with an average now of 0.6 kcal mol1 . The new nitrogen parameter was 7 cal 2 , the rest being the same as before. However, A the improvement was not enough to justify the inclusion of the extra parameter. Since the electrostatic energy of solvation in chloroform is expected to be small compared to a solvent like water, one other parameterisation that was considered was to ignore the GPB term altogether. The rough correlation between SASA and Gsol pol evident in Table 6.3 validates this assumption. Gsol was then parameterised with two variables according to the equation Gsol =
i

(SASA)i + b.

(6.9)

The parameters obtained were = 0.021 and b = 2.7. The average error was 0.7 kcal mol1 , the same as the previous t which also had 2 parameters. However, one critical outlier by 2.3 kcal mol1 was acetamide whose Born term is quite signicant. While neglecting the Born term leaves an inherently simpler model, since the functionality of acetamide is critical to modelling macrobicycle 12, it was decided that this model was not appropriate. The application of the model to the macrobicycle 12 system is straightforward, except for dummy atoms. While dummy atoms at one end of the perturbation have zero charge, if their Born radii, i are also zero, then very near to the ends of the perturbation, dummies have a small but still substantial charge in a very small dielectric cavity. Such a charge has a huge electrostatic energy and the presence of this would lead to energy instabilities. Therefore, the radius of a dummy atom was set to 1.0 . Such an atom is safely buried within the real atom it is bonded to and so will A

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING


N

140

dihedral distribution (x10 configurations)

S S

C C

Du Du

H H

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

C C C

H H

H H

C C

C C

H H

H H

H H

C C

N N

H H

H H

H H

90

180

270

360

N N

H H

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

H H

C C

H H

H H

C C

H H

C C

H H

H H H H H H

C C

90

dihedral / degrees

180

270

360

Figure 6.10: The dihedral distribution of the hydrocarbon chain for the free host run over 30 M congurations. not contribute to GGB at the mutation end points. pol

6.5

Sampling of Macrobicycle 12 in Continuum Chloroform.

Before presenting the free energy results, it is necessary to demonstrate that the hydrocarbon chain of the host is indeed sampling adequately. Figure 6.10 illustrates the sampling in each dihedral. This was taken from a long run of 30 M congurations using the same protocol as in Subsection 7.3.1 but with no guest. The trajectory

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING


N

141

180
N
H

0 180
H H

0 180 0 180

C C

H H

H H

C C

H H

H H

H H

0 180 0

H H

H H

C C

C C

dihedral angle / degrees

180 0 0 10 20 30

N N

S S

C C

N N

Du Du H H

H H

180 0 180 0 180 0 180 0 180

H H

C C

H H

H H

C C

H H

C C

H H

H H H H H H

C C

0 180 0
O

10

configurations (x10 )

20

30

Figure 6.11: The trajectory for the dihedrals in the hydrocarbon chain for the host alone run over 30 M congurations. of the dihedrals is also shown in Figure 6.11. The sampling is now seen to be much improved on that obtained using explicit chloroform or with no extra MC moves. The middle six dihedrals are seen to move very frequently and produce the distribution expected for a hydrocarbon chain, with most in the trans conformation and the rest in the gauche. However, towards the ends of the hydrocarbon chain, the restraint of ring closure appears to limit the sampling. The end dihedrals barely change at all. Due to the C2 symmetry of the host, a symmetrical distribution would be expected for each chain. Sampling is indeed observed to be reasonably symmetrical but not

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

142

Table 6.5: Population of the four most common hydrocarbon chain conformations in the host. 1 2 3 4 Population 29 24 15 12 1 t t t t 2 g g g+ g 3 t t t t 4 t t g+ t 5 g g g g 6 t g g t 7 t t t t 8 t t t t 9 10 11 g g t t g t g+ t t t g+ t 12 t t g+ t 13 g g t t 14 t t t t

exactly so. Although improved, the sampling is not perfect. Symmetrical sampling is not helped by the fact that the host when alone adopts a very distorted, asymmetric structure. From the run of 30 M congurations, structures were saved every 0.1 M congurations, giving 300 in total. Of these, 97 had unique hydrocarbon chain conformations. Table 6.5 shows the four most common conformations. It can be seen that these are quite dierent to the most common conformations produced in the annealed structures. This is primarily due to the inuence of the guest in the annealed structures and reinforces the importance of having the guest in the host for the annealing structure generation. It is interesting to examine the structure of the host without the guest inside the cavity. A typical host structure is illustrated in Figure 6.12. The host structure with no guest is found to be quite dierent to the structure found when it is complexed with the guest. A simulation was carried out on a structure that had had the guest removed. As the simulation progressed, the thiourea inverted such that the sulfur pointed into the cavity and the polar hydrogens outwards. The cavity then collaped in on itself with the two aryl walls coming together. Some degree of internal hydrogen bonding occurred between adjacent amide units. The hydrocarbon chain hung o in a loop with the thiourea lying sideways to the chains. Clearly, the guest, when it binds inside the host, must organise the host to a large extent in order to t inside. This in turn would be expected to reduce the sampling of the host hydrocarbon chain to some extent. The dierent inuences of each guest on sampling of the hydrocarbon chain is discussed later in Chapter 8.

CHAPTER 6. METHODS TO IMPROVE MONTE CARLO SAMPLING

143

Figure 6.12: The macrobicycle 12 structure without the guest.

6.6

Conclusion.

Methods to improve sampling of congurational space for the macrobicycle 12 system have been described. MC moves introduced were the conrot, ip, large dihedral and three part solute moves. The GB/SA continuum solvent model for chloroform was implemented to replace the explicit solvent model. These modications were shown to lead to a vast improvement in sampling. The simulation protocol is now ready for the calculation of free energies in the macrobicycle 12 system.

Chapter 7 Free Energy Calculations for Macrobicycle 12


The aim of this work is to obtain relative free energies of binding of all combinations of enantiomers and conformations for the various amino acid derivatives. These will be compared with experiment6 and rationalised, ideally leading to predictions of better hosts for binding. The simulation protocol is now ready for testing on the macrobicycle 12 system. What follows is a full description of the experimental system and the data that the simulations are trying to replicate. The free energy protocol and results in explicit solvent are described, together with their resultant sampling problems. These problems led to the implementation of additional MC moves and the continuum solvent model as discussed and implemented in Chapter 6. This in turn allows a comparison of the performance of the two solvation models. The host-guest free energies of binding obtained using the more successful continuum solvent protocol are then described and compared to experimental observations and free energies of binding.

7.1
7.1.1

The Macrobicycle 12 System.


The Simulation System.

Kilburn et al. have been designing receptors for amino acids and peptides. Macrobicycle 12 was designed to bind the carboxylate forms of these molecules using a

144

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12145


H

C
H

C
H

C
H

C
H H

H H

CH C
H

C N
H H

C
O
H

O
H

C C N N
H

C
H

N C
O

C
O
H H

H H

C C
H

H H H

H H

N C C
H

C
O
H H

N C
H

C C

C
H

C
H H

N C
H

Figure 7.1: The general binding mode for N-Ac-l-phenylalanine to macrobicycle 12 (cross-section). The hydrogen bonds on the right between the guest carbonyl oxygen and a polar hydrogen of the host are suspected of stabilising the cis conformation.6 thiourea moiety. The binding is due to two strong hydrogen bonds between the CO 2 of the amino acid derivatives and the six polar hydrogens inside the cavity of the host. Figure 7.1 illustrates the general binding pattern in a cross-section of the host. The centre two thiourea hydrogens are particularly important in this interaction. The host was designed so that its amide groups, two of which contain a chiral centre, would lie adjacent to the guest. These would provide hydrogen bonds to enforce not just amino acid specicity and enantioselectivity, but also conformational specicity, as was subsequently discovered. Two biaryl methane units were included to link up the amide groups to rigidify the structure into a double ring. The guests themselves are actually tetrabutylammonium amino acid derivative salts. Their amide nitrogen is capped by an acetyl group in order to keep this end of the molecule neutral. The other end is a charged carboxylate group counterbalanced by a tetrabutylammonium (TBA) ion. The experiments were performed in CDCl3 so that 1 H NMR spectra could be recorded. In the computer simulations, one macrobicycle 12 molecule is modelled with one amino acid. The TBA is not included in the simulations for three reasons. It is not considered important to binding; it is not necessary that the system be neutral

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12146 in charge; and the inclusion of a second ion would result in considerable sampling and energetic problems. Seven dierent amino acids were studied experimentally. Of these, the glycine (Gly), alanine (Ala) and phenylalanine (Phe) derivatives were chosen as a representative sample for study (pictured in Figure 1.1). These abbreviations for the amino acid derivatives are used in the remainder of the text. This sample of three amino acids provides a reasonable range in size of side chains, ranging from a hydrogen for Gly to a methyl for Ala through to a benzyl for Phe. Furthermore, most of the experimental information was obtained for Ala and Phe. The other four amino acids available for testing were asparagine, glutamine, histidine and lysine. They were not considered because the NH2 groups they contain can cause problems with protonation state. The lysine group was also rather exible, although the conrot moves implemented later in the protocol would have been capable of adequately sampling it. In any case, the aim of the study was not so much to examine amino acid specicity but rather the preferred enantiomer and conformation of each amino acid. The solvent used was either explicit or continuum chloroform, not CDCl3 . This was because only chloroform force eld parameters were available, although the dierence is expected to be negligible. For example, the OPLS force eld does not even treat the hydrogen explicitly and uses a united atom representation for explicit chloroform with the hydrogen absorbed by the central carbon.

7.1.2

Experimental Data.

There were two main experimental data with which to compare the simulation. The rst of these is binding data. Binding constants were calculated by comparing the partitioning of guests between water and CDCl3 , rstly with no host in the CDCl3 , and secondly with the host present. The temperature of the system was 20 C. The experimental binding data is given in Table 7.1. Benzoic acid and hexanoic acid were also studied to gauge the signicance of the carboxylate-host interaction. Cbzglycylglycine (glycine with a protecting group) was available from the syntheses and so was also tested for completeness. The similar binding data obtained for all molecules

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12147

Table 7.1: Association Constants for Macrobicycle 12 with Various Tetrabutylammonium Carboxylates in CDCl3 .6 Amino acid N-Ac-glycine N-Ac-l-alanine N-Ac-d-alanine N-Ac-l-phenylalanine N-Ac-d-phenylalanine N-Ac-l-asparagine N-Ac-d-asparagine N-Ac-l-glutamine N-Ac-l-histidine N-Ac-l-lysine Cbz-glycylalanine benzoic acid hexanoic acid Ka /103 mol1 68.6 7.9 16.9 5.9 14.6 2.0 22.0 2.8 13.3 7.0 9.6 7.0 6.8 2.9 11.1 8.0 5.8 8.0 130.0 20.0 8.9 1.6 55.4 17.4 28.1 3.3 G/kcal mol1 6.47 0.07 5.66 0.22 5.57 0.07 5.81 0.07 5.52 0.10 5.33 0.05 5.14 0.05 5.43 0.12 5.04 0.07 6.86 0.07 5.28 0.12 6.36 0.22 5.95 0.07

suggested that almost all the binding was due to the carboxylate-host interaction and not to the other dierences between each amino acid hence the lack of specicity observed. It was the second set of experimental data, the NMR data, that revealed the interesting binding behaviour of the host. Firstly, 1 H spectra revealed the stereoselectivity of the host. While in each case the guest was observed to bind to the thiourea unit, d guests showed a preference to bind on the outside of the host, while l guests preferred to bind on the inside. Secondly, ROESY spectra indicated the presence of a cis amide bond for the l forms of the two amino acids tested, Ala and Phe, and trans for the d forms. 70 % of the l-Phe appeared to be in the cis conformation, suggesting that some was still present in the trans conformation, most likely bound on the outside of the host like the d form. In addition, some structural modelling work has been done on the system.6 A range of structures were generated using simulated annealing and molecular dynamics simulations with a united atom representation. An analysis of these structures revealed the hydrogen bonding pattern responsible for the binding and stabilisation of the cis amide bond. The carbonyl oxygen of the guest appeared to be hydrogen

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12148 bonding to an amide hydrogen in the main ring of the host as shown in Figure 7.1. Possible binding structures for the d enantiomers on the outside were also investigated. However, no energetic studies were undertaken nor rationalisations of why dierent amino acids bound dierently.

7.1.3

The Role of Computer Simulations.

It is primarily the results concerning the stabilisation of the cis amide bond that this work aims to rationalise. Computer simulations are ideal for this for a number of reasons. They provide another means of calculating binding free energies. They make possible the study of all binding complexes both strong and weak. The weak complexes are dicult to probe by experiment since they are rarely observed, if at all. Finally, they provide energetic and structural information that gives clues to how the binding occurs. The rest of this chapter concerns the free energy calculations themselves. The preference for one particular molecule to bind inside the cavity over another requires only relative free energies of binding. These quantities are calculated using the method described in Subsection 2.3.2. Free energy perturbations are performed once in the host and once in chloroform, and their dierence gives the relative binding free energy using Eq. 2.25. The actual free energy mutations performed are between all l and d, cis and trans forms of Gly, Ala and Phe to obtain relative binding free energies. These mutations are shown in Figure 7.2. Perturbations were constructed between molecules that were the most similar in shape so as to keep the mutations small and minimise the computational eort. Gly is mutated to Ala, Ala to Phe, and cis to trans. There are no l to d mutations since these require two large perturbations rather than one. Gly, which has no stereochemistry, serves as the connection between the l and d molecules. Experiment showed that the guests were able to bind either inside or outside the cavity. Since all the interesting binding behaviour appeared to be occuring for guests bound inside the cavity of the host, mutations were only performed for guests in this

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12149

l-cis-Phe
T

l-trans-Phe
T

l-cis-Ala
T

l-trans-Ala
T

cis-Gly
c

trans-Gly
c

d-cis-Ala
c

d-trans-Ala
c

d-cis-Phe

d-trans-Phe

Figure 7.2: The free energy perturbations performed to calculate relative binding free energies. position. Guests bound outside the cavity are more in a solvent-like environment and will be less inuenced by the host. Their relative free energies would be expected to more resemble those calculated in pure chloroform. To further understand if the solvent is able to play a role in stabilising any enantiomers or conformation, free energy perturbations were also performed in the gas phase so that free relative energies of solvation could be obtained using Eq. 2.27. Unlike the free energy of hydration calculations in Chapter 5, the molecules are now exible so free energies for the gas phase mutations have to be calculated.

7.2
7.2.1

Explicit Solvent Free Energy Calculations.


Gas Phase Simulation Protocol.

The gas phase free energy simulations were performed using MCPRO.32 Free energies were calculated using FEP. Congurations were generated at a temperature of 293 K using MC Metropolis sampling. Each window had 3 million (M) congurations of equilibration and 5 M of data collection. For gas phase calculations, this was found to be well in excess of that required to give converged results but their speed was so fast

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12150 that this was not an issue. In the regular solute moves, all Z-matrix variable angles and dihedrals were sampled while all bonds were kept xed. The only additional special MC move necessary for these calculations was the large dihedral move (see Subsection 6.3.7) to sample the swing dihedral of Phe. This large dihedral move comprised 5% of all attempted congurations. It is important to note that the amide variable dihedral angle was sampled to a small extent but not so much that it could interconvert between cis and trans during a single simulation. This was achieved by setting its maximum amplitude to 27 which is far too small to allow it to climb over the 14 kcal mol1 barrier separating the two conformations. Free energies were calculated using the same method as described in Subsection 5.1.3. The error for each window was taken as the dierences between forward and reverse free energies. Combining these errors for all windows gave a total error for the perturbation. With no solvent to crash into, molecules in most cases could perturb quite quickly without introducing hysteresis. Therefore the window spacing could be set at a fairly large spacing, ranging from 2 for GlyAla mutations up to 10 for Phe cistrans mutations. Phe required more windows because it is a lot more exible than Gly. Identical starting geometries were used for each window. Mutations of isolated molecules were only performed for the d isomers since it was assumed and veried that l and d isomers would give exactly the same result in the gas phase. Initial dummy atom bond lengths were set to 0.2 . The three types A of mutation are illustrated in Figure 7.3. For the cistrans mutations, an oxygen and three dummy atoms were perturbed to a methyl group, and the reverse process applied for the methyl group, eectively swapping the two. Such a perturbation is not only simpler than a direct perturbation around the dihedral angle of the amide bond, but it is also a path with a much lower energy barrier, avoiding the large dihedral term and the steric clash which would occur when the perturbation was performed in the host. The GlyAla mutations were performed by growing a hydrogen and three dummy atoms into a carbon and three hydrogen atoms, respectively, with concomitant increases in bond lengths. For the AlaPhe mutations, a hydrogen and ten dummy

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12151


H H H H H
H

C C C N
H

Gh

Gh Gh

C C C
N
H

GhO Gh Gh

C
H H

N-Ac-cis-glycine
H H H

N-Ac-trans-glycine
H Gh Gh H Gh H H H H H

C
H

C C
O

C
N
H

C C N
H

C
O

C
H

C C C

N-Ac-glycine
Gh Gh Gh Gh Gh Gh Gh Gh Gh H Gh

N-Ac-alanine
H

C C

C
H

C
H

H H

C C

C C N
H

C C
N
H

C
O

N-Ac-alanine

N-Ac-phenylalanine

Figure 7.3: The three types of perturbation performed for the amino acid derivatives. atoms were grown into a phenyl group, a much larger mutation.

7.2.2

Explicit Chloroform Protocol.

The protocol for the perturbation of the guest alone in explicit chloroform had a number of dierences to the gas phase protocol. Simulations were performed in a box of side 33 containing 265 OPLS chloroform molecules.97 Congurations were A generated in the NPT ensemble at 20 C and 1 atm. There were 3 M congurations of equilibration and now 10 M of data collection. Preferential sampling 207 was used to improve the sampling of solvent molecules around the solute. Periodic boundary conditions were used together with a non-bonded molecule-based cuto radius of 10 , although no feathering of the potential was included for these simulations (see A Subsection 3.5). With the exception of the large dihedral move, none of the improved sampling schemes described in Chapter 6 were used in this particular protocol. Maximum move sizes for solute translations and rotations were selected to be 0.15 and A

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12152 15 . The maximum volume move sizes were set to 700 3 . Regular solute moves were A attempted 5% of the time, large dihedral moves 2%, and volume moves 0.7%, with the remainder being solvent moves. The regular solute moves contained solute translations and rotations of 0.015 and 1.5 . Preferential sampling was used to improve A the sampling of solvent molecules around the solute. Between 1015 windows were included and carefully spaced, since the perturbation had to be done slowly to avoid clashes with the solvent. Again, only the d isomers were considered. Two protocols for the host in chloroform were tested. The rst protocol lacked all of the additional sampling schemes discussed in Chapter 6. The second protocol was more advanced and included all the additional MC moves from Section 6.3. Apart from the dierences described here, the remainder of the protocol was the same as for the guest in explicit chloroform. Compared to guest-only simulations, for both A protocols a much larger box was used. The box was of dimension 414445 and contained 592 chloroform molecules to enclose the larger host. The maximum size of volume moves was set to 1000 3 to reect the larger box volume. A For the rst protocol, two dierent starting structures were examined. The rst was taken from a structure used in the previous modelling work on the system.6 The second was another structure of similar energy with the hydrocarbon chain in a dierent conformation. There were 3 M congurations of equilibration and 10 M congurations of data collection per window. The breakdown of move attempts was 0.017 % volume, 3.3 % host residue and 1 % regular solute, and the rest solvent. Perturbations were only run for the thermodynamic cycle containing the cis and trans forms of Gly and l-Ala in Figure 7.2. For the second protocol, 17 dierent host starting structures were taken from the lowest energy host-guest structures generated from the simulated annealing runs described in Subsection 6.1.1. The purpose of this was to examine the eect of starting structure on the free energies obtained. 5 M congurations of equilibration and 20 M of data collection were used per window. The breakdown of move attempts was 0.017 % volume, 3.3 % host residue, 2.5 % three part solute for the guest, 1 %

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12153 conrot and 1 % ip, the remainder being solvent moves. Only the cis-Gly to trans-Gly perturbation was performed for the second protocol.

7.2.3

Window Spacing.

Before presenting the results, it is worth mentioning a little about window spacing in free energy perturbation calculations. The spacing of windows is important for convergence reasons. For a xed number of total congurations, more windows means smaller sampling length. The optimum situation is to obtain the most accurate free energy change possible for a given total simulation time. This was not so important an issue for the free energy of hydration calculations in Chapter 5 which were fairly cheap, but for the macrobicycle 12 system, the expensive simulations necessitated some window spacing optimisation. Firstly, a balance had to be struck between the number of windows and length of simulation. Then the windows had to be spaced in the most ecient manner. While there was no hard and fast protocol, the main assessment criterion in this work for evenly spaced windows was reducing the hysteresis between forwards and reverse windows to around 0.1 kcal mol1 for all windows. Generally, the window spacing was the more critical factor than simulation length in achieving this. There have been a number of recommendations in the literature as to the optimum way to space windows. One approach is to spread the free energy change evenly between each window,208 an approach that requires iteration. Another was to use the statistical error as the guide, placing the additional windows in between two existing windows so as to equalise209 or minimise52 the error in each direction. Yet another was to equalise the entropy dierences for each window,210 but this does require a separate simulation to determine the spacing. A number of these methods involving partitioning according to the errors and free energies were tested. While some success was obtained, such approaches are generally awed in that they require prior simulation. This is because the large energy dierences generated for the perturbed states are usually due to Lennard-Jones

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12154


150 3

SASA /

50

G /kcal mol 0 0.2 0.4 0.6 0.8 1

100

0.2

0.4

0.6

0.8

Figure 7.4: The change in SASA and free energy with for the cis-Ala to cis-Phe perturbation in explicit chloroform. contributions rather than electrostatics.155 Hence the free energy, which depends on both terms, does not serve as a suitable guide. Statistical errors can also be misleading since they can arise from other causes and do not necessarily indicate bad window placement. One the one hand, two consecutive windows can give very well converged but quite dierent results, while on the other hand, the size of the error can scale with the free energy change rather than due to any underlying statistical uncertainty. The approach adopted here was to assume that the windows should be spaced to minimise large changes in Lennard-Jones energies. Rather than calculating LennardJones energies, though, it was assumed that the change in molecules solvent accessible surface area (SASA) would correlate well with these energies because an even spacing of windows with respect to SASA approximately correlates with an even change in Lennard-Jones energy. At each value, SASA could be quickly calculated. Extensive sampling might still be necessary to obtain an average value of SASA. However, due to the approximate nature of the approach, such a renement was not considered to be necessary and a single conguration was used. Such a hypothesis was indeed found to be the case and subsequently proved of great use in spacing windows to obtain the optimum distribution. Figure 7.4 shows how SASA and the free energy vary with for the cis-Ala to cis-Phe perturbation in explicit chloroform. The small change in area for small allows a large spacing in window. This leads to large but well converged free energy change. For high , the area changes rapidly, necessitating

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12155

cis-Gly 3.84
c

-3.19 0.01 -3.67 0.01 -4.15

trans-Gly 3.35
c

d-cis-Ala 3.60
c

d-trans-Ala 3.11
c

d-cis-Phe

d-trans-Phe

Figure 7.5: The gas phase relative free energy perturbation results (kcal mol1 ). The square box gives the closure for each cycle. closely spaced windows. Despite this close spacing, there is still a moderate error in free energy changes. These errors would have been even worse had the windows been more distantly spaced. As for the number of windows, a certain number was required to obtain the desired hysteresis goal. However, if too many windows were added, the summation of errors due to each window from such a large number of windows outweighed the gain in closer window spacing. Such an imbalance could only be rectied by longer sampling. Hence 1015 windows were used as a compromise.

7.2.4

Guest Free Energies in the Gas Phase.

The gas phase free energy results are given in Figure 7.5. The numbers in boxes in the middle of each cycle represent the closure of the thermodynamic cycle. Ideally, this value would be 0. It can be seen that the gas phase runs are very precise with a closure of 0.01 kcal mol1 for each cycle. The statistical errors are less than 0.01 kcal mol1 and are not included since they are insignicant compared to the errors obtained for the chloroform and host simulations. There are a number of interesting points to note from the results. Firstly, the obvious stability by 34 kcal mol1 of the trans form relative to the cis is evident for all three amino acid derivatives as would be expected. This is in reasonable accord with previous experimental8 and theoretical211 results which both gave 2.60.4 kcal mol1 for N-methyl acetamide in water. Importantly, the cis to trans energy dierence

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12156

cis-Gly
c

0.40 0.15 0.09 0.04 0.23 0.22 0.88 0.18

trans-Gly
c

0.61 0.11

0.34 0.15

cis-Ala
c

trans-Ala
c

2.88 0.37

2.18 0.36

cis-Phe

trans-Phe

Figure 7.6: The relative free energies of solvation of the amino acid derivatives in explicit chloroform (kcal mol1 ).The square box gives the closure for each cycle. appears to be largely independent of side group. The second point is the dierence in energy between dierent amino acid derivatives. There is an increase in free energy observed going from Gly to Ala. This is probably due to a rise in internal energy of 3 kcal mol1 that accompanied this mutation. However, the increase in free energy going from Ala to Phe only has an internal energy rise of 2 kcal mol1 . The dierence between the internal energy and free energy probably arises from a loss of entropy due to the presence of a hindering phenyl group.

7.2.5

Guest Free Energies in Explicit Chloroform.

By subtracting the free energy change in a perturbation in the gas phase from the corresponding free energy change in chloroform, relative free energies of solvation are obtained by Eq. 2.27. Figure 7.6 contains the relative solvation energies in chloroform between all the amino acid derivatives. Errors are now of the order 0.10.3 kcal mol 1 and the thermodynamic closures of 0.09 and 0.22 kcal mol1 are less exact than before, reecting the poorer sampling for the solvated system. The solvent has a small and mixed eect on the cis to trans equilibrium. Compared to the respective trans forms, cis-Gly is slightly destabilised by the solvent, cis-Ala is unaected, while cis-Phe is stabilised by the moderate amount of 0.88 kcal mol1 . Another observation is that the relative free energies of solvation increase with increasing molecular size. This is to be expected given the greater number of favourable energy interactions a larger molecule can have with the solvent.

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12157

l-cis-Ala
T

0.10 0.44 0.48 0.68 0.73

l-trans-Ala
T

2.46 0.41 cis-Gly

2.76 0.84
E

trans-Gly

l-cis-Ala
T

3.96 0.86 0.72 3.45 0.49

l-trans-Ala
T

3.99 0.26 cis-Gly

2.76 0.23
E

trans-Gly

Figure 7.7: The host in explicit chloroform relative binding free energies (kcal mol1 ) using two dierent starting structures with no special MC moves. The square box gives the closure for each cycle.

7.2.6

Host-Guest Free Energies in Explicit Chloroform.

The relative free energies of binding for the cis and trans conformations of Gly and l-Ala are obtained by subtracting the free energy change for the guest perturbation in chloroform from the free energy change for the same perturbation in the host (Eq. 2.25). The results are presented in Figure 7.7 for the two dierent starting structures. The closures for each of these cycles of 0.48 and 0.72 kcal mol1 are now a lot worse than for the runs in chloroform even though both runs have the same 10 M congurations of data collection. Obviously the larger system now being studied would require more sampling to make a fairer comparison. However, the discrepancy in free energies obtained for each structure makes it clear that there is a more serious problem. In the rst structure, the relative free energies of the cis to trans structures are 0.10 and 0.68 kcal mol1 . However, in the second structure, the same perturbations have free energies of 3.96 and 3.45 kcal mol1 . Such a dependence of free energies on starting structure is very unsatisfactory and quite misleading. One structure indicates a stabilisation of the cis structure, while another indicates no stabilisation. This, together with the poor sampling of the host in explicit solvent as discussed in Subsection 6.1.3 led to the development of improved sampling schemes involving additional MC moves.

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12158 -1 0 1 2 G


bind

3 /kcal mol
1

Figure 7.8: Distribution of relative free energies of binding to the host for cis-Gly with respect to trans-Gly using 17 dierent starting host geometries. These additional MC moves were incorporated into the second protocol in explicit solvent. The free energies were calculated going from cis-Gly to trans-Gly using 17 dierent starting structures taken from the annealing runs. Figure 7.8 shows the results for the relative free energies of binding obtained. It can be seen that despite the inclusion of the additional MC moves, the relative free energies of binding still show a marked dependence on starting structure. The positive relative binding free energies around 4 kcal mol1 for some host structures indicate that the host is stabilising the cis conformation. However, for most structures the relative binding free energies are centred around 0 kcal mol1 . It should be noted that all the structures that did produce stronger binding for the cis conformation came from annealed structures containing a cis guest. This suggests that these host structures were possibly biased towards stabilising cis guests. To develop a protocol that could produce good sampling for the system and free energies independent of starting structure, the explicit solvent was replaced by a continuum solvent.

7.3
7.3.1

Continuum Solvent Free Energy Calculations.


Continuum Chloroform Simulation Protocol.

Continuum calculations were performed using the version of MCPRO now including the continuum chloroform GB/SA model parameterised in Section 6.4. The GB/SA protocol was the same as that described in the continuum chloroform parameterisation. Continuum solvation free energies, equivalent to solvation energies, are added to the Hamiltonian in the FEP equation to give the total free energy. Since the SASA calculation is rather time intensive, the assumption was made that SASA varies sufciently slowly not to require updating at every conguration. An area calculation

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12159 at every conguration would be prohibitively expensive. Therefore, the area was calculated every 100 attempted congurations. The continuum chloroform protocol implemented was very similar to the gas phase protocol since neither have explicit solvent, apart from the fact that there were 1 M congurations of equilibration and 5 M congurations of data collection per window. The protocol in the host had a few more dierences compared to the gas phase perturbations. A number of preliminary runs were performed using dierent starting structures to verify that the sampling of the host was now adequate. With the dependence shown to be minimal, host starting structures were taken from the lowest energy host-guest structures generated from the simulated annealing runs described in Subsection 6.1.1 to enable faster equilibration. There were 1 M congurations of equilibration and 8 M congurations of data collection per window. The end windows were run for 10 M congurations to obtain more sampling information for the real physical states. For the end windows, structures were saved every 0.1 M congurations for later analysis in Chapter 8 to give 100 structures in total for each guest. Such a relatively small number of well-spaced, uncorrelated structures was found to be sucient to deduce the necessary trends. The breakdown of move attempts was 26% three part solute, 5% large dihedral (for Phe), 20% conrot and 13% ip, the remainder being host residue moves. 1015 windows were used with the same spacing as in explicit chloroform simulations. Free energies this time were calculated for all stereochemistries and conformations of all three amino acid derivatives.

7.3.2

Guest Free Energies in Continuum Chloroform.

The relative free energy of solvation results in continuum chloroform are presented in Figure 7.9. The closures of the cycles at 0.22 and 0.28 kcal mol1 are reasonable. Errors for each mutation are only 0.1 kcal mol1 at most. The mutation with the worst error was the d-Phe cis to trans with an error of 0.08 kcal mol1 . It is interesting to note that the errors obtained are greater than those in the gas phase (< 0.01 kcal mol1 ), even though the sampling would be expected to be the same. It

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12160

cis-Gly
c

0.35 0.08 0.22 0.35 0.07 0.28 0.24 0.07

trans-Gly
c

0.13 0.01

0.09 0.08

d-cis-Ala
c

d-trans-Ala
c

0.30 0.03

0.61 0.08

d-cis-Phe

d-trans-Phe

Figure 7.9: The relative free energies of solvation of the amino acid derivatives in continuum chloroform (kcal mol1 ). The square box gives the closure for each cycle. may possibly be a result of the shorter equilibration of 1 M congurations for continuum calculations. Otherwise, it may simply be due to the addition of an additional energy term, the solvation free energy. The calculation of relative free energies for the amino acid derivatives in both continuum and explicit solvent allows a speculative comparison of the two models. Compared to the previous explicit solvent simulations (Figure 7.7), the results appear to be somewhat dierent. The most obvious dierence is that the free of energy of solvation for Phe of either conformation is only marginally more negative than that for Ala. In explicit solvent, the dierence was much larger with Phes free energy of solvation lower by 23 kcal mol1 . The second dierence concerns the cis versus trans stabilisation. trans-Gly appears to be less stable with respect to cis-Gly in continuum, yet the reverse was found in explicit. The same is true for Ala but to a smaller extent. However, the reverse trend is found for Phe. trans-Phe is more stable with respect to cis-Phe in continuum, yet the order was reversed in explicit. These are all relative free energies, so it is not immediately obvious which molecules absolute free energy of solvation may be varying with the solvent model. However, the likely candidate is Phe. Anything that may aect the simpler Ala or Gly will probably aect Phe in the same way, leading to little observed dierence between the molecules. Phe is a more complex molecule and would be expected to be more sensitive to dierences in solvent model. One or both of the solvent models must be failing to some extent. The main

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12161 weakness of the explicit solvent model is that the sampling of the solutes may not be adequate. However, the solvation energy for the explicit solvent is expected to be more reliable. Sampling problems are likely to be minimal for the continuum model, while its main weakness is that it may be incorrectly parameterised for certain functionalities in the amino acid derivatives such as the aryl ring which is the key dierence in structure between Ala and Phe. On the sampling issue, Phe has two dihedrals whose sampling may be considerably hindered by the explicit solvent. If these dihedrals can only vary over a small range, then it is no surprise that the free energies obtained may be dierent to the continuum case, which does have good sampling. Poor sampling already appears to the result for the wide distribution of free energies shown in Figure 7.8 for the cis to trans mutation for Gly. On the parameterisation issue, it has been noted212 that the GSA free energies of solvation for cyclic groups such as aryl rings should have a more negative dependence on SASA to compensate for the fact that cyclic structures, being more compact, have a smaller SASA. If cyclic atoms were assigned a separate term in a GB/SA parameterisation (Section 6.4), then this would give Phe with its cyclic aryl group a more negative GB/SA free energy of solvation as is observed in explicit. This, though, would have required an extra i parameter in the GB/SA parameterisation protocol, giving three i parameters in total, too many for tting to the free energies of solvation of only 20 molecules. However, an inspection of the free energies of solvation in Table 6.3 for the molecules used in the GB/SA parameterisation reveals that aromatic molecules perform quite well. If this second explanation is the reason, there is cause for some concern in having used this GB/SA model, particularly for Phe, although the host-guest complexes are also in chloroform, so these eects may cancel out to some extent.

7.3.3

Host-Guest Free Energies in Continuum Chloroform.

Before presenting the relative binding free energies for all guests, the relative binding free energy for the cis-Gly to trans-Gly mutation was calculated using four quite

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12162

l-cis-Phe
T

4.12 0.52 1.16 2.21 0.69 1.17 0.23 0.28 0.32 0.08 0.27 0.98 2.14 0.32

l-trans-Phe
T

1.80 0.29 l-cis-Ala


T

1.05 0.24
E

l-trans-Ala
T

2.01 0.24 cis-Gly


c

0.74 0.17
E

trans-Gly
c

0.02 0.30

0.15 0.17

d-cis-Ala
c

d-trans-Ala
c

1.65 0.28

0.41 0.50

d-cis-Phe

d-trans-Phe

Figure 7.10: The relative free energies of binding of amino acid derivatives in the host in continuum chloroform (kcal mol1 ). The square box gives the closure for each cycle. dierent starting structures. In explicit solvent, the four structures had given relative binding free energies of 0.49, 0.77, 3.09 and 3.94 kcal mol1 with a maximum error 0.8 kcal mol1 . In continuum solvent, the same structures gave the results of 0.06, 0.23, 0.39 and 0.11 kcal mol1 , respectively with a maximum error of 0.45 kcal mol1 . Clearly, within the bounds of error, these results are independent of structure. Figure 7.10 contains the relative free energies of binding for the guests in the host-guest complex in continuum chloroform. The errors and closure of the cycles with the host present are now somewhat worse than in chloroform, suggesting that the free energy results are not fully converged. The closures range from 0.32 to 1.17 kcal mol1 . The errors range from 0.20.7 kcal mol1 and tend to be worse for the cis to trans relative binding free energies. The errors obtained indicate that the sampling appears to be insucient to obtain fully converged free energies despite the improvements made. However, the reassuring feature is that the errors now appear small enough compared to the size of the free energy numbers to make meaningful qualitative deductions. Another feature of the errors is that they are comparable

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12163 to those found in the poorly sampling perturbations performed earlier in explicit solvent. This is probably the result of two competing eects. The improved sampling in continuum leads to a much greater exploration of congurational space. If many dierent regions are visited, many dierent terms will contribute to the average in the FEP equation, leading to a larger error. For the explicit solvent case, if the system remains trapped in one conformation, then most terms in the FEP equation will be of similar value. However, the discrete nature of the solvent can produce wide uctuations in energies not observed in continuum since continuum solvent free energies are averaged over all solvent congurations. These two opposing eects probably lead to errors or similar value. It also emphasises the diculty in explicit solvent that sampling has to average over both solute and solvent degrees of freedom. As anticipated by the experimental ndings, there is a wealth of information in Figure 7.10 addressing the two main points of study in this work regarding enantioselectivity and conformational stabilisation. The rst major point of interest is the stabilisation of the cis conformation relative to the trans for l-Ala, l-Phe and apparently d-Phe, but not Gly or d-Ala. The second major point of interest is the stronger binding of the l enantiomer compared to the d enantiomer for both Ala and Phe, particularly for the cis compounds. This suggests that rstly, l enantiomers are likely to complement the host better, and secondly, that the cis stabilisation is selective for the l isomer. These results are in exact accordance with the major ndings of experiment. However, on the third point of interest, namely the selectivity of the host, there is more uncertainty. To more clearly observe the relative binding free energies of each amino acid derivative, the data from Figure 7.10 may be used to construct a table of relative binding free energies with respect to one of the molecules. trans-Gly is selected to be this reference molecule. The relative free energies (Gsim ) for most molecules with respect to trans-Gly may be calculated using more than one path. In these cases, either the most direct path was taken or, if there was more than one, the value was taken along the path with the smallest error. For example, Gsim

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12164

Table 7.2: Relative Free Energies Obtained From Simulation and Experiment. Molecule N-Acglycine N-Ac-lalanine N-Ac-dalanine N-Ac-lphenylalanine N-Ac-dphenylalanine Gsim /kcal mol1 cis trans 0.23 0.28 0 -1.78 0.37 -0.74 0.17 0.25 0.41 -0.15 0.17 -3.58 0.40 -1.79 0.29 -1.40 0.25 -0.66 0.53 Gexpt /kcal mol1 0.00 0.81 0.91 0.67 0.96 0.07 0.23 0.10 0.10 0.12

for l-cis-Phe is found by going from trans-Gly to cis-Gly to l-cis-Ala to l-cis-Phe. Table 7.2 contains those data, together with the experimental values. The stronger binding conformation, either cis or trans, as predicted by simulation is highlighted in bold. Before interpreting this table, there are two signicant dierences between the relative binding free energies that are being measured by experiment and simulation. Firstly, simulation relative binding free energies were obtained exclusively for the guest inside the host cavity while the experimental values were obtained for the guest binding to the host anywhere, either inside or outside the cavity, or maybe even exclusively outside. Secondly, simulation gives relative free energies for cis and trans individually, while the experimental values presumably are for the lowest energy conformation whichever that is. However, it is possible to make one meaningful comparison concerning the inuence of the side chain. Since experimental NMR data indicate that lPhe and lAla both bind in the cavity, their relative binding free energies may be compared with simulation. It can be seen that both experiment and simulation predict stronger binding for lPhe over lAla, but they dier signicantly in predicting the extent to which this happens. Simulation predicts lPhe to be 1.80 kcal mol1 more stable than lAla, while experiment predicts only 0.14 kcal mol1 for the same number. Unless the experiment or simulation is in error, such a dierence must be put down to l guests binding both inside and outside the cavity. Indeed, experiment has estimated the relative binding populations of inside to outside the cavity at 70:30 for l-Phe. Presumably, binding outside the cavity is much less selective. Hence, since binding

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12165 on the non-selective outside position is occurring in experiment, the experimental relative binding free energies should be smaller than in the simulation. It is not possible to compare the d molecules since they are binding on the outside of the cavity. For Gly, experiment has not shown whether the guest binds inside or outside the cavity. If it binds inside, then experiment and simulation are in disagreement. Looking at Table 7.2 again, experiment predicts Gly binds 0.81 kcal mol1 more stably than l-Ala, while experiment predicts that Gly binds less stably by 2.01 kcal mol1 . This contradiction appears to suggest that Gly may bind outside the cavity. However, if Gly does bind inside the cavity, simulation predicts that it will marginally prefer to adopt the trans conformation. This prediction is currently being tested by experiment. Concerning relative binding free energies, one possible source of error for simulation is the GB/SA solvation model. As noted earlier in the comparison between explicit and continuum free energies of solvation, Ala and Phe were not as well solvated in the continuum. A less negative free energy in chloroform leads to stronger relative binding. This may account for some of the dierence with experiment, and if it were true, it suggests that a better parameterisation of the GB/SA model is required to understand the eect of dierent side chains. Enantiomers and conformations do not dier in terms of the atom identities present and so are unlikely to be adversely aected if there is a problem with the GB/SA model. Full elucidation of the binding constants can only be made if more simulations are performed to measure the relative binding free energies of all the guests outside the cavity and the relative binding free energy for guests between the inside and outside of the cavity. The large dierence in the structure of the end points for the latter simulation would probably necessitate the calculation of absolute binding free energies to each site. This calculation would be very expensive.

CHAPTER 7. FREE ENERGY CALCULATIONS FOR MACROBICYCLE 12166

7.4

Conclusion

The application of the simulation protocol to calculate relative free energies of binding for the amino acid derivatives to macrobicycle 12 has been described. The experimental enantioselectivity and stabilisation of the cis amide conformation has been correctly reproduced. Moreover, it has been tentatively predicted that should Gly bind inside the cavity, it binds in the trans conformation. Experiments to test this prediction are in progress. Problems were demonstrated with the use of an explicit solvent model. The GB/SA continuum model was able to provide the means by which results could be obtained that largely agreed with the experimental ndings. However, the ability of GB/SA to correctly model relative solvation energies may aect the relative binding free energies obtained. Nevertheless, the agreement with the other experimental ndings reinforces the reliability of the simulation data and gives condence in using this data to examine precisely their physical origins. This is the subject of the next chapter.

Chapter 8 Analysis of the Macrobicycle 12 System


Experiment has shown that simply changing the side chain of the amino acid is able to produce a diverse range of binding behaviour. Thus, to predict binding behaviour, the exact causal relationship between side chain and binding must be determined. It must be emphasised that this study only examines binding inside the host cavity since this is the place where experiment indicates that interesting binding selectivity occurs. The main property of the macrobicycle 12 system that causes the diverse binding behaviour for dierent guests is the availability of a number of possible alternative binding motifs. The dierent properties of each guest determine which of these are prefered. The purpose of this chapter is to draw the connection between the properties of the guest and the way it binds. Initially, a detailed description of the host and guest is given. A model is then proposed to describe all the possible interactions between them. The model will be used to aid the interpretation of the structures observed in the simulation. Following this, a detailed analysis of all the host-guest complexes is given, focusing on the features that dier between guests and using the model to interpret the observed behaviour. Finally, an overall rationalisation is made about the link between the guest, the binding mode and ultimately the binding free energies.

167

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM benzamide

168

aryl amide r d sr  rr d d d h d d y d dd d r d d a od a m c d r A i a d 911 d y d r d d l d e b d A o d d n d d d thiourea d N-benzylamide N-benzylamide d d d d h d y d a d a d d m r d r d i y 1012 A d d o d l d c da e d r d d bdd o d d n d rrd d rd c r aryl amide
T '

benzamide 8 A

Figure 8.1: Schematic of the dimensions and important parts of macrobicycle 12.

8.1
8.1.1

Description of the Binding Site.


Host Binding Features.

Firstly, consider the host, macrobicycle 12. Figure 8.1 shows a schematic of the host illustrating the layout of its components and a few important distances. The thiourea unit lies in the middle of the cavity and is joined to the main ring by two hydrocarbon chains which stretch along the diagonal between the two junction carbons. The main ring of the host resembles a rhombus of side 8 . Two sides are made up of benzamide A groups and two are made up of N benzylamide groups. The diagonal dimensions are approximately 911 between the tertiary junction carbons and 1012 between A A the diaryl methane carbons. The depth of the cavity ranges from 3 at the junction A carbons up to 6 at the diaryl methane carbons. The denition of the two depths A

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM


r Tr r

169

benzamide

N-benzylamide
r rr T

46 A

hydrocarbon
c

3 A
c

thiourea

Figure 8.2: Schematic of the two depths of macrobicycle 12 as viewed along the axis connecting the two junction carbons. are shown in Figure 8.2. In this gure, macrobicycle 12 is being viewed along the axis connecting the two junction carbons. Overall, the host molecule possesses C2 symmetry. This allows the guest to reside in two equivalent orientations. The host possesses a number of regions important for binding by the guest. These are highlighted in Figure 8.3. The rst of these is the presence of six polar hydrogens in the cavity. The annealed structures (see Section 6.1) revealed two important facts. Firstly, the polar hydrogens of the amide groups almost always pointed into the cavity. Only a few strained structures had the oxygens pointing inwards and so may be discounted. Furthermore, while the thiourea unit preferred to have its hydrogens pointing outwards for the host, when a guest was present, they preferred to point inside the cavity. Therefore, the thiourea and the two pairs of amide groups contribute a total of six polar hydrogens with which the guest can bind inside the cavity. The host also possesses four aryl groups available both for hydrogen bonding to polar hydrogens and interactions to other aryl groups. These binding possibilities are further complicated by the exibility of the host. The hydrocarbon chain is capable of adopting a large number of conformations that inuence not only where the hydrogen bonding thiourea unit lies but also the shape of the host cavity. The aryl units can adopt various orientations, also inuencing the shape of the cavity. The amide units are relatively rigid but their orientation, nevertheless, is still important as it aects the position of their polar hydrogens. This exibility emphasises the necessity to examine many structures. It is not sucient

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

170

H H

aryl ring
H H

polar hydrogen

Figure 8.3: The host molecule with the polar hydrogens and aryl rings highlighted. to consider a single host structure and compare how each guest binds to it because dierent structures will be more suited to some guests than to others.

8.1.2

Guest Binding Features.

Before describing how these regions come into play in binding, it is necessary to describe the guest. Figure 8.4 highlights the important binding regions in the guest. The guest possesses two highly charged oxygens in a carboxylate group at one end and a carbonyl oxygen on the amide unit. All these oxygens have the potential to hydrogen bond to the polar hydrogens of the host. The guest also contains a polar hydrogen that can repel the host polar hydrogens or form an internal hydrogen bond to one of
M O SC
H

oxygen
SC

methyl group
H

side chain polar hydrogen

Figure 8.4: The guest molecule (cis-Phe) with the oxygens, polar hydrogens, aryl rings and other important features highlighted. Gly and Ala are similar except in the side chain.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

171

the carboxylate oxygens. It is not able to hydrogen bond to the carbonyl oxygens of the host since these oxygens were found to lie outside the cavity, as mentioned in the previous Subsection. The variation between guest molecules is also important. The main dierence arising from guest conformation about the amide bond is that the carbonyl oxygen lies more on the side of the guest for cis but more on the top for trans. The reverse applies for the methyl group. This can be seen in Figure 8.4 by interchanging the amide oxygen and methyl group. The variation with side chain is mainly in size. Gly has no side chain and only a hydrogen. Ala has a larger methyl group, while Phe has a very large phenyl group. Flexible dihedrals in the phenyl group can point it in dierent directions, and there is also the potential for interactions with the host aryl group. Dierences in stereochemistry for the guest alone cause no dierence in potential guest binding sites. However, when combined with the chiral host, dierences do arise. These are discussed further on in Subsection 8.1.4. The guests, as opposed to the host, are much more rigid. All guests contain three signicant dihedrals. The lower two dihedrals closer to the carboxylate end are very restrained and change little in value, partly due to stabilisation by an internal hydrogen bond. The third dihedral is the amide dihedral and determines the two conformations of interest for each guest. Phe guests possess two more dihedrals that are moderately exible. These are the dihedral which swings the aryl group around, and the dihedral which twists the aryl group about its own axis. These two dihedrals are important because they allow Phe to place the aryl group in a number of dierent positions.

8.1.3

Origins of Selectivity.

Having described all the major binding sites on the host and guest, their binding interactions may now be discussed. There is a wide range of factors determining the interactions possible between the host and guest. The primary interaction stabilising the binding is the hydrogen bonding of the guest carboxylate group to the host.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM


H

172

C
H H

H H

C C
O

C
H

C
N

C
H H

C
N
H

C
H

C C
H

C
O O

O
H H

N C
H

N C C
H

N C
H

N C C
H

H H

N C
H

N C C
H

Figure 8.5: Three possible ways that cis-Gly may bind to the thiourea of the host. There is a choice of three pairs of hydrogens to which the carboxylate may bind. The carboxylate may choose to bind to the thiourea hydrogen pair in the middle. This allows the rest of the guest to point out into the open end of the cavity. Alternatively, it may bind to an amide pair at one end. In doing this, the guest would have to tip to one side. Either of these cases would involve the formation of between two and four hydrogen bonds. A more complex mode of binding is intermediate between these two extremes. One carboxylate oxygen forms a double hydrogen bond to the thiourea pair and the other oxygen forms a double hydrogen bond to the amide pair, giving four in total. Finally, the carboxylate may be able to have each oxygen bonding to one thiourea hydrogen and two amide hydrogens, giving six hydrogen bonds in total. Which of these binding patterns a guest adopts will depend on both the shape of the guest and the conformation of the host. The thiourea hydrogens would not be expected to have a strong, direct inuence on guest selectivity. Since the thiourea lies in the middle of the cavity, it is very accessible to the carboxylate group that all guests possess. Furthermore, there are a large number of orientations that the guest may adopt while still having two hydrogen bonds between its carboxylate oxygens and the two thiourea hydrogens. This is because one of the oxygens may form a double hydrogen bond about which the whole guest may pivot. Figure 8.5 illustrates three possibilities. It is also possible to form up to four hydrogen bonds between the carboxylate group and the thiourea if the guest in the middle picture twists by 90 about its axis. However, the resulting hydrogen

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

173

bonds are likely to be weaker and they require a very specic, constrained geometry to form. In this way, thiourea plays an anchor role. It keeps the guest close by oering two easy hydrogen bonds, but has little inuence on guest orientation. The one interesting feature of thiourea that may inuence guest selectivity is the exact location of the thiourea group in the host. Since the thiourea group is attached to two exible hydrocarbon chains, it is able to move relative to the rest of the host to a small but signicant extent. Guests that may be trying to improve some other binding interaction elsewhere in the cavity may be able to force the thiourea to another position to obtain both interactions at once. Alternatively, and in a more complex manner, the guest may force a change in the cavity shape. This might lead to a change in hydrocarbon conformation and move the thiourea to a position adverse to the guest. The amide hydrogens are more relevant for inuencing guest selectivity and there are ve important points to make about them. The rst is that they are potential candidates for hydrogen bonding for both the carboxylate and the carbonyl oxygens of the guest. Thus there is a competition between which oxygens are preferred. The second is that they lie in the cleft at the two shallowest ends of the cavity. Since bulky atoms on the guest might also prefer to reside in this cleft, this results in a competition between hydrogen bond formation and steric relief of large groups. The third point is that in each pair, the N-benzylamide hydrogen lies higher above the thiourea than the benzamide hydrogen. Figure 8.6 illustrates this dierence. For the carbonyl oxygen to bond to the lower benzamide hydrogen, the guest must tip over further than for the N-benzylamide hydrogen. By way of contrast, this dierence in height is important sterically to guest methyl groups and side chains in the opposite way. It will be easier sterically for these groups to lie over the lower benzamide than the higher N-benzylamide. This is particularly important for the side chains which lie lower down on the guest than do the amide methyl groups. Thus, the side chains are closer to the host and have a stronger inuence when the guest binds. The fourth point is that the the N-benzylamide unit is more exible than the benzamide unit

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

174

H H

Nbenzylamide

benzamide

Figure 8.6: The dierence in height for the polar hydrogens in the N-benzylamide (left) and benzamide units, and the closed (left) and open aryl groups. due to the presence of the extra CH2 group. Such exibility is important since it allows the attached polar hydrogen to reach up a little if necessary to hydrogen bond to the guest carbonyl oxygen. The nal point is that the aryl group in the benzamide unit prefers to lie in the same plane as the amide group and so access to the polar hydrogen of this group is somewhat sterically hindered. On the other hand, the aryl group of the N-benzylamide group prefers to lie at an angle to the amide. Thus the aryl group is more out of the way of the guest. It is clear from the discussion about polar hydrogens that the host aryl groups are also important to binding selectivity. As well as the dierent heights previously described, one other key feature distinguishing the two types of host aryl groups is in their orientation about the cavity. The benzamide aryl groups tend to lie in a more open manner with respect to the cavity while the N-benzylamide ones are more closed over the cavity. This dierence, shown in Figure 8.6, may have a strong inuence on steric clash with the guest side chain and methyl groups since open aryl groups can more easily accommodate these groups.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

175

8.1.4

The V-Model For Binding.

The combination of the dierent properties of the the guest and the intricate features of the host polar hydrogens and aryl groups would be expected to produce a diverse range of binding possibilities. The highly coupled nature of these features further complicates their analysis since they cannot easily be examined in isolation. Changing one apparently small feature such as amide conformation may lead to large structural changes throughout the whole complex. A model is now proposed to describe all the modes of binding. By picking out a few key interactions, the situation may be considerably simplied. Furthermore, all possible binding modes using these key interactions may be considered and evaluated. This is important because in rationalising binding, more often than not it is the binding modes that do not occur that are of greater assistance in explaining the observed structures. Naturally, models have their limitations and so other factors left out of the model may well have to be considered. It should be noted that this model was not completely derived a priori to examining the structures observed in the simulations. However, its presentation now will aid later interpretation of the structures. A simple model for the shape of the guest is now described. The guest, when viewed down its long axis (see Figure 8.4 for denition) with the carboxylate at the other end, is not at but has the shape of an open V. This is because the side chains, being attached to a sp3 carbon, are not coplanar with the guest amide group. Figure 8.7 illustrates the V-shape for l-cis-Ala as viewed looking from the top down the main axis. This V shape is taken to represent the guest. One end of the V is the side chain, while the other is either the amide oxygen or methyl group, depending on the amide conformation. For cis, this end is the smaller oxygen with potential for hydrogen bond formation. For trans, it is the more bulky methyl group. Gly lacks one end of the V since it has no side chain. For Ala and Phe, which do have side chains, this V faces opposite ways depending on the guests stereochemistry. An important feature of the V is that the side group end lies lower down the guest and closer to the carboxylate than the amide end, as seen in Figure 8.4. It is assumed that the

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

176

Figure 8.7: The V-shape of l-cis-Ala as viewed down its long axis. Both ball and stick and van der Waals surface representations are shown. The V faces the other way for d guests. carboxylate group lies parallel to the length of the V as indicated in Figure 8.4. It is assumed that the end of the amide group that points up can be ignored since the side chain, lying below it, is likely to dominate any interactions with the host at this end. Thus the proposed model captures all the important dierences present in the guests, apart from the dierence due to the size of the side chain. Now a model for the host is described. The features of the host included in the model are the six polar hydrogens and the four aryl groups. As described in Subsection 8.1.3, two of the hydrogens are high (N-benzylamide hydrogens), while two are low (benzamide hydrogens), and two of the aryl groups are open (benzamide aryl groups), while two are closed (N-benzylamide hydrogens). When bringing the guest and host together, it is assumed that the carboxylate at the centre of the guest V always bonds with at least one oxygen to the thiourea. Each end of the V will compete to bind with its preferred section of the host. Figure 8.8 shows the four possibilities for the four types of V shape. A favourable interaction for oxygen is with a high hydrogen to form a hydrogen bond while a favourable interaction for a side chain or methyl group is to nd space over a low amide or open aryl group. It is more important to satisfy the side chain since this end of the V lies closer to the host. Therefore, this model of binding predicts the binding order as l-cis > l-trans > d-trans > d-cis. These binding modes are termed the primary binding modes for

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

177

L H O T T SC H L H M

T H L

Lcis
L H O

l-cis

Ltransl-trans
L H M

SC

T SC H

T SC H

Dcis d-cis
Key:

Dtrans d-trans
O

H "high" amide hydrogen L "low" amide hydrogen T

oxygen

M methyl group SC side chain

thiourea hydrogen aryl ring

favourable unfavourable

Figure 8.8: The four possible binding motifs for a V-shaped guest. the V model. If the guests do not bind well in one of these primary modes, then they will most likely adopt some other position. The V may do one of four things. These are to tip forward, backwards, sideways, or do a roll. These motions are illustrated in Figure 8.9, Firstly, the V may tip over. Such a move may be possible if there is a clash at only one end. A tip in either direction may help form a hydrogen bond or remove the clash. If the part of the host with which the guest is clashing is low, then the guest may tip forward slightly to place the clashing part of the guest above this part of the host. If the clashing part of the host is too high, then the guest will tip backward and t its clashing part inside the cavity. This back tip is the second type of motion. Which of these occurs will also depend on whether the clashing part is a lower side chain or a higher amide group. Thirdly, it may tip sideways into the

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM


H H

178

H H

O C
H

C N

C
H

C C C N N
H

C N
H

C O
H

N C O C O O C C C N N
H

OH

C O

C O
H

N C O O C

H H

C
H H

C
H H

H H C H H H

OH

C O

C
H

H H

H H

N C
H

C
H H

H H C H H

C
H

C
H

C
H H

H H

N C
H

forward tip

back tip

C
H

C
H H

S
H H H

C
O
H

C C
H

C
H H

H H

C C C
H

C C O N C
H H H H H H

C C
H H

C C C C
H

C
H

C H
H

C C C

N C

C C C
H

C
H

C C O C
H H H

N C N
H H H

C O O
H H C C N

C
H

C N
H

H H H

C
H

N C O
H

C
H

C C
H H H

C C

C
H

H C C

O
H

N C
O

C C C
H H

side tip

C C
H H

O C

S C
H

roll

N
H

C
H

C
H

C
H H

C C

C
H

C C

C
H

Figure 8.9: The four possible motions available to the guest. These are the forward tip, back tip, side tip and roll. Only selected cross-sections of the host are shown in each diagram. side of the host. This may also occur if there is a clash at only one end. Fourthly, it may roll around an axis coming out of the page so that both ends move. This is likely to happen when both ends are clashing. Such a motion may either be a whole body motion for the guest, or it may occur by a rotation of one of the dihedrals in the guest itself, leaving the carboxylate group xed with respect to the host. These four possibilities are termed secondary relief modes. The aryl groups shown in Figure 8.8 are important in determining to where the clashing groups may choose to move. Forwards and backwards tipping is the most likely secondary mode because it keeps the V aligned in the cavity, with the carboxylate always near polar hydrogens. Sideways tipping and rolling do not do this so much and would only be expected to occur if tipping failed or was not possible. From this point onwards, use of the word tip will imply the forwards or backwards tip. Sideways tipping will always be explicitly stated as such. It is important to keep in mind that if the guest adopts

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

179

a secondary relief mode, the guest amide group at the top that has been previously ignored may inuence which relief mode occurs. Therefore, a preliminary prediction of this model is that l-cis guests remain in the favourable position in Figure 8.8. Forming this hydrogen bond will most likely involve some forwards tipping to maximise the strength of this bond. l-trans and dtrans guests will most likely tip to relieve their one bad contact, although some degree of tipping sideways or rolling is possible. d-cis will roll, most likely in a clockwise direction to remove the bad side chain contact. Of course, Gly, which is not shown, does not have a side group so only the amide conformation inuences the binding mode. In this model, there is nothing to stop the oxygen in either case hydrogen bonding to the host amide hydrogens. Only so much can be rationalised from the shapes of the guests themselves. The real binding motifs are a complex interplay of all these described features. An analysis of the binding structures observed from the computer simulations is now presented using the V model as a guide to elucidate these binding motifs. Pictures of all the main motifs are given at the end of this Chapter on Page 211 and may prove useful to inspect during the analysis.

8.2

Guest Orientation.

The rst place to start for such an analysis is to see what the average positions of the guests are in the host. An initial study of the structures revealed that the coordinate that varied the most between dierent guests was the angle at which they lay in the host. In the binding model, this is the degree to which guests tip forwards or backwards. The other roll and sideways tip angles were seen to vary only to a small degree. Evidently, the host restricts their range. Their eects are best observed in later analyses. This particular tip angle, f b , is calculated from the dot product of the CC vector of the guest and the junction carbonjunction carbon vector of the host, as shown in Figure 8.10. Table 8.1 shows the value for f b for each guest averaged over the 100 structures used in the analysis, together with its standard deviation. 0

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM


H

180

C C C

C
H

C C
H H H

C
H

H H H

C N
H

C
H

side 2
N

fb

side 1
O

C N C
H H

C
H H

H H

H H

C
H

C
H

N
H

N C

C
H

C
H

C
H

C
H

Figure 8.10: The denition of f b used to dene the orientation of the guest with respect to the host. f b is calculated from the dot product of the two dashed vectors shown. corresponds to the guest lying sideways with the polar hydrogen pointing straight down (side 1); at 90 the guest is vertical; at 180 , the guest lies the other way with the polar hydrogen pointing straight up (side 2). These angles are average values so some deviation from them does still occur, as indicated by the standard deviations. This table shows that l-cis-Phe and l-cis-Ala are tipped to the side containing the carbonyl oxygen at respective angles of 72 and 82 , suggesting the possibility of a hydrogen bond to the amide hydrogens. trans-Gly with a tipping angle of 146 Table 8.1: Tip Angle, f b , For Each Guest in the Host. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine Angle / degrees 72 4 90 9 82 8 100 9 115 20 146 10 130 7 110 14 113 8 116 6

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

181

represents the other extreme. It may be hydrogen bonding to the amide hydrogens on the opposite side of the cavity. d-cis-Ala tips in the same direction as trans-Gly but to a lesser extent. In between lie l-cis-Phe and l-cis-Ala which are fairly vertical while the remainder lie in the range 110-116. The angles given are only averaged angles. Their deviations reect the mobility of each guest. Clearly, the cis-Gly conformation appears to be very mobile, followed by d-trans-Ala. At the other extreme, l-cis-Phe appears to be the most constrained. It is interesting to note that trans-Gly is a lot less mobile than cis-Gly. This dierence is not simply due to the amide conformation, since the reverse trend is found for d-Ala. This dierence for Gly may be due to a possible hydrogen bond gained by trans-Gly in tipping over. Such behaviour may also be present for l-cis-Phe, which appears a lot less mobile than l-trans-Phe. This observed behaviour is consistent with the V model. l-cis guests tip forwards, d guests tip backwards, l-trans guests are fairly level, while Gly, lacking a side chain appears moderately mobile. A hydrogen bond analysis will make clearer any trends.

8.3
8.3.1

Hydrogen Bond Analysis.


Hydrogen Bond Patterns.

An explanation for much of the dierent binding motifs can be found in a hydrogen bond analysis. Hydrogen bonds are strong, energy-lowering interactions that generally favour binding, particularly in non-competitive solvents like chloroform. The more of them there are, the stronger, typically, the overall binding. In this work a hydrogen bond is deemed to exist if the two atoms involved are less than 2.5 apart. Two A features concerning hydrogen bonds are of particular interest. The rst of these is the total number of hydrogen bonds that can occur. The frequency of a given number of hydrogen bonds occurring simultaneously for each guest is given in Figure 8.11. The second feature of interest is the types of hydrogen bonds formed. Figure 8.12 indicates the number of hydrogen bonds of a given type averaged over the simulation

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM cis trans


80 80 40 1 2 3 4 5 6 7 0 80 40 1 2 3 4 5 6 7 0 80 40 1 2 3 4 5 6 7 0 80 40 1 2 3 4 5 6 7 0 80 40 1 2 3 4 5 6 7 0 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

182

N-Ac-l-phenylalanine 40
0 80

N-Ac-l-alanine

40 0 80

N-Ac-glycine

40 0 80

N-Ac-d-alanine

40 0 80

N-Ac-d-phenylalanine

40 0

Number of Hydrogen Bonds Figure 8.11: A histogram of the number of hydrogen bonds for each guest with the host. for each guest. The height of the entire bar gives the total number of hydrogen bonds on average. It can be seen that all guests are quite capable of forming three to four hydrogen bonds while some are able to form ve, six or even seven. A source of this dierence lies in the types of hydrogen bonds formed. There are three types of hydrogen bonds. The rst type is between the carboxylate and thiourea, the second between the carboxylate and the amides, and the third between the carbonyl and the amides. Figure 8.11 reveals that most guests form two carboxylatethiourea hydrogen bonds and two carboxylateamide hydrogen bonds, while only some are able to form carbonylamide hydrogen bonds, and if they do, they form one of these, on average.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

183

Number of Hydrogen Bonds

6 5
 

CarbonylAmide CarboxylateAmide CarboxylateThiourea


 !                          

Molecule

Figure 8.12: The breakdown of the three types of hydrogen bonds present between each guest and the host.

8.3.2

Interpretation of Hydrogen Bond Patterns.

Both Figures 8.11 and 8.12 contain a wealth of information concerning the dierences for each guest. What is immediately evident from them is the general trend that, the larger the side chain in the l position or alternatively, the smaller the side chain in the d position, the more hydrogen bonds there are. The main exception to this rule is trans-Gly which has the largest number of hydrogen bonds of all guests. The reason for this dierence becomes clearer when considering the types of hydrogen bonds formed. Carbonyl to amide hydrogen bonds are only able to form for four guests. Three of these guests are in the cis conformation and are either of l or have no stereochemistry. These guests are cis-Gly, l-cis-Ala and l-cis-Phe. Usually, only one of the two bonds forms and it is typically with the higher N-benzylamide hydrogen. The percentage of carbonylamide bonds that involves lower benzamide hydrogens is only 17, 8 and 10% for cis-Gly, l-cis-Ala and l-cis-Phe, respectively. In this binding motif, up to six and occasionally seven hydrogen bonds are observed to form. If two of these



$ 

$    

" #

" 



  

3
   

$  !   



4
    





 !   







CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

184

Table 8.2: Average Polar Hydrogen Separation, rHH , For the Guest Amide and the Nearest Host Amide. Guest rHH / A N-Ac-l-cis-phenylalanine 2.9 N-Ac-l-trans-phenylalanine 2.2 N-Ac-l-cis-alanine 2.5 N-Ac-l-trans-alanine 2.4 N-Ac-cis-glycine 2.9 N-Ac-trans-glycine 3.3 N-Ac-d-cis-alanine 3.1 N-Ac-d-trans-alanine 2.8 N-Ac-d-cis-phenylalanine 2.7 N-Ac-d-trans-phenylalanine 3.0 hydrogen bonds are carboxylatethiourea, two carboxylateamide, and two carbonylamide, then all this evidence is indicative of the guest tipping over to side 1. This is in agreement with the f b results and the V model. The fourth guest that seems to possess the greatest ability to form such carbonylamide hydrogen bonds is trans-Gly. Being in the trans conformation, this indicates that the guest tips over in the opposite way to the l-cis guests allowing the carbonyl to form one to two additional hydrogen bonds to the amide pair. trans-Gly quite commonly forms two carbonyl-amide hydrogen bonds simultaneously, including one to the benzamide hydrogen which now contributes 39% of all carbonyl amide hydrogen bonds, giving six in total. Evidently, trans-Gly is able to approach much closer to the amide hydrogens than the other three guests, particularly compared to cis-Gly. Again, this supports the large f b value found. Even though all four of these guests can form the carbonyl-amide hydrogen bond, for trans-Gly there is no polar hydrogenhydrogen repulsion between host and guest. Instead, the polar hydrogen points up away on the other side to the amide hydrogens. Table 8.2 shows the closest HH contact, rHH , for all guests. This table shows that rHH is 3.3 for trans-Gly, 2.9 for l-cis-Phe, 2.5 for l-cis-Ala and 2.9 for A A A A cis-Gly. This variation is interesting as the repulsion is smaller for the extreme cases, Phe and Gly, but for dierent reasons. The lack of tipping for cis-Gly indicates

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM N-Ac-l-alanine N-Ac-glycine

185

20 Frequency 15 10 5 0 1

N-Ac-l-phenylalanine

3 4 5 Distance /

Figure 8.13: Histogram of the distances of the carbonylN-benzylamide hydrogen bond for cis-Gly, l-cis-Ala and l-cis-Phe. that it barely attempts to form the carbonylamide hydrogen bond and thus the two hydrogens never draw close. l-cis-Phe is so successful at forming this bond that the guest is suciently tipped over to separate the two hydrogens suciently. l-cis-Ala lies, on average, in an intermediate state and experiences the strongest HH repulsion. Two conclusions may be drawn. Firstly, the absence of the HH repulsion appears to be the reason why trans-Gly easily forms a carbonylamide hydrogen bond, while cis-Gly does not. Secondly, the side chain seems to aect the likelihood of the carbonyl-amide hydrogen bond forming for cis guests, with the larger the side chain, the greater the chance that these hydrogen bonds will form. Possibly, the guest tips for two reasons rstly to form the hydrogen bond, and secondly, to relieve the side chain clash. Such tipping is possible since the side chain lies over the lower, more open benzamide group. Figure 8.13 illustrates this eect graphically. It shows a histogram of the N-benzylamide hydrogen bond distance for these three guests. Clearly, hydrogen bond distances increase as the side chain gets smaller. What may be happening is that the larger side chain lying over the favourable benzamide cleft may be forcing the guest amide oxygen closer to the N-benzylamide unit even if the polar hydrogens are also getting closer. Thus l guests with larger side chains may favour the cis conformation on steric grounds as well as due to the hydrogen bond gained since the carbonyl oxygen is smaller than the methyl group. What is particularly signicant from Figures 8.11 and 8.12 is that none of the

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

186

other six guests are able to form such a carbonylamide hydrogen bond. For the trans structures, this result is consistent with the V binding model, which assumes that the guest oxygen is too distant to be considered. However, the d-cis structures must also have trouble forming these hydrogen bonds. One reason for this is that the nearest amide hydrogen is the lower one. A second reason is that their side chain lies over a high N-benzylamide group and so the guest must move to relieve this bad contact. A third reason is the amide hydrogen repulsion. The V model predicts that such guests will tip backwards, forcing the amide oxygen away from the guest amide group. Three of the trans compounds, l-trans-Phe, l-trans-Ala and to a small extent, d-trans-Ala, appear to be able to form ve or six hydrogen bonds. Figure 8.12 reveals that these are likely to be carboxylateamide hydrogen bonds. Evidently, these guests must be able to simultaneously hydrogen bond to amide hydrogens on each side of the cavity. This suggests two things. Firstly, these three guests must be positioned in a fairly vertical position to achieve this and they must be able to place their side groups over a low aryl group. The V model shows that this is easy to do for the l compounds, while d-trans-Ala must tip sideways or roll to some extent to achieve this. This is consistent with the f b angles being close to 90 . Secondly, the host is likely to be adopting a more narrow structure to make this possible. In a few instances, trans-Gly and l-trans-Phe are even able to form seven hydrogen bonds. The extra one arises when one carboxylate oxygen simultaneously forms two hydrogen bonds to an amide hydrogen pair and one to a thiourea hydrogen, all the other hydrogen bonds remaining the same. Such a bonding pattern is further evidence that the host conformation is placing the thiourea unit closer to one of the amides to make this possible. The inuence of host conformation on binding is discussed later. To conclude this section, it can be seen that the hydrogen bond pattern varies quite considerably with the conformation, stereochemistry and side chain of the guest. To more clearly understand why only certain hydrogen bonds form in each case, a steric analysis provides more clues.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

187

8.4
8.4.1

Steric Analysis.
Extracting Meaningful Steric Information.

Since the main two factors determining the overall binding are hydrogen bonds and steric strain, it would be ideal to do a steric analysis looking at how distances vary between certain atoms in an analogous fashion to the hydrogen bond analysis. However, looking at distances for steric analyses is more dicult than for the hydrogen bonding analysis. Firstly, the particular atom under consideration can clash sterically with many other atoms rather than just a few easily identiable ones. This makes necessary the examination of many distances. Secondly, unlike hydrogen bonds, there is hardly any on or o or degree for steric clash. Rather, there is only o. This is because steric eects cause such great penalties in energy that they simply do not compromise. Thus even if atoms would like to occupy the same space, they never overlap, except only very slightly. In examining structures, the discrepancy in distance between atoms that do not overlap and atoms that do is too marginal to be signicant or detectable. Two atoms pressing up close will look virtually the same as two atoms lying comfortably alongside each other. Thus average distances alone can reveal little. In this section, the wording, steric clash, is used to indicate a close contact which may or may not have a high energy penalty. There are two possible approaches for detecting steric clash. The rst is by measuring energy. The diculty with energy is that steric clashes are ambiguous to interpret. In the case of signicant steric strain, the energy clearly rises. However, dierences in energy between close contacts and moderate contacts are small. These small dierences are dicult to interpret because energy also encompasses many different eects making it hard to extract only the steric ones. An energy analysis is described later in a Section 8.5. The other method to analyse steric clash is distancebased. It uses a particular radial distribution function about the group of interest called the contact radial probability distribution function (CRP). It is a distribution function of contact distances between Lennard-Jones surfaces. Distribution functions

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

188

are useful because they add up many small eects that are unnoticeable individually into something from which dierences can be discerned. In this analysis, CRP distribution functions are constructed to show the degree to which particular groups are sterically hindered, while examining which parts of the host contribute to the contact distances indicates which parts of the host and guest are closely interacting.

8.4.2

Probing the Close Contacts for Dierent Guests.

The main parts of the guest that may be involved in steric clash are the side chains and the methyl group on the amide. These are also the components of the guests that most signicantly dier with stereochemistry and amide conformation. CRP distribution functions are dened as the radial probability that the closest host atom is at a given contact distance from the guest group. They are constructed as follows. The distance, r, between the centre of every host atom and the nearest atom of the particular guest group is calculated for each conguration. Then from this distance is subtracted the geometric average of the Lennard-Jones radii of each of the two atoms involved to give a contact separation, rc . This distance is binned with a weighting of 1/r 2 . No normalisation is applied since they are only used for comparison between dierent guests. This function is expected to rise sharply with increasing distance from zero, although it can be non-zero at negative distances since some slight overlap of atoms does occur. The CRP then decays away at larger distances due to the 1/r 2 weighting and the nite system size. The CRP for the side chain with the rest of the host are shown for the four Ala and Phe guests in Figure 8.14. CRP plots indicate steric clash in two ways. The steepness of the gradient of the CRP at rc around 0 indicates repulsive steric clash. The more A negative is rc for the rst peak, the larger the number of repulsive contacts. This is seen to vary marginally for each guest and may vary suciently to be signicant. The second steric eect that does dier much more between guests is the degree to which the CRP function clusters around zero. If the guest is constrained in a very tight t, then the function will tend to cluster around zero. Short range clustering

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

189

N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-d-cis-alanine N-Ac-d-trans-alanine

N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine

Probability 0

4 rC /

4 rC /

Figure 8.14: The CRP between the host and any atom on the guest side chain. can indicate connement for that part of the guest. However, whether the lack of mobility observed is actually due to this part of the guest cannot be inferred since the guest may be being restrained by an interaction elsewhere in the system. The CRP functions in Figure 8.14 show that the trends in steric clash for the side chain are clear for Ala. The side chains of both d isomers appear to be more sterically strained and conned than the l isomers. This is seen by the more negative rst peak and greater clustering at rc around 0 . There appears to be a clear ordering of d A cis-Ala > d-trans-Ala > l-trans-Ala > l-cis-Ala. Such a trend is exactly what the V binding model predicts for side chains. For Phe, the trends are harder to make out. All aryl side groups experience a similar large degree of steric strain and connement. The trans conformations seem to be experiencing a little more strain and connement, but it is marginal. Similar plots were examined for Gly whose side chains are only hydrogens. For all four hydrogens the distributions were virtually identical, as would be expected, and so were not shown here. The small amount of strain appeared identical for both cis and trans guests. The same CRP functions may also be plotted for the methyl groups on the guest amides. These are shown for the same four Ala and Phe guests in Figure 8.15. For

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

190

N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-d-cis-alanine N-Ac-d-trans-alanine

N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine

Probability 0

4 rC /

4 rC /

Figure 8.15: The CRP between the host and any atom on the guest methyl group. Ala, the CRP functions of the methyl group are more uniform for all guests. The methyl group of l-trans experiences the most steric clash and connement, followed by that of d-cis-Ala. These clashes are curious. The connement of the methyl group for l-trans but not of the methyl group for d-trans and l-cis is exactly what the V-binding model predicts. However, clash for d-cis-Ala is possibly due to tipping sideways or rolling caused by the side chain. This motion brings its methyl group into close contact with some other part of the host. Which part of the host will be revealed in the next Subsection. The story for the methyl group of the Phe guests is quite dierent to that for the side chains. There is now a distinct ordering of connement. The ordering here is l-trans-Phe > l-cis-Phe > d-cis-Phe > d-trans-Phe. The methyl groups for l guests appear to be much more conned, while those for the d guests are a lot more mobile, apparently even more than the d-Ala isomers. What must be happening is as follows. The large clash for l-trans-Phe and the lack of steric clash for d isomers is again predicted by the V model. However, that the clash is smaller for d-Phe than d-Ala is not predicted. The larger side chain for Phe may possibly be forcing the methyl group more into open space. The V model, however, does not predict the large clash for l-cis-Phe. Evidently, the tipping of l-cis-Phe to hydrogen bond must be bringing

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

191

its methyl group into contact with the host somewhere. Since this does not occur for l-cis-Ala, this again appears to be another dierence between Phe and Ala. CRP plots for the methyl group of Gly were found to be the same for cis and trans with little strain for either and so no gures are shown. CRP plots were also made for the oxygen on the amide of the guest. These showed large short range peaks exactly when the carbonylamide hydrogen bonds form as discussed in Section 8.3 while plots for all other guests appeared similar to each other, showing not too much clash. One nal point of interest is to compare the heights of the CRP functions for the side chain and methyl group. This comparison is valid and requires no normalisation because the CRP always accumulates only one closest guest group-host distance. The scale on each of these plots is not shown, but the peaks for the side chains are approximately twice as high as those for the methyl groups for both Ala and Phe, indicating that overall, the side chains are more responsible for steric clash. This is expected, given that side chains, being lower down the guest, are closer to the host. In addition, the side group for Phe is larger than the methyl group.

8.4.3

The Nature of the Close Contacts.

While some conclusions have been drawn from CRP distribution functions about which guests have the most steric strain, it would be more informative to know which parts of the host are responsible for this. In order to achieve this, the closest contact distances between the group of the guest and every atom of the host were averaged over all congurations. The host was divided up into residues in a manner similar to that used in the Monte Carlo moves but now there are 11 in total rather than 9. These are the thiourea, two hydrocarbon chains, four amide, and four aryl segments. The extent of clash between the guest group and the host residue was binned according to distance. Three bins were used. These are rC 0.3 , 0.3 < rC A 0.6 , and A

rC > 0.6 . The results of such an analysis for all guests are presented in Figure 8.16. A This gure shows where the close contacts are occurring using a schematic of the host broken down into its residues. Blue indicates the side chain, green the methyl

CHAPTER 8.
l-cis-Phe
A H 1 B C 1 BO2

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM


l-cis-Ala cis-Gly
B C 2 TH A H 1 B C 1 BO2

192
d-cis-Phe

d-cis-Ala
B C 2 A H 1 B C 1 BO2

AL1 HC1

BO1 B C 2 TH

A H 1 B C 1

AL1 HC1

BO1

AL1 HC1

BO1

AL1 HC1

BO1 B C 2 TH

TH

A H 1 B C 1

AL1 HC1

BO1 B C 2 TH

HC2

AL2

A H 2

g
BO2

HC2

AL2

A H 2

HC2

AL2

A H 2

HC2

AL2

A H 2

g
BO2

HC2

AL2

A H 2

l-trans-Phe
A H 1 B C 1 BO2

l-trans-Ala
B C 2 A H 1 B C 1 BO2

trans-Gly
B C 2 A H 1 B C 1 BO2

d-trans-Ala
B C 2 A H 1 B C 1 BO2

d-trans-Phe
B C 2 A H 1 B C 1 BO2

AL1 HC1

BO1

AL1 HC1

BO1

AL1 HC1

BO1

AL1 HC1

BO1

AL1 HC1

BO1 B C 2 TH

TH

TH

TH

TH

HC2

AL2

A H 2

HC2

AL2

A H 2

HC2

AL2

A H 2

HC2

AL2

A H 2

HC2

AL2

A H 2

TH Thiourea AL Benzamide Amide (Lower) HC Hydrocarbon BC N-Benzylamide Aryl (Closed) AH N-Benzylamide Amide (Higher) BO Benzamide Aryl (Open) (1 denotes on the side of the guest amide hydrogen, and 2 the other)

Figure 8.16: Close contacts with the host for all ten guests. A close contact is indicated for the side chain by blue, for the methyl group by green, and for the oxygen by red. group and red the oxygen. The darker shade of colour is used to indicate the stronger clash, while grey indicates that there is no particular clash. In these structures, the guest is always oriented with its polar amide hydrogen facing the corner of the host containing the residue abbreviations suxed by 1. In order to emphasise the properties of the dierent amide and aryl residues, N-benzylamide amide and aryl units are respectively referred to as higher and closed while benzamide amide and aryl units are termed lower and open. There is a lot of information concerning the modes of binding in this gure. As mentioned before, care must be taken in interpreting it since a close distance does not necessarily indicate steric clash. It is not only the close contacts that are of interest, but also the lack of them. First of all a few general trends can safely be given. The guest amide and side groups generally interact sterically on opposite sides of the host as would be expected. Another clear trend is that the larger the side group, the greater the number of close contacts, again as expected. This indicates that smaller guests are also more mobile.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

193

Little can be deduced from the data for Gly except for the trans hydrogen bond, the guests high mobility and the slight preference for the end opposite to the guest polar hydrogen. To understand the binding for the other guests, applying the Vbinding model may prove of use. It is evident that the observed binding pattern is somewhat more complex than that predicted by the simple V model. Starting with the blue side chains, the steric contacts for all four Ala guests are exactly where the V-binding model predicts and the clash for d is greater than that for l. For the red oxygen atoms in the cis-Ala guests, there is clear close contact around side 1 due to carbonylamide hydrogen bond formation for l-cisAla, while there is no sign of any clash for the oxygen in d-cisAla. For the green methyl groups of the trans-Ala guests, the prediction of a clash for l-trans-Ala with the high amide, AH1, occurs, while the absence of a clash for d-trans-Ala is again predicted correctly. What is also evident from these diagrams is some degree of clash due to the group at the top end of the amide, indicating some degree of tipping or rolling. This is seen in the clash of the methyl group with BC2 for l-cis-Ala and the clash with BO2 and HC2 for d-cis-Ala, while the oxygen in d-trans-Ala clashes slightly with BC1. Now consider Phe. All four guests experience a large degree of clash between their blue side chains and the residues around cleft 2. However, this time, the clash for d guests appears to be reduced compared to that for l-Phe guests. The side chains for l-Phe guests lie near to 4 or 5 residues simultaneously, suggesting that the side chain may lie close to all of them in a close t, while the d-Phe guests appear to only touch the host at a few points, This indicates that d-Phe guests are unable to approach as closely to the host as l-Phe guests. For the red oxygen, again the carbonylamide hydrogen bond is seen for l-cis-Phe. Not only is there no such hydrogen bond for d-cis-Phe but surprisingly, the oxygen clashes instead with BC1. This suggests quite a strong degree of tipping of the whole guest to side 2. The methyl group of the ltrans-Phe guest may be seen lying quite close to side 1, particularly to the favourable BO1 residue, while in d-trans-Phe, it only draws marginally near to the unfavourable AH1. This indicates that d-trans-Phe tips, while l-trans-Phe does not.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

194

In summary, the observations from the steric analysis have agreed with the f b and hydrogen bond analyses and it has shed more light on the position of each guest and how it ts into the cavity. At this point, it appears as though there are four general possibilities. There are the two mobile Gly modes, the two forwards tipping l-cis modes with carbonylamide hydrogen bonds, the two level l-trans modes, and the four d backwards tipping modes. The V model was able to account for most of these trends, with the exception of the nature of the secondary relief modes.

8.5
8.5.1

Energy Analysis.
Energy Components.

Energies indicate much about the relative stabilities of dierent complexes. The diculty is that energies encompass many dierent eects. These include intermolecular interactions such as hydrogen bonds, interactions and steric eects, and host and guest strain. The coupling of all these contributions makes it rather dicult to deduce particular physical phenomena from energies. The individual force eld component energies as given in Eq. 2.1 are of some use in small systems or when dierences are obvious and unambiguous such as the torsional prole in the ethane guest. However, in complex systems dierences become many, subtle and distributed over many contributions. It becomes almost impossible let alone meaningless to assign individual terms to a particular eect. Nevertheless, the force eld components when grouped into types can still serve some use, particularly for conrming any suspected physical eects. The total energy, E, and a number of its components are listed in Table 8.3 for each hostguest complex. The components listed are Eint , the combined host and guest internal energies, Exx , the host guest interaction energy and Esx , the solvation energy of the whole complex. These three terms combined give E. Edih , the dihedral contribution to Eint is given. Exx is broken down into two further components, ELJ , the Lennard-Jones energy, and ECoul , the electrostatic energy. Finally, Epol , the

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

195

Table 8.3: Absolute Total Energy and Components (kcal mol1 ) for the Ten Guests. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine E -193 -195 -191 -193 -191 -194 -188 -190 -187 -190 Eint -51 -48 -53 -55 -61 -60 -58 -55 -51 -55 Exx -85 -91 -82 -81 -71 -74 -70 -76 -77 -73 Esx -55 -55 -55 -56 -58 -59 -59 -57 -58 -61 Edih ELJ ECoul 20 -19 -66 26 -21 -69 26 -16 -65 25 -16 -64 24 -13 -58 24 -13 -61 22 -15 -55 24 -15 -61 23 -17 -59 18 -17 -56 Epol -49 -49 -49 -50 -52 -53 -53 -52 -52 -54

polarisation term of Esx is also given. Absolute energies are rather large and it is not so easy to spot trends for such numbers. Therefore Table 8.4 is also included. It gives the relative energies with the most negative energy component becoming the energy zero. Note that now the energy components are zeroed, components will not add up to the total energy. An extra decimal point is also included for precision purposes.

8.5.2

Interpretation of the Energies.

Before discussing the energies, it is important to remember that they are not free energies which include entropy. A lower energy value does not necessarily imply that this guest is the strongest binder. It is the free energy data in chloroform and the host that provide this. Table 8.4: Relative Total Energy and Components (kcal mol1 ) for the Ten Guests. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine E 1.7 0.0 3.8 1.9 4.2 0.5 6.4 4.3 7.9 4.5 Eint 9.5 12.8 7.5 5.7 0.0 0.7 2.6 5.2 9.5 5.9 Exx 5.3 0.0 8.9 9.8 19.6 16.3 20.8 14.1 13.9 17.6 Esx 6.0 6.3 6.5 5.5 3.6 2.5 2.0 3.9 3.5 0.0 Edih 1.9 8.0 7.5 6.3 5.4 6.3 3.4 5.3 5.2 0.0 ELJ ECoul 1.9 3.5 0.0 0.0 4.9 4.1 4.5 5.4 8.3 11.4 8.1 8.3 6.4 14.5 6.2 8.0 3.8 10.2 4.0 13.7 Epol 5.7 5.9 5.8 4.6 2.5 2.0 1.1 2.9 2.6 0.0

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

196

The total energy, E, reveals two interesting facts. Firstly, the energies of all the trans conformers lie lower in energy than the respective cis conformers. This observation probably reects the fact that the trans conformers have a lower internal energy than the cis conformers. Even though the free energy calculations predict the cis conformation to be more stable for l-Phe, l-Ala and d-Phe, the origin of this stabilisation does not appear to come from the energies alone. Secondly, within each conformation, there is the general trend of rising energy going from lPhe to d-Phe, with trans-Gly the only exception. This trend indicates a general greater stability of l over d. When looking at the three components to E, however, the story is much more complex. A few general rules can be extracted nevertheless from each component. The larger the guest, the larger Eint . Thus the host and guest geometries are more perturbed for larger guests, as might be expected. Most of the cause for this trend interestingly lies in Edih . A large Eint must be evidence of conformational strain and possibly dierent conformations. There is one notable exception in this case and this is l-cis-Phe, which has a small Edih term. The variation in Exx is much more dramatic. l guests clearly have a more negative Exx , indicating a favourable binding contact. d-cis-Ala appears to suer a particular unfavourable binding energy. Going down the table, Exx rises but becomes more negative again for larger d compounds. An explanation for this is made clearer by looking at its two components, ELJ and Ecoul . Going down the table, ELJ rises as the guest size decreases, reecting the smaller number of interactions. ELJ becomes more negative again for the larger d guests, but does not recover to the same value for large l due to worse contacts. The variation in the Coulomb term appears to largely correlate with the number of hydrogen bonds observed earlier in Figure 8.11. The third component, Esx , varies the least between guests. It appears to run in the reverse order, with l-Phe guests the least stable and d-Phe the most. This may be rationalised by remembering that d complexes tend to bind less tightly, giving them a larger size and a more negative Esx term. Interestingly, though, the cause

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

197

of this trend lies not so much in the cavity term but in the polarisation term, Epol . Thus the origins of this trend must be more electrostatic in nature than dispersion and cavitation-based. More distantly spaced atoms are able to polarise the solvent to a greater degree, leading to a more negative Esx term. The surface area term, which is not shown, barely changes. The only trend it shows is that it is more negative for larger guests, as would be expected. In summary, the energy analysis has shown that trans complexes have lower energies than cis complexes and l complexes have a lower energy than d. l guests appear to achieve this principally through a strong host-guest interaction partially cancelled out by some intramoleculer strain and a smaller solvation term. Such intramolecular strain, of which the main component is the dihedral term, may be indicative of conformational rearrangement.

8.6
8.6.1

Conformational Analysis.
Variation of Host Shape With Dierent Guests.

The nal piece in the puzzle for understanding the binding comes from a conformational analysis of the host. There have been a number of indications that the host structure varies for dierent guests. Indeed, the proper sampling of the host was found to play a major role in obtaining consistent free energies. Therefore it is worthwhile examining the host structure to see what role it plays in binding. Table 8.5 shows how various geometric properties of the host vary when dierent guests are bound to it. The exact denitions of these parameters are given in Figure 8.17. Most of these dierences may be understood by examining the exible parts of the host. The main dihedrals of interest that bring about dierent conformations are those in the exible hydrocarbon chain. However, other dihedrals that still aect binding include the aryl dihedrals of the host and the swing dihedral of the aryl group in Phe.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

198

Table 8.5: The Values of Various Geometric Properties For Each Host-Guest Complex. These Properties are Dened in Figure 8.17. Distances Are in and Angles in A Degrees. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine rJC 10.6 9.4 10.1 9.6 10.2 10.0 10.4 10.0 10.9 11.0 AC JC 84 91 74 100 79 93 75 95 80 90 78 87 81 84 78 92 84 90 86 88 rHC1 6.2 5.6 5.9 5.7 6.0 5.9 6.1 5.9 6.2 6.1 rHC2 5.9 5.5 5.8 5.9 6.1 6.6 6.5 6.0 6.3 6.4 rBC1 7.8 7.6 7.6 8.0 8.0 8.4 8.3 8.3 8.2 8.0 rBC2 8.1 7.6 8.1 7.7 7.8 7.8 8.1 7.7 8.2 8.3 rd 4.5 4.6 5.1 5.4 5.4 6.4 6.3 5.5 4.7 4.8

8.6.2

Hydrocarbon Chain Conformation.

The starting point for such an analysis is to examine the number of conformations for each guest arising from the host hydrocarbon chain. For the 10100 congurations saved, 92 unique conformations were counted for all twelve dihedrals for all guests. A conformation was dened similarly to before by dividing the dihedral angle into three sections, gauche+, trans or gauche-. If a conformation consisted of a unique combination of these three speciers, then it was taken as new conformation. This is marginally less than the 97 unique conformations observed for the free host out of 300 structures. It is not the fairest comparison to compare the total number of unique conformations since for the host-guest complexes they are being selected from 1000 structures while for the host from only 300. Since it is likely that many more conformations will be found for the free host from 1000 structures, the general indications are that the total number of unique conformations is greater for the free host. When a guest binds, it has to organise the host, especially if it is to open up the cavity and bind inside, and may therefore limit the number of unique conformations available to the host. Such a reduction has to be compensated for by the gaining of a favourable energetic host-guest interaction. While only 100 structures would not be expected to provide all the conformations

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

199

JC

T o s d T d  d AC d d d d HC1 d A B d d g H C d rJC dg 1 2 rHC1 d d g d rBC1 rBC2 d o w TH d d d r g d HC2 A B gd d C H g d d 1 2 d HC2 d d  AC d  d d d d d c c w BO2 AL2

AL1

BO1

r Tr

BO2,AL2

rd
c

r r

AH2,BC2

HC2 TH

JC

Figure 8.17: The denition of the distances and angles given in Table 8.5 illustrated for a schematic of macrobicycle 12. possible, it should still nd the dominant ones. To give a preliminary idea of the dierences in conformation for each guest, the number of unique conformations generated in the simulations for each guest is given in Table 8.6. It can be seen that the number of hydrocarbon conformations the host can access is larger for smaller guests and smaller for larger guests, as would be expected. It is possible that this dierence is an artefact due to poorer sampling for larger guests. A comparison with annealed structures can suggest which is the case. The number of unique conformations generated from simulated annealing for each guest is also given in Table 8.6, together with the total number of structures generated for that guest. The trends appear to be the same for both annealed and simulation conformations, indicating that the guests restricting the host is a real physical eect and not a result of poor sampling. The other main trend in Table 8.6 is that there appear to be fewer conformations for the l guests than for d. This may be a consequence of a stronger binding interaction for l guests. 92 conformations are still too many to analyse individually. Most conformations only occur a few times, although some are more prevalent and these warrant closer analysis. Before doing so, more clues can be obtained concerning which are the important conformations from the dihedral distributions. They also can reveal dierences

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

200

Table 8.6: Number of Unique Hydrocarbon Conformations For Each Host-Guest Complex From Annealed Structures and Simulation. Guest N-Ac-l-cis-phenylalanine N-Ac-l-trans-phenylalanine N-Ac-l-cis-alanine N-Ac-l-trans-alanine N-Ac-cis-glycine N-Ac-trans-glycine N-Ac-d-cis-alanine N-Ac-d-trans-alanine N-Ac-d-cis-phenylalanine N-Ac-d-trans-phenylalanine Number of Unique Conformations Annealed (Total) Simulation 7 (8) 12 19 (28) 8 22 (31) 16 20 (30) 18 24 (32) 22 29 (40) 32 32 (47) 16 28 (34) 14 26 (39) 17 17 (20) 16

for each guest. Indeed, some very interesting dierences were observed. Illustrated in Figures 8.18 and 8.19 are the dihedral distributions of the hydrocarbon chain for l-cis-Phe and l-trans-Phe. The sampling compared to the host alone (see Fig-

ure 6.10) can be seen to be somewhat reduced. Almost all of the dihedrals appear to be rather more restricted, especially for the larger Phe guests. Despite the restricted sampling, it can be seen that there are a number of dierences between each distribution. While eight of the dihedrals (unshaded) remain mostly unchanged for each guest, the four dihedrals that are shaded dier quite dramatically between the two guests. The predominant subconformation is gttg for l-cis-Phe and tggt for l-trans-Phe. An analysis of the dihedral distributions for all other guests indicated that their dominant subconformation resembled either the subconformation of l-cis-Phe, the subconformation of l-trans-Phe, or something in between. The guests similar to lcis-Phe were d-trans-Phe, d-cis-Phe, d-cis-Ala. The guest similar to l-trans-Phe was l-trans-Ala. All the other guests showed properties common to both. These are cis-Gly, trans-Gly, d-trans-Ala, and l-cis-Ala. A sample dihedral distribution for one of these, trans-Gly is given in Figure 8.20. It can be seen that the shaded dihedrals lie in an intermediate state between the two extremes. What is also evident is the greater exibility in the other dihedrals. This must be a consequence of the

CHAPTER 8.
N

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

201

dihedral distribution (x10 configurations)

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

C C

H H

H H

C C

C C

H H

H H

H H

C C

N N

H H

H H

H H

O
H H

90

180

270

360

S S

C C

N N

Du Du H H

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

H H

C C

H H

H H

C C

H H

C C

H H

H H H H H H

C C

90

dihedral / degrees

180

270

360

Figure 8.18: The dihedral distribution for all the hydrocarbon dihedrals in the host l-cis-Phe complex. smaller guest. Indeed, greater sampling was also found for all the other guests with intermediate subconformation behaviour. On face value, this dierent subconformation preference for each guest seems rather inexplicable. However, by examining the subconformations more closely, a number of patterns similar to those found in previous analyses may suggest possible reasons.

CHAPTER 8.
N

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

202

dihedral distribution (x10 configurations)

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

C C

H H

H H

C C

C C

H H

H H

H H

C C

N N

H H

H H

H H

O
H H

90

180

270

360

S S

C C

N N

Du Du H H

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

H H

C C

H H

H H

C C

H H

C C

H H

H H H H H H

C C

90

dihedral / degrees

180

270

360

Figure 8.19: The dihedral distribution for all the hydrocarbon dihedrals in the host l-trans-Phe complex.

8.6.3

Dominant Hydrocarbon Subconformations.

When only these four dihedrals making up the subconformation are considered, only 26 subconformations are found for all guests. Of these, only six are signicantly populated and common to more than one guest. These are listed in Table 8.7 together with the individual guest populations. The dominant subconformations appear to dier by pairs. For example, the rst conformation, gttg, and the second conformation, gtgt dier in the third and fourth dihedrals. This pairing reects the fact that the restraint of chain closure usually requires two dihedrals to adjust in synchronisation.

CHAPTER 8.
N

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

203

dihedral distribution (x10 configurations)

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

C C

H H

H H

C C

C C

H H

H H

H H

C C

N N

H H

H H

H H

O
H H

90

180

270

360

S S

C C

N N

Du Du H H

8 4 0 8 4 0 8 4 0 8 4 0 8 4 0 8 4 0

H H

C C

H H

H H

C C

H H

C C

H H

H H H H H H

C C

90

dihedral / degrees

180

270

360

Figure 8.20: The dihedral distribution for all the hydrocarbon dihedrals in the host trans-Gly complex. The main subconformations appear to be symmetric between the two hydrocarbon chains, while the others are asymmetric. Note that the order in which the guests are listed is deliberately set to be the same as that observed in the trend for the dihedral angle distributions. Any discrepancy between the total number of subconformations and the total, 100, is due to the occurrence of other subconformations not listed here. As observed earlier, two particular subconformations are predominant. The gttg subconformation dominates for the guests at the l-cis-Phe end of the table, while the tggt subconformation on the other hand is the most common for guests at the l-

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

204

Table 8.7: The Populations of Each Host Sub-Conformation For Each Guest.
Conformation Total d-Phe l-Phe d-Ala d-Phe trans cis cis cis Gly Gly d-Ala l-Ala l-Ala l-Phe cis trans trans cis trans trans

g- t t g- 363 g- t g- t 66 t g- g- t 168 t g- t g64 g- g+ t g75 g- t g+ g53 % of total 77

92 0 0 0 2 1 95

65 28 0 0 3 4 100

64 0 0 12 3 2 81

51 2 0 0 4 11 68

27 12 11 14 19 13 96

5 0 1 13 13 1 33

17 10 7 6 17 6 72

34 1 42 4 10 3 94

8 8 43 4 4 5 72

0 5 64 11 0 7 87

trans-Phe end. In between, there is a cross-over region in which other intermediate subconformations also occur. This diverse sampling is particularly the case for transGly which can also access to a large extent a number of other subconformations not listed here. Even though d-trans-Phe appears to be even more extreme than l-cisPhe with 92 % of it adopting the rst subconformation compared to 65 %, it was not truly representative of conformational class since the full conformation for the whole hydrocarbon chain diered to quite an extent for the two d-Phe guests, compared to the other eight. By looking back at the original full conformations, it is possible to extract some information concerning the likelihood of these subconformations occurring. Two full conformations stand out from the total of 1000. One of these occurs 136 times. This one was found to contain the gttg subconformation. Another conformation occurred 156 times. This one was found to contain the tggt subconformation. Given that the tggt subconformation only occurs 168 times in total for all full conformations, it suggests that there appears to be only one main way that the other eight dihedrals can arrange themselves to form this subconformation, suggesting that this is a rather extreme structure. On the other hand, the gttg subconformation occurs 363 times in total. Thus there must be many more ways the other dihedrals can arrange themselves for this subconformation. Hence this subconformation is likely to be easier to form. These two dominant subconformations can go a long way to explaining the geometric dierences in host structure. This is done by comparing the geometric proper-

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

205

Table 8.8: The Values of Various Geometric Properties For Each Subconformation Averaged For Each Host-Guest Complex. These properties are dened in Figure 8.17. Distances are in and Angles in Degrees. A Subconformation g- t t gg- t g- t t g- g- t t g- t gg- g+ t gg- t g+ grJC AC 10.6 82 10.1 79 9.5 74 10.1 79 10.2 80 10.3 80 JC 87 92 96 89 90 89 rHC1 6.3 6.2 5.8 5.7 5.7 6.4 rHC2 6.3 5.6 5.6 6.3 6.2 5.9 rBC2 8.2 7.9 7.7 7.9 7.8 8.1 rBC2 8.2 7.9 7.7 8.0 8.0 7.9 rd 5.6 5.3 5.4 4.5 5.3 5.6

ties averaged for each conformation with the respective properties averaged for each guest in Table 8.5. The geometric properties concerned with the hydrocarbon chain are listed in Table 8.8. The main dierence between the conformations is their width. The average junction carbon to junction carbon distance is 10.6 for gttg but only A 9.5 for tggt. Therefore, from this point on, the gttg subconformation will be A referred to as the wide subconformation, while tggt will be referred to as the narrow one. The basic reason why the narrow subconformation is narrow is because the rst dihedral of the subconformation points the hydrocarbon chain away from the other junction carbon. However, for the wide subconformation, this dihedral points directly towards the other junction carbon. There are a number of additional features in the host structure that dierentiate between wide and narrow conformations. The two angles that describe the shape of the cavity are AC and JC are dened in Figure 8.17. For the narrow cavity, AC = 82 and JC = 87 on average over all guests. For the wide cavity, AC = 79 and JC = 92 . The other dierence is the length of the hydrocarbon chain segments, given by the distances, rHC1 and rHC2 . For the wide cavity, rHC1 = rHC2 = 6.3 , while the narrow cavity has much shorter hydrocarbon A A A chains with rHC1 = 5.8 and rHC2 = 5.6 . Another feature of the host shape that diers between guests is the deep depth of the cavity, rd This distance is dened in Figure 8.17 and their average values are listed in Table 8.5 for each guest. The shallower cavity depth was not found to vary to any great extent, ranging from 2.9 to 3.7 . Variation in this depth depends on A

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

206

the hydrocarbon chain conformation. The rd value varies a lot more, from 4.6 to 6.3 . This depth appears to be controlled by the degree to which the two aryl-amide A sides lift up. This lifting up appears to be caused by a narrowing of the AC and JC angles. An examination of Table 8.5 indicates that rd also appears to be inuenced by the length of the two N-benzylamide units, rBC1 and rBC2 , which make up opposing sides of the main host ring. Denitions and values of these distances are also given in Figure 8.17 and Table 8.5. On average, the distances lie around 8 but they A can uctuate by up to 0.4 in either direction. The complexes with deep cavities A occur for trans-Gly and d-cis-Ala. In these instances, rBC1 increases to 8.38.4 . A However, the role of rBC1 and rBC2 appears to be quite variable. It may also increase to 8.28.3 to enlarge rJC , the cavity width, as seen for d-cis-Phe and d-trans-Phe. A Alternatively, the distances may shorten to 7.6 to make the cavity more narrow, A as occurs for l-trans-Phe, l-trans-Ala and l-cis-Ala. Another interesting feature concerns the position of the thiourea unit. Any shift of it from the centre may be revealed by a dierence between rHC1 and rHC2 . This shift to one end of the cavity would make it easier for the guest to tip and make its carboxylate oxygens simultaneously hydrogen bond to both pairs of thiourea and host amide hydrogens. It is dicult to see any trends in thiourea position for the predominant subconformations since they are averaged over so many dierent guests. However, Table 8.5 shows clearer trends for each guest. A lengthening in rHC1 occurs for l-cis-Phe and lengthening occurs for trans-Gly, d-cis-Ala and the two d-Phe guests. In other words, the hydrocarbon chains are becoming more trans-like. This behaviour is in agreement with the presence of a wider or deeper cavity. Alternatively, these two distances are seen to contract if the cavity becomes narrower, as seen for the two l-trans guests. In general, the two benzamide aryl rings lie open, while the two N-benzylamide aryl rings lie closed, as discussed earlier. However, in a few instances this was found to change slightly. For Gly, all four aryl rings often turn over to closed, almost trapping

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

207

the guest inside. The small size of the guest must be behind this feature. The other deviation occurrs for the d-Phe guests. The BC2 aryl group was often found to lie open, while the connecting BO1 aryl group lay closed. This appears to be a result of the large phenyl side group pushing against the BC2 group. Evidently, the backward tipping may be sucient to reduce the bad contact for d-Ala guests, but for d-Phe guests, the phenyl group can still not sit comfortably. In summary, it appears that l-trans-Phe and l-trans-Ala prefer narrow cavities, d-trans-Phe, d-cis-Phe, l-cis-Phe, and d-cis-Ala prefer wide cavities, while the remaining, smaller guests can t in a cavity of either size. Preferences are intimately tied in with their f b angles. Guests with f b close to 90 prefer narrow cavities, while guests which tip over prefer wider cavities. What this conformational analysis has shown is the structural variation in the host for dierent binding modes. Even within a given binding mode, there are still small dierences between each complex in order to optimise the overall host-guest interaction.

8.6.4

Phenyl Ring Conformation of the Guest.

One nal dihedral of interest is the swing dihedral of the Phe guests. This is shown in Figure 8.4. The position of the phenyl group is important in binding to the host for two reasons. Firstly, the phenyl group must be placed into space, while secondly, it may be able to form favourable interactions with the aryl groups of the host ring. Specically, this dihedral is dened with respect to the carboxylate carbon rather than the nitrogen. There are three possible conformations available for this dihedral, g, t, and g+. These are shown in Figure 8.21. The t conformation has the phenyl ring roughly parallel to the long axis of the guest, g+ points away in the plane of the guest, while g points perpendicularly away from the plane of the guest. The preferred conformation in chloroform is found to be g+. However, there are indications that the relative energies of each conformation depend on the amide conformation. Figure 8.22 shows the total force eld energy for a dihedral drive (see Subsection 3.3.2 for details) around the swing dihedral when the amide conformation is trans and when

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM


H

208

C C C

H H H H

H H

C
H

C C
N
H H H

C C

C
H

C
H

C C C
H

C C
H

C C C N
H

C N C C
H

O
H

C C

HH

C C
H

C C
H

C
O
H

C
H

C
O O

C
O

C
O O

g+

Figure 8.21: The three conformations for the swing dihedral of Phe. it is cis. The preference for g+ swing trans amide is clear. However, what is especially interesting is that in the cis amide conformation, the t swing dihedral is not only the most stable conformation of the three but also comparable to the t swing conformation of the trans amide. Table 8.9 shows how the populations of the three conformations for each hostguest complex. There appears to be remarkable variety in the conformation adopted. A combination of the internal energy and steric interactions with the host play a role in determining which conformations occur. Illustrations of how each Phe guest binds in the host are given in Figure 8.23. The d guests adopts the t conformation almost exclusively. This must be a result of the backward tipping that draws the side chain deeper into the cavity. This leaves the phenyl nowhere to go but straight up.
15
1

t cis ( ) trans ( )

g+

Energy / kcal mol

10

60

120 180 240 300 Dihedral Angle / degrees

360

Figure 8.22: The dihedral angle energy proles for the swing dihedral in Phe for cis and trans conformations, calculated using force eld energies.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

209

Table 8.9: The Distribution for the Guest CCCC Dihedral Angle Between the Three Conformations For the Four Phe guests. Conformation g+ t g l-cis-Phe l-trans-Phe 18 66 5 34 77 0 d-cis-Phe 0 99 1 d-trans-Phe 0 99 1

l-trans-Phe prefers to adopt the g+ conformation, the most stable in chloroform. In doing this it places its aryl group over BO2, picking up favourable interactions with it and the adjacent BC1. However, there is still some inclination for it go t. The l-cis-Phe, on the other hand, adopts the g conformation, although it is also able to access the other two. The forward tipping must raise the phenyl group out of the host to a small degree, making all three conformations accessible. The preferred conformation is interesting, given the higher energy for the g conformation. This threefold partitioning of conformations is slightly misleading for l-cis-Phe since the g conformation is not well dened, as seen in Figure 8.22. The average dihedral angle comes out to be 247 , on the edge of the shoulder. Like l-trans-Phe, l-cis-Phe places its phenyl group over BO2, gaining favourable interactions with it and BC1. The major result that may be drawn from the preferred dihedral conformation is that l-cis-Phe, d-cis-Phe and d-trans-Phe all adopt higher energy conformations. This stabilisation of the swing dihedral is important in itself. However, the even greater implication is that for these higher energy swing conformations, the relative energy of cis and trans amide conformations is greatly reduced compared to that in the lowest energy g+ conformation, as indicated in the dihedral prole Figure 8.22. Thus for Phe, the cis amide conformation is stabilised internally as a result of a conformational change in the phenyl group induced by a remote steric interaction with the host. One interesting feature about this stabilisation is that the high energy conformation for the swing dihedral is dierent between l and d Phe. The dihedral energy proles in Figure 8.22 indicate that the stabilisation is slightly greater for d. This

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

210

N-Ac-l-cis-phenylalanine

N-Ac-l-trans-phenylalanine

N-Ac-d-cis-phenylalanine

N-Ac-d-trans-phenylalanine

Figure 8.23: Particular structures for the four Phe guests, highlighting the dierent swing conformations. l-cis-Phe is g, l-trans-Phe is g+, while both d-Phe guests are t. dierence may provide a means to selectively stabilise only one stereoisomer. Since this dihedral prole is only a force eld energy prole, it may suer from some inaccuracy for higher energy conformations. This is because force elds are not perfect and usually give greater priority to getting the low energy conformations correct. Therefore, ab initio calculations need to be carried out on all the conformations to better quantify this eect.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

211

N-Ac-l-cis-phenylalanine (Motif 1)

N-Ac-l-trans-phenylalanine (Motif 2)

N-Ac-d-cis-phenylalanine (Motif 3)

N-Ac-trans-glycine (Motif 4)

Figure 8.24: The four main binding motifs of the guest in the host shown for representative molecules.

8.7
8.7.1

Rationalisation of Free Energies.


Binding Motifs Observed in The Simulations.

Given the large number of binding observations made, a summary of them all is now given in Table 8.10. Four main binding motifs have been observed for the ten guests observed in the simulations. These are illustrated in Figure 8.24. The dierent

motifs that each guest adopt are shown in Table 8.11. Here is a summary of the main characteristics of each motif.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

212

Table 8.10: Summary of the Main Binding Properties For Each Guest. The Presence of a Space Indicates no Special Behaviour.
l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis d-Ala trans d-Phe cis d-Phe trans

Tipping (V = vertical, F = Forward, B = Backward): F V F V V,B B B B B


l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis d-Ala trans d-Phe cis

B
d-Phe trans

CarbonylAmide Hydrogen Bond: * * * * CarboxylateAmide Hydrogen Bond, BOTH SIDES: * * * * Polar Hydrogen Repulsion: * * *
l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis d-Ala trans d-Phe cis d-Phe trans

Side Group Clash: * * * * Methyl Group Clash: * * * * Oxygen Clash (Including Hydrogen Bonds): * * *
l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis

*
d-Ala trans d-Phe cis

*
d-Phe trans
a

Internal Energy (L = Low, H = High): H H L L L Binding Energy: L L L L H H H Solvation Energy: H H H L L


l-Phe cis l-Phe trans l-Ala cis l-Phe trans Gly cis Gly trans d-Ala cis d-Ala trans

H H L
d-Phe cis d-Phe trans

Unusual Cavity Width (W = Wide, N = Narrow): N N W Unusual Cavity Depth (D = Deep, S = Shallow): S S D D S Number of Host Conformations (F = Few, M = Many): F F M M Shifting of Thiourea: * * * * Swing Conformation: g g+ t
a

W S

* t

Low energies are the most negative.

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

213

Table 8.11: The Number of Times a Given Binding Motif Occurs For Each Guest. The Dominant Motif is Shown in Bold. Motif 1 2 3 4 l-Phe cis trans 94 6 86 14 l-Ala cis trans 60 35 71 5 29 Gly cis trans 7 25 68 28 72 d-Ala cis trans 27 73 d-Phe cis trans

100

100

100

Motif 1 has the guest tipping forwards moderately with the carbonyl oxygen hydrogen bonding to one or possibly two amide hydrogens of the host. One oxygen of the carboxylate forms a double hydrogen bond to the thiourea, while the other forms a double hydrogen bond to the amide hydrogens on the opposite side to the polar hydrogen of the guest. The side chain typically lies over the benzamide. Such a bonding pattern is indicative of l-cis guests. Motif 2 has the both the carboxylate oxygens hydrogen bonding to at least one amide hydrogen on either side of the cavity, and usually one or two thiourea hydrogens. Guests in this motif are fairly upright and are unable to form carbonylamide hydrogen bonds but are able to place their side chains over the benzamide unit so as to prevent tipping. Such a bonding pattern is found for l-trans guests. Motif 3 has the guest tipping backwards such that the side chain descends partially into the cavity. One carboxylate oxygen forms a double hydrogen bond to the thiourea while the other forms a double hydrogen bond to the amide group on the opposite side to the guest polar hydrogen. This motif is accessible to most guests but it is characteristic of those unable to hydrogen bond to the host amides and that are sterically conned. This is particularly so for the d isomers. Motif 4 has the guest tipped backwards completely on its side with the carbonyl oxygen forming one or two hydrogen bonds to the amide hydrogens on the opposite side of the cavity to the guest polar hydrogen. This motif appears to

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

214

have fairly stringent requirements that only trans-Gly can satisfy. These motifs are fairly well-dened and rarely overlap. Any borderline cases, of which there were only one or two, were assigned to Motif 3. It can be seen that each guest has a preferred binding motif, but some can adopt a second or third binding motif. This is the particularly the case for l guests. That some guests adopt more than one motif demonstrates the nely balanced energetics. Indeed, with the exception of l-cis-Phe, Motif 3 appears accessible to all guests, suggesting that this mode is rather non-selective. It is the presence of the other lower energy modes that brings about the host selectivity. The table shows the general trend, going from right to left, that as the d side chain gets smaller and the l side chain gets larger, the preferred motif goes from 3 to 1. This is primarily due to the dierent heights of the host aryl units at each end of the cavity, particularly the ones near the side chain. The anomalous guest is Gly, which lacks a side chain. This allows trans-Gly to adopt Motif 4 all by itself, while cis-Gly surprisingly adopts Motif 3 most of the time rather than Motif 1 with the additional hydrogen bonds. This fact raises the suspicion that the presence of a correctly placed side chain is critical in stabilising the cis conformation. Alternatively, it is the ability of the host to hold the guest in the correct orientation using the other hydrogen bonds and careful side group positioning that forces the guest amide to be more stable in the cis conformation. From this anaylsis, it appears fairly clear that guests that bind in Motif 3 are the worst binders. This would explain the stereoselectivity of the host. On the question of cis amide stabilisation, the V model, which has been shown to prove fairly useful in interpreting the results, suggests that the cis conformation is more stable in the host than the trans. Both l-trans and l-cis structures appear to have a number of preferable interactions such as many hydrogen bonds and minimal steric clash, indicating that these conformations may be comparable in stability to each other. However, a telling feature as to which is more stable is in the motifs that l-trans-Ala adopts. Table 8.11 reveals that l-cis-Ala is able to sample both Motifs 1 and 2, yet it prefers 1. Assuming that l-trans-Ala and l-cis-Ala have similar properties in

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

215

Motif 2, it must be concluded that Motif 1 is more stable than Motif 2. In other words, the cis conformation is more stable than trans. The assumption made for this is reasonable because Motif 2 is characterised by both ends of the amide group being well-placed in space. On the question of whether trans-Gly or cis-Gly is more stable, trans-Gly would appear to be more stable since it prefers to adopt Motif 4 with up to six hydrogen bonds. However, the fact that cis-Gly chooses to form usually no more than four hydrogen bonds suggests that there is something to be gained from being mobile and possibly allowing the host to be exible too. On the question of selectivity between dierent amino acid derivatives, it is difcult to predict which binds better, the d guests aside. Larger guests would be expected to have stronger binding energies, but at the same time there may be some reduced freedom and strain for the host and the guest. Given that l-Phe and l-Ala still seem to be able to t well in the cavity, there is reason to believe that they bind the strongest simply by virtue of their larger size. On the other hand, the side chain only seems to play a steric role in binding and so all side chains may be expected to behave similarly. In any case, all that may be concluded is that no large dierence in binding free energy would be expected. Given that the focus of this study was more on the stereochemical and conformational selectivity, this issue was not fully resolved.

8.7.2

Connection Between Binding Free Energies and Motifs.

The basis of rationalisation in this analysis has been shape and energetics. A large amount of information may be obtained and interpreted from such an analysis. However, in terms of using the analysis to predict actual free energies, there are two principal problems. The rst is that many eects are qualitative and competing and it is hard to determine which one will dominate. The second problem is that real binding depends on free energies which includes entropy as well as energy. Entropy is much harder to model, apart from qualitative arguments about changes in host exibility and guest mobility. Apart from a real experiment itself, the only way to determine relative binding free energies is to perform absolute free energy calcula-

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

216

tions, as has been done in Chapter 5. At this point in this work, both relative free energies and an analysis have been performed, so it is the ultimate test to compare how they match up. Since they have both been performed using the same simulation protocol and for binding inside the cavity, their results may be directly compared. Gly is able to bind the best in either conformation with the most number of hydrogen bonds and the most comfortable t, particularly for trans-Gly. This comes at a small price of straining the host to fully accommodate the guest. The apparent better binding for trans-Gly does not seem to carry across to a better relative free energy binding energy, presumably due to concomitant entropic restrictions. There is little benet in adopting the cis amide bond since cis-Gly loses too much in the way of good contacts if it tries to hydrogen bond to the host amide. On the question of stereoselectivity, the free energies are in strong agreement with the analysis. The inability of d guests to bind properly to the host appears to costs it around 2 kcal mol1 of free energy compared to l for all enantiomers with the exception of l-trans-Ala and d-trans-Ala, for which the dierence is only around 1 kcal mol1 . This observation is consistent with the V model, which only predicts one bad clash for each. The cis amide stabilisation free energies also correlate well with the analysis. For Gly, there appears to be no stabilisation of the cis conformation. Evidently, the greater ability of the trans conformation to form more hydrogen bonds comes at the price of loss of mobility and host exibility. Considering all the other guests, the stabilisation is particularly strong at around 2 kcal mol1 for l-Phe, moderately strong at 1 kcal mol1 for l-Ala, absent for d-Ala and moderate again 1 kcal mol1 for d-Phe. These results again agree with the analysis. The carbonyl oxygenamide bond stabilises the cis conformation for both l guests, and the phenyl swing dihedral stabilises d-Phe. The strongest stabilisation for l-trans-Phe seems to be due to the fact that it experiences stabilisations due to both eects. The analysis was not able to predict any signicant relative binding between dierent guests. The free energies, on the other hand, predict that l-Phe binds the

CHAPTER 8.

ANALYSIS OF THE MACROBICYCLE 12 SYSTEM

217

strongest, followed by l-Ala and Gly, while there is little dierence between d guests. The analysis, being inconclusive, is not inconsistent with this ordering predicted by the free energies. The results predicted by both the free energy calculations and the analysis agree with experiment on the key questions of conformational stabilisation and enantioselectivity for binding inside the cavity. The analysis was not able to fully elucidate relative binding constants for dierent guests. This is because experimental binding relative free energies were obtained for guests binding in the non-selective outside position and the selective inside position, and so a direct comparison with experiment is not possible.

8.8

Conclusion.

A detailed analysis of the host-guest structures has been presented. Each complex exhibits a range of interesting features including orientation, hydrogen bonds, steric contacts and conformations. For all guests the binding modes could be classied into four motifs. All guests are able to adopt more than one of these motifs. The analysis was able to explain that the origins of enantioselectivity arose mainly from a badly positioned side chain. The stabilisation of the cis amide conformation was attributed to two factors. One was the formation of a well-placed carbonyl oxygenamide hydrogen bond. The other was a internal stabilisation of Phe due to the stabilisation of the swing dihedral by a remote steric interaction with the host. Overall, this analysis has fullled its objectives and made possible the interpretation of the binding in terms of the fundamental structure of each guest.

Chapter 9 Conclusion
A detailed computer simulation of the binding of amino acid derivatives to macrobicycle 12 has been presented. The primary motivation for this study was to understand the novel binding behaviour that had been observed in experiment. This behaviour was the enantioselectivity and the stabilisation of the cis amide conformation binding inside the host cavity. A number of steps were required to achieve this goal. Firstly, a range of computer simulation methods were assessed to determine the most suitable method for this system. The decision taken was to calculate free energies using FEP, generating equilibrium congurations using Monte Carlo on a system modelled with the OPLSAA force eld. Secondly, a simulation protocol was developed to model the system. Geometry, dihedral and charge parameters that were missing from the OPLS-AA force eld were derived. A number of improvements were made to the simulation program to improve both sampling and speed. Thirdly, REPD, a new charge parameterisation method, was developed to produce OPLS-like charges by tting to the molecular electrostatic potential. Fourthly, the use of REPD charges was validated by calculating their free energies of hydration using FEP and LIE free energy methods. This work highlighted a number of caveats in using the LIE method. Fifthly, a number of more sophisticated MC moves and the GB/SA continuum solvent method were implemented into the simulation protocol to improve sampling of the macrobicycle 12 system. Sixthly, FEP free energy calculations were performed to calculate the relative binding free energies of amino acid derivatives to the host. Finally, a 218

CHAPTER 9. CONCLUSION

219

structural and energetic analysis of the macrobicycle 12 system was performed in order to understand the origins of the relative binding free energies. Each of these steps are dealt with in their own chapters. However, it is now worthwhile to comment on the overall ndings of this study and its usefulness for the future. On the methodological side, a simulation protocol has been developed to perform MC simulations on highly constrained cyclic systems. The success of this procedure hinged on replacing an explicit solvent model with a continuum solvent model. Continuum solvent models have their advantages. They are faster and yield better sampling than in explicit solvent. However, the explicit nature of solvent is important to many binding phenomena. Therefore, it would be desirable for a method to be developed that allowed large structural change in explicit solvent using reasonable simulation lengths. The end result of this study was that the relative binding free energies calculated agreed well with the experimental ndings. A detailed explanation for the binding behaviour was also proposed, based largely on a model termed the V binding model. The usefulness of computer simulations was demonstrated, to not only reproduce behaviour but provide a means to directly understand the systems properties. This analysis led to a number of interesting insights into the binding process. The rst insight is the remarkable way the host and guest interact. Contrary to most host-guest systems, the host when alone adopts quite a dierent structure to that in the host-guest complex. Evidently, the guest is able to force a large change in host structure to bind inside the cavity. Furthermore, dierent guests appear to make the host take on slightly dierent structures. The situation is further complicated by the possibility of binding also occurring outside the cavity. This aspect concerning the location of binding, while not examined in this work, would also be of great interest. However, computationally, it would be a signicantly more demanding task, most likely requiring full free energy of binding calculations, either involving a potential of mean force or double decoupling. As was found in this study, there would be even greater sampling problems.

CHAPTER 9. CONCLUSION

220

The second insight is that even for this relatively simple host-guest system, the binding process inside the cavity is still rather complex. The dierent binding motifs observed arise due to a number of competing eects. Even having established these eects, there is still considerable diculty in predicting which one dominates. Free energy calculations may provide a direct answer to this problem, although a more qualitative and informative technique may be to physically alter parts of the host and guest and observe what eect that has on binding. One issue that was not adequately resolved in this study was the selective binding of dierent amino acid derivatives. Part of the problem lay in the fact that experimental binding constants were not available for binding solely inside the cavity. A proper study of the eect of side chain would require the testing of many more amino acid derivatives and calculation of the relative free energies of binding to the whole host, both inside and outside the cavity. In any case, indications from experiment and simulation are that the host is not very selective. Nevertheless, having established that the side chain lies on top of the benzamide ring for l guests, it may be possible to alter this part of the host to create some side chain selectivity. On the issue of selectivity, four dierent binding motifs were identied by which the guests bound. All guests appeared to be able to bind in Motif 3. However, selectivity arose because some guests were able to adopt other motifs even lower in energy. One possible approach to further enhance the host selectivity may be to alter the host in some way so as to destabilise Motif 3 while leaving the others unchanged. Thus only guests that bound in the other motifs would be able to bind. Dierences in the structures observed for dierent motifs may provide clues as to what alterations may be appropriate. This approach may be extended to stabilise only one particular Motif, or it may be used to develop new motifs. Stabilisation of the cis amide conformation is quite an achievement. Two means to achieve this have been demonstrated, either by use of carefully placed hydrogen bond, or by forcing the side chain into a higher energy conformation. However, catalysis of cistrans isomerism itself in the manner of protein rotamases would be a major

CHAPTER 9. CONCLUSION

221

advance. This issue was not examined at all in this study and there are no obvious indications that the macrobicycle 12 system is able to stabilise the transition state for such a process. Nevertheless, the study has shed light on how unstable structures may be stabilised. This knowledge may prove useful in moving closer to this goal. With ever increasing computer power, the scope of systems accessible to computer simulations is rapidly expanding. Many simulations are now being performed on large protein-ligand systems that are pushing computers to their limits. The further simulations move into areas inaccessible by experiment, the greater the care required to ensure that sound methods are being applied. In addition, interpreting the observed behaviour in such complex systems becomes ever more dicult. For the next few years at least, an important bridge between these large scale simulations and experiment, and a reliable means of testing simulation development, will remain the simulation of host-guest systems.

Appendix A Charges
Table A.1: Listing of OPLS,25 EPD/6-31G*, REPD/6-31G*, EPD/6-31+ G* and REPD/6-31+ G* Charges for All 29 Molecules Used in the Parameterization of a. Molecule methane (CH4 ) ethene (C2 H2 ) water (H2 O) methanol (CH3 OH) Atom C H C H O H C O HO HC CH2 CH3 OH HO H2 C H3 C C S HS HC CH2 CH3 SH HS H2 C H3 C q OPLS -0.240 0.060 -0.230 0.115 -0.834 0.417 0.145 -0.683 0.418 0.040 0.145 -0.180 -0.683 0.418 0.060 0.060 0.060 -0.435 0.255 0.040 0.060 -0.180 -0.435 0.255 0.060 0.060 q EPD 6-31G* -0.484 0.121 -0.342 0.171 -0.818 0.409 0.358 -0.708 0.431 -0.027 0.493 -0.253 -0.705 0.416 -0.076 0.067 -0.199 -0.344 0.210 0.111 0.050 -0.134 -0.380 0.201 0.040 0.061 q REPD 6-31G* -0.300 0.075 -0.328 0.164 -0.806 0.403 0.225 -0.664 0.418 0.007 0.282 -0.126 -0.648 0.404 -0.019 0.042 -0.131 -0.352 0.207 0.092 0.030 -0.072 -0.376 0.200 0.043 0.044 q EPD G* 6-31+ -0.484 0.121 -0.378 0.189 -0.862 0.431 0.436 -0.765 0.452 -0.041 0.559 -0.259 -0.767 0.440 -0.087 0.067 -0.173 -0.357 0.215 0.105 0.071 -0.131 -0.393 0.206 0.035 0.059 q REPD 6-31+ G* -0.264 0.066 -0.356 0.178 -0.844 0.422 0.238 -0.702 0.434 0.010 0.277 -0.109 -0.687 0.422 -0.010 0.039 -0.102 -0.364 0.211 0.085 0.042 -0.059 -0.386 0.203 0.040 0.000

ethanol (CH3 CH2 OH)

methanethiol (CH3 SH)

ethanethiol (CH3 CH2 SH)

222

APPENDIX A. CHARGES Molecule formaldehyde (CH2 O) acetaldehyde (CH3 CHO) Atom CO O H C O CH3 HC H3 C C O CH3 H C OH OC CH3 HO H3 C C OC2 OC CH3 C CH3 O H3 CC H3 CO N H C N HN HC CH3 CH2 N H3 C H2 C H2 N C O N HC HN (cis) HN (trans) q OPLS q REPD 6-31G* 0.450 0.444 -0.450 -0.462 0.000 0.009 0.450 -0.450 -0.180 0.000 0.060 0.470 -0.470 -0.180 0.060 0.520 -0.530 -0.440 -0.180 0.450 0.060 0.510 -0.330 -0.430 -0.180 0.160 0.060 0.030 -1.026 0.342 0.020 -0.900 0.350 0.060 -0.180 0.080 -0.900 0.060 0.060 0.350 0.500 -0.500 -0.760 0.000 0.380 0.380 0.628 -0.532 -0.379 -0.041 0.108 0.830 -0.586 -0.521 0.133 0.886 -0.679 -0.622 -0.459 0.454 0.140 1.006 -0.479 -0.641 -0.720 0.060 0.192 0.066 -1.104 0.368 0.461 -1.050 0.380 -0.057 -0.298 0.652 -1.081 0.063 -0.110 0.379 0.741 -0.597 -1.007 -0.003 0.450 0.416 q EPD 6-31G* 0.390 -0.442 0.026 0.498 -0.496 -0.154 -0.010 0.054 0.590 -0.536 -0.186 0.053 0.711 -0.629 -0.569 -0.165 0.445 0.069 0.771 -0.412 -0.583 -0.232 -0.012 0.071 0.085 -1.068 0.356 0.244 -0.951 0.358 -0.003 -0.135 0.362 -0.957 0.032 -0.035 0.352 0.461 -0.520 -0.674 0.070 0.351 0.312 q REPD 6-31+ G* 0.475 -0.495 0.010 0.669 -0.571 -0.416 -0.039 0.119 0.870 -0.628 -0.517 0.132 0.927 -0.708 -0.655 -0.446 0.468 0.138 1.060 -0.503 -0.676 -0.737 0.067 0.196 0.758 -1.134 0.378 0.526 -1.113 0.397 -0.069 -0.280 0.704 -1.149 0.057 -0.121 0.398 0.802 -0.638 -1.080 0.001 0.479 0.436 q REPD 6-31+ G* 0.405 -0.471 0.033 0.498 -0.523 -0.127 0.002 0.050 0.581 -0.565 -0.128 0.040 0.706 -0.640 -0.585 -0.106 0.454 0.057 -0.410 -0.599 -0.155 -0.029 0.053 0.092 0.000 -1.080 0.360 0.231 -0.973 0.365 0.004 -0.101 0.333 -0.975 0.025 -0.024 0.358 0.433 -0.537 -0.643 0.096 0.350 0.301

223

acetone ((CH3 )2 CO)

acetic acid (CH3 COOH)

methyl acetate (CH3 COOCH3 )

ammonia (NH3 ) methylamine (CH3 NH2 )

ethylamine (CH3 CH2 NH2 )

formamide (HCONH2 )

APPENDIX A. CHARGES Molecule acetamide (CH3 CONH2 ) Atom C O N CH3 H2 N (cis) H2 N (trans) H3 C q OPLS q REPD 6-31G* 0.500 0.978 -0.500 -0.649 -0.760 -1.099 -0.180 -0.559 0.380 0.447 0.380 0.432 0.060 0.150 0.500 -0.500 -0.180 -0.500 0.020 0.060 0.300 0.060 0.110 -0.400 0.030 -0.180 0.140 -0.400 0.060 0.030 0.037 -0.435 0.060 -0.180 0.098 -0.435 0.060 0.060 -0.006 -0.180 -0.200 0.103 0.060 -0.115 0.115 0.150 -0.585 -0.115 -0.115 -0.115 0.804 -0.609 -0.462 -0.539 -0.365 0.121 0.331 0.159 0.026 -0.364 0.052 -0.315 0.464 -0.508 0.077 -0.063 -0.207 -0.222 0.106 -0.208 0.180 -0.354 0.065 0.005 0.012 -0.104 -0.231 0.085 0.051 -0.133 0.133 0.543 -0.669 -0.387 -0.036 -0.237 q EPD 6-31G* 0.586 -0.555 -0.745 -0.086 0.350 0.324 0.042 0.486 -0.535 -0.057 -0.332 -0.296 0.030 0.257 0.129 0.014 -0.358 0.055 -0.134 0.207 -0.388 0.039 0.002 -0.112 -0.256 0.080 -0.096 0.104 -0.332 0.038 0.022 0.019 -0.060 -0.233 0.080 0.038 -0.132 0.132 0.250 -0.582 -0.145 -0.192 -0.103 q REPD 6-31+ G* 1.063 -0.700 -1.205 -0.558 0.485 0.465 0.150 0.858 -0.644 -0.531 -0.558 -0.406 0.140 0.339 0.174 0.018 -0.378 0.057 -0.317 0.490 -0.542 0.076 -0.065 -0.185 -0.236 0.101 -0.188 0.203 -0.374 0.058 -0.001 0.024 -0.100 -0.239 0.084 0.049 -0.149 0.149 0.598 -0.713 -0.401 -0.044 -0.242 q REPD 6-31+ G* 0.528 -0.570 -0.666 -0.011 0.331 0.301 0.029 0.456 -0.551 -0.028 -0.306 -0.294 0.027 0.249 0.131 0.003 -0.372 0.061 -0.107 0.180 -0.396 0.033 0.013 -0.080 -0.272 0.072 -0.067 0.102 -0.342 0.030 0.023 0.028 -0.047 -0.241 0.079 0.034 -0.146 0.146 0.246 -0.600 -0.125 -0.206 -0.107

224

trans-N-methyl C acetamide O (CH3 CONHCH3 ) CH3 C N CH3 N H3 CC HN H3 CN dimethyl ether ((CH3 )2 O) diethyl ether ((CH3 CH2 )2 O) C O H CH3 CH2 O H3 C H2 C C S H CH3 CH2 S H3 C H2 C CH2 CH3 Cl H2 C H3 C C H C1 O C2 C3 C4

dimethyl sulde ((CH3 )2 S) diethyl sulde ((CH3 CH2 )2 S)

chloroethane (CH3 CH2 Cl)

benzene (C6 H6 ) phenol (C6 H5 OH)

APPENDIX A. CHARGES Molecule Atom H2 H3 H4 HO C1 N C2 C3 C4 H2 H3 H4 H2 N C1 Cl C2 C3 C4 H2 H3 H4 C1 CN C2 C3 C4 H2 H3 H4 N C1 CO2 C2 C3 C4 H2 H3 H4 OC OH HO q OPLS q REPD 6-31G* 0.115 0.180 0.115 0.129 0.115 0.142 0.435 0.449 0.100 -0.900 -0.115 -0.115 -0.115 0.115 0.115 0.115 0.400 0.180 -0.180 -0.115 -0.115 -0.115 0.115 0.115 0.115 0.035 0.395 -0.115 -0.115 -0.115 0.115 0.115 0.115 -0.430 -0.115 0.635 -0.115 -0.115 -0.115 0.115 0.115 0.115 -0.440 -0.530 0.450 0.802 -1.263 -0.478 -0.016 -0.292 0.182 0.130 0.149 0.484 0.008 -0.139 -0.023 -0.205 -0.070 0.115 0.151 0.125 0.101 0.302 -0.171 -0.133 -0.081 0.167 0.140 0.130 -0.458 0.047 0.710 -0.157 -0.146 -0.083 0.155 0.137 0.125 -0.587 -0.664 0.474 q EPD 6-31G* 0.131 0.149 0.122 0.427 0.192 -0.766 -0.137 -0.192 -0.107 0.122 0.142 0.115 0.348 0.016 -0.136 -0.049 -0.156 -0.118 0.119 0.138 0.134 0.067 0.289 -0.128 -0.155 -0.072 0.157 0.141 0.132 -0.446 0.077 0.564 -0.122 -0.157 -0.105 0.129 0.142 0.136 -0.527 -0.565 0.436 q REPD 6-31+ G* 0.186 0.133 0.146 0.463 0.959 -1.439 -0.528 -0.014 -0.312 0.192 0.136 0.156 0.532 0.044 -0.150 -0.033 -0.213 -0.073 0.117 0.155 0.127 0.146 0.297 -0.190 -0.139 -0.077 0.173 0.143 0.130 -0.470 0.121 0.697 -0.188 -0.155 -0.073 0.166 0.142 0.123 -0.603 -0.684 0.489 q REPD 6-31+ G* 0.129 0.152 0.128 0.433 0.181 -0.776 -0.117 -0.203 -0.111 0.122 0.143 0.118 0.349 0.037 -0.142 -0.051 -0.162 -0.124 0.119 0.140 0.137 0.088 0.287 -0.128 -0.166 -0.070 0.159 0.144 0.134 -0.457 0.121 0.524 -0.119 -0.173 -0.103 0.126 0.148 0.138 -0.528 -0.551 0.435

225

aniline (C6 H5 NH2 )

chlorobenzene (C6 H5 Cl)

benzonitrile (C6 H5 CN)

benzoic acid (C6 H5 COOH)

Bibliography
1. W. L. Jorgensen, Chemtracts-Org. Chem., 4, 91, (1991). 2. P. A. Kollman, Acc. Chem. Rev., 29, 461, (1996). 3. M. K. Gilson, J. A. Given, B. L. Bush and J. A. McCammon, Biophys. J., 72, 1047, (1997). 4. M. L. Lamb and W. L. Jorgensen, Curr. Opin. Chem. Biol., 1, 449, (1997). 5. J. A. McCammon, Curr. Opin. Struct. Biol., 8, 245, (1998). 6. G. J. Pernia, J. D. Kilburn, J. W. Essex, R. J. Mortishire-Smith and M. Rowley, J. Am. Chem. Soc., 118, 10220, (1996). 7. D. E. Stewart, A. Sarkar and J. E. Wampler, J. Mol. Biol., 214, 253, (1990). 8. P. S. Li, X. G. Chen, E. Shulin and S. A. Asher, J. Am. Chem. Soc., 119, 1116, (1997). 9. A. Jabs, M. S. Weiss and R. Hilgenfeld, J. Mol. Biol., 286, 291, (1999). 10. G. Scherer, M. L. Kramer, M. Schutkowski, U. Reimer and G. Fischer, J. Am. Chem. Soc., 120, 5568, (1998). 11. S. F. Gothel and M. A. Marahiel, Cell. Mol. Life Sci., 55, 423, (1999). 12. S. L. Schreiber, Science, 251, 283, (1991). 13. M. H. G. Wu and M. W. Deem, J. Chem. Phys., 111, 6625, (1999). 14. M. Orozco, J. Tirado-Rives and W. L. Jorgensen, Biochemistry, 32, 12864, (1993). 15. X. X. Zhang, J. S. Bradshaw and R. M. Izatt, Chem. Rev., 97, 3313, (1997). 16. W. L. Jorgensen and J. Tirado-Rives, J. Phys. Chem., 100, 14508, (1996). 226

BIBLIOGRAPHY 17. R. H. Henchman and J. W. Essex, J. Comput. Chem., 20, 483, (1999). 18. R. H. Henchman and J. W. Essex, J. Comput. Chem., 20, 499, (1999).

227

19. I. D. Wall, R. H. Henchman, A. R. Leach, D. W. Salt, M. G. Ford and J. . W. Essex, J. Comput. Chem., submitted for publication. 20. M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids, Clarendon, (1987). 21. A. R. Leach, Molecular Modelling. Principals and Applications, Longman, (1996). 22. D. Frenkel and B. Smit, Understanding Molecular Simulation, Academic Press, (1996). 23. F. Jensen, Introduction to Computational Chemistry, Wiley, (1999). 24. N. L. Allinger, X. F. Zhou and J. Bergsma, Theochem, 118, 69, (1994). 25. W. L. Jorgensen, D. S. Maxwell and J. Tirado-Rives, J. Am. Chem. Soc., 118, 11225, (1996). 26. W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell and P. A. Kollman, J. Am. Chem. Soc., 117, 5179, (1995). 27. W. R. P. Scott, P. H. Hunenberger, I. G. Tironi, A. E. Mark, S. R. Billeter, J. Fennen, A. E. Torda, T. Huber, P. Kruger and W. F. van Gunsteren, J. Phys. Chem. A, 103, 3596, (1999). 28. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan and M. Karplus, J. Comput. Chem., 4, 187, (1983). 29. L. Nilsson and M. Karplus, J. Comput. Chem., 7, 591, (1986). 30. A. D. Mackerell, D. Bashford, M. Bellott, R. L. Dunbrack, J. D. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher, B. Roux, M. Schlenkrich, J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiorkiewiczkuczera, D. Yin and M. Karplus, J. Phys. Chem. B , 102, 3586, (1998). 31. W. L. Jorgensen, BOSS, Version 3.6 , Yale University, New Haven, CT, (1995).

BIBLIOGRAPHY

228

32. W. L. Jorgensen, MCPRO, Version 1.5 , Yale University, New Haven, CT, (1997). 33. W. C. Still, A. Tempczyk, R. C. Hawley and T. Hendrickson, J. Am. Chem. Soc., 112, 6127, (1990). 34. J. Warwicker and H. C. Watson, J. Mol. Biol., 157, 671, (1982). 35. P. Ewald, Ann. Phys., 64, 253, (1921). 36. H. G. Petersen, J. Chem. Phys., 103, 3668, (1995). 37. H. L. Friedman, Mol. Phys., pp. 15331543, (29). 38. L. Greengard and V. Rokhlin, J. Comput. Phys., 73, 325, (1987). 39. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller, J. Chem. Phys., 21, 1087, (1953). 40. J. P. Ryckaert, G. Ciccotti and H. J. C. Berendsen, J. Comput. Phys., 23, 327, (1977). 41. D. L. Beveridge and F. M. Dicapua, Ann. Rev Biophys. Biophys. Chem., 18, 431, (1989). 42. T. P. Straatsma and J. A. McCammon, Ann. Rev. Phys. Chem., 43, 407, (1992). 43. C. A. Reynolds, P. M. King and W. G. Richards, Mol. Phys., 76, 251, (1992). 44. P. Kollman, Chem. Rev., 93, 2395, (1993). 45. R. M. Levy and E. Gallicchio, Ann. Rev. Phys. Chem., 49, 531, (1998). 46. R. W. Zwanzig, J. Chem. Phys., 22, 1420, (1954). 47. S. Boresch and M. Karplus, J. Phys. Chem. A, 103, 103, (1999). 48. W. L. Jorgensen and C. Ravimohan, J. Chem. Phys., 83, 3050, (1985). 49. M. Zacharias, T. P. Straatsma and J. A. McCammon, J. Chem. Phys., 100, 9025, (1994). 50. D. A. Kofke and P. T. Cummings, Mol. Phys., 92, 973, (1997). 51. J. G. Kirkwood, J. Chem. Phys., 3, 300, (1935).

BIBLIOGRAPHY 52. T. P. Straatsma and J. A. McCammon, J. Chem. Phys., 95, 1175, (1991).

229

53. T. P. Straatsma, H. J. C. Berendsen and J. P. M. Postma, J. Chem. Phys., 85, 6720, (1986). 54. M. J. Mitchell and J. A. McCammon, J. Comput. Chem., 12, 271, (1991). 55. D. A. Pearlman, J. Comput. Chem., 15, 105, (1994). 56. M. R. Reddy and M. D. Erion, J. Comput. Chem., 20, 1018, (1999). 57. P. E. Smith and W. F. van Gunsteren, J. Chem. Phys., 100, 577, (1994). 58. A. E. Mark, Y. W. Xu, H. Y. Liu and W. F. van Gunsteren, Acta Biochimica Polonica, 42, 525, (1995). 59. B. Tidor, J. Phys. Chem., 97, 1069, (1993). 60. X. J. Kong and C. L. Brooks, J. Chem. Phys., 105, 2414, (1996). 61. C. Jarque and B. Tidor, J. Phys. Chem. B , 101, 9362, (1997). 62. J. Pitera and P. Kollman, J. Am. Chem. Soc., 120, 7557, (1998). 63. M. A. L. Eriksson, J. Pitera and P. A. Kollman, J. Med. Chem., 42, 868, (1999). 64. H. Senderowitz, D. Q. McDonald and W. C. Still, J. Org. Chem., 62, 9123, (1997). 65. J. Aqvist, C. Medina and J. E. Samuelsson, Protein Eng., 7, 385, (1994). 66. X. Chen and A. Tropsha, J. Comput. Chem., 20, 749, (1999). 67. J. Novotny, R. E. Bruccoleri, M. Davis and K. A. Sharp, J. Mol. Biol., 268, 401, (1997). 68. R. S. DeWitte and E. I. Shakhnovich, J. Am. Chem. Soc., 118, 11733, (1996). 69. I. Kolossvary, J. Phys. Chem. A, 101, 9900, (1997). 70. I. Kolossvary, J. Am. Chem. Soc., 119, 10233, (1997). 71. A. E. Mark and W. F. van Gunsteren, J. Mol. Biol., 240, 167, (1994). 72. B. L. Tembe and J. A. McCammon, Comp. Chem., 8, 281, (1984). 73. W. L. Jorgensen, Acc. Chem. Rev., 22, 184, (1989).

BIBLIOGRAPHY

230

74. T. C. Beutler, A. E. Mark, R. C. Vanschaik, P. R. Gerber and W. F. van Gunsteren, Chem. Phys. Lett., 222, 529, (1994). 75. G. C. Boulougouris, I. G. Economou and D. N. Theodorou, Mol. Phys., 96, 905, (1999). 76. T. P. Lybrand, J. A. McCammon and G. Wip, Proc. Natl. Acad. Sci. USA, 83, 833, (1986). 77. N. A. McDonald, E. M. Duy and W. L. Jorgensen, J. Am. Chem. Soc., 120, 5104, (1998). 78. S. Miyamoto and P. A. Kollman, J. Am. Chem. Soc., 114, 3668, (1992). 79. S. J. Cho and P. A. Kollman, J. Org. Chem., 64, 5787, (1999). 80. T. Z. M. Denti, W. F. van Gunsteren and F. Diederich, J. Am. Chem. Soc., 118, 6044, (1996). 81. E. M. Duy and W. L. Jorgensen, J. Am. Chem. Soc., 116, 6337, (1994). 82. P. D. Kirchho, J. P. Dutasta, A. Collet and J. A. McCammon, J. Am. Chem. Soc., 121, 381, (1999). 83. M. J. Potter, P. D. Kirchho, H. A. Carlson and J. A. McCammon, J. Comput. Chem., 20, 956, (1999). 84. J. Costantecrassous, T. J. Marrone, J. M. Briggs, J. A. McCammon and A. Collet, J. Am. Chem. Soc., 119, 3818, (1997). 85. M. T. Burger, A. Armstrong, F. Guarnieri, D. Q. McDonald and W. C. Still, J. Am. Chem. Soc., 116, 3593, (1994). 86. M. A. L. Eriksson, P. Y. Morgantini and P. A. Kollman, J. Phys. Chem. B , 103, 4474, (1999). 87. A. E. Mark, S. P. van Helden, P. E. Smith, L. H. M. Janssen and W. F. van Gunsteren, J. Am. Chem. Soc., 116, 6293, (1994). 88. P. A. Bash, U. C. Singh, F. K. Brown, R. Langridge and P. A. Kollman, Science, 235, 574, (1987). 89. H. A. Carlson, T. B. Nguyen, M. Orozco and W. L. Jorgensen, J. Comput. Chem., 14, 1240, (1993).

BIBLIOGRAPHY

231

90. P. Y. Morgantini and P. A. Kollman, J. Am. Chem. Soc., 117, 6057, (1995). 91. V. Helms and R. C. Wade, J. Comput. Chem., 18, 449, (1997). 92. J. Hine and P. K. Mookerjee, J. Org. Chem., 40, 292, (1975). 93. S. Cabani, P. Gianni, V. Mollica and L. Lepori, J. Sol. Chem., 10, 563, (1981). 94. A. Ben-Naim and Y. Marcus, J. Chem. Phys., 81, 2016, (1984). 95. C. J. Cramer and D. G. Truhlar, J. Comput.-Aided Mol. Design, 6, 629, (1992). 96. J. W. Essex, C. A. Reynolds and W. G. Richards, J. Chem. Soc., Chem. Commun., pp. 11521154, (1989). 97. W. L. Jorgensen, J. M. Briggs and M. L. Contreras, J. Phys. Chem., 94, 1683, (1990). 98. E. M. Duy, D. L. Severance and W. L. Jorgensen, Isr. J. Chem., 33, 323, (1993). 99. S. J. Weiner, P. A. Kollman, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Profeta and P. Weiner, J. Am. Chem. Soc., 106, 765, (1984). 100. K. . Bertil Sandell, Naturwissenschaften, 13, 330, (1953). 101. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey and M. L. Klein, J. Chem. Phys., 79, 926, (1983). 102. W. D. Kumler and G. M. Fohlen, J. Am. Chem. Soc., 64, 1944, (1942). 103. M. J. Frisch, G. W. Trucks, H. B. Schlegel, P. M. W. Gill, B. G. Johnson, M. A. Robb, J. R. Cheeseman, G. A. P. T. Keith, J. A. Montgomery, K. Raghavachari, M. A. Al-Laham, V. G. Zakrzewski, J. V. Ortiz, J. B. Foresman, J. Cioslowski, B. B. Stefanov, A. Nanayakkara, M. Challacombe, C. Y. Peng, P. Y. Ayala, W. Chen, M. W. Wong, J. L. Andres, E. S. Replogle, R. Gomperts, R. L. Martin, D. J. Fox, J. S. Binkley, D. J. Defrees, J. Baker, J. P. Stewart, M. Head-Gordon, C. Gonzalez, and J. A. Pople, GAUSSIAN 94 , Gaussian, Inc., Pittsburgh PA, revision d.4 edition, (1995). 104. C. C. Chambers, E. F. Archibong, A. Jabalameli, R. H. Sullivan, D. J. Giesen, C. J. Cramer and D. G. Truhlar, Theochem, 425, 61, (1998). 105. W. L. Jorgensen, BOSS, Version 3.8 , Yale University, New Haven, CT, (1997).

BIBLIOGRAPHY

232

106. L. R. Dodd, T. D. Boone and D. N. Theodorou, Mol. Phys., 78, 961, (1993). 107. F. A. Momany, J. Phys. Chem., 82, 592, (1978). 108. S. R. Cox and D. E. Williams, J. Comput. Chem., 2, 304, (1981). 109. U. C. Singh and P. A. Kollman, J. Comput. Chem., 5, 129, (1984). 110. L. E. Chirlian and M. M. Francl, J. Comput. Chem., 8, 894, (1987). 111. M. M. Francl, C. Carey, L. E. Chirlian and D. M. Gange, J. Comput. Chem., 17, 367, (1996). 112. R. Soliva, M. Orozco and F. J. Luque, J. Comput. Chem., 18, 980, (1997). 113. D. E. Williams, J. Comput. Chem., 15, 719, (1994). 114. W. A. Sokalski, D. A. Keller, R. L. Ornstein and R. Rein, J. Comput. Chem., 14, 970, (1993). 115. J. G. Angyan and C. Chipot, Int. J. Quant. Chem., 52, 17, (1994). 116. A. Wallqvist and B. J. Berne, J. Phys. Chem., 97, 13841, (1993). 117. D. E. Williams, Biopolymers, 29, 1367, (1990). 118. T. R. Stouch and D. E. Williams, J. Comput. Chem., 13, 622, (1992). 119. T. R. Stouch and D. E. Williams, J. Comput. Chem., 14, 858, (1993). 120. C. A. Reynolds, J. W. Essex and W. G. Richards, J. Am. Chem. Soc., 114, 9075, (1992). 121. W. D. Cornell, P. Cieplak, C. I. Bayly and P. A. Kollman, J. Am. Chem. Soc., 115, 9620, (1993). 122. C. M. Breneman and K. B. Wiberg, J. Comput. Chem., 11, 361, (1990). 123. F. Colonna and E. M. Evleth, Chem. Phys. Lett., 212, 665, (1993). 124. Z. W. Su and P. Coppens, Naturforsch, 48a, 85, (1993). 125. A. K. Rappe and W. A. Goddard, J. Phys. Chem., 95, 3358, (1991). 126. W. T. King and G. B. Mast, J. Phys. Chem., 80, 2521, (1976). 127. J. H. Newton and W. B. Person, J. Chem. Phys., 64, 3036, (1976).

BIBLIOGRAPHY 128. J. Cioslowski, J. Am. Chem. Soc., 111, 8333, (1989). 129. R. S. Mulliken, J. Chem. Phys., 23, 1833, (1955).

233

130. A. E. Reed, R. B. Weinstock and F. Weinhold, J. Chem. Phys., 83, 735, (1985). 131. F. L. Hirschfeld, Theor. Chim. Acta, 44, 129, (1977). 132. R. F. W. Bader, Chem. Rev., 91, 893, (1991). 133. K. B. Wiberg and P. R. Rablen, J. Comput. Chem., 14, 1504, (1993). 134. C. Chipot, B. Maigret, J. L. Rivail and H. A. Scheraga, J. Phys. Chem., 96, 10276, (1992). 135. A. J. Stone and M. Alderton, Mol. Phys., 56, 1047, (1985). 136. F. Colonna, E. Evleth and J. G. Angyan, J. Comput. Chem., 13, 1234, (1992). 137. G. G. Ferenczy, J. Comput. Chem., 12, 913, (1991). 138. L. F. Kuyper, R. N. Hunter, D. Ashton, K. M. Merz and P. A. Kollman, J. Phys. Chem., 95, 6661, (1991). 139. M. Cao, B. J. Teppen, D. M. Miller, J. Pranata and L. Schafer, J. Phys. Chem., 98, 11353, (1994). 140. C. I. Bayly, P. Cieplak, W. D. Cornell and P. A. Kollman, J. Phys. Chem., 97, 10269, (1993). 141. M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. J. Su, T. L. Windus, M. Dupuis and J. A. Montgomery, J. Comput. Chem., 14, 1347, (1993). 142. S. Swaminathan, B. M. Craven and R. K. Mcmullan, Acta. Crystallogr., B40, 300, (1984). 143. M. A. Spackman, J. Comput. Chem., 17, 1, (1996). 144. K. M. Merz, J. Comput. Chem., 13, 749, (1992). 145. M. L. Connolly, J. Appl. Crystallog., 16, 548, (1983). 146. D. A. Pearlman, J. W. Caldwell, G. Seibel, P. A. Weiner and P. A. Kollman, AMBER 4.1 (UCSF), Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, (1991).

BIBLIOGRAPHY 147. H. A. Carlson and W. L. Jorgensen, J. Phys. Chem., 99, 10667, (1995).

234

148. N. A. McDonald, H. A. Carlson and W. L. Jorgensen, J. Phys. Org. Chem., 10, 563, (1997). 149. W. L. Jorgensen, J. F. Blake and J. K. Buckner, Chem. Phys., 129, 193, (1989). 150. W. L. Jorgensen and D. L. Severance, J. Am. Chem. Soc., 112, 4768, (1990). 151. G. Kaminski, E. M. Duy, T. Matsui and W. L. Jorgensen, J. Phys. Chem., 98, 13077, (1994). 152. J. Aqvist and T. Hansson, J. Phys. Chem., 100, 9512, (1996). 153. T. Rowan, Ph.D. Thesis, Department of Computer Sciences, University of Texas,, Austin, (1990). 154. F. Mohamadi, N. G. J. Richards, W. C. Guida, R. Liskamp, M. Lipton, C. Caueld, G. Chang, T. Hendrickson and W. C. Still, MacroModel Version 5.0, J. Comput. Chem., 11, 440, (1990). 155. Y. X. Sun, D. Spellmeyer, D. A. Pearlman and P. Kollman, J. Am. Chem. Soc., 114, 6798, (1992). 156. I. D. Wall, A. R. Leach, D. W. Salt, M. G. Ford and J. W. Essex, J. Med. Chem., in press. 157. D. Livingstone, Data Analysis for Chemists, Oxford University Press, Oxford, (1995). 158. M. Stone and R. J. Brooks, Journal of the Royal Statistical Society Series BMethodological , 52, 237, (1990). 159. J. Malpass, D. Salt, M. Ford, E. Wynn and D. Livingstone, Continuum regression: A new algorithm for the prediction of biological activity. In Methods and Principles in Medicinal Chemistry, 3, Advanced computer assisted techniques in drug discovery, VCH Publishers, Weinheim, (1995). 160. J. Malpass, D. Salt and M. Ford, Pestic. Sci., 46, 282, (1996). 161. M. Ford, R. Crichton, D. Salt and D. Livingstone, PARAGON Drug Design Software. V1.09.01 , University of Portsmouth, UK, (1999). 162. B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, New York, (1983).

BIBLIOGRAPHY

235

163. A. E. Mark, W. F. van Gunsteren and H. J. C. Berendsen, J. Chem. Phys., 94, 3808, (1991). 164. G. Verkhivker, R. Elber and W. Nowak, J. Chem. Phys., 97, 7838, (1992). 165. H. Tsujishita, I. Moriguchi and S. Hirono, J. Phys. Chem., 97, 4416, (1993). 166. H. Y. Liu, A. E. Mark and W. F. van Gunsteren, J. Phys. Chem., 100, 9485, (1996). 167. T. Huber, A. E. Torda and W. F. van Gunsteren, J. Phys. Chem. A, 101, 5926, (1997). 168. T. C. Beutler and W. F. van Gunsteren, J. Chem. Phys., 101, 1417, (1994). 169. P. Rossky, P. Doll and H. L. Friedman, J. Chem. Phys., 69, 4628, (1978). 170. C. Pangali, M. Rao and B. J. Berne, Chem. Phys. Lett., 55, 413, (1978). 171. G. M. Torrie and J. P. Valleau, J. Comp. Phys., 23, 187, (1977). 172. W. L. Jorgensen, P. I. M. Detirado and D. L. Severance, J. Am. Chem. Soc., 116, 2199, (1994). 173. M. W. Deem and J. S. Bader, Mol. Phys., 87, 1245, (1996). 174. Z. H. Liu and B. J. Berne, J. Chem. Phys., 99, 6071, (1993). 175. D. D. Frantz, D. L. Freeman and J. D. Doll, J. Chem. Phys., 93, 2769, (1990). 176. R. H. Zhou and B. J. Berne, J. Chem. Phys., 107, 9185, (1997). 177. C. J. Geyer, Computing Science and Statistics, American Statistical Association, New York, (1991). 178. B. A. Berg and T. Neuhaus, Phys. Rev. Lett., 68, 9, (1992). 179. J. Lee, Phys. Rev. Lett., 71, 211, (1993). 180. I. Andricioaei and J. E. Straub, J. Chem. Phys., 107, 9117, (1997). 181. P. H. Verdier and W. H. Stockmayer, J. Chem. Phys., 36, 227, (1962). 182. A. Baumgartner and K. Binder, J. Chem. Phys., 71, 2541, (1979). 183. V. G. Mavrantzas and D. N. Theodorou, Macromolecules, 31, 6310, (1998).

BIBLIOGRAPHY 184. F. A. Escobedo and J. J. Depablo, J. Chem. Phys., 102, 2636, (1995).

236

185. E. Leontidis, J. J. Depablo, M. Laso and U. W. Suter, Adv. Polym. Sci., 116, 283, (1994). 186. F. T. Wall and F. Mandel, J. Chem. Phys., 63, 4592, (1975). 187. P. V. K. Pant and D. N. Theodorou, Macromolecules, 28, 7224, (1995). 188. H. Senderowitz and W. C. Still, J. Comput. Chem., 19, 1736, (1998). 189. M. S. Head, J. A. Given and M. K. Gilson, Biophys. J., 72, WP437, (1997). 190. D. Wu, D. Chandler and B. Smit, J. Phys. Chem., 96, 4077, (1992). 191. S. Duane, A. D. Kennedy, B. J. Pendleton and D. Roweth, Physics Letters B , 195, 216, (1987). 192. S. Miertus, E. Scrocco and J. Tomasi, Chem. Phys., 55, 117, (1981). 193. S. R. Edinger, C. Cortis, P. S. Shenkin and R. A. Friesner, J. Phys. Chem. B , 101, 1190, (1997). 194. L. Wesson and D. Eisenberg, Protein Sci., 1, 227, (1992). 195. R. Luo, M. S. Head, J. A. Given and M. K. Gilson, Biophys. Chem., 78, 183, (1999). 196. T. J. Richmond, J. Mol. Biol., 178, 63, (1984). 197. G. D. Hawkins, C. J. Cramer and D. G. Truhlar, Chem. Phys. Lett., 246, 122, (1995). 198. J. W. Ponder, TINKER, Version 3.6 , Washington University, St. Louis, Missouri, (1998). 199. D. J. Giesen, C. C. Chambers, C. J. Cramer and D. G. Truhlar, J. Phys. Chem. B , 101, 2061, (1997). 200. J. D. Madura, M. E. Davis, M. K. Gilson, R. C. Wade, B. A. Luty and J. A. McCammon, Reviews in Computational Chemistry V , VCH Publishers, Inc., pp. 229267, (1994). 201. R. E. Bruccoleri, J. Novotny and M. E. Davis, J. Comput. Chem., 18, 268, (1997).

BIBLIOGRAPHY

237

202. M. K. Gilson, K. A. Sharp and B. H. Honig, J. Comput. Chem., 9, 327, (1988). 203. B. Jayaram, D. Sprous and D. L. Beveridge, J. Phys. Chem. B , 102, 9571, (1998). 204. J. B. Li, G. D. Hawkins, C. J. Cramer and D. G. Truhlar, Chem. Phys. Lett., 288, 293, (1998). 205. F. Fraternali and W. F. van Gunsteren, J. Mol. Biol., 256, 939, (1996). 206. R. C. Rizzo and W. L. Jorgensen, J. Am. Chem. Soc., 121, 4827, (1999). 207. W. L. Jorgensen, J. Phys. Chem., 87, 5304, (1983). 208. D. A. Pearlman and P. A. Kollman, J. Chem. Phys., 90, 2460, (1989). 209. C. H. Bennett, J. Comput. Phys., 22, 245, (1977). 210. N. D. Lu and D. A. Kofke, J. Chem. Phys., 111, 4414, (1999). 211. W. L. Jorgensen and J. Gao, J. Am. Chem. Soc., 110, 4212, (1988). 212. D. Sitko, N. Bental and B. Honig, J. Phys. Chem., 100, 2744, (1996).

Vous aimerez peut-être aussi