Vous êtes sur la page 1sur 12

Marc Hrknen and Hanne Antila

GROMACS modification of two phase


thermodynamic program (2PT)

30.1.2014

AALTO UNIVERSITY
SCHOOL OF CHEMICAL TECHNOLOGY

Contents
1 Introduction

2 Source code modification

3 Computational details

4 Results

5 Conclusions

Appendices

A Source code additions


A.1 model.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 trj_header.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.3 trj_reader.cpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6
6
8
9

B Class flowchart

11

Introduction

This is the documentation for the G ROMACS4 compatible modification of Two-Phase Thermodynamic (2PT) program.13 At the moment only water trajectories are supported; any
input file containing more than water will not produce correct results.

2 Source code modification


The 2PT program, written in C++ by Lin et al.,13 was modified to allow .gro files as
input files. The calculation code was left untouched; only the input-reading sections of
the code were changed. The functions TRJheader::readUSRHeader, TRJ::ReadUSRFrame,
and MODEL::rd_bgf were used as a basis to create the new G ROMACS compatible TRJheader::readGROHeader, TRJ::ReadGROFrame, and MODEL::rd_gro functions. Most of the
additions to the original code can be seen in Appendix A. Additionally, minor tweaks have
been made in other parts of the source code to ensure compatibility with 4 or 5 point water
models. A simplified flowchart of the program is depicted in Appendix B.
Unlike other formats, which require separate structure and trajectory files, this version
needs a single converted .gro file. The conversion from G ROMACSs binary .trr file to the
.gro file is done with G ROMACSs own trjconv tool.
The decision to omit the structure file for ease of use introduces a few constraints to the
input file. The structure is read from the first trajectory frame instead of a separate file.
ASCII

Since .gro file do not contain any information about bonds, the program assumes that the
molecule is water and places bonds accordingly. The first atom of the trajectory (oxygen)
will be bound to the 2 following atoms (hydrogen) and so on until the end of the first frame.
Any subsequent virtual site will be ignored. Thus it is crucial that the atoms are in the
right order in the input file (OHHOHH. . . ). If the water contains virtual siteswhich is
the case in TIP4P or TIP5 water modelsthey must be at the end of the molecule and
may not have a residue name starting with O, since the program will confuse them with
the oxygen atom of the next molecule. For example, the atom order for trajectories with a
TIP4P water model would be : OHHMOHHM. . . Fortunately mostif not allG ROMACS
water models are sorted that way by default. In water models with more than 3 atoms, the
extra atoms will simply be ignored in the thermodynamic calculations.
The modification was benchmarked against data in ref. 2 to ensure the proper functionality of the program. Molecular dynamics simulations were conducted using the G ROMACS
v.4.5.5 software package on 4 different water models.

Table 1: Results from 100 ps simulations with 512 water molecules of different water models.
Unit of S is kJ/mol and f is dimensionless
SPC(G ROMACS)

SPC(Lin)

SPC/E(G ROMACS)

SPC/E(Lin)

52.97
11.99
64.97
0.29
0.07

53.05 0.14
12.03 0.03
65.09 0.13
0.29 0.01
0.07 0.00

50.02
10.35
60.37
0.24
0.05

49.870.14
10.410.04
60.280.16
0.230.01
0.050.00

Strn
Srot
S
ftrn
frot

TIP3P(G ROMACS)

TIP3P(Lin)

TIP4P(G ROMACS)

TIP4P(Lin)

55.48
12.94
68.42
0.34
0.08

55.590.15
12.900.04
68.480.14
0.340.00
0.080.00

49.65
9.50
59.15
0.23
0.05

49.790.07
9.530.07
59.320.12
0.240.01
0.050.00

Strn
Srot
S
ftrn
frot

Computational details

The simulation protocol is similar but not fully equivalent to the one in ref. 2. The simulation box contained 512 water molecules of different water models: SPC, SPC/E, TIP3P,
and TIP4P. First, simple steepest descent energy minimization was done, and then the
system was equilibrated in NVT conditions for 2 ns. Finally a 100 ps NVT simulation was
run to create the trajectory for entropy calculations. All simulations were conducted at a
temperature of 298,15 K, and in a 2.483 nm 2.483 nm 2.483 nm periodic box to achieve
a density of 1 g/cm3 . The 100 ps simulation had an integration step of 1 fs, and the output
was written every 4 steps. Unlike the original paper,2 long-range interactions were included
using the particle-mesh Ewald method, because PPPM is not yet supported by G ROMACS.
Short-range neighbourlist, electrostatic, and the short-range van der Waals cutoff were set
to 0,95 nm. The Nose-Hoover thermostat was used for temperature coupling, with the reference temperature at 298,15 K and the time constant t = 0.1 ps.
Once the simulations were complete, the binary .trr and .tpr files were converted with
trjconv to a single .gro file. The .gro file was added to the 2PT control file with the new
keyword IN_GRO.

4 Results
The G ROMACS results were benchmarked against the data from Lin et al.2 The quantities
compared were Strn for the entropy of the translational component, Srot for the entropy of
the rotational component, S for the total entropy (Strn + Srot ), ftrn for the fluidicity of the

translational component and frot for the fluidicity of the rotational component. The results
can be seen in table 1.

5 Conclusions
The values for entropy from the modified code mostly fit in the error margins of the original
data by Lin et al.2 Note that the four-point water models yields less accurate results than
three-point water models. However, the difference of entropy values to reference data is
less than 0.6%. The small error might be attributed to the use of a different simulation
software, integration method or long range electrostatics calculation method. The values of
fluidicity correspond much better to the original data.
In the converted .gro trajectory files, G ROMACS has also assigned velocities to the virtual
MW site in addition to the O and H sites. This is despite the fact that MW has no mass.
It is unclear how these velocities are calculated and how they relate to the velocities in the
rest of the sites. Therefore, the culprit to the less accurate results with the four-point water
model might be the water model reading part, as it ignores the virtual sites.
The code could be improved in a number of ways. At the time, systems with only water
molecules are supported. The code could be further modified to allow other atoms than hydrogen and oxygen: the function MODEL::element2prp (unused when reading .gro files) recognizes many different elements (H, He, C, N, O, F, Na, S, Cl, Pt). However, the structurereading function MODEL::rd_gro will have to be modified to be able to read a structure file
containing bond information, such as a .tpr or .top file. If this is done, a new keyword for
the 2PT control file will have to be added, because currently the IN_GRO keyword reads both
the trajectory and the structure from the same file.

References
[1] Lin, S. T.; Blanco, M.; Goddard, W. A. J. Chem. Phys. 2003, 119, 1179211805.
[2] Lin, S. T.; Maiti, P. K.; Goddard, W. A. J. Phys. Chem. B 2010, 114, 81918198.
[3] Pascal, T. A.; Lin, S. T.; Goddard, W. A. Phys. Chem. Chem. Phys. 2011, 13, 169181.
[4] Hess, B.;Kutzner,C.; van der Spoel, D.;Lindahl, E. JCTC. 2008, 4, 435447.
[5] Jayaram, B.; Jain, T. Annu. Rev. Biophys. Biomol. Struct. 2004, 33, 343361.
[6] Mortimer, R. G. Physical Chemistry, Benjamin/Cumming Publishing Company, Inc.
1993
[7] Leach, A. R. Molecular Modelling: Principles and Applications, Second edition, Pearson Education Ltd. 2001

Appendices
A
A.1
1
2
3
4
5
6
7
8
9
10
11
12

Source code additions


model.cpp

/*
/ Reads structure from the first frame of the trajectory .
/ Also determines the amount of atoms per molecule , and
/ ignores virtual sites (such as the M atom of TIP4P or
/ the L1 and L2 atoms of TIP5P). Once the atom array is
/ complete , creates bonds according to water molecule
/ specifications , assuming atoms are sorted correctly .
*/
int MODEL :: rd_gro (char* name)
{
char null [1024];
stringstream ss;

13
14
15
16
17
18
19

cout <<" Reading GRO file "<<name <<"...";


ifstream inf(name , fstream ::in);
if (! inf. is_open ()) {
cout <<" Error : GRO file "<<name <<" cannot be opened ."<<endl;
return filenotopen ;
}

20
21
22
23

// First two lines: title and number of atoms


inf. getline (null ,1024) ;
inf >>natom ;

24
25
26
27

// Create temporary atom array , which contains also virtual sites


ATOM* tmpatom = new ATOM [natom ];

28
29
30
31
32

33
34
35
36
37
38
39
40
41

// Get atomic info , according to GROMACS formatting


for(int i=0; i<natom ; i++) {
tmpatom [i].id = i;
inf >>null >> tmpatom [i].name >>null >> tmpatom [i].pv[0]>> tmpatom [i]. pv[1]>> tmpatom [i]. pv
[2]>>null >>null >>null;
// Fix units (nm -> a)
for(int j=0; j <3; j++) tmpatom [i].pv[j ]*=10;
// Remove extra letters from name (NOTE: Only 1- letter atoms work)
tmpatom [i]. name [1]= \0 ;
// Write forcefield types
if( tmpatom [i]. name [0] == O) strcpy ( tmpatom [i]. fftype , "O_3");
else if( tmpatom [i]. name [0] == H) strcpy ( tmpatom [i]. fftype , "H_H");
else strcpy ( tmpatom [i]. fftype , "X_");
}

42
43
44

45
46
47

// Find out how many atoms per molecule , remove virtual atoms
// NOTE: Atoms MUST be in the following order : O,H,H,(M,M,)O,H,H... where M is an
optional virtual site
int atomsPerMolecule = 0;
while (1)
if( tmpatom [++ atomsPerMolecule ]. name [0] == O) break ;

48
49
50
51
52

int natom_old = natom;


natom = natom_old / atomsPerMolecule * 3; // Save atom number , excluding virtual sites
prp. atomsPerMolecule = atomsPerMolecule ;
init_atom ();

53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

// Deep copy all atoms except virtual sites


for(int i=0; i< natom_old ; i++)
{
if(i% atomsPerMolecule <= 2)
{
int idx = i - ( atomsPerMolecule -3) *(i/ atomsPerMolecule );
strcpy (atom[idx ]. name , tmpatom [i]. name);
strcpy (atom[idx ]. fftype , tmpatom [i]. fftype );
atom[idx ].pv [0] = tmpatom [i]. pv [0];
atom[idx ].pv [1] = tmpatom [i]. pv [1];
atom[idx ].pv [2] = tmpatom [i]. pv [2];
atom[idx ].id = idx;
}
}
delete [] tmpatom ;

69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

// Go to the end of the first frame and read box dimensions


inf.seekg (0);
periodic = 3;
for(int i=0; i< natom_old +3; i++)
{
inf. getline (null , 1024);
}
ss.str(null);
ss >>cell.la >>cell.lb >>cell.lc;
cell.la *=10; cell.lb *=10; cell.lc *=10; // Units nm -> a
cell.alpha =cell.beta=cell.gamma =90;
cell. labc2H ();
cell. H2others ();

84
85
86
87
88
89

// Assumes atoms are in gro file in order (O,H,H,O,H,H...)


// Creates bonds
for(int i=0; i<natom ; i++) {
if(i%3 == 0) // Oxygen
{

atom[i]. ncnt =2;


for(int j=0; j<atom[i]. ncnt; j++)
{
atom[i]. bod[j]=1;
atom[i]. cnt[j]=i+j+1;
if(atom[i]. name [0] != O)
{
cout <<"ERROR : Atom order incorrect near atom " <<( atomsPerMolecule -3) *(i/3)+i
<<endl;
return 1;
}
}
}
else // Hydrogen
{
atom[i]. bod [0]=1;
atom[i]. cnt [0]=i -(i%3);
atom[i]. ncnt =1;
if(atom[i]. name [0] != H)
{
cout <<"ERROR: Atom order incorrect near atom " <<( atomsPerMolecule -3) *(i/3)+i<<
endl;
return 1;
}
}

90
91
92
93
94
95
96
97

98
99
100
101
102
103
104
105
106
107
108
109

110
111
112
113

A.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

18

trj_header.cpp

/*
/ Reads number of atoms , excluding virtual sites
*/
int TRJheader :: ReadGROHeader ( ifstream *intrj ,int atomsPerMolecule )
{
char junk [1024];
int old_natom ; // natom including virtual sites
stringstream ss;
CleanTRJheader ();
intrj -> getline (junk , 1024); // Read out title line
intrj -> getline (junk , 1024); // Read out nAtoms line
ss.str(junk);
ss >> old_natom ;
totmov = old_natom / atomsPerMolecule * 3;
nmovatm = totmov ; // Remove virtual sites from the number of atoms
init_header ();
cout <<" grotrj natom "<<nmovatm <<" ( "<<old_natom -totmov <<" virtual sites ignored )"<<
endl;
intrj ->seekg (0, ios :: beg);

19

return 0;

20
21

A.3
1
2
3
4
5
6
7
8
9
10
11
12
13
14

trj_reader.cpp

/*
/ Calculates total amount of frames . .gro files are
/ formatted so that every frame has an equal amount of
/ bytes . Total frame number is computed by dividing the
/ total size of the file by the size of one frame.
*/
int TRJ :: init_grotrj ( trjItem trjf)
{
cout <<" Analyzing GROMACS trajectory file "<<trjf.vel <<endl;
intrj.open(trjf.vel ,ios ::in);
if (! intrj . is_open ()) {
cout <<" Error: Trj file "<<trjf.vel <<" cannot be opened ."<<endl;
return filenotopen ;
}

15

int i = rd_header (atom);


if(i) return i;

16
17

// read header , function calls TRJheader :: ReadGROHeader

18

location0 = intrj.tellg ();


init_contentDouble ();
CleanTRJcontentDouble ();
ReadGROFrame ();
location1 = intrj.tellg ();
datasize =location1 - location0 ;

19
20
21
22
23
24
25

intrj.seekg (0, ios :: end);


location2 = intrj.tellg ();
totframe =( location2 - location0 )/ datasize ;

26
27
28
29

return 0;

30
31

32
33
34
35
36
37
38
39
40
41
42

/*
/ Reads the position and velocities of all atoms and the
/ value of time in a single frame . Before this function
/ is called , the function TRJ :: rd_frame will change the
/ file pointer to point at the beginning of the frame to read.
*/
void TRJ :: ReadGROFrame ()
{
int i,j, idx;

int tmp;
double boxDim [3];
char junk [1024];
char *find;
int apm = prp -> atomsPerMolecule ; // Atoms / water molecule
stringstream ss;
intrj. getline (junk ,1024) ;
find = strstr (junk , "t="); // Finding time info in title line
ss.str(find);
if (! find)
{
cout <<" WARNING "<<endl;
cout <<".gro file was not converted with GROMACS trjconv . 2pt will probably fail"<<
endl;
}
ss.get ();
ss.get (); // read out chars t and = from stream
ss >>prp ->time;
intrj >>tmp; // Compare atom number from this frame to first one
if(tmp / apm * 3 != header . nmovatm )
cout <<" WARNING "<<endl <<"Atom number mismatch , 2pt will probably fail"<<endl;
intrj. getline (junk ,1024) ;

43
44
45
46
47
48
49
50
51
52
53
54
55

56
57
58
59
60
61
62
63
64

// Reading all atoms x,y, and z positions and velocities ( excluding virtual sites )
for(i=0;i<tmp;i++)
{
intrj. getline (junk ,1024) ;
if(i % apm <= 2)
{
idx = i-(apm -3) *(i/apm);
sscanf (junk , "%*5d%*5s%*5s%*5d%8 lf %8 lf %8 lf %8 lf %8 lf %8 lf",
&atom[idx ].pv [0], &atom[idx ]. pv[1], &atom[idx ]. pv [2] ,
&atom[idx ]. vel [0], &atom[idx ]. vel [1], &atom[idx ]. vel [2]);

65
66
67
68
69
70
71
72
73
74
75

for(j=0;j <3;j++) // Correct units (nm ->a, nm/ps ->a/ps)


{
atom[idx ].pv[j ]*=10;
atom[idx ]. vel[j ]*=10;
}

76
77
78
79
80

81

82
83

// Read box dimensions


intrj. getline (junk ,1024) ;
sscanf (junk , "%lf %lf %lf", & boxDim [0], & boxDim [1] , & boxDim [2]);
prp ->V = boxDim [0]* boxDim [1]* boxDim [2]*1000; //nm ^3 -> a^3

84
85
86
87
88

10

11
Addition for
GROMACS
compatibility

PROPERTY
+step: int
+time: double
+atomsPerMolecule: int

CELL
Matrix
containing 3 box
vectors

TRJheader

+ReadGROFrame(): void

+header: TRJheader

TRJ

+ReadGROHeader(intrj:ifstream*,atomsPerMolecule:int): int

Number of atoms

+nmovatm: int

+H[3][3]: double

+pv[6]: double
+vel[6]: double
+name: char*

ATOM

+rd_gro(name:char*): int

+trj: TRAJECTORY
+atom: ATOM*
+cell: CELL*
+prp: PROPERTY*

+rd_frame(iframe:int): void

Contains
structure
information

TRAJECTORY
+strj: TRJ*

MODEL

+doit(in_model:MODEL*): void

+model: MODEL*

Computes 2PT

ANALYSIS

B
Class flowchart

Vous aimerez peut-être aussi