Académique Documents
Professionnel Documents
Culture Documents
APPROCHES
BY
1
CONTENTS
Introduction on CoMFA
Biological activity
model
CoMSIA
2
Conclusion
References
3
literature is replete with examples where the sole benefit of CoMFA was the
confirmation of a previously determined model. Can we look beyond this limited
information and reveal more significant insights in our model. By the very nature
of the technique, the most crucial step in this 3D approach is the relative
orientation of the test molecules in space. That is, the chosen alignment of the
compounds in the training set is going to have the most profound impact on the
predictive ability of the model. We have already noted the plethora of methods
available to us for structural superimposition. From a purely drug design
perspective, in those instances where one has no knowledge of the three-
dimensional shape of the receptor, a set of alignment rules based upon a
pharmacophore hypothesis will probably be the most valuable. Indeed, perhaps
the principle value of this methodology is the evaluation of such alignments
based upon the predictive power of the derived model. One may conclude that
the greater the predictive power of the model the more the alignment reflects the
bioactive conformation of the molecules. As a simple example of such a
phenomena, we were interested in a series of oxime analogues as cholinergic
agonists at the m1 receptor sub-type for the treatment of Alzheimer's disease.
The compounds concerned were active as both their syn and anti isomers,
although there was not a strict parallel in their respective SAR's. The molecules
were originally modeled with their oxime substituted side-chains orientated in
opposite positions in space. That is to say that the syn oximes had their side-
chains aligned to one side of the molecule and the anti had theirs placed on the
other side:
Interestingly, when all the analogues were combined into a single CoMFA model
no relationship could be found between the structure and biological activity. The
predictive value, or cross-validated r-square, was a negative number for this
model, indicating that any predictions made based on this alignment was no
better than simply taking the average of all the compounds biological activities.
However, if one performed CoMFA on the isomeric pairs individually, a good
predictive model was found for one set of oximes, but not for the other. This led
4
us to the conclusion that instead of the oxime side-chains reaching out into
opposite points in 3D space, they were probably aligned to a common site. Re-
orientation of the side-chains in such a manner provided a CoMFA model with
high predictive power, thus validating our hypothesis.
5
CoMFA process: (1). The selected bioactive conformation of each compound is
superimposed (2) the steric, electrostatic, and sometimes other fields are calculated with
various probe groups around the molecules .(3) the important features related to the
biological activity are extracted by partial least squares statistical analysis (4) the results are
displayed in contour plots .
The results are then displayed in contour plots showing the important regions in
three-dimensional space that are highly associated with the biological activity.
1. Biological activity
2. Selection of compounds and series design
3. Generation of 3D structure of the ligand molecules
4. Conformational analysis of each molecule
5. Establishment of the bioactive conformation of each molecule
6. Binding mode and superimposition of the molecules
7. Position of the lattice points
8. Choice of force fields and calculations of the interaction energies
9. A statistical analysis of the data nd the selection of the 3D-QSAR model
10. Display of the results in contour plots and interpretations of them
11. Design and forecasting the activity of unknown compounds.
1. Biological Data
Accurate biological data are essential for any QSAR technique. The
biological data must be obtained for a set of ligands using uniform protocols and
ideally from a single source. The biological activity ranges covered should be as
the data could have discrepancies that any be misleading.
6
2. Selection of compounds and series design
• Minimization of co linearity
• Maximization of variance and
• Mapping of substituent space with the smallest number of compounds.
Both X-ray crystallography and molecular modelling can provide the starting
structures of the molecules. The Computational methods for the 3D-structures
generation can be classified into manual, numeric, and automatic methods. In the
manual mode, the user constructs a 3D structure interactively on a 3D computer
graphics interface from scratch or from an existing 3D structure including those in
various fragment libraries. Manually constructed 3D structure can be used as input to
generate many conformations or to further refine by several numerical methods such
as distance geometry, quantum or molecular mechanical methods. When the
structures of relatively small numbers of compound are desired, as in the case for
3D-QSAR studies, this approach is usually used. The automatic methods are often
7
used for building a 3D-structure database. Although rapid generation of approximate
3D structures is essential for building a 3D structure database, it is usually not as
critical for 3D-QSAR study.
• Molecular mechanics
• Semi empirical molecular orbit and
• ab initio molecular orbit methods.
Molecular mechanics methods are fast and accurate for molecules within the
range of the force field parameterization, and can handle very large molecules, such
as enzymes. Ab initio methods are more time consuming.
Some small rigid molecules have only one low-energy conformation larger and more
flexible molecules have several conformations populated at room or physiological
temperature.
The disadvantage of this method is that the time required for systematic search
increases exponentially with each additional rotatable bond, and the generation and
energy evaluation of the conformations can easily become impractical..Must deal
with the ring closure problem for cyclic structure and further limits its efficiency of
8
compounds with more than 20dihedral angles. The second conformational analysis
method is the molecular dynamics is a method of studying the motions and the
configurationally space of the molecular system in which the time evolution or
trajectory of a molecule is described by the classical Newtonian equations of
motions.
1. The protein must crystallize, and the crystallizing medium is usually far
form the physiological condition. This may mean that crystalline enzyme is
in an inactive conformation.
2. Data collection usually takes a long time, and this procedure a time-
averaged structure.
3. Distortion in the structure involving the active site may arise from the
packing of the enzyme in the crystal.
9
Different approach involved establishment of the bioactive conformation of
each molecule.
c. Molecular-fitting technique
d. DISCO
10
Once the bioactive conformations of all the compound are determined, the
next step in CoMFA is to superimpositions and align them in a grid box. This
is one of the most crucial step in CoMFA and results of CoMFA analyses
depend on the alignment of the molecules. A complication in CoMFA is that
the selection of either bioactive conformation or the superimposition of the
molecules are influences by choice of other .therefore the two aspects are
sometime consider simultaneously.
11
c. Alignments based on fields or pseudo fields:- this alignment can be
performed using the calculated energy field instead of atoms or binding-site
points.
When all the compounds are superimposed, they are located in a grid box for
calculating interaction energies with various probes at each lattice points.
The position of the lattice points, constriction of the data table and interaction
energy calculations, and molecules force field are
a. Position of the lattice points:- In order to place the lattice points around the
molecules, three aspects are of concern
The choices of the grid spacing is 2 A0 and the size of grid box is about 3-4 A0
larger than the union surface of the molecules. Electrostatic interaction are long
range and a larger grid box may be required.
12
c. Molecular force fields:- A force field is the empirical fit to the potential
energy surface. It defines the mathematical form of the equation involving the
coordinates and parameters adjusted in the empirical fit of the potential
energy surface. A force field uses a combination of bond distances, bond
angels, torsion angles and inter atomic distances to describe both bonds as
well as van der waals and electrostatic interaction between atoms.
13
It was the intention of the authors that this mini-review of Alternative CoMFA
fields be as complete as possible. If we have omitted work in the field, we wish
to apologize in advance.
It is, in fact, a rather simple matter to create a field. All that is actually
necessary is an atomistic parameter and some mathematical functional form
for the distance dependence for that parameter. Then the field is created by
summing the effect of all atoms on each grid point in the cage surrounding the
molecule. Some form of arbitrary cut-off can be imposed to eliminate or
truncate the contribution of grid points that are within van der Waals radii of
molecular atoms.
HINT Fields
One of the earliest efforts is the hydropathic interaction (HINT) technique of
Kellogg et al. The HINT formalism is strongly rooted in the C-LogP technique of
Hansch and Leo. Beginning with the fragment constants used in the computation
of the octanol:water partition coefficient (C-LogP), Abraham and Leo suggested
the further deconstruction of these values into atomic contributions to overall
molecular hydrophobicity. Wireko, Kellogg and Abraham demonstrated that such
values can be calculated and that a hydrophobicity field for a given molecule can
14
be computed. At each grid intersection point, the net sum of the following
empirical equation is evaluated over all the atoms for a given molecule:
In 1991 Kellogg, Semus and Abraham used the HINT field in a re-examination of
the classic steroid data set with mixed results. While the HINT field contributed
significantly to three-field (steric, electrostatic, HINT) models, the additional field
did not improve the statistical measures of the model. However, the authors
proposed that the additional field adds interpretability to CoMFA models by
being easy to understand in chemical (synthetic/drug design) terms. Later
studies of ryanodines, barbiturates, and other systems[9] have confirmed this
hypothesis. These fields have also been demonstrated to effectively model
experimentally-determined logP values in cases where calculated logP (CLog-P)
values fail - positional isomers, etc.
Molecular Lipophilicity Potential (MLP) Fields.
Others, for example Norinder and Altomare et al., have used fields based on
Molecular Lipophilicity Potentials (MLP) that were described by Fauchere et al
in 1988. These fields add lipophilic information through the use of atomistic
hydrophobic parameters as derived by a variety of researchers.
H-bonding Fields.
Kim has reported the use of the direction-dependent 6-4 function of the
GRID program to generate hydrogen bonding fields as descriptors of the
hydrophobic interactions. Specifically, rather than using a raw atom as a
molecular probe, Kim uses a neutral H2O "molecule" with an effective radius of
1.7 A. Two hydrogen bond donating and two hydrogen bond accepting
properties are assigned to the probe. The probe is allowed to freely rotate
about the grid point in order to optimize the interaction as computed using the
GRID function. The hydrogen bonding potential energy is computed at each
lattice intersection according to the following function:
E{hb} = (C/d[6] - D/d[4]) cos (m{theta})
15
where C and D are taken from tables and m is the angle described by the trio
of donor, hydrogen and acceptor atoms.
This approach has been successfully applied to model the hydrophobic effect
of substituents on the aromatic ring of several series of compounds with
respect to alterations in pharmacodynamic as well as chemical equilibrium
constants. In the three cases presented by Kim, the 3D-QSAR models were
consistently more statistically robust than the corresponding classical QSAR
equations based on {pi} substituent constants.
The GRID-based H2O probe has also been used in conjunction with GRID-
based steric (CH3, 1.95 A radius, 0.0 charge) and electrostatic (H+, 0.0 A
radius, 1.0 charge) probes to model the receptor binding affinities of a series of
benzodiazepines. In this particular case, a significant correlation (r > 90%) was
found to exist between the steric and hydrophobic fields. The electrostatic and
hydrophobic fields were significantly less collinear (r < 70%) and were included
in the model which best described the binding data. The resulting GRID-
CoMFA model indicated that hydrophobic fields explained 78% of the variance
in the binding data. The electrostatic field accounted for 18%. A standard
CoMFA study using standard probes and steric and electrostatic potential
functions performed on this same data set yielded qualitatively similar results in
that the model based on steric fields alone was found to be the most
descriptive of the data. The statistical significance of the GRID-CoMFA model
suggests that hydrophobicity information is crucial in this particular case and
that the SYBYL-based (Tripos, Inc.) sp3 carbon probe is not sufficient to
describe these effects.
Desolvation Energy Fields.
An exercise (some might say, in futility) was performed to compute the
desolvation free energy fields as a function of hydrophobicity. This was
accomplished using the finite difference approximation method as implemented
in the Delphi program (Biosym-MSI). In Delphi, the linearized Poisson-
Boltzmann equation is numerically solved to compute the electrostatic
contribution to solvation on a regularly-spaced field of points constructed
around a given molecule. It is this feature which ideally suits the results of
Delphi computation for inclusion into a 3D-QSAR model. Desolvation energy
16
fields are computed as the difference between the solvated (grid dielectric =
80) and in vacuo (grid dielectric = 1) field calculations.
In preliminary studies using inhibitors of angiotensin-converting enzyme (ACE)
and thermolysin, the desolvation energy fields did not successfully model the
hydrophobicities nor the reported binding affinities of training set molecules. It
was interesting, although not totally surprising, that in both of the above
studies, the desolvation energy fields were found to be highly collinear with the
SYBYL generated Coulombic electrostatic potential fields (r > 90%). The Delphi
technique does provide for the generation of mixed desolvated/solvated energy
fields. In structure-based 3D-QSAR studies (i.e., where the target is known), it
may be possible to compute the energy afforded by partial desolvation of the
ligand upon complexation with the target.
Molecular Orbital Fields.
HOMO fields have been shown to be beneficial for the refinement of 3D-QSAR
models for data sets such as the Angiotensin-Converting Enzyme set in order
to more completely describe the interaction between the ionized ligand and the
metal in the molecular binding domain. More recently, molecular orbital fields
have been used in the construction of 3D-QSAR models for molecular reactivity
endpoints Roy Vaz has examined the sigma (induction and resonance)
17
constants of amines (NH2-X) with CoMFA fields comprised of total electron
density (calculated with AM1/MOPAC5) and obtained excellent one and two
component PLS models. Vaz has also looked at OH radical formation rates for
substituted phenols and naphthalenes with the electron density fields and
obtained meaningful CoMFA models.
Electrotopological State Fields
Kellogg, Kier and Hall have recently created a 3-D field from the atomistic
Electrotopological State parameter of Kier and Hall. This parameter, which
represents a contraction of free valence (electronegativity) along with
topological information, is totally non-empirical. The "distance function" for the
field decay of the E-State was chosen through multiple CoMFA runs to be
inverse r-cubed. The E-State fields provide remarkably good statistical results
in PLS, comparing quite favorably (q2 = 0.803, 3 components) to the "standard"
CoMFA steric and electrostatic fields.
8. Pre treatment of data
When the data table constructed, it is analysed with statistical methods such as
partial least squares (PLS) to extract important features related to the biological
activity. Before performing the data analysis we follow different parameters like
a. Reduction of the data.
Reduction by standard deviation.
Reduction by energy cut-off.
Reduction in electrostatic contribution.
Reduction by variable selection.
b. Scaling of the data.
Auto scaling to unit variance.
Block-scaling to constant group variance.
Block-adjusted scaling.
c. Centering of the data.
9. Statistical analysis of the data and the selection of 3D-QSAR model.
a. Partial least-squares (PLS) analysis
Partial Least Squares (PLS) regression technique is especially useful in quite
common case where the number of descriptors (independent variables) is
comparable to or greater than the number of compounds (data points) and/or
18
there exist other factors leading to correlations between variables. In this case
the solution of classical least squares problem does not exist or is unstable and
unreliable. On the other hand, PLS approach leads to stable, correct and highly
predictive models even for correlated descriptors.
Partial least squares (PLS) is sometimes called "Projection to Latent Structures"
because of its general strategy. The X variables (the predictors) are reduced to
principal components, as are the Y variables (the dependents). The components
of X are used to predict the scores on the Y components, and the predicted Y
component scores are used to predict the actual values of the Y variables. In
constructing the principal components of X, the PLS algorithm iteratively
maximizes the strength of the relation of successive pairs of X and Y component
scores by maximizing the covariance of each X-score with the Y variables. This
strategy means that while the original X variables may be multicollinear, the X
components used to predict Y will be orthogonal. Also, the X variables may have
missing values, but there will be a computed score for every case on every X
component. Finally, since only a few components (often two or three) will be used
in predictions, PLS coefficients may be computed even when there may have
been more original X variables than observations (though greater cases are
recommended). In contrast, any of these three conditions (multicollinearity,
missing values, and too few cases in relation to variables) may well render
traditional OLS regression estimates unreliable (and estimates by other
procedures in the general and generalized linear model families).
19
The advantages of PLS include ability to model multiple dependents as well as
multiple independents; ability to handle multicollinearity among the independents;
robustness in the face of data noise and missing data; and creating independent
latents directly on the basis of crossproducts involving the response variable(s),
making for stronger predictions. Disadvantages of PLS include greater difficulty of
interpreting the loadings of the independent latent variables (which are based on
crossproduct relations with the response variables, not based as in common
factor analysis on covariances among the manifest independents) and because
the distributional properties of estimates are not known, the researcher cannot
assess significance except through bootstrap induction. Overall, the mix of
advantages and disadvantages means PLS is favored as a predictive technique
and not as an interpretive technique, except for exploratory analysis as a prelude
to an intepretive technique such as multiple linear regression or covariance-
based structural equation modeling.
Though developed by Herman Wold for econometrics, PLS first gained popularity
in chemometric research and later industrial applications. It has since spread to
research in education, marketing, and the social sciences.
b. Validation of CoMFA
The most important criterion for selection a CoMFA model is how well the
model can be predict the activity of the compounds out side the model rather
than the model reproduces the biological activity of the compounds include in
the model
Various approaches are used for this purpose.
Cross-validation.
Bootstrapping.
Random change of the dependent variable value.
20
Dividing the original set into the training set and test set.
c. Derivation of 3D-QSAR model
After optimum numbers of compounds are chosen from the cross validation
test, the CoMFA model is derived from all compounds and the optimum
number of components. For this model, R2 and s are calculated in the same
way as R2cv and scv , expect that PRESS is replaced by the sum of the squares
of the differences between the calculated and the observed biological activities.
If multiple fields are considering being important for biological potency, they
can be examined simultaneously, or individually one after another. The results
from three different approaches in considering multiple fields were recently
explored.
Chance correlation
Co linearity
Number of compounds
d. Outlier detection
Outlier and other in homogeneities can easily be detected from the derivations
in the calculated values of the model or by examining the corresponding
residual plot. They can also be detected by inspecting the residuals from the
cross-validation test.
e. Pitfalls
There are several pitfalls that require careful attention in QSAR. One particular
pitfall in PLS analysis is that there is no guarantee that PLS will find all the
relation contained in the data.
f. Consideration of all compounds and omission of compounds
The compound are purposely left out as a test set, all compounds of the
original data set should be used in deriving the model. The omission of
compounds is usually a negatives statement on the quality of the correlation or
biological data.
10. Display of the results in contour plots and their interpretation
Several types of fields are retrievable from a CoMFA analysis. The coefficient
contour plots are the most often examined but sometimes PLS plots provide
useful information.
21
11. Design and forecasting the activity of unknown compounds
There are now a few hundred practical applications of CoMFA in drug design.
Most applications are in the field of ligand protein interactions, describing
affinity or inhibition constants. In addition, CoMFA has been used to correlate
steric and electronic parameters.
Less appropriate seems the application of CoMFA to in vivo data, even if
lipophilicity is considered as an additional parameter.
Develop quantitative structure-activity relationships
Predict the properties and activities of untested molecules
Compare different QSAR models statistically and visually
Optimize the properties of a lead compound
Validate models of receptor binding sites
Generate hypotheses about the characteristics of a receptor binding site
Prioritize compounds for synthesis or screening
Determine key structural requirements for high affinity receptor ligand.
22
CoMSIA
The CoMSIA technique was introduced by Klebe in 1994 in which similarity
indices are calculated at different points in a regularly spaced grid for pre aligned
molecules. It has several advantages over CoMFA technique like greater robustness
regarding both region shifts and small shifts within the alignments; no application of
arbitrary cut offs and more intuitively interpretable contour maps. The standard
settings (Probe with charge þ1, radius 1_A and hydrophobicity þ1, hydrogen-bond
donating þ1, hydrogen-bond accepting þ1, attenuation factor a of 0.3 and grid
spacing 2_A) were used in CoMSIA to calculate five different fields viz steric,
electrostatic, hydrophobic, acceptor and donor.
Conclusion
The 3D QSAR studies carried out using CoMFA and CoMSIA have led to the
identification of the regions important for steric, hydrophobic and electronic
interactions and the derived models well explain the observed variance in the activity
and also provide important insight into structural variations that can lead to the
design of NCEs with high activity.
23
References
1. Molecular similarity in drug design by P.M. Dean, 1st Edition, p.g - 291-323.
3. Klebe, G., Abraham, U., Mietzner, T., J. Med. Chem. 1994, 37, 4130-4146.
7. http://www.vcclab.org/lab/pls/m_description.html.
8. http://faculty.chass.ncsu.edu/garson/PA765/pls.html.
9. http://www.netsci.org/Science/Compchem/feature11.html.
24