Com Fa

CoMFA & CoMSIA AS 3D QSAR
APPROCHES
BY
BHARATH KUMAR INTURI
1
CONTENTS
 Introduction on CoMFA
 Considered CoMFA Analysis
 Biological activity
 Selection of compounds and series design
 Generation of 3D structure of the ligand molecules
 Conformational analysis of each molecule
 Establishment of the bioactive conformation of each molecule
 Binding mode and superimposition of the molecules
 Position of the lattice points
 Choice of force fields and calculations of the interaction energies
 A statistical analysis of the data nd the selection of the 3D-QSAR
model
 Display of the results in contour plots and interpretations of them
 Design and forecasting the activity of unknown compounds.
 Miscellaneous aspects of CoMFA
 CoMFA Applications in Drug Design
 CoMSIA
2
 Conclusion
 References
The CoMFA methodology is a 3D quantitative structure-activity relationship

(QSAR) technique which ultimately allows one to design and predict activities of
molecules. The database of molecules with known properties, the training set,
are suitably aligned in 3D space according to various methodologies.
Superimposition techniques include those that maximize the steric overlap,
those that are based upon crystallographic data, those based upon a
pharmacophore theory, those employing a steric and electrostatic alignment
algorithm, those based upon automated field fit methods and those utilizing
pharmacophore mapping programs such as DISCO. Having arrived at whatever
one considers to be the alignment of choice, charges are then calculated for
each molecule at a level of theory deemed appropriate. No one can construct a
field. Steric and electrostatic fields are calculated for each molecule by
interaction with a probe atom at a series of grid points surrounding the aligned
database in 3D space. One then attempts to correlate these field energy terms
with a property of interest by the use of partial least squares (PLS) with cross-
validation, giving a measure of the predictive power of the model. So much for
the crude description of the technique. The question is; does it work and do the
results mean anything? Perhaps that is a little harsh. Maybe the real question
we should be asking is whether this method provides us with information that
was not available or apparent by other techniques.
one of the most frequently debated topics was whether CoMFA told you
anything more than one may have gathered by simple examination of the data.
In other words, can a medicinal chemist using his knowledge and intuition make
"ball park" predictions of activity for designed compounds, based on the
structure and biological data of a series of analogues, In a congeneric series
where steric factors predominate the answer is obviously that one can. The
3
literature is replete with examples where the sole benefit of CoMFA was the
confirmation of a previously determined model. Can we look beyond this limited
information and reveal more significant insights in our model. By the very nature
of the technique, the most crucial step in this 3D approach is the relative
orientation of the test molecules in space. That is, the chosen alignment of the
compounds in the training set is going to have the most profound impact on the
predictive ability of the model. We have already noted the plethora of methods
available to us for structural superimposition. From a purely drug design
perspective, in those instances where one has no knowledge of the three-
dimensional shape of the receptor, a set of alignment rules based upon a
pharmacophore hypothesis will probably be the most valuable. Indeed, perhaps
the principle value of this methodology is the evaluation of such alignments
based upon the predictive power of the derived model. One may conclude that
the greater the predictive power of the model the more the alignment reflects the
bioactive conformation of the molecules. As a simple example of such a
phenomena, we were interested in a series of oxime analogues as cholinergic
agonists at the m1 receptor sub-type for the treatment of Alzheimer's disease.
The compounds concerned were active as both their syn and anti isomers,
although there was not a strict parallel in their respective SAR's. The molecules
were originally modeled with their oxime substituted side-chains orientated in
opposite positions in space. That is to say that the syn oximes had their side-
chains aligned to one side of the molecule and the anti had theirs placed on the
other side:
Interestingly, when all the analogues were combined into a single CoMFA model
no relationship could be found between the structure and biological activity. The
predictive value, or cross-validated r-square, was a negative number for this
model, indicating that any predictions made based on this alignment was no
better than simply taking the average of all the compounds biological activities.
However, if one performed CoMFA on the isomeric pairs individually, a good
predictive model was found for one set of oximes, but not for the other. This led
4
us to the conclusion that instead of the oxime side-chains reaching out into
opposite points in 3D space, they were probably aligned to a common site. Re-
orientation of the side-chains in such a manner provided a CoMFA model with
high predictive power, thus validating our hypothesis.
5
CoMFA process: (1). The selected bioactive conformation of each compound is
superimposed (2) the steric, electrostatic, and sometimes other fields are calculated with
various probe groups around the molecules .(3) the important features related to the
biological activity are extracted by partial least squares statistical analysis (4) the results are
displayed in contour plots .
CoMFA is a 3D-OSAR technique employing both interactive graphics and statically

technique for corresponding shapes and properties of molecule with their biological
activity. Bioactive conformations of each compound are chosen, and they are
superimposed in a manner defined by the supposed mode of interaction with the
target receptor and compares in three dimensions and fields calculated around the
molecules with various probe groups.
The results are then displayed in contour plots showing the important regions in
three-dimensional space that are highly associated with the biological activity.
Considered CoMFA analysis:
1. Biological activity
2. Selection of compounds and series design
3. Generation of 3D structure of the ligand molecules
4. Conformational analysis of each molecule
5. Establishment of the bioactive conformation of each molecule
6. Binding mode and superimposition of the molecules
7. Position of the lattice points
8. Choice of force fields and calculations of the interaction energies
9. A statistical analysis of the data nd the selection of the 3D-QSAR model
10. Display of the results in contour plots and interpretations of them
11. Design and forecasting the activity of unknown compounds.
1. Biological Data
Accurate biological data are essential for any QSAR technique. The
biological data must be obtained for a set of ligands using uniform protocols and
ideally from a single source. The biological activity ranges covered should be as
the data could have discrepancies that any be misleading.
6
2. Selection of compounds and series design
The application of CoMFA is a process for optimizing the desired biological

activity while eliminating or reducing the side effects by structural modifications.
When we modify the structure of a molecule ,we simultaneously make a variety of
changes. The changes not only affect the desired potency of the drug, but also
absorption, distribution, metabolism and excretion.
It is usually desirable to choose compounds such that their variables for

hydrophobic, strict and electrostatic properties behave independently. It is not easy
to make such a decision by inspection of the compounds. Serious problems in
classical QSAR are often the intercorrelation among different properties. It may be
less of a problem in 3D-QSAR, There are three major issues in choosing substituent
for the modification of compounds:
• Minimization of co linearity
• Maximization of variance and
• Mapping of substituent space with the smallest number of compounds.
3. Generation of three-dimensional structure of the ligand molecules
There are two aspects to be considered in the 3D structure of the ligand

molecule. To accurately represent the molecular structure In three dimensions. Two
points needed to be discussed : generation of the starting structure and optimization
of the starting structure. The second is how to determine the bioactive conformation.
Both X-ray crystallography and molecular modelling can provide the starting
structures of the molecules. The Computational methods for the 3D-structures
generation can be classified into manual, numeric, and automatic methods. In the
manual mode, the user constructs a 3D structure interactively on a 3D computer
graphics interface from scratch or from an existing 3D structure including those in
various fragment libraries. Manually constructed 3D structure can be used as input to
generate many conformations or to further refine by several numerical methods such
as distance geometry, quantum or molecular mechanical methods. When the
structures of relatively small numbers of compound are desired, as in the case for
3D-QSAR studies, this approach is usually used. The automatic methods are often
7
used for building a 3D-structure database. Although rapid generation of approximate
3D structures is essential for building a 3D structure database, it is usually not as
critical for 3D-QSAR study.
After the initial structure is generated, the geometry has to be refined.

Structure optimization procedures begin with a chemically reasonable 3D starting
and attempt to improve the quality of the structure by minimizing its conformation
energy by molecular mechanics or molecular orbit methods.
An important factor in structure optimization:
• Molecular mechanics
• Semi empirical molecular orbit and
• ab initio molecular orbit methods.
Molecular mechanics methods are fast and accurate for molecules within the
range of the force field parameterization, and can handle very large molecules, such
as enzymes. Ab initio methods are more time consuming.
4. Conformational analysis of each molecules
Some small rigid molecules have only one low-energy conformation larger and more
flexible molecules have several conformations populated at room or physiological
temperature.
Conformational analysis is done by the systematic search method. Since greater

energy is needed to change bond length or bond angles compared to a dihedral
angle, the major difference between various conformations of a molecule is generally
due to a dihedral angle. Conformations are generated by systematically varying each
of the dihedral angles in the molecule by some increment whilst keeping the bond
lengths and bond angles fixed. Such systematic search is called ‘Grid Search’. The
increment for the torsion angle variation is appropriately small; this method is an
exhaustive search technique and will generate all the conformations possible.
The disadvantage of this method is that the time required for systematic search
increases exponentially with each additional rotatable bond, and the generation and
energy evaluation of the conformations can easily become impractical..Must deal
with the ring closure problem for cyclic structure and further limits its efficiency of
8
compounds with more than 20dihedral angles. The second conformational analysis
method is the molecular dynamics is a method of studying the motions and the
configurationally space of the molecular system in which the time evolution or
trajectory of a molecule is described by the classical Newtonian equations of
motions.
The third conformational analysis, the dynamic behavior of a molecule is simulated

by random changes in the structure. The energy of the changed configuration is
calculated, and if the energy is lowers than the previous configuration is accepted. If
the energy is higher, an algorithm is used to determine whether the new
configuration is to be accepted.
5. Establishment of the bioactive conformation of each molecule
The bioactive conformation of a drug molecule refers to the confirmation it is bound

to its target receptor. When CoMFA analysis is performed on a series of simple
analogs, it may not be necessary to establish a bioactive conformation.
Experimentally, the bioactive conformation of a drug can be obtained from structure

determination of the drug-receptor complex by X-ray crystallography or NMR
spectroscopy. X-ray crystallography is the only know method for determining exactly
the three-dimensional shape of macromolecules. There are several disadvantages in
the X-ray crystallography approach:
1. The protein must crystallize, and the crystallizing medium is usually far
form the physiological condition. This may mean that crystalline enzyme is
in an inactive conformation.
2. Data collection usually takes a long time, and this procedure a time-
averaged structure.
3. Distortion in the structure involving the active site may arise from the
packing of the enzyme in the crystal.
NMR spectroscopy is another way of obtaining 3D structural data of protein,

and is an many ways complementary to X-ray crystallography.
9
Different approach involved establishment of the bioactive conformation of
each molecule.
a. Active analog approach
b. Ensemble distance geometry approach
c. Molecular-fitting technique
d. DISCO
a. Active analog approach: - This is one of the well-known techniques used to

establish of bioactive conformation of set of compounds. The method
determines the allowed conformations of all molecules in the study by
systematic search and selection conformations that satisfy the inter atomic
distance in the working pharmacophore.
b. Ensemble distance geometry approach:- This method rapidly determined

whether any set of conformations exists, and generation a random set of
conformations that satisfy the distance constrictions.
An advantage of the approaches is that it handles cyclic structure with out

difficult of ring closure encountered in the systematic searching method.
c. Molecular-fitting technique:- when a ridge compound that binds tightly to

the receptor is available this method can be used to derives bioactive
conformations. This method sometimes also called as template forcing
approach.
d. DISCO :- It is a computer program devised to find the bioactive conformation

of molecules. In order to find out the bioactive conformation of each
compound, its search superposition rules that satisfy a set of conformations
for all active molecules among the conformation that were supplied by a
computer.
6. Superimposition of the molecules
10
Once the bioactive conformations of all the compound are determined, the
next step in CoMFA is to superimpositions and align them in a grid box. This
is one of the most crucial step in CoMFA and results of CoMFA analyses
depend on the alignment of the molecules. A complication in CoMFA is that
the selection of either bioactive conformation or the superimposition of the
molecules are influences by choice of other .therefore the two aspects are
sometime consider simultaneously.
During the alignment process, it is important to consider the common different

or union volumes of the superimposed active or inactive molecules
separately, as well as together.
Several approaches have been used for the alignment of molecules
a. Alignment based on atom overlapping
b. Alignment based on receptor binding sites
c. Alignments based on fields or pseudo fields
a. Alignment based on atom overlapping: - this method is most popular and

classic method for molecular alignment it is the atom overlapping which gives
the best matching of the preselected atom position. Sometimes the
alignments may be based on a common template.
This method also called as Pharmacophore approach, is powerful in detecting

dissimilarity between similar molecules. A disadvantages of the method is that
it requires corresponding atom pairs, and thus this method cannot be applied
to molecules in which the corresponding atoms are difficult to select.
b. Alignment based on receptor binding sites :- this method similar to

alignment based on atom in the receptor, a receptor site can interact with a
variety of orientation of the same functional group in equal affinity.
Thus in the binding-site approach, the molecules are superimposed by

overlapping the receptor binding site or the receptor interact with ligand rather
than the pharmacophore group.
11
c. Alignments based on fields or pseudo fields:- this alignment can be
performed using the calculated energy field instead of atoms or binding-site
points.
In other approaches, electrostatic similarity or molecular surface similarity

indices are used in superposition.
7. Calculation of the interaction energies
When all the compounds are superimposed, they are located in a grid box for
calculating interaction energies with various probes at each lattice points.
The position of the lattice points, constriction of the data table and interaction
energy calculations, and molecules force field are
a. Position of the lattice points
b. Construction of data table and interaction energy calculations
c. Molecular force fields
a. Position of the lattice points:- In order to place the lattice points around the
molecules, three aspects are of concern
 The size of the grid spacing
 The size of the grid box
 The location of grid box
The choices of the grid spacing is 2 A0 and the size of grid box is about 3-4 A0
larger than the union surface of the molecules. Electrostatic interaction are long
range and a larger grid box may be required.
b. Construction of data table and interaction energy calculations:- Once

molecules are aligned in a grid box, the next step of CoMFA is to set up a
data table as in the classical QSAR. This includes the biological activity or
other properties that are to be correlated and interaction energy fields.
12
c. Molecular force fields:- A force field is the empirical fit to the potential
energy surface. It defines the mathematical form of the equation involving the
coordinates and parameters adjusted in the empirical fit of the potential
energy surface. A force field uses a combination of bond distances, bond
angels, torsion angles and inter atomic distances to describe both bonds as
well as van der waals and electrostatic interaction between atoms.
 Standard CoMFA Fields
The standard potential energy fields produced by the out-of-the-box

CoMFA program are steric (van der Waals) and electrostatic (Coulombic).
The standard CoMFA probe is an sp3 hybridized carbon atom with an
effective radius of 1.53 A and a +1.0 charge. The probe atom to ligand
atom distance-dependence of the potential functions (i.e., the standard 6-
12 of the Lennard-Jones potential and r-square term of the Coulombic
potential) result in steep changes as the probe nears the surface of the
molecule. It is the convention to truncate steric value at some arbitrary level
(on the order of 4.0 to 30.0 kcal/mol) to eliminate points both within the van
der Waals shells of molecules and at the periphery of the region so that
effectively a shell of points is used. Electrostatic values are also truncated
at similar levels and most commonly ignored at points inside the molecules.
 New CoMFA Fields
The relatively straightforward nature of the CoMFA paradigm makes it

potentially very powerful. While steric and electrostatic properties of molecules
are major physicochemical properties related to biological activity, they are
purely enthalpic. It is desirable in many cases to characterize additional
properties on a three-dimensional basis. Efforts to include entropic properties
within a CoMFA framework have been to characterize the hydrophobic nature
of molecules. More recently, reactivity-based fields such as those of molecular
orbitals have also been imported into CoMFA studies. The type of field to be
generated and included in a CoMFA model is limited only by the creativity of
the research and the validity of the underlying theory.
13
It was the intention of the authors that this mini-review of Alternative CoMFA
fields be as complete as possible. If we have omitted work in the field, we wish
to apologize in advance.
It is, in fact, a rather simple matter to create a field. All that is actually
necessary is an atomistic parameter and some mathematical functional form
for the distance dependence for that parameter. Then the field is created by
summing the effect of all atoms on each grid point in the cage surrounding the
molecule. Some form of arbitrary cut-off can be imposed to eliminate or
truncate the contribution of grid points that are within van der Waals radii of
molecular atoms.
 Inclusion of the Hydrophobic Effect
Steric, electronic, and hydrophobic effects are considered to be among the

primary forces in ligand:receptor interactions. In classical (Hansch-type) QSAR
studies, these three forces are described using scalar descriptors such as Taft
(Es), Hammett (sigma), parameters or partition coefficients (logP), respectively.
In 3D-QSAR studies, the steric and electronic effects are approximated using
molecular shape (Lennard-Jones) and charge distribution (Coulombic) potential
energy fields. The missing piece of the puzzle is, of course, a field to represent
the hydrophobic "binding" component. Various approaches have been
advocated for the description of this effect within 3D-QSAR studies. Among
these are empirically-based hydropathic interaction (HINT) field, a large variety
of lipophilic potential fields, the molecular mechanically based H2O probe for
the description of hydrogen bonds, and the Poisson-Boltzman finite difference
approach based calculation of desolvation energy fields.
 HINT Fields
One of the earliest efforts is the hydropathic interaction (HINT) technique of
Kellogg et al. The HINT formalism is strongly rooted in the C-LogP technique of
Hansch and Leo. Beginning with the fragment constants used in the computation
of the octanol:water partition coefficient (C-LogP), Abraham and Leo suggested
the further deconstruction of these values into atomic contributions to overall
molecular hydrophobicity. Wireko, Kellogg and Abraham demonstrated that such
values can be calculated and that a hydrophobicity field for a given molecule can
14
be computed. At each grid intersection point, the net sum of the following
empirical equation is evaluated over all the atoms for a given molecule:
In 1991 Kellogg, Semus and Abraham used the HINT field in a re-examination of
the classic steroid data set with mixed results. While the HINT field contributed
significantly to three-field (steric, electrostatic, HINT) models, the additional field
did not improve the statistical measures of the model. However, the authors
proposed that the additional field adds interpretability to CoMFA models by
being easy to understand in chemical (synthetic/drug design) terms. Later
studies of ryanodines, barbiturates, and other systems[9] have confirmed this
hypothesis. These fields have also been demonstrated to effectively model
experimentally-determined logP values in cases where calculated logP (CLog-P)
values fail - positional isomers, etc.
 Molecular Lipophilicity Potential (MLP) Fields.
Others, for example Norinder and Altomare et al., have used fields based on
Molecular Lipophilicity Potentials (MLP) that were described by Fauchere et al
in 1988. These fields add lipophilic information through the use of atomistic
hydrophobic parameters as derived by a variety of researchers.
 H-bonding Fields.
Kim has reported the use of the direction-dependent 6-4 function of the
GRID program to generate hydrogen bonding fields as descriptors of the
hydrophobic interactions. Specifically, rather than using a raw atom as a
molecular probe, Kim uses a neutral H2O "molecule" with an effective radius of
1.7 A. Two hydrogen bond donating and two hydrogen bond accepting
properties are assigned to the probe. The probe is allowed to freely rotate
about the grid point in order to optimize the interaction as computed using the
GRID function. The hydrogen bonding potential energy is computed at each
lattice intersection according to the following function:
E{hb} = (C/d[6] - D/d[4]) cos (m{theta})
15
where C and D are taken from tables and m is the angle described by the trio
of donor, hydrogen and acceptor atoms.
This approach has been successfully applied to model the hydrophobic effect
of substituents on the aromatic ring of several series of compounds with
respect to alterations in pharmacodynamic as well as chemical equilibrium
constants. In the three cases presented by Kim, the 3D-QSAR models were
consistently more statistically robust than the corresponding classical QSAR
equations based on {pi} substituent constants.
The GRID-based H2O probe has also been used in conjunction with GRID-
based steric (CH3, 1.95 A radius, 0.0 charge) and electrostatic (H+, 0.0 A
radius, 1.0 charge) probes to model the receptor binding affinities of a series of
benzodiazepines. In this particular case, a significant correlation (r > 90%) was
found to exist between the steric and hydrophobic fields. The electrostatic and
hydrophobic fields were significantly less collinear (r < 70%) and were included
in the model which best described the binding data. The resulting GRID-
CoMFA model indicated that hydrophobic fields explained 78% of the variance
in the binding data. The electrostatic field accounted for 18%. A standard
CoMFA study using standard probes and steric and electrostatic potential
functions performed on this same data set yielded qualitatively similar results in
that the model based on steric fields alone was found to be the most
descriptive of the data. The statistical significance of the GRID-CoMFA model
suggests that hydrophobicity information is crucial in this particular case and
that the SYBYL-based (Tripos, Inc.) sp3 carbon probe is not sufficient to
describe these effects.
 Desolvation Energy Fields.
An exercise (some might say, in futility) was performed to compute the
desolvation free energy fields as a function of hydrophobicity. This was
accomplished using the finite difference approximation method as implemented
in the Delphi program (Biosym-MSI). In Delphi, the linearized Poisson-
Boltzmann equation is numerically solved to compute the electrostatic
contribution to solvation on a regularly-spaced field of points constructed
around a given molecule. It is this feature which ideally suits the results of
Delphi computation for inclusion into a 3D-QSAR model. Desolvation energy
16
fields are computed as the difference between the solvated (grid dielectric =
80) and in vacuo (grid dielectric = 1) field calculations.
In preliminary studies using inhibitors of angiotensin-converting enzyme (ACE)
and thermolysin, the desolvation energy fields did not successfully model the
hydrophobicities nor the reported binding affinities of training set molecules. It
was interesting, although not totally surprising, that in both of the above
studies, the desolvation energy fields were found to be highly collinear with the
SYBYL generated Coulombic electrostatic potential fields (r > 90%). The Delphi
technique does provide for the generation of mixed desolvated/solvated energy
fields. In structure-based 3D-QSAR studies (i.e., where the target is known), it
may be possible to compute the energy afforded by partial desolvation of the
ligand upon complexation with the target.
 Molecular Orbital Fields.
In certain instances, a simple Coulomb type field may not be adequate to

represent the electronic characteristics of molecules. This is illustrated by
cases which attempt to model endpoints in which an ionic or charge-transfer
reaction is part of the ligand: target interaction. In these cases, the three-
dimensional characteristics (i.e., size and localization on/around a molecule) of
molecular orbital fields have proven be useful descriptors. As with the other
fields generated external to the SYBYL-CoMFA program, it is possible to import
these fields into a CoMFA framework. In the case of molecular orbital fields,
molecules with their CoMFA alignment referenced are subjected to
semiempirical MOPAC single-point (keyword:1SCF) calculations. A selected
orbital (i.e., HOMO or LUMO) for a given molecule is then imported into the
CoMFA defined region, and the electron density at the lattice intersections in
the region is extracted and recorded in the QSAR table as an electrostatic type
field.
HOMO fields have been shown to be beneficial for the refinement of 3D-QSAR
models for data sets such as the Angiotensin-Converting Enzyme set in order
to more completely describe the interaction between the ionized ligand and the
metal in the molecular binding domain. More recently, molecular orbital fields
have been used in the construction of 3D-QSAR models for molecular reactivity
endpoints Roy Vaz has examined the sigma (induction and resonance)
17
constants of amines (NH2-X) with CoMFA fields comprised of total electron
density (calculated with AM1/MOPAC5) and obtained excellent one and two
component PLS models. Vaz has also looked at OH radical formation rates for
substituted phenols and naphthalenes with the electron density fields and
obtained meaningful CoMFA models.
 Electrotopological State Fields
Kellogg, Kier and Hall have recently created a 3-D field from the atomistic
Electrotopological State parameter of Kier and Hall. This parameter, which
represents a contraction of free valence (electronegativity) along with
topological information, is totally non-empirical. The "distance function" for the
field decay of the E-State was chosen through multiple CoMFA runs to be
inverse r-cubed. The E-State fields provide remarkably good statistical results
in PLS, comparing quite favorably (q2 = 0.803, 3 components) to the "standard"
CoMFA steric and electrostatic fields.
8. Pre treatment of data
When the data table constructed, it is analysed with statistical methods such as
partial least squares (PLS) to extract important features related to the biological
activity. Before performing the data analysis we follow different parameters like
a. Reduction of the data.
 Reduction by standard deviation.
 Reduction by energy cut-off.
 Reduction in electrostatic contribution.
 Reduction by variable selection.
b. Scaling of the data.
 Auto scaling to unit variance.
 Block-scaling to constant group variance.
 Block-adjusted scaling.
c. Centering of the data.
9. Statistical analysis of the data and the selection of 3D-QSAR model.
a. Partial least-squares (PLS) analysis
Partial Least Squares (PLS) regression technique is especially useful in quite
common case where the number of descriptors (independent variables) is
comparable to or greater than the number of compounds (data points) and/or
18
there exist other factors leading to correlations between variables. In this case
the solution of classical least squares problem does not exist or is unstable and
unreliable. On the other hand, PLS approach leads to stable, correct and highly
predictive models even for correlated descriptors.
Partial least squares (PLS) is sometimes called "Projection to Latent Structures"
because of its general strategy. The X variables (the predictors) are reduced to
principal components, as are the Y variables (the dependents). The components
of X are used to predict the scores on the Y components, and the predicted Y
component scores are used to predict the actual values of the Y variables. In
constructing the principal components of X, the PLS algorithm iteratively
maximizes the strength of the relation of successive pairs of X and Y component
scores by maximizing the covariance of each X-score with the Y variables. This
strategy means that while the original X variables may be multicollinear, the X
components used to predict Y will be orthogonal. Also, the X variables may have
missing values, but there will be a computed score for every case on every X
component. Finally, since only a few components (often two or three) will be used
in predictions, PLS coefficients may be computed even when there may have
been more original X variables than observations (though greater cases are
recommended). In contrast, any of these three conditions (multicollinearity,
missing values, and too few cases in relation to variables) may well render
traditional OLS regression estimates unreliable (and estimates by other
procedures in the general and generalized linear model families).
Partial least squares (PLS) regression/path analysis is thus an alternative to OLS

regression, canonical correlation, orstructural equation modeling (SEM) for
analysis of systems of independent and response variables. In fact, PLS is
sometimes called "component-based SEM," in contrast to the usual covariance-
based structural equation modeling. PLS is a predictive technique which can
handle many independent variables, even when predictors display
multicollinearity. Like canonical correlation or multivariate GLM, it can also relate
the set of independent variables to a set of multiple dependent (response)
variables. However, PLS is less than satisfactory as an explanatory technique
because it is low in power to filter out variables of minor causal importance.
19
The advantages of PLS include ability to model multiple dependents as well as
multiple independents; ability to handle multicollinearity among the independents;
robustness in the face of data noise and missing data; and creating independent
latents directly on the basis of crossproducts involving the response variable(s),
making for stronger predictions. Disadvantages of PLS include greater difficulty of
interpreting the loadings of the independent latent variables (which are based on
crossproduct relations with the response variables, not based as in common
factor analysis on covariances among the manifest independents) and because
the distributional properties of estimates are not known, the researcher cannot
assess significance except through bootstrap induction. Overall, the mix of
advantages and disadvantages means PLS is favored as a predictive technique
and not as an interpretive technique, except for exploratory analysis as a prelude
to an intepretive technique such as multiple linear regression or covariance-
based structural equation modeling.
Though developed by Herman Wold for econometrics, PLS first gained popularity
in chemometric research and later industrial applications. It has since spread to
research in education, marketing, and the social sciences.
PLS may be implemented as a regression model, predicting one or more

dependents from a set of one or more independents; or it can be implemented as
a path model, akin to structural equation modeling. PLS is implemented as a
regression model by SPSS and by SAS's PROC PLS. SmartPLS is the most
prevalent implementation as a path model.
b. Validation of CoMFA
The most important criterion for selection a CoMFA model is how well the
model can be predict the activity of the compounds out side the model rather
than the model reproduces the biological activity of the compounds include in
the model
Various approaches are used for this purpose.
 Cross-validation.
 Bootstrapping.
 Random change of the dependent variable value.
20
 Dividing the original set into the training set and test set.
c. Derivation of 3D-QSAR model
After optimum numbers of compounds are chosen from the cross validation
test, the CoMFA model is derived from all compounds and the optimum
number of components. For this model, R2 and s are calculated in the same
way as R2cv and scv , expect that PRESS is replaced by the sum of the squares
of the differences between the calculated and the observed biological activities.
If multiple fields are considering being important for biological potency, they
can be examined simultaneously, or individually one after another. The results
from three different approaches in considering multiple fields were recently
explored.
 Chance correlation
 Co linearity
 Number of compounds
d. Outlier detection
Outlier and other in homogeneities can easily be detected from the derivations
in the calculated values of the model or by examining the corresponding
residual plot. They can also be detected by inspecting the residuals from the
cross-validation test.
e. Pitfalls
There are several pitfalls that require careful attention in QSAR. One particular
pitfall in PLS analysis is that there is no guarantee that PLS will find all the
relation contained in the data.
f. Consideration of all compounds and omission of compounds
The compound are purposely left out as a test set, all compounds of the
original data set should be used in deriving the model. The omission of
compounds is usually a negatives statement on the quality of the correlation or
biological data.
10. Display of the results in contour plots and their interpretation
Several types of fields are retrievable from a CoMFA analysis. The coefficient
contour plots are the most often examined but sometimes PLS plots provide
useful information.
21
11. Design and forecasting the activity of unknown compounds
12. Miscellaneous aspects of CoMFA

 QSAR validation of CoMFA methodology.
 Non-linear relationships in CoMFA.
 Indicator variable in CoMFA.
 Hydrophobic effects and drug transport, distribution, and elimination.
 Limitation in CoMFA.
 Multiple or alternate binding mode.
 Checklist for CoMFA publication.
CoMFA APPLICATIONS IN DRUG DESIGN
 There are now a few hundred practical applications of CoMFA in drug design.
Most applications are in the field of ligand protein interactions, describing
affinity or inhibition constants. In addition, CoMFA has been used to correlate
steric and electronic parameters.
 Less appropriate seems the application of CoMFA to in vivo data, even if
lipophilicity is considered as an additional parameter.
 Develop quantitative structure-activity relationships
 Predict the properties and activities of untested molecules
 Compare different QSAR models statistically and visually
 Optimize the properties of a lead compound
 Validate models of receptor binding sites
 Generate hypotheses about the characteristics of a receptor binding site
 Prioritize compounds for synthesis or screening
 Determine key structural requirements for high affinity receptor ligand.
22
CoMSIA
The CoMSIA technique was introduced by Klebe in 1994 in which similarity
indices are calculated at different points in a regularly spaced grid for pre aligned
molecules. It has several advantages over CoMFA technique like greater robustness
regarding both region shifts and small shifts within the alignments; no application of
arbitrary cut offs and more intuitively interpretable contour maps. The standard
settings (Probe with charge þ1, radius 1_A and hydrophobicity þ1, hydrogen-bond
donating þ1, hydrogen-bond accepting þ1, attenuation factor a of 0.3 and grid
spacing 2_A) were used in CoMSIA to calculate five different fields viz steric,
electrostatic, hydrophobic, acceptor and donor.
Partial least square analysis

PLS is used to correlate thrombin receptor antagonistic activity with the CoMFA and
CoMSIA values containing magnitude of steric and electrostatic potentials. The
models were assessed by their cross-validated r2 (q2) using leave one out (LOO)
procedure by SAMPLS method as implied in SYBYL. CoMFA standard scaling was
applied to all the CoMFA analysis. The full PLS analysis was run with a column
filtering of 2.0 kcal/mol to reduce the noise and to speed up the calculation. For
CoMSIA to SAMPLS method was used, thereafter a full PLS was run using column
filtering of 2 kcal/mol. Auto scaling was applied to all CoMSIA analysis.
Conclusion
The 3D QSAR studies carried out using CoMFA and CoMSIA have led to the
identification of the regions important for steric, hydrophobic and electronic
interactions and the derived models well explain the observed variance in the activity
and also provide important insight into structural variations that can lead to the
design of NCEs with high activity.
23
References
1. Molecular similarity in drug design by P.M. Dean, 1st Edition, p.g - 291-323.
2. Waller, C.L., Marshall, G.R., J. Med. Chem. 1993, 36, 2390-2403.
3. Klebe, G., Abraham, U., Mietzner, T., J. Med. Chem. 1994, 37, 4130-4146.
4. Goodford, P.J., J. Med. Chem. 1985, 28, 849-858.
5. A. Dixit et al. / Bioorg. Med. Chem. 12 (2004) 3591–3598.
6. Wold S (1994) PLS for Multivariate Linear Modeling QSAR: Chemometric
Methods in Molecular Design. Methods and Principles in Medicinal Chemistry
van de Waterbeemd H (Editor) Verlag-Chemie.
7. http://www.vcclab.org/lab/pls/m_description.html.
8. http://faculty.chass.ncsu.edu/garson/PA765/pls.html.
9. http://www.netsci.org/Science/Compchem/feature11.html.
24

Com Fa

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Com Fa

Transféré par

Droits d'auteur :

Formats disponibles

CoMFA & CoMSIA AS 3D QSAR

BHARATH KUMAR INTURI

 Considered CoMFA Analysis

 Selection of compounds and series design

 Generation of 3D structure of the ligand molecules

 Conformational analysis of each molecule

 Establishment of the bioactive conformation of each molecule

 Binding mode and superimposition of the molecules

 Position of the lattice points

 Choice of force fields and calculations of the interaction energies

 A statistical analysis of the data nd the selection of the 3D-QSAR

 Display of the results in contour plots and interpretations of them

 Design and forecasting the activity of unknown compounds.

 Miscellaneous aspects of CoMFA

 CoMFA Applications in Drug Design

The CoMFA methodology is a 3D quantitative structure-activity relationship

CoMFA is a 3D-OSAR technique employing both interactive graphics and statically

Considered CoMFA analysis:

The application of CoMFA is a process for optimizing the desired biological

It is usually desirable to choose compounds such that their variables for

3. Generation of three-dimensional structure of the ligand molecules

There are two aspects to be considered in the 3D structure of the ligand

After the initial structure is generated, the geometry has to be refined.

An important factor in structure optimization:

4. Conformational analysis of each molecules

Conformational analysis is done by the systematic search method. Since greater

The third conformational analysis, the dynamic behavior of a molecule is simulated

5. Establishment of the bioactive conformation of each molecule

The bioactive conformation of a drug molecule refers to the confirmation it is bound

Experimentally, the bioactive conformation of a drug can be obtained from structure

NMR spectroscopy is another way of obtaining 3D structural data of protein,

a. Active analog approach

b. Ensemble distance geometry approach

a. Active analog approach: - This is one of the well-known techniques used to

b. Ensemble distance geometry approach:- This method rapidly determined

An advantage of the approaches is that it handles cyclic structure with out

c. Molecular-fitting technique:- when a ridge compound that binds tightly to

d. DISCO :- It is a computer program devised to find the bioactive conformation

6. Superimposition of the molecules

During the alignment process, it is important to consider the common different

Several approaches have been used for the alignment of molecules

a. Alignment based on atom overlapping

b. Alignment based on receptor binding sites

c. Alignments based on fields or pseudo fields

a. Alignment based on atom overlapping: - this method is most popular and

This method also called as Pharmacophore approach, is powerful in detecting

b. Alignment based on receptor binding sites :- this method similar to

Thus in the binding-site approach, the molecules are superimposed by

In other approaches, electrostatic similarity or molecular surface similarity

7. Calculation of the interaction energies

a. Position of the lattice points

b. Construction of data table and interaction energy calculations

c. Molecular force fields

 The size of the grid spacing

 The size of the grid box

 The location of grid box

b. Construction of data table and interaction energy calculations:- Once

 Standard CoMFA Fields

The standard potential energy fields produced by the out-of-the-box

 New CoMFA Fields

The relatively straightforward nature of the CoMFA paradigm makes it

 Inclusion of the Hydrophobic Effect

Steric, electronic, and hydrophobic effects are considered to be among the