Vous êtes sur la page 1sur 8

Rigid and Flexible Docking Methods

This page describes the problem of rigid molecule docking. We describe two variants of
the problem (ligand-protein and protein-protein docking), and various criteria for "good"
docking. We then describe the DOCK algorithm developed by Kuntz et al, and the basic
principles of the soft docking algorithm by Jiang & Kim and Katchalski-Katzir's docking
algorithm.
Rigid docking
Ligand protein docking
Protein protein docking
The DOCK algorithm
Soft (Flexible) docking

Rigid Docking
In the rigid molecule docking problem we will relate to the molecules as rigid
objects that cannot change their spatial shape during the docking process.
In molecular biology, there are two main problems where the docking problem
arises: The ligand-protein docking and the protein-protein docking problems.
Back to Index.
Ligand-Protein Docking:
This problems involves a large molecule (the protein - also
called the 'receptor') and a small molecule (the ligand) and
is very useful in developing medicines. A common
situation is the `key in lock` situation when the ligand is
docking in a cavity of the protein.

Back to Index.
Protein-Protein Docking:
This problem involves two proteins that are approximately
the same size. Therefore, usually the docking site is a
more "planar" surface than in the ligand-protein docking
and cases where the docking occurs when one molecule is
located inside a cavity in the other molecule, are very rare.
There are several physical and chemical forces that interact between the two
molecules. These forces are used to define various docking scores that measure
how good is each solution. These scores take into account the strength of these
forces and the plausibility of the docking solution.
The most significant forces are:

Electrical forces:

The most significant force that draws parts of the molecules closer
together ot further apart according to their electrical charge.
Van Der Vaals forces:
This force is very significant when the molecules are close and their
contact surface is large.
The formula for this force is A/(r6) - B/(r12) where A and B are constants
and r is the distance between them. Notice that when the distance r is
very small there's a significant rejection force driving them apart.
Hydrogen bonds:
These bonds could significantly strengthen the bond between two
molecules and occurs when one molecule has a Hydrogen atom close to
the docking surface that interacts with an atom from the second molecule
when the docking occurs.
Figure 1 - When the docking
of the molecules is
achieved, a Hydrogen bond
occurs between the Hydrogen
atom of the donor and
another atom (in this case,
an Oxygen atom) in the
acceptor.

The docking problems may be formally stated as follows:


Let A,B be two rigid molecules with their geometric representation in R3. We
would like to find a rigid transformation T:R3-->R3 such that the contact
surface between T.A and B is maximal.
The contact surface is defined as the surface where the distance between the
molecules is smaller than a given threshold.
Notice that we shall usually try to achieve a contact surface which is "large
enough" instead of "maximal" and that we usually try to maximize not the size
of the contact surface but a score measuring the quality of the proposed
docking solutions. These two parameters are correlated but are not
equivalent. Back to Index.

The DOCK algorithm (Kuntz et al.)


The basic version of this algorithm was developed by Kuntz et al. at the
university of california over 15 years ago and, despite its age, is still considered
to be a successful method for solving the rigid docking problem and is used as
a reference point to other algorithms.
The algorithm is rather slow and thus has difficulties in handling large surface.
Therefore it is primarily used in solving ligand-protein docking while trying to
focus on 'interesting' sites on the surface of the molecules.

The basic stages of the algorithm are:

1. Compute the molecular surface using Connolly's method. The output of this
stage is a set of points on the 'smoothed' molecular surface with their normals.
2. SPHGEN (Sphere Generator) - This stage creates a new representation
of the molecular surface of the protein and the ligand using 'pseudo-
atoms' (see below) and then uses this representation to find plausible
docking sites on the molecular surface - these docking sites that
SPHGEN is looking for are cavities in the surface of the receptor. The
rest of the algorithm focuses on these sites only.

This stage is crucial to the success of the algorithm and will be explained
in more detail below.

SPHGEN consists of the following stages:


(I) For each pair on Connolly points pi,pja sphere passing through this
pair is placed such that its center is on one of the points
normal (see Fig.2).
Remember that each pi is a point on the surface of a specific atom of one
of the molecules.
We now define Snorm(i) = {Spheres that their center is on the normal
of pi}
(II) Assuming there are n Connolly points, then for
each 1<=i<=n and pi is on the surface of the receptor, we throw away all
the spheres in Snorm(i) and leave only the one with the smallest radius.
This throws all the spheres that penetrate the surface of the receptor.
(III) Leave only the spheres where Theta < 90o (see Fig.2). Otherwise the
points that define the sphere, pi and pj, are too close to each other and
therefore are not located in a cavity on the receptor's surface.
(IV) For each atom, leave only the sphere with the maximal radius. This
step leaves only the spheres that 'touch' the surface of the atom.
(V) If the points that define the sphere, pi and pj, belong to two different
atoms, and the distance between these atoms on the molecular sequence
is less than 4 - discard the sphere. This is done because the length of a
curve on an Alpha-helix is 3.6 and we wish not to treat these sites as
possible docking sites.

The remaining spheres are called pseudo-atoms.


The next stage looks for clusters of intersecting pseudo-atoms. The
existence of this kind of cluster indicates the existence of a cavity in the
molecular surface, which is considered to be a good docking site.
Figure 2 - Two points pi and pj on the
Connolly surface with their normals (in red).
The sphere in this figure is
in Snorm(i) because its center is onpi's
normal. The angle Theta is marked in blue.

After all this is performed on the receptor the same is done on the ligand,
but this time we take the points and the vectors opposite to their normals,
in order to create the spheres inside the surface instead of outside the
surface.
The result of SPHGEN on the receptor is sometimes called the 'negative
image' and on the ligand it's called the 'positive image'.

3. Matching - For each docking site, the algorithm tries to find a


transformation T that would give a good correspondence between the centers of the
pseudo-atoms of the protein to those of the ligand (in some cases, the centers of the
real atoms of the ligand are used, instead of the centers of its pseudo-atoms).

In some versions of DOCK, clusters of pseudo-atoms are separated into


sub-clusters in order to improve the complexity of this stage. One way of
doing this is to discard the largest sphere in the cluster, which sometimes
causes the cluster to be divided into two sub-
clusters (see Fig.3).
Figure 3 - An example of a cluster of pseudo-
atoms outside the receptor's Connoly surface.
If we will remove the largest pseudo-atom
(here in the middle of the cluster) we would
have two sub-clusters of pseudo-atoms

The matching problem


In the matching problem of rigid docking we are trying to find a
translation and rotation of one molecule, such that good matching
between the interesting points in both molecules is formed.

In Kuntz's group the distances between the point used instead of the
points locations: For each molecule, the receptor (R) and the ligand (L),
an apropriate distance matrix is defined - dRi,j and dLi,j, respectively.
They then try to find two subsets in R and L such that their distances
will be the same, with some tolerance of error. These two subsets define
two subgraphs with almost similar distances between their vertices.
This is done using a method similar to the interpretation tree by
Grimson and Lozano-Perez:
The root has several mathcing candidates and as we go down in the tree
we should check that the distances on the path comply with the required
tolerance of error. In order to have a more efficient algorithm, Kuntz's
group implementation trims off paths where the matching is not good
enough, but this implementation become sensitive to the order of the
points in the tree, therefore the algorithm could have an exponential run-
time.

Another way of solving the matching problem is to find a "large enough"


clique in a matching graph. If we have n points in the receptor
and m point in the ligand, the matching graph has n*mvertices where
each vertex represents a point from the receptor and a point from the
ligand.
Let G=(V,E) be the mathcing graph and let u,v be vertices
in V where u represents uL and uR (points in the ligand and the receptor,
respectively) and v represents vL and vR . An edge e=(u,v)will be added
to E only when ABS [dL(uL,vL) - dR(uR,vR)] < tolerance. Therefore, a
clique in the matching graph defines subsets of points in the ligand and
the receptor with similar distances.

Evaluating the match


In order to evaluate the quality of the match a score is calculated. The
score should take into account the size of the contact surface between the
molecules and should not allow one molecule to penetrate the other.
The DOCK algorithm uses a cubic grid that fills the binding site and
every cell in this grid has a score according to its distance from the
centers of the receptor's atoms:

 1 if the distance is 2.8A - 4.5A.


 -127 if the distance is less than 2.8A.
 0 if the distance is more than 4.5A.

(In some cases the distance 2.8A is replaced by 2.4A)


For each proposed transformation, the poisition of the ligand's points
(i.e. the centers of its atoms or pseudo-atoms) in the grid are calculated
and the score is calculated as the sum of scores of these points.

Additional scores were used in several versions of the algorithm - for


example, in one version a Van der-Vaals energy score was calculated to
the transformations that had a good matching scores.
There were also attemps to refin the matching using molecular
dynamics, but this has not proved as a good method until now. A
possible reason is that during the docking process, several atoms in both
molecules change their position slightly.

Back to Index.

Soft (flexible) Docking


We shall now descibe shortly the principles of two soft docking
algorithms.
Jiang & Kim
An enumeration on the 6-dimensional space of rigid transformation is
performed and these transformations are given score according the their
energetic value. Both molecules are placed on a grid and the matching is
evaluated using the distances between grid cells, the number of
penetrations and the directions of the points' normals. The algorithm
works on the output of Connolly's algorithm and works on the entire
molecular surface (i.e. no cavities are looked for - as opposed to the
DOCK algorithm).
In order to decrease the enumeration, the algorithm uses two resolutions
- low and fine. The low resolution uses ~0.3 points per square angstrem,
and the fine resolution uses ~1 point per square angstrem.

Each cell in the grid is marked as "surface" (if it contains at least one
Connolly point) or "volume" (if it doesn't contain any Connolly point).
Usually, each surgace cell contains 2-3 Connolly points.

An enumeration on the rotations of one of the molecules (usually smaller


one) is performed. For each rotations:

 Calculate the surface and volume cells of the molecule.


 Assuming that there's at least one pair of surface cells (one from
each molecule) that are matched be the transformation, an
enumeration on all of these pairs is performed. For each pair the
transformation is calculated and it is evaluated by checking the
directions of the normals, the number of surface-to-surface
matches and the number of penetrations.
 The good transformations are those who have a small number of
penetrations and a lot of surface-to-surface matches.

This is done first in low resolution and the best results are calculated
again in fine resolution with the addition of an approximated energetic
score. The approximated enrgetic score is calculated according to the
number of "favourable" and "unfavourable" interactions. There are
several categories for the atoms of each molecule and combinations of
these categories are marked as "favourable" if they have a good
contribution to the enrgetic plausibilty of the match, or "unfavourable"
otherwise.
For example, it is unfavourable that an atom with positive charge is
placed near another atom with positive charge, but it is favourable if two
atoms are adjacent if one of them is an H-donor and the other is an H-
acceptor.
Katchalski-Katzir et al. (1992)
The basic idea is to enumerate on the possible translations, while using
FFT to calculate the matching score efficiently. Similar to the previous
algorithm, both molecules are placed on a 3-dimensional grid, but here 3
types of grid cells are defined - "volume", "surface" and "intermediate".
If the molecules are A and B, we define the matrices Al,m,n and Bl,m,n as
follows (l,m,n are the grid coordiantes)

Al,m,n = { 1 - if (l,m,n) is a "surface" cell, q - if (l,m,n) is an


"intermediate" cell, 0 - otherwise }
Bl,m,n = { 1 - if (l,m,n) is a "surface" cell, r - if (l,m,n) is an
"intermediate" cell, 0 - otherwise }

We choose q<0 and r>0 while |q| is large and |r| is small. The scalar
product of these matrices can be efficiently calculated using FFT thus
improving the algorithms performace considerably.

Vous aimerez peut-être aussi