Vous êtes sur la page 1sur 10

PROTEIN-LIGAND

DOCKING

Submitted by-

Pooja Khurana – 810/BT/07


Seema - 815/BT/07
COMPUTATIONAL BIOLOGY – II LAB.

PRACTICAL:

AIM : To understand the concept of the small-molecule (ligand) and protein docking.

A. Introduction.

Molecular docking is a method to predict the preferred orientation of one molecule to a second
when bound to each other to form a stable complex. Molecular docking can be thought of as a
problem of “lock-and-key”, where one is interested in finding the correct relative orientation of
the “key” which will open up the “lock” (where on the surface of the lock is the key hole,
which direction to turn the key after it is inserted, etc.). For example, the protein can be
thought of as the “lock” and the ligand can be thought of as a “key”.

Computers and programs (softwares) are used to predict or simulate the possible reaction (and
interactions) between two molecules based on their 3 dimensional structures. Using softwares,
the interactions can be viewed and analyzed to understand and answers some biological
important questions regarding a certain chemical or biological reaction. Analyzing the
interactions basically comes with (nice) 3 D graphics which can be manipulated in several ways
to clearly explore in detail (in atomic resolutions) the interaction involved between the atoms in
the two interacting molecules. This method can therefore be used not only to predict possible
binders or inhibitors, but also to predict how strong the association between the molecules
(called the binding affinity) can be. It is useful to know the binding strength (binding energy)
when one are comparing (ranking) a group of compounds or derivatives to determine which
derivative is the best binder or inhibitor (how strong a compound will bind to the target).

Prediction of the binding affinity will be useful when one is synthesizing compounds whereby
one can predict the affinity of other desired compound towards a certain target (say a protein or
DNA; with particular interest to stop the function of the enzyme/protein or to block certain
reaction). One can therefore save a lot time and money by “experimenting” using the computer
first before actually going to the lab to make oner compound. In addition, one can predict how a
molecule interact or react with another molecule for example in protein – protein interaction, in
a specific biological reaction (of oner interest) before conducting the (“wet”) experiment.

This method is also useful when one want to screen (they call it “virtual screening”) a number
of compounds say from a natural product or plants/herbs to see whether oner small molecules
(from the medicinal plants/herbs) will have certain pharmacological effects on a particular
protein or enzyme (for example HIV protease etc.). Large pharmaceutical companies in Europe
and US have been using this technique for some time in the discovery and development of new
drugs.

Types of docking (molecular docking) in practice:


a. small molecule – protein (called “ligand – protein docking”)
b. protein – protein docking.
c. DNA-ligand docking
d. DNA-protein docking
B. Why is it important?

Because of its ability in predicting binding interactions and orientation (in some cases at a very
high accuracy with reference to existing crystal structure of the complex studied), it is being
widely used in rational drug design @ structure based drug design processes (structure based
drug design means that we use 3 dimensional structures to design drug/new drugs with the help
of computers and softwares).

Another good reason why many researchers are moving towards docking methods (to
complement their work) in their research is because some information are difficult to obtain
through experimental ways. The ability of the computer to simulate the reactions in atomic
details (nanoscale) and with the increasing power of computers (high performance computing),
provides the answer to difficult research problems which cannot be solved through the
conventional means. This is why many researchers have diverted some (if not all) of their
attention into this technique

C. Protein-Ligand Docking

Given a protein structure, a binding site and a ligand, the question is how the ligand will form
the complex such that the total energy of the protein-ligand complex is minimized??...

That is in docking, we answer the question:


a. What will be the pose of the molecule in the binding site?
b. What is the binding affinity or a score representing the strength of binding?

In a protein, binding site (“or active site”) is the part of the protein where the ligand molecule
will bind. It is generally a cavity on the protein surface and can be identified by looking at the
crystal structure of the protein bound with a known inhibitor. But protein-ligand docking is not
abound identifying the binding site, rather the “pose” or the” binding mode” of the ligand. Pose
refers to the geometry (location, orientation and conformation) of the ligand in the binding site.
I. USES OF PROTEIN-LIGAND DOCKING:

Main uses are:

a. Virtual screening. It refers to the identification of the potential lead compounds


from a large dataset. It is the computational analogue of the biological screening. The
aim is to score, rank and filter a set of structures using one or more computational
methods. Eg Docking.

Virtual screening can be used to decide which compounds to screen, which libraries to
synthesis, which compounds to purchase from an external company and help to analyse
the results of an experiment such as an HTS run.
b. To identify the regions which are necessary for binding.
c. To manipulate the structures to improve the affinity and can avoid the changes that will
clash with the protein.
d. Function prediction

II. Forces governing the interactions between the ligand and the protein:

Depend on the molecules involved and the solvent.


_ Van der Waals.
_ Electrostatics.
_ Hydrophobic contacts.
_ Hydrogen bonds
_ Salt bridges .. etc.
All interactions act at short ranges. These forces play a major role deciding the orientation
of the ligand in the binding site of the protein.

III. Components of the docking software.

Typically all the docking software consists of two main components.

1. Search algorithm which will generate a large number of poses of a molecule in


the binding site.
2. Scoring function that will determine the score or binding affinity for a particular
pose.

Molecular docking can be divided into two separate problems. The search algorithm should
create an optimum number of configurations that include the experimentally determined
binding modes. These configurations are evaluated using scoring functions to distinguish the
experimental binding modes from all other modes explored through the searching algorithm.

A rigorous searching algorithm would go through all possible binding modes between the
two molecules. However, this is impractical due to the size of the search space. Consider a
simple system comprised of a ligand with four rotable bonds and six rigid-body alignment
parameters and a cubic active site measuring 103 _A3. The translational and rotational
properties add up to six degrees of freedom. If the angles are considered in 10 degree
increments and translational parameters on a 0.5 _Agrid there are approximately 4 _ 108 rigid
body degrees of freedom to sample, corresponding to 6 _ 1014 configurations to be searched.
This would require approximately 2 000 000 years of computational time at a rate of 10
configurations per second. As a consequence only a small amount of the total conformational
space can be sampled, and so a balance must be reached between the computational expense
and the amount of the search space examined.

Some common searching algorithms include


_ Molecular dynamics
_ Monte Carlo methods
_ Genetic algorithms
_ Fragment-based methods
_ Point complementary methods
_ Distance geometry methods
_ Tabu searches
_ Systematic searches

Current docking methods utilize the scoring functions in one of two ways. The first approach
uses the full scoring function to rank a protein-ligand conformation. The system is then
modified by the search algorithm, and the same scoring function is again applied to rank the
new structure. In the alternative approach a two stage scoring function is used. A reduced
function is used in directing the search and a more rigorous one is then used to rank the
resulting structures.

Some common scoring functions are


_ Force-field methods
_ Empirical free energy scoring functions
_ Knowledge-based potential of mean force
III (a). Docking methods

Molecular dynamics

These methods involve the calculation of solutions to Newton's equations of motions. Finding
the global minimum energy of a docked complex is difficult since traversing the rugged
hypersurface of a biological problem is problematic.

The problem is approached using standard optimization algorithms including direct searches,
using only the potential function, impractical for large molecules, suitable only for crude
optimization of small molecules far away from the minimum, e.g. simplex_ gradient
methods, involving the first derivative of the potential function, low convergence near the
minimum, recommended for initial optimization, e.g. steepest descend, conjugate-gradient
methods, history of the search inuences the search direction, high computational efforts,
better convergence, e.g. Fletcher- Reeves. second derivative methods, very efficient
convergence, e.g. Newton-Raphson and least squares methods, good convergence but often
computationally too expensive, e.g. Marquardt Often a combination of methods mentioned
above is used, for example a combination of a gradient method for initial optimization and a
conjugate-gradient method when nearing the minimum.

Monte Carlo methods

The Monte Carlo simulation method occupies a special place in the history of molecular
modeling, as it was the technique used to perform the first computer simulation of a
molecular system. The expression Monte Carlo simulation seems to be extremely general and
many algorithms are called by that whenever they contain a stochastic process or some kind
of random sampling. For those interested, in molecular docking the expression Monte Carlo
usually means importance sampling or Metropolis method. The Metropolis method, which is
actually a Markov chain Monte Carlo method, generates random moves to the system and
then accepts or rejects the move based on a Boltzmann probability. The Monte Carlo methods
play an important role in molecular docking but the variety of di_erent kinds of algorithms is
too large be considered here in detail. Programs using MC methods include AutoDock,
ProDock, ICM, MCDOCK, DockVision, QXP and Affinity.

Genetic algorithms

Genetic algorithms and evolutionary programming are quite suitable for solving docking
problems because of their usefulness in solving complex optimization problems. The
essential idea of genetic algorithms is the evolution of a population of possible solutions via
genetic operators (mutation, crossovers and migrations) to a final population, optimizing a
predefined fitness function. The process of applying genetic algorithms starts with encoding
the variables, in this case the degrees of freedom, into a "genetic code", e.g. binary strings.
Then a random initial population of solutions is created. Genetic operators are then applied to
this population leading to a new population. This new population is then scored and ranked,
and using "the survival of the fittest", their probabilities of getting to the next iteration round
depends on their score. If the size of the population is kept constant, good solutions will
occupy the population. It should be noted that genetic algorithms are well suitable for parallel
computing. Some programs using GAs are GOLD, AutoDock, DIVALI
and DARWIN.
Fragment-based methods

Fragment based methods can be described as dividing the ligand into separate portions or
fragments, docking the fragments, and finally linking these fragments together. These
methods require subjective decisions on the importance of the various functional groups in
the ligand, because a good choice of base fragment is essential for these methods. A poor
choice can significantly affect the quality of the results. The base fragment must contain the
predominant interactionswith the receptor. Early algorithms required manual selection of
base fragment, but this has been automated in newer implementations. Some well known
programs using fragment based methods are FlexX and DOCK.

Point complementary methods

These methods are based on evaluating the shape and/or chemical complementaritybetween
interacting molecules. The interacting molecules are usually modeled in an easy way, for
example using spheres or cubes as atoms. The ligand description is then rotated and
translated to obtain maximum number of matches between ligand and protein surfaces, minus
the number of volume overlaps. Additional constraints may be present, for example a demand
for interacting surface normals to be approximately in opposite directions. Some algorithms
use a 3D grid, which is placed over the protein and over the ligand. Each grid point is then
labeled either open space or inside the ligand or protein. Then a correlation function is
created and this function is optimized using rigid body translation and rotation. This often
involves using traditional shape recognition algorithms like Fast Fourier Transform(FFT)
with Fourier correlation theory. A high correlation score denotes good surface
complementarity between the molecules. Because many of the methods were originally
created for protein-protein docking, the rigid body assumption is usually made.This is a
limitation in ligand-protein docking. However, some algorithms are addressed to ligand-
protein docking and these allow some exibility. Examples of programs using point
complementary methods are FTDOCK, SANDOCK, FLOG and the Soft Docking algorithm.

Distance geometry methods

Many types of structural information can be expressed as intra- or intermolecular distances.


The distance geometry formalism allows these distances to be assembled and three-
dimensional structures consistent with them to be calculated. The crucial feature is that it is
not possible to arbitrarily assign values to the inter-atomic distances in a molecule and always
obtain a low-energy conformation. Rather, the inter-atomic distances are closely interrelated
and many combinations of distances are geometrically impossible. This enables fast sampling
of the conformational space though not always resulting in good results. An example of a
program using distance geometry in docking problem is DockIt.

Tabu searches

These methods are based on stochastic processes, in which new states are randomly generated
from an initial state (referred to as the current solution). These new solutions are then scored
and ranked in ascending order. The best new solution is then chosen as the new current
solution and the same process is then repeated again. To avoid loops and ensure diversity of
the current solution a tabu list is used. This list acts as a memory. It contains information
about previous current solutions and a new solution is rejected if it reminds a
previoussolution too much. An example of docking algorithm using tabu search is PRO
LEADS.

Systematic searches

These methods systematically go through all possible conformations and represent the brute
force solution to the docking problem. All molecules are usually assumed to be rigid and
interaction energy is evaluated from a force field model. Some constraints and restraints can
be used to reduce the dimensionality of the problem.

IV. General Docking Scheme:

1. Part 1: Molecular surface representation


2. Part 2: Features selection
3. Part 3: Matching of critical features
4. Part 4: Filtering and scoring of candidate transformations
V. Software

In addition to the existing large number of docking programs, there are also many molecular
mechanics programs applicable to these problems. Despite the huge variety of available
programs, no single program has been able to become recognized as a standard. Of course,
there are some programs that are very widely used. Nevertheless it seems that the programs are
not that easy to use and require some understanding of the underlying computational principles.
This leads into situations, where people are using the same program they have been using
before though better options could be available. It also seems that some of the existing
programs are reaching a bit more mature state, since there seem to be an increasing number of
commercial solutions available. Docking programs are usually sold in a package with other
molecular design software.

It should also be noted that the division made earlier is not very strict and many programs
would fall into more than one category of methods. Tests have shown that there is not a
significant difference in hit rates between different programs and they all produce false alarms.
Because of this, combining different searching and scoring functions produces more reliable
results. This has lead to the most successful docking programs usually being a collection of the
methods described.

It is also worth remembering that a molecular docking software is only as good as its scoring
function is. It does not help if we are able to create the right conformation not but able to
recognize it.

Probably the best known example of rational drug design has been the HIV-1 protease
inhibitor. Starting with X-ray structures of HIV-1 protease, a group of scientists at DuPont
Merck used docking and molecular design softwares to succesfully design an inhibitor.

AutoDock

AutoDock uses Monte Carlo simulated annealing and Lamarckian genetic algorithm to create a
set of possible conformations. LGA is used as a global optimizer and energy minimization as a
local search method. Possible orientations are evaluated with AMBER force field model in
conjunction with free energy scoring functions and a large set of protein-ligand complexes with
known protein-ligand constants. The newest yet unreleased version 4 should contain side chain
exibility. AutoDock has more informative web pages than its competitors and because of its
free academic license, it is a good starting point when wondering into the world of molecular
docking software.

DOCK

DOCK is one of the oldest and best known ligand-protein docking programs. The initial
version used rigid ligands; exibility was later incorporated via incremental construction of the
ligand in the binding pocket. As said DOCK is a fragment-based method using shape and
chemical complementary methods for creating possible orientations for the ligand. These
orientations can be scored using three different scoring functions, however none of them
contain explicit hydrogen-bonding terms, solvation/desolvation terms, or hydrophobicity terms
thus limiting serious use. DOCK seems to handle well apolar binding sites and is useful for fast
docking, but it is not the most accurate software available.
FlexX

FlexX is another fragment based method using exible ligands and rigid proteins. It uses
MIMUMBA torsion angle database for the creation of conformers. The MIMUMBA is an
interaction geometry database used to exactly describe intermolecular interaction patterns. For
scoring, the Boehm function (with minor adaptions necessary for docking) is applied. FlexX is
introduced here to pronounce the importance of scoring functions. Although FlexX and DOCK
both are fragment based methods, they produce quite different results. On the contrary to
DOCK which performs well with apolar binding sites, FlexX shows totally opposite behavior.
It has a bit lower hit rate than DOCK but provides better estimates of Root Mean Square
Distance for compounds with correctlypredicted binding mode. There is an extension of FlexX
called FlexE with exible receptors which has shown to produce better results with significantly
lower running times.

Gold

Gold has won a lot of new users during the last few years because of its good results in
impartial tests. It has a good hit rate overall, however it somewhat suffers when dealing with
hydrophobic binding pockets. Gold uses genetic algorithm to provide docking of exible ligand
and a protein with exible hydroxyl groups. Otherwise the protein is considered to be rigid. This
makes it a good choice when the binding pocket contains amino acids that form hydrogen
bonds with the ligand. Gold uses a scoring function that is based on favorable conformations
found in Cambridge Structural Database and on empirical results on weak chemical
interactions. The development of GOLD is currently focused on improving the computational
algorithm and adding a support for parallel processing. GOLD has one of the most
comprehensive validation test sets and is also available for use at CSC.

URL:
[1]: http://www.accelrys.com/insight/affinity.html
[2]: http://www.scripps.edu/pub/olson-web/doc/autodock/
[3]: http://www.cmpharm.ucsf.edu/kuntz/
[4]: http://www.metaphorics.com/products/dockit.html
[5]: http://www.dockvision.com
[6]: http://www.sdsc.edu/CCMS/DOT
[7]: http://www.sdsc.edu/CCMS/FP/index.html
[8]: http://www.tripos.com/sciTech/inSilicoDisc/virtualScreening/fdock.html
[9]: http://cartan.gmd.de/exx/
[10]: http://www.bmm.icnet.uk/docking/
[11]: http://www.schrodinger.com/Products/glide.html
[12]: http://www.ccdc.cam.ac.uk/prods/gold/
[13]: http://reco3.ams.sunysb.edu/gramm/
[14]: http://www.edusoft-lc.com/hint/
[15]: http://www.molsoft.com/index.html
[16]: http://www.tripos.com/sciTech/inSilicoDisc/bioInformatics/leapfrog.html
[17]: http://www.schrodinger.com/Products/liaison.html
[18]: http://www.biochem.ucl.ac.uk/bsm/ligplot/ligplot.html
[19]: http://www.schrodinger.com/Products/qsite.html
[20]: http://www.sdsc.edu/CCMS/Packages/shape.html
[21]: http://situs.scripps.edu/

Vous aimerez peut-être aussi