Vous êtes sur la page 1sur 64

Bioinformatics is the field of science in which

biology, computer science, and information


technology merge into a single discipline. The
ultimate goal of the field is to enable the
discovery of new biological insights and to create
a global perspective from which unifying
principles in biology can be discerned.

Molecular Modeling is one of the important


area of Bioinformatics
Computational programs generate molecular data
geometries (bond lengths, bond angles, torsion angles),
energies (heat of formation, activation energy, etc.),
electronic properties (moments, charges, ionization
potential, electron affinity),
spectroscopic properties (vibrational modes, chemical
shifts)
bulk properties (volumes, surface areas, diffusion,
viscosity, etc.).
 Molecular modeling encompasses
 theoretical methods and computational techniques
used to model or mimic the behavior of different
molecules.

 The most common feature of molecular modeling


techniques is the atomistic level description of the
molecular systems
 The starting point for many studies is generally a two
dimensional drawing of a compound of interest. These
diagrams can range from notebook or "back-of-the-
envelope" sketches to electronically stored connection
tables in which one defines the types of atoms in the
molecule, their hybridization and how they are bonded to
each other.
 Carbon dioxide, for example, would be defined as one SP 2
oxygen atom (atom number 1) bonded to an SP carbon
atom (atom number 2) with a double bond which in turn, is
bonded to a second SP2 oxygen atom with a double bond.
atom # Atom Name Atom Type Bound to atoms
1 O 5 2
2 C 2 1, 3
3 O 5 2

Connection tables are easily stored and searched


electronically. However, they must be transformed into
three dimensional representations of chemical
structure to study chemical properties.
Molecular Mechanics Background

• The "mechanical" molecular model was developed out of


a need to describe molecular structures and properties in
as practical a manner as possible.

•Molecular mechanics is a mathematical formalism which


attempts to reproduce molecular geometries, energies and
other features by adjusting bond lengths, bond angles and
torsion angles to equilibrium values that are dependent on
the hybridization of an atom and its bonding scheme.
Energy Calculation

Epot is the total steric energy which is defined as the difference in energy between
a real molecule and an ideal molecule.
Ebnd, the energy resulting from deforming a bond length from its natural value, is
calculated using Hooke's equation for the deformation of a spring (E = 1/2 Kb(b -
bo)2 where Kb is the force constant for the bond, bo is the equilibrium bond length
and b is the current bond length).
Eang, the energy resulting from deforming a bond angle from its natural value, is
also calculated from Hooke's Law.
Etor is the energy which results from deforming the torsion or dihedral angle.

Eoop is the out-of-plane bending component of the steric energy.

Enb is the energy arising from non-bonded interactions

Eel is the energy arising from coulombic forces.


molecular dynamics

•An excellent approach to searching regions of conformational space,


it is not an exhaustive search. The active conformation of a molecule
can be missed as the dynamics simulation skips over the hills and
valleys of the potential energy surface. Since the active conformation
at a receptor may not always be the minimum energy structure
(defined as the structure with the 3D geometry that places the
molecule at the lowest point on the potential energy hypersurface), it is
important to examine all potentially accessible conformations.

•For small molecules with a limited number of freely rotating bonds,


this can be easily accomplished by driving each torsion angle stepwise
over a 360 degree range.

•As an example, a graph of the conformationally dependent energy


(shown along the Y-axis) of the molecule Butane.
Butane Conformers
The number of conformations for a molecule (defined as the "non-identical
arrangements of the atoms in a molecule obtainable by rotation about one or
more single bonds“

Number of conformers = (360/angle increment)(# rotatable bonds)


Optimize molecular geometry and
calculate physical and electronic
properties.

An equally important aspect of


CAMD/CADD is the ability to display
these properties in a manner which
increases the chemist's ability to
interpret experimental findings and
correlate these finding with structural
features.
Molecular surfaces play an important
role in these studies.
Direct drug design
In the direct approach, the three-dimensional
features of the known receptor site are
determined from X-ray crystallography to design
a lead molecule. In direct design, the receptor
site geometry is known; the problem is to find a
molecule that satisfies some geometric
constraints and is also a good chemical match.
After finding good candidates according to these
criteria, a docking step with energy minimization
can be used to predict binding strength.
Indirect Drug Design

The indirect drug design approach


involves comparative analysis of
structural features of known active and
inactive molecules that are
complementary with a hypothetical
receptor site. If the site geometry is not
known, as is often the case, the designer
must base the design on other ligand
molecules that bind well to the site.
 SBDD is an iterative process, in which macromolecular
crystallography has been the predominate technique used to
elucidate the three-dimensional structure of drug targets
 Both nucleic acids and proteins are potential drug targets,
but the majority of such targets are proteins.
 Proteins undergo considerable conformational change upon
ligand binding, it is important to design drugs based on the
crystallographic structures of protein-ligand complexes, not
the un liganded structure.
I. Two case studies for sequence to structure mapping:
• Small changes in protein sequence cause dramatic difference
in drug binding: COX inhibitors

• Large changes in protein sequence still maintain similar


structure: G protein coupled receptors

II. Protein Structure Prediction

III. Ligand Docking to Protein Structures


Primary Sequence
MNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY
VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG
GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP
EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES

Folding

3D Structure
 First (if structure is known) or second (after structure
prediction) step in a drug design project: find a lead structure
(=small molecule which binds to a given target)
 docking problem - predicting the energetically most favorable
complex between a protein and a putative drug molecule
 For a given protein structure, one can apply docking
algorithms to virtually search through the space
 2 questions:
1. what does the protein-ligand complex look like
2. what is the affinity with respect to other candidates?
 Find a set of compounds to start with
    - e.g. from inspecting known ligands for a protein (e.g.
substrate in an enzyme)
 compounds from a screening experiment of a combinatorial
library (in which there  is usually a molecular fragment that is
common between all molecules of the library, the core, and
the fragments attached to the core are R-groups)
 compounds from a filtering experiment using other software
 from varying other lead structures or known ligands
 virtual screening using a fast docking algorithm (typically from
a million molecules)
 de novo design using fragments of compounds
=> get several hundred to thousands of ligands to start with
 Rigid-body docking algorithms
• Protein and ligand are held fixed in conformational space which reduces
the problem to the search for the relative orientation fo the two molecules
with lowest energy.

• All rigid-body docking methods have in common that superposition of


point sets is a fundamental sub-problem that has to be solved efficiently:
• Superposition of point sets: minimize the RMSD

 Flexible ligand docking algorithms


• most ligands have large conformational spaces with several low energy
states
Ligand database Target Protein

Molecular docking

Ligand docked into protein’s active site


 DOCK (I. D. Kuntz, UCSF)
 AutoDOCK (A. Olson, Scripps)
 RosettaDOCK (Baker, U Wash., Gray, JHU)
More information in: http://www.bmm.icnet.uk/~smithgr/soft.html
DOCK works in 5 steps:
 Step 1 Start with coordinates of target receptor
 Step 2 Generate molecular surface for receptor
 Step 3 Fill active site of receptor with spheres
• potential locations for ligand atoms
 Step 4 Match sphere centers to ligand atoms
• determines possible orientations for the ligand
 Step 5 Find the top scoring orientation
AutoDock
• designed to dock flexible ligands into receptor binding sites
• Has a range of powerful optimization algorithms

RosettaDOCK
• models physical forces
• Creates a large number of decoys
• degeneracy after clustering is final criterion in selection of
decoys to output
RANDOM START POSITION
 Creation of a decoy begins with a random orientation of each
partner and a translation of one partner along the line of
protein centers to create a glancing contact between the
proteins
LOW-RESOLUTION MONTE CARLO SEARCH
 Low-resolution representation: N, C, C, O for the backbone
and a “centroid” for the side-chain
 One partner is translated and rotated around the surface of the
other through 500 Monte Carlo move attempts
 The score terms: A reward for contacting residues, a penalty
for overlapping residues, an alignment score, residue
environment and residue-residue interactions
HIGH-RESOLUTION REFINEMENT
 Explicit side-chains are added to the protein backbones using a
rotamer packing algorithm, thus changing the energy surface
 An explicit minimization finds the nearest local minimum
accessible via rigid body translation and rotation
 Start and Finish positions are compared by the Metropolis
criterion
 Before each cycle, the position
of one protein is perturbed by
random translations and by
random rotations
 To simultaneously optimize the
side-chain conformations and
the rigid body position, the side-
chain packing and the
minimization operations are
repeated 50 times
COMPUTATIONAL EFFICIENCY
1. The packing algorithm usually varies the conformation of
one residue at a time; rotamer optimization is performed
once every eight cycles
2. Periodically filter to detect and reject inferior decoys
without further refinement
CLUSTERING & PREDICTIONS
 Repeat search to create approximately 105 decoys
per target
 Cluster best 200 decoys by a hierarchical clustering
algorithm using RMSD
 The clusters with the most members become
predictions, ranked by cluster size
 Download and install Arguslab in windows
 Load a PDB file, practice Arguslab tools
 Follow the tutorial at
http://www.arguslab.com/tutorials/tutorial_doc
king_1.htm
Molecular Docking using Argus lab:
Ex : Benzamidine inhibitor docked into Beta Trypsin
Create a binding site from bound ligand
Setting docking
parameters
Analyzing docking results
Polypeptide builder.
The computational molecular docking problem is far
from being solved.
There are two major bottle-necks:
1. The algorithms handle limited flexibility
2. Need selective and efficient scoring functions
Molecular Modeling Applications
Molecular Modeling Applications

I. Molecular structures may be generated by a variety of


software. The 3D structures of molecules may be created by
several common building functions like make-bond, break-
bond, fuse rings, delete-atom, add-atom-hydrogens, invest
chiral center, etc. Computer modeling allows chemists to build
dynamic models of compounds which in turn allows them to
visualize molecular geometry and demonstrate chemical
principles
II. The most important area of the molecular modeling
concept is visualization of molecular structures and
interactions. The molecules are visualized in three
dimensions by various representations like connected
sticks, ball and stick models, space filling
representations and surface displays.
III. The most active area of theoretical research using
molecular orbital theory has been in the prediction of the
preferred conformation of molecules. The preferred
conformation of a molecule is a structural characteristic
feature that arises as a response to the force of attraction
and repulsion. The shape should be considered primarily in
determining the interaction of the molecule with the
receptor.
IV. The 3D structures of many ligands (drug molecules)
that interact with the receptors may be known but the
structures of most receptors are not known. The interaction
of macromolecular receptors and of small drug molecules
is an essential step in many biological processes.
 Invented in 1982 (Cetus Corp)
 Discovery of Taq polymerase in 1985
 Kary Mullis: Nobel Prize 1993
 Widely used method with wide application
 Many variations of commercial kits
 Method for exponential amplification of DNA or RNA
sequences
 Basic requirements
• template DNA or RNA
• 2 oligonucleotide primers complementary to different regions
of the template
• heat stable DNA polymerase
• 4 nucleotides and appropriate buffer
Cycling Program
Step 1: 94o C for 30
sec
Step 2: 94o C for 15
sec
Step 3: 55o C for 30
sec
Step 4: 72o C for 1.5
min
Step 5: Go to step 2
for 35 times
Step 6: 72o C for 10
min
Step 7: 4o C forever
Step 8: END
James D. Watson & Francis Crick, 1953, discovered the
structure of DNA
Alexander Todd et al, 1950s, made the first internucleotide bond
(cycle time: days)
H.G. Khorana et al, 1960s, made the first oligonucleotide
phosphodiester (cycle time: hours)
R. Letsinger et al, 1965, synthesis on solid support led to the first
DNA synthesizer ever
Many researchers, 1970s, phosphotriester method
M. Matteuci & M. Caruthers, 1980s developed DNA synthesis on
inorganic support
S. Beaucage & M. Caruthers, 1981s, developed phosphoramidite
chemistry
Characteristics of primers: Thoughts on primer design:
Specificity Uniqueness
Specific for the intended Length
target sequence (avoid Base Composition
nonspecific hybridization) Internal Stability
Stability Melting Temperature
Form stable duplex with Annealing Temperature
template under PCR
conditions Internal Structure

Compatibility
Primers used as a pair shall
Primer Pair Matching
work under the same PCR
condition
 A melting temperature (Tm) in the range of
~52°C to 65°C
 Absence of dimerization capability
 Absence of significant hairpin formation (>3
bp)
 Lack of secondary priming sites
 Low specific binding at the 3' end (ie. lower
GC content to avoid mispriming)
 Primer length
 GC%
 Annealing
 3’ complementary between primers
 G&C runs at the 3’ end
 Palindrome sequences
There shall be one and only one target site in the template
DNA where the primer binds, which means the primer
sequence shall be unique in the template DNA.

There shall be no annealing site in possible contaminant


sources, such as human, rat, mouse, etc. (BLAST search
against corresponding genome)
Primer length has effects on uniqueness and
melting/annealing temperature. Roughly speaking, the longer
the primer, the more chance that it’s unique; the longer the
primer, the higher melting/annealing temperature.

Generally speaking, the length of primer has to be at least 15


bases to ensure uniqueness. Usually, we pick primers of 17-
28 bases long. This range varies based on if you can find
unique primers with appropriate annealing temperature
within this range.
Melting Temperature, Tm – the temperature at which
half the DNA strands are single stranded and half are
double-stranded.. Tm is characteristics of the DNA
composition; Higher G+C content DNA has a higher Tm
due to more H bonds.
Calculation
Shorter than 13: Tm= (wA+xT) * 2 + (yG+zC) * 4
Longer than 13: Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC)

(Formulae are from http://www.basic.northwestern.edu/biotools/oligocalc.html)


If primers can anneal to themselves, or anneal to each other rather than
anneal to the template, the PCR efficiency will be decreased
dramatically. They shall be avoided.

However, sometimes these 2 structures are harmless when the annealing


temperature does not allow them to take form. For example, some dimers
or hairpins form at 30 C while during PCR cycle, the lowest temperature
only drops to 60 C.
• Primers with stable 5’ termini and unstable 3’ ter
mini give the best performance: reduces false primi
ng on unknown targets
• Low 3’ stability prevents formation of duplexes th
at may initiate DNA synthesis: 5’ end must also pair
in order to form a stable duplex
• Optimal terminal G ~ 8.5 kcal/mol; excessive lo
w G reduces priming efficiency
1. Uniqueness: ensure correct priming site;
2. Length: 17-28 bases.This range varies;
3. Base composition: average (G+C) content around 50-60%; avoid
long (A+T) and (G+C) rich region if possible;
4. Optimize base pairing: it’s critical that the stability at 5’ end be high
and the stability at 3’ end be relatively low to minimize false priming.
5. Melting Tm between 55-80 C are preferred;
6. Assure that primers at a set have annealing Tm within 2 – 3 C of
each other.
7. Minimize internal secondary structure: hairpins and dimmers shall be
avoided.
Primer design is an art when done by human beings, and a far
better done by machines. 
machines

Some primer design programs we use:


- Oligo: Life Science Software, standalone application
- GCG: Accelrys, ICBR maintains the server.
- Primer3: MIT, standalone / web application
http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi
- BioTools: BioTools, Inc. ICBR distributes the license.
- Others: GeneFisher, Primer!, Web Primer, NBI oligo program, etc.

Melting temperature calculation software:


- BioMath: http://www.promega.com/biomath/calc11.htm

Vous aimerez peut-être aussi