Vous êtes sur la page 1sur 9

Meta-Dock

Using EELA grid to virtual screening


Juan Manuel Hurtado Ramirez
UNAM
Instituto de Biotechnología

Jérôme Verleyen
UNAM
Instituto de Biotechnología

1. Purpose
Meta-Dock application purpose is to give a grid-based screening method for pharmaceutical studies.
When a protein that cause a disease is known, the next step is to identify a molecule (a ligand) that be
able to prevent the action of this protein through a binding mechanism. The traditional laboratory method
is to identify the ligand using chemical method, but the spectrum of analysis is reduced, and the cost are
elevated.

The alternative method is to use a simulation in silico: using a docking program. The studies in silico
permit realizing a massive docking on a large spectrum of ligands, that is named commonly virtual
screening. We propose first to use AutoDock (http://autodock.scripps.edu/), a well used program in the
academic world. The objective in the future will be to use more than one dockin program.

2. What is Docking?
The molecular docking is a method which predicts the most probable orientation of one molecule to a
second when bound to each other to form a stable complex. Knowledge of the preferred orientation in
turn may be used to predict the strength of association or binding affinity between two molecules using
for example scoring functions.

Docking is frequently used to predict the binding orientation of small molecule drug candidates to their
protein targets in order to in turn predict the affinity and activity of the small molecule. Hence docking
plays an important role in the rational design of drugs. Given the biological and pharmaceutical
significance of molecular docking, considerable efforts have been directed towards improving the
methods used to predict docking .

1
Meta-Dock

We present a view of a ligand (Figure 1), and a protein as a receptor (Figure 2). This two molecules can
binding them self making a macro-molecule complex that would lock the normal function of the protein.
(Figure 3).

Figure 1. Ligand surface

Figure 2. Protein surface

2
Meta-Dock

Figure 3. Docking (Macromolecule complex) surface

3. Use Case of the Meta-dock application


We want to present here the process of a study of a target , who the user wants to test it on a large set of
ligands. This study needs some steps to be completed before beginning the run of the study on the grid.
In another terms, the molecule that the user will submit on the program should be prepared to be used
with the program. See Figure 4 a to have a view of all of use case.

3
Meta-Dock

Figure 4. Schema of the use case

3.1. Protein of study

In most of case, the structure of the protein is determined with physical process as crystallography,
Nuclear Magnetic Resonance, or by computational modelling, and get back a PDB format file ("Protein
Data Bank"), that contains the description of the molecule. Prepare it means that the user should
eliminate all of trace of solvent used in the physical study. He have to add hydrogens atoms, and add
charges for all of the atoms in his molecule. At end, the user will have a archive PDBQT format file, that
is a PDB format with "Q" charges and "T" autodock_type.

4
Meta-Dock

3.2. Grid Parameter File.

At this point, it’s necessary to determine the area of grid of the target, where the docking program will
try to assemble the molecule with the ligand. This area would normally contains the active domain ( o
catalytic domain ) of the target. The active domain is the part of the protein that is used in the
biochemical process, involved for example in a disease. If the user have an idea of the domain, it could
determine a grid area including it. We should notice that more large is the area, more long will be
running the docking program to search the best position of the ligand.

The docking program will use maps for each type of atoms present in the area of interest of the receptor
and in the ligand, and an electrostatic map. This maps will be used during the phase of binding a ligand
with a receptor. Is to the responsibility of the user to determine what type of atoms he want to include in
this maps. Normally, this choice depends of the set of ligands he will use to do the docking. We will
present below how to get this information back to the user (see Section 5.1)

Autodock use a format file named AutoGrid Parameter File (GPF) to store the information about the
area of study. The gpf file will be used during the initial phase of the screening on the grid. It is, with the
pdbqt file, the necessary to achieve a screening with AutoDock.

All of this steps are doing by the user, with AutoDockTools (ADT). ADT is the free GUI of Autodock
suite. The package MGLTools contains ADT and more program. The size of the pdbqt format file and the
gpf format file is in the order of Kbytes.

3.3. Which ligands?

In a simple case of study, ADT permits to the user to select which one of a ligand to use. In our case of
screening a large set of ligands, it’s out of question to use ADT to select all of the possible ligands. Then,
we can propose two possibilities:

• Different sets of ligands ready to use.


• The user want to upload it’s own set of ligands.
We will define this process below. Because the process would be the same, made by the responsible of
the application or by the user.

3.4. Docking selection parameters

Autodock use four types of search method: simulated annealing(SA), genetic algorithm(GA), local
search(LS) and a hybrid global-local Lamarkian GA(LGA or GALS). In this stage we defined
parameters that control the search of docking, we can select the seed random generator, parameters of
grid energy, step size of ligand (in Angstroms), number of tries, and so. We can define more specifics
parameters for each search algorithm; by example for GA we can modify the Numbers of GA, Maximum
numbers of evaluations, rate cross over.

5
Meta-Dock

3.5. Running

At end, the user will run the application. That is, make a screening of all of the chooses ligands for his
receptor. In fact, the screening is composed by a lot of independent tasks, one task for one ligand. But
before all, the receptor should be treated with the autogrid program.

3.5.1. Calculus of the affinity maps.

The program autogrid will be used to calculate the affinity maps of the receptor. The proteins is
embedded in a three-dimensional grid and a atom is place in each point of this grid. The energy of
interaction is calculated and assigned for this point. The affinity grid is calculated for each type of atoms.
The result of this calculus is a "map" file format for each type of atoms. With this maps, the docking
process is more fast.

3.5.2. Screening

The docking autodock is in charge to search the best position for a ligand on the receptor. Thanks to the
maps generated before, and the chosen algorithm by the user. This step need to have access to the maps
files of the receptor, and of the pdbqt and dpf files of each ligand. The result is a dlg file format. It could
achieve more than 30 Mb. This file contains information about the bests placements of the ligand, and the
energy of the interaction with the receptor.

3.5.3. Classification

At the end of all the screening, we need to make a classification to propose at the user the best hits of his
search. The classification is based on the lowest energy of each docking. Autodock proposes a script to
make this in a automatically way. It is based on Python language. It supports different options that should
be asked at the user in the beginning of the job.

4. Program to be used
We will present all of the programs that we need to propose the Meta-Dock grid application.

4.1. AutoDock Tool ADT

ADT is used to prepare the receptor for the study. It is a Graphical User Interface based on Python and
Tcl/TK. it’s include in a large package MGLTools (http://mgltools.scripps.edu/downloads).

6
Meta-Dock

The installation of this package is very easy, as it come in a binary format. The unique requirement is the
installation of the compat-libstdc++ libraries on the machine that will be used ADT program. the process
of the installation is like this:

jojot@sl4 $ wget http://mgltools.scripps.edu/downloads/ \


tars/releases/REL1.5.2/MGLTools-1.5.2-Linux-x86-Install

jojot@sl4 $ chmod +x MGLTools-1.5.2-Linux-x86-Install

A window will pop-up to ask some questions about licence and installation directory.

To have access to the MGLTools program, we just have to add the directory of installation
/dir/of/instalation/of/MGLTools-1.5.2/bin/ in the PATH variable of the user.

4.2. Autogrid and Autodock

Autogrid 4 will be used to prepare the maps of atoms of the receptor, depending of the library of ligands
chosen. Based of the gpf file format of the receptor, and of the characteristics of the ligands, a file for
each type of atoms will be write, "*.maps" files. This files could be in the order of Mbytes, depending of
the size of the selected area of docking.

Autodock program is used to compute the joining of a specific ligand and the receptor. It needs all of the
maps files, the pdbqt files of the specific ligand and of the receptor. And the GPF file of the receptor. This
file have to be on the same directory where the program will run.

The installation of this two programs requires a C++ compiler, as gcc-c++ that exist on Scientific Linux.
There is no necessity of a special library at all. The process use the famous "configure/make/make
install":

jojot@sl4 $ tar xf autodocksuite-4.0.1-all.tar.gz


jojot@sl4 $ cd autodocksuite-4.0.1
jojot@sl4 $ cd src/autogrid
jojot@sl4 $ ./configure
jojot@sl4 $ make && make install
jojot@sl4 $ cd ../autodock
jojot@sl4 $ ./configure
jojot@sl4 $ make && make install

We have writing a SPEC file that permits the creation of a RPM package file. It should as this be more
easy to install this package on where it is necessary. The tests we realize on different Red Hat
distributions show us that it do not have any knowing issue.

7
Meta-Dock

The executables size in the order of 500 Kb for autodock and 70 Kb for autogrid. At the other hand, the
result file, dlg file format, could be in the order of more than 30 Mb. It have to be considered in the way
that the library of ligand could have 5,000 members. In this case, we will need 150 Gb of storage just for
a simple job result.

5. Preparation of the Ligands


In a similar way as the receptor, the ligands have to be prepared for the docking. That is, the ligands have
to be described in a PBDQT format file. Moreover, for each ligand, we have to determine the covering
set of each type of atoms.

If the user have this own set of ligands that want to try, it is down is own responsibility. That means that
he should give the good files to the program. Of courses, we can propose a method to check if all the
necessary files are present before beginning the run.

In the other hand, we will proposed a set of ligands of interest. Hopefully, AutoDock provide a pair of
scripts written in Python to facilitate this huge treatment of databases ligands.

5.1. ZINC, a library of ligands

We propose in a first time the use of ZINC (http://zinc.docking.org/) as a set of ligands. We choose it
because of free access to all the database. But this library propose commercial ligands that have a license
for using. We have to think in a way to propose the two different licences of the molecules to the end
users.

Downloading a set of molecule consist in get a mol2 file (multi-molecules format file), that contains all
the information needed to generate a pbd format file for each ligands. AutoDock propose a python Script
prepare_ligand.py that can generate in a automatic way, from a mol2 file, a set of directory for each
ligand, containing a PDBQT file. All of this file are necessary to achieve the docking program. The size
of a pdbqt file in in the same order as the receptor pdbqt file.

Moreover, the script is able to generate a dictionary of all the atoms present in the set of molecules. This
information is important in the user point of view, as he will be able to adapt his receptor in the best way
with ADT program.

This treatment have to be done before a user send a job on the grid. And the information about each sets
of ligands have to be accessed by the user, to be capable of chose which one he want to test. About the
storage capacity that we will need, each file describing the ligand is about a few Kb. If we propose more
than 5,000 ligands, we speak about ~30 Mb (just for eela-grid school; the number of ligands should more
large in a production grid).

8
Meta-Dock

In the grid school, we will trying to propose a general set of ligands, and a more specific one (we don’t
have not yet defined it, as the testing receptor is not yet clearly picked out).

6. Benefit of the grid?


The principal benefit on using a grid is on the huge needed of calculating resources. Some docking
search can run during hours. As the screening contains more than thousands of ligands, the possibility to
run at the same time all of this docking program for each ligand is a interesting aspect. But we are aware
that we need a control about all of this process. Moreover, we need storage to save all of the ligand
databases, and all of the result.

Vous aimerez peut-être aussi