Vous êtes sur la page 1sur 12

Massive Docking Flexible of Ligands Environmental l\ichesin Using Parallelized Genetic Algorithms

MICHAEL THORMANN' MIQUEL PONS


Depaftafimt de QutmicaOrginica, Unioersitat Barcelona, de Marti i Franquis,1-11, 08028 Barcelona, Spain Receioed lanuary 2001;acceptedluly 2007 19 4

ABSTRACT: Virtual screeningof large librariesof small compoundsrequires fast and reliableautomatic docking methods.[n this ariicle we presenta parallel implementationof a geneticalgorithm (GA) and the implementationof an enhancedgeneticalgorithm (EGA) with niching that lead to remarkable speedupscomparedto the original version AutoDock 3.0.The niching conceptis introduced naturally by sharing genetic information between evolutions of subpopulations that run independently, each on one CPU. A unique set of additionally introduced searchparameters that control this information flow has been obtained for drug-like molecules based on the detailed study of three test casesof different complexity. The averagedocking time for one compound is of 8.6 s using eight R10,000 processors running at 200MHz in an Origin 2000 computer.Different geneticalgorithms with and without local search(LS)have b_een compared_ an equal workload basisshowing EGA/LS to be superior over on all altematives becauseit finds lower energy solutions faster and more often, particularly for high dimensionality problems. O 2001John Wiley & Sons,Inc. ! Comput Chem 22:1971.-7982,2001. Keywords: automateddocking; geneticalgorithm; niching; parallel computing

Coftespofidence for M. Pons; e-mail: mpons@qo.ub.es or M. Thormarur; e-mail: michael.thormann@morphochem-de 'Present addressiMorphochem AG, Gmunder 5|.l,.37-37a, 81379Miinchen, Germany Conbaci/grant sponsors: EEC (TMR) and Studienstiftung des deutschen Volkes/BASF Contract/grant sponsor Spanish Ministetio de Educacidn y Cultura; contract/grant number: PB97-0933 Conkact/grant sponsor: Generalitat de Catalunya Journalof Computational Chemistry Vol. 22, No. t 6, 197i -1982 (2001) O 2001John Wiley & Sons, Inc.

THORMANN AND PONS

Introduction
he number of known three-dimensional struc_ ture-s of biological molecules is growrng ex_ ponentially. Current efforts to deterrnine the tiree_ dimensional structures of all the expressed proteins in severalliving organismsare expectedto increase the number of available structures dramaticallv A significantnumber of theseproteins are potentiil targetsfor drug design. high throughput screeningmeth_ ,Experimental ods are berng hcreasingly used to find new lead compor:nds that bind to a target protein.L 2 However, there is a need for fastei ani less expensive computational methods to perform a virtual pre_ screeningof largecompound libraries3-swhere qe_ " netic algorithms (GAs) are successfully uss6.r,z algorithms are designed to automati_ Polhg cally find the best mode of inieraction between a small,possiblyflexible,ligand and a large,usuallv rigid, macromolecularreceptor.3.8-la Th'is is donl by minimization of the energy of interaction,which is a complexfunction ofvariables describingthe po_ sition, orientatiory and conformation of the-ligand. In this article we present an enhancediearch algorithm that employs a parallelized implementa_ tion of AutoDock 3.0 on eight CpUs oi a Silicon GraphicsOrigin 2000.Our implementation reduces the real time needed to dockln average ligand to &9 s. This would allow the search of a 1S,0d0com_ pound library in about 1 day. AutoDock uses a grid-based method to eval_ uate the_ interaction energy between the rigid .macromolecular receptor, and the ligand and is puru,n"q3gljo provide esrimatesoithe binding constant.''.roThis approachis well suited for dock-_ ing of a large number of ligands to the same pro_ tein, becausethe grids are calculated only once. The AutoDock force field has been proven to lead to docked structures that are in close agreement with known X-ray structures (rmsd < i A). ne_ cently, Diller and VerlindelThave evaluated iiffer_ ent_search algorithms for the molecular docking problem. An extension to GAs is the introductioi of a Lamarckian genetic algorithm in AutoDock 3.0. The genetic algorithm allows a very efficient search of the high-dimensionalsearchspice composedof the position and orientation variables plus all the conformational degreesof freedom of the ligand. Randomvaluesare initially assignedindependently to the different vdriables, and-random mutation! of individual "genes" help maintain the variability. Various genetic operators are applied to the

individuals of the population, their fitness is evaluated, and those that have better-than-average fit create offspring in the next generation. In a Linar_ ckian GA, the valuesof the viariables adjustedin are a determrnistic wav bv a local optimizer, and these dterministic way by "adapted genes"becomeheritabietraits if better fitrdapted better fir_ nessis achieved. Genetic algorithms are inspired by the evolu_ .. tion of an isolated population ;f individuals under the pressureof selection.Niching has been proven -design t? b9 of genetic .u "r.uF,l^ concept in the algorittms.lE-2oThe whole populatiin is d'ivided up rnto a number of isolated subpopulations,and evolution takes place independently in each sub_ population Hence, evolutionary pressureis deter_ mined only by the competition bltween individu_ als inside each subpopulation. Niching introduces vanab rty betweenthe different subpopulations,as each subpopulation stems from different randomly createdancestor_ genes,and this variability can be exploited at a different level by horizonial gene_ j transferbetweensubpopulation fhe . 1jc;fiinggonc"pt tun be introduced naturally h parallel implementations of genetic algorithmi as the evolution of eachsubpopulatio., c"i b" puted in a separateCpU. "o*_

Methods
PARALI,ELIZATIONOF AINODOCK In nondeterministic methods, correct results have to be sought on a probabilistic basis.The searchpro_ cedure is repeated several times for a given ligand with the same input for its strucfure, receptor, and searchparameters. The resultsare then cluitered ac_ cording to geometry,ran-ked energy,and written by to the output. In our parallel implefientation, one individual search is pirformed in each subthread. In this SMP (sharedmemory processing) implemen_ tauon only one copy of the interaction energy grid maps is necessary.Thus, the total amount o? RAM required is reduced. As the docking of entire struc_ tural databases might involve bg;ds with many different atom types, the saved RAM can be used to store a larger number of different grid maps. Alter_ natively, higher resolution grid maps may be used, which could allow higherlccuracy in t'he energy evaluations. Finally, as the grid si2e scalesas n"5, where n is the number of atoms, the more efficient use. of RAM may be crucial for the larger macromoleculartargets. Figure 1 shows the implementation of a Genetic Algorithm with Local Seirch (CAlLS) docking al-

1972

vol. 22,NO. 16

MASSIVE DOCKING FLEXIBLE LIGANDS OF The parallel implementation results in a reduction of real execution time of the GA/LSbased docking. The maximum speedupthat can be achieved theoretically is N, which is equal to the number of processors.Under nonbenchmark conditions, the speedup reachesabout 95% of this value. lndependent populations undergoing evolutions can be seenas isolated subpopulations of one whole population in a genetic algorithm with niching. Parallel processing, although not essential for the introduction of niching. offers a natural way of implementation, as the subpopulations are available concurrently. This feature has been used in the implementation of the enhanced genetic algorithm (EGA) described in the following sections. NOVEI,PARA.I,I,EL EGA WITH NICHING There are many possible algorithms to introduce information sharing between subpopulations.2l 22 In the following we presmt a possibility that offers both flexibility and simpliciry The Klng and Pawns EGA Algorithm Gene transfer between individuals ("Pawns") from isolated subpopulations is made thtough one external individual, which is called "King." The subpopulations do not "see" each oiher directly, but shareinformation only via the King. The usual genetic operators drive evolution in each subpopulation: Crossover, Mutation, and Elitism, along with the necessaryenergy evaluation (Mapping) and Generation of new populations. Two new genetic operators carry out the information sharing between subpopulations: Pawn_to_King and King-to-Pawn. Three new parameters that control these genetic operators must be set for the EGA docking protocol: king_freq, king_diff, and pawn_freg. Pawn_to_King: The fittest Pawn can become the King, i.e., overwrite the actual King's geneswith its own ones, if it fulfills two requirements: 1. Eritt""t king_diff < EKns.Its fit (docking energy) minus a selected value must be lower than the actual King's energy. The parameter king_diff is given in kcal/mol, and can be negative or positive, thus accepting better genesor worse genes for the new king/ respectively. 2, The fittest Pawn that fulfiils the fust reouirement must also be selectedwith the freouencv king-f req; i.e.,the fittest Pawn is not ilways accepted kiag_freq < 1.0. if

..rllE it r.Der-JM. ..dh xm|l

Dflr.r

ll.dl wdnr.. h!.ncoM rt hr.m.Lql! rdll,ndLE -I&IIG

parallel FIGURE 1. Current implementation GA,/LS. of Mostof theworkis executed oarallel in tashion considering thread-safe integrity eachof the data on eightsubihreads. Eachsubthread executes threeof the 24 evolution runs.

gorithm on a shared memory processor machine, a Silicon Graphics Origin 2000 with 54 processors. The parallelization fulfills the latest OpenMP standard and, thus, makes the implementation portable to a variety of machines. The results presented in this article make use of only eight processors to facilitate implementation on affordable machines, and the program has been ported successfully to an Intel processor-basedSMP machine (Dell PowerEdge 6300 with 2 GB RAM and 4 Pentium III Xenon processorsrunning at 500 MHz). This form of parallelization reduces necessary and slow context swikhing between serial and parallel execution to a minimum, as parallel regions are entered and left only once for eachligand. Also, synchronization of the concurrently running threads is reduced to a ririnimum as evolutionary searchescan run uncoupled due to their independence.

JOURNAL COMPUTATIOML OF CHEMISTRY

'|973

THOBMANN AND PONS

operators incA(normar) appried 5,13H&t,i;3f""tic


King_to_pawn: The King inseminates, i.e., over_ the Senes a randoirly sele"r"a pu# of l"i,i lltjl ffequency me pawn_freq. Figure 2 illustrates the order of application of the , additional geneticoperatorsin each J"bp;;il;;; and Figure3 showsi scheme or tne implementation of the King_and_pawnalgorithm. *u,b"*hg of ihe search, each irdivid._^l!trt ual each subpopulation is initialized ,""d;;t diver:iry of generic inrormarion is maxi_ ]-lir,^F" trrdr.r-/rverstty decreases during evolution, and the analyzesof success ratesover tire generahonsshow that a plateau is reached after a rather low num_ Der of qenerations(see Results). Ne_ ilformation rs rntroduced by mutations but also becuusesu-h populations are reinitialized u FIGURE 3. TheEGA,/LS derait. in ulations becomes relatively time consuming. In -,rl?

problem must be solved. :'1eq I his problem and an exiellent solution have ;;

i_pr"**ir,t"", l:1!flllt in theparatrer neaprontention

."j

*:T'-Y
;::.o"".

successtully applied in our implementa_

by 8".ff1et al.2a tiur:"i'uai*ii rheir

LocalSearch (LS) tt u use of a **T.^!1-"..*u" nmrzerincreases9AltS, individuals local op_ the fit of ;;;.c5Local optimization hu, b;;; :*":.g*ce. ^"111"-l,. dramadcally irnpiove the ,u.""rr'*-* :iT. searches ot GAat the expense a larg", nuilurli of energyevaluations. cf,oice ine fl:^1ol*-Iq oi rne iocatoptimizer determires tf,e nrmb", oi eil ergy minihization steps and th" oi th" "ft"i"."y Local optimization can,however, causethe sys_ ."li; trap.ped lr""l .i"ir"";;;;; in ::i^t: _g:, ror protocolis finomg a proper bal_ ^a,successful

t<tng hdividual, which is not rehitialized,,carrieJ the valuableinformation to the next round, and bi_ asesthe evolution of the new population rapidlv

j?f: .,#l;?:;.ili[":Ti1[ 3g1qemeJ


solurion, hd"ltrj. il;?;; (.""

i.Sg9a -t",*":d: who is.initialized with random values at the beei;_ ning of the first run, improves dr,tu;";; sharing information with the besr pawns. "'y"i"1; r,rom a technical point of view, the dynamic process of creating, copying, ;;;il;;;: ""d
1974

lll"_o_:Y"* ano matntenance the diversitv of

convergencetheLcai of d;;;

vol.22,NO. 16

MASSIVE LIGANDS DOCKING FLEXIBLE OF


The originalCAlLS protocolof AutoDock3.015 consistsof 10 runs with 1,500,000 energy cvaluations per ligand using the pseudo-Solis-Wets optimizer. We have explored the pseudo-Solis-Wets and the Solis-Wetsoptimizers (see Results).Although the latter, to our experience, requires more energy evaluation steps, it succeedsmore often and leads to faster convergence.ln contrast, the pseudo-SolisWets optimizer requires fewer energy evaluations, thus and favors global exploration. In our EGA protocol, the diversity is increased by enlarging the total number of individuals spread over the subpopulations and maintained by allowing only weak interaction between them. The best results have been obtained applymg the Solis-Wets optim1zer, TESTCASES During the development of EGA/LS we have focused on three different complexes for which the structure is known. The benzamidine-B-trypsine complex (3ptb) and the XK263-HlV-protease complex (lhvr) were taken directly from the AutoDock 3.0 distribution and, hence, allow straightforward comparison with previous results obtained by Morris et a1.15 The forskolin-adenylyl cyclase complex (1ab8)has been studied by Behnke et al.ro The diterpene forskolin is a mid-size drug molecule (MW : 411) that fills the complexity gap between the trivial benzamidine and the more complicated XK263 ligand. The structures of the ihree ligands are presented in Figure 4. As a test for a problem of high complexity we have additionally explored the assembly of two fragments of protein A (1bdd). This protein consists of three tightly packed helicesthat interact by hydrophobic contacts. A 14 amino acid a-helical pepiide was removed from the crystal structure of proiein A, and we studied the docking of this fragment to reform the complete structure. For the macromolecular target and the peptidic ligand, po-

lar hydrogens 22.0 wcrc added,and CFIARMm


partialcharees wereassisned.

Rcsultsand Discussion
OPTIMIZATIONOF EGA,/LSPARAJIIETERS A successful implementation of the EGA/LS algorithm requires finding a proper balance between the parameters that favor variability and those leading to convergence. The optimum is expected to be problem dependent but it is likely that similar parameters are applicable for problems of similar complexity. Most drugs comply with a series of structural and physico-chemical properties.s The three test caseare iepresentatives of different levels of complexity in druglike molecules, and we believe that the parameters reported should be of generaluse for lead search.For highdirnensional problems, however, these parameters and the docking protocol as well might need adjustment. The best protocol we have found so far is the 8.3 . 10,000 protocol,which consistsin eight coupled subpopulations undergoing three rounds of evolutionary searchwith 10,000 energy evaluationseach. Elitism is applied on two different levels: within each subpopulation and from one round to the next via the extemal King. After the third round has finished, 24 individuals, the best of each subpopulation after tennination of each round are clustered geometrically and ranked by their docking energies. The frequencies king*freq and pawn_freq control the coupling of the subpopulations.With pawn_freq : 0 there would be no Bene transfer between ihe subpopulations,and the ECA/LS algorithm would be equivalent to 24 conventional GAILS runs. With pawn_f req : 1, the King would overwrite all individuals when a new generation is created and, hence, diversity would be lost instantaneously. Here, a balance must be found between gene transfer and maintenance of the diversity between the subpopulations. To optimize the King, king-freq must be greaterthan zero. The way the King evolves is controlled by king-diff. A largenegativevalue for king_diff may prevent evolution of the King and prevent the evolution of the populations. Positive values of king-diff allow the King to be replacedby less fit Pawns. Slightly positive values may be useful in high-dimensional problems to avoid local minima.

,b)d

FIGURE4. Structure the ligands of usedin thiswork: (a)benzamidine, XK-263, forskolin. (b) (c)

JOURNAL COMPUTATIONAL OF CHEMISTRY

1975

THORMANN AND PONS


The parameters have bem chosen from a large number of random combinations so th;;;#;:: sutts are obtainedusing the three test cases. ro-: experimentsallowing the parameters to"r,ploratory --l.uj cover the entire r"nge of vaLes, tor,each test case. 1000 experiments'were set-ui wrth random values for the three parameterswith S 0.8, 0 s pavrn_freq lr"= _lt"n--t.:s. S

t,.

0!

0,,

&.,r-tr{

0,,

0,,

a,

Dependence ofEGA,/LS enersies ,i,j.9y.ff1 values pawn_freq dockins wrnvarying for andking_fr-"qio66 in emptoyrns random ::q:lTTj:.*irh t hwxK_265 vatues ror krng_diff
between _1 and 2 kcal/mol.

g:::':1"-" Thuqevolution jin**t riup"pri"'i" nonsrunsuncoupled some for generatio", tfii. Oj.
TABLEI. e"st Do"king par".erersfo@
Parametef ga_pop_size ga_num_evals ga_etitism 9a_mutation_rate ga_crossover_rate ga_window_size ga_cauchy_alpha ga_cauchy_beta set_ga sw_max_its sw_max_succ sw_max_fail sw_rho sw_tb_rho rs_search_freq set_sw'l ega_king_freq ega_pawn_freq ega_king_ditf ega_run anatysis Value 25 10,000 1 0.06 0.80 10 0 1 300 4 4 1.0 0.01 0.06 0.150 0.030 0.500 24 BriefDescription number individuals subpopulation of per number energy ot evaluations rouno per number individuals suMv; oJ that auromatically one from the ,_-generation.to nextper subpopulation rrequency individuals of underg;ingmutation rre_quencyindividuals ot underloini crossover prckingthe worstjndividual aftJrlO"generations parameters usedin the mutation genes of parameters usedin the mutation genes of GA Penorm search parameters usedby the localoptimizer parameters usedby the localoptimrzer paramelers usedby the localoptimizer parameters usedby the localoptimizer parameters usedby the localoptimtzer nequency individuals of undergoing localoptimization emptoy Solis_Wets the optimizer
b b

rametersare presentedin Tablel. r'or a population size of 25,pawn_treq . hasbeen tound to be around 3%. This iesults i. O.-ZS p"r"r, on averagethat are replaced fv tf," fC"g i" *"rv

;ii"i,ffi,ilT;i:,""?,,i*,1 Tfil,iil"Hll1

!ig. 5). Final parameters have been,"l""tia'f.oon the three_dirnensronal surfaces I1t: Tgr*r

fr'#"I#"T H;",';:: li"# ;; ; examined(for.,i".i, see been an example, :l.r"liy"

do thatmanyEGAruns(using eightCpUs,thisleadsto threerounds) peform ranking clusteririg and aialysls

aga_andsW_palamete6arestandardAUto.|^^l^-"-^,^. rottuate Autodock parameters controlling genetic the {""" u"u, eur.i"-nirro-oiJr,-.' argorithm the soris-wets and locarsearch - E(:A_ parameters delined thetext. are in

1976

vol.22;NO. 16

LIGANDS OF DOCKING FLEXIBLE MASSIVE


din-l Iprwlr.;"b.utut
I I

| . "yu.a L---------r-------

cydll cyclc2

Ez
3r

20

40

60 fto lottl gan.ration

i00

1m

{{0

operator FIGURE6. Etlectof the genetic duringa successtul in King_to-Pawn onesubpopulation run of 'l hvrr(K2s. EFFECTOF f,I\MROI\IMEMAI NICHES Our new algorithm was tested with all three casesabove, performing 24 evolutions for each li5 and. This procedure was rePeated100 times, thus leading to 100 final complex stnichlres (solutions) with corresponding binding energies for each ligand. To measure the effect of environmental niches, we have compared the success rateg after 100 comparable selrches on the three test cases us-

ing GA, EGA, GA/LS, and EGA/LS. Solutions withh 1 kcal/mol range from the lowest energy solution are considered to be correct. In all three test cases,the lowest energy solutions obtained by EGA/LS and by GA/LS respectivelywere identical. To compare the different protocols on an equal workload basis, the total number of energy evaluations per solution was kept constant to 240,000. The totai number of individuals was set to 200in all casesexceDtin one GA/LS run where the number of individuais was reduced to 50 allowing an average number of energy evaluations per individual four times higher. For EGA and EGA/LS the individuals individuwere divided in eight niches containing 2.5 als each.The experiments are illustrated in Figure 7. GA and GA/LS runs provide a single solution ln the EGA and EGA/LS runs the lowest energy conformation out of the 24 resulting lrom the 8 3'10,000 protocol was considered as the solution. The 100 ranked docking energies for GA, EGA, GAILS, and EGA/LS are presentedin Figure 8. The effect of niching can be deduced from the comoarisonof the EGA and GA runs. In the three test iases, the introduction of environmental niches in EGA leads to the finding of lower energies.The energy differences are larger in the more complex case Lhvr, for which the ligand XK263 has more degreesof freedom (18) than forskolin (12). None of the two genetic algorithms alone, howevet is

!l L__l *
---r--

Fl-:

[T"'-'
| |
L------|obu!r|10o

ftTilTm ffi
ttl L"--Jolutlorlao

ililililr ffffilffi
L----r.luto.tlc{

localsearch maintaining usingdiiferent withandwithout GA FIGURE7, Thedesignof the seriesof experiments per of constant totalamount energyevaluations solution. th6

CHEMISTRY JOURNAL COMPUTATIONAL OF

'1977

THORMANN AND PONS sufficiently,successful in finding reliably correct so_ rutrons under the harsh limitation .r ,infy Zi6,oOi0 energy evaluations. shown in Morris et al.,rs the local .^jj" th"O{ searcn,remarkably enhancesthe successof Aenetic algonthms.Figure 8 shows th"t b"th EG;'ri; ;; are remarkably improved bv th" i"h;;;;;;;

.r ilT T3,p::i""1,".u tr,"'"i.r,"a EGA dockincs ""j iiJi)i tsoz"uc.e.';l,uffi,?tift;:,il


,rsirly :pl:,,::d:: "lbqd) c-n:ncs (d.1, d2,_and that interactU'y ;"ik"; a3) i,i,6r.o1.,"_ olc,rnteractions. Helix a3 1Se.c_efu$1,rvr,r1'""io,ii a^ndused as ligand for the remaining prJ_ tem macromolecule. "f__i When _", ao?r."a lr-rlgi; trgand, the resulting complex "s s
Theeffect nichingseems of to __ im_ p_ortant high-dimeisional beparticularly in p.ob6_". ;;;'t#' ;

lil.y:b":

of individuals (66%).For rhe simpler

tt_comparable the va.lue to of 1 used ""u.t oi ii" ergntsubpoputations rhe . . i" in 8 3 10,000 ;;;col restcase,using eight 1i^-1" leads Jht. crea rly to thecorrect so'iufiin "uUpofrfurion, (76y") tnan using a singlepopulation m oreotten with the ,u*"-,oj

standard p;;;;il eri tism

or :l:._"r"",.9f naverun the EGA/LS protocol "iitil.,1"i sinqlepopuration of 200i"di"i;;';.";;;:1 1

atso finds them more often. and pawn algorithm mkoduces _*".kr1g a de_ fl:"-.o.f "]fti:r". i"ross the;oputauons. To evaluate

test cases,'however, theEGA,/LS Tjlj.,h 9: algonthm T*" findssolutions _ith lo_u;;"-Jr; :fr

consists ir,*" of

niching independ"ntfy

;":i [T,:.Hr;:l:

tor_ the li.i",, 9:fl.*t of freedom) EGA/G;'t-s;;; cor ld.still low energy structuresin accordance ,find

;":.""Tb.1"-1,1;F..?i:H:ffii*'r"" w_nen side chainswere the allowed to rotate (22


s*ucture (rmsd_ 15 A;il;;:
/, y'r_backbon" *.r.i..

l-t,lnall Touy natly 1: the side chain and

freely, ;ftt;;;;;, llgl"r.L]" ltbyed to rotare ro*io.nul degrees freedom, dickings of the _.1^_. l,
were Fapped in local minima. However, the energy

60

rhepseudoso,s_w;i;;ffi,;';#"jl:m:lfjgn*jm:f:Hlg,illi:1,:fl;,l8y
1978

of wittrnicning, evj,ition.';" "no u conulation 2ooindividuals' three oplmrzer. EGA,/Ls-sw.1 tloo"pulation usino Solis-wets ,200-3.c:tlT, argorithm";ithoui and the iiffi';;;l argorthm. Three evelxfisn but 16un6. *,e.le,tlc the searcn usinjirr; ;;i;;:';r:"J '"o'"'ouals) usins kinoandpawn usrn-s pseudo the soris-wets optimi]T-locat

r,ff #**:q',"i#'"Hi{!e':f;}i lll,ltli:iTd;lf,.j;#l{iftiif+li]r:rii'if :eu;:t:ldxi';":flr'],:#t*d#li:i":Ji[1fi[,x'.,11'll[]lreeevo'u'lionroun


vol.22,NO. 16

withouilo&,search-an;;;A;fi:'"b/j;1i, fl..H,.f:i:,idjj:y,f?ill&:,",fi,i"",i"+:l;:mf;

FIGURE Comparisontheral B. of resulting oased genetic fron on atgorithms: all,l,o.-1o-1'nq (A) "n"rgies

LIGANDS MASSIVE DOCKING FLEXIBLE OF

50

60

20
$

t0 14 10 I

Er a-. ii
-10 -12 -1 -14 '20 -22

40

60

ao

70

a0

90

t@

FIGURE8. (Continued.)

of the soluiions found was always much lower in the EGA/LS than in GA/LS runs, indicating that the EGA searches, although finally unsuccessful, proceededfurther towards the lowest energy target EFFECTOF POPUI,AIION RETNITIALIZATIONS Figure 9 shows the probability of finding the correct solution in different generations of three consecutive evolutions using EGA/LS in the three test cases studied.The probabilities were estimated from the frequenciesobtained after repeating the ex-

periment 100 times. For flexible ligands, a plateau is reached after about 50 generations in the first evolution, indicating that the searchhas ended in a local minimum. Reinitialization of all the individuals of the population, exceptthe extemalKing, introduces new genetic information and the population gets out of the local minimum and allows the evolution to proceed.Now, the preoptimized King biasesthe evolution and the success rate is greaily enhanced. increase success To the ratesevenfurther, a third evolution ofall subpopulationsis performed. This protocol achieves successrates of the order

JOURNAL COMPUTATIONAL OF CHEMISTRY

1979

THORMANN AND PONS


I

ber.of.energy evaluationsthat are used by the local

a01

protoior fil;ilT f ;13,,1i in,h",ri;";;;;;;


DOCKING TIMf,S
30ao60@ gsn6laDon

performed ;;;;;;; p"; l9j-y*bu:' gf gg"erarions

;J; lfdy"L.!* ;;;h: auonsisfixed. The maximum, minirnum, and aver_

thetotalnumber .f

dg"Fg times for the three test casesare _,I". presentedin Table III. These are real times and
B

0.6

a
R

iol

thenew-algorithm *f,i"r_, it uff, ruUriu"_ :,1?ruI *" .pU rirne required rh"l;;k;s fo, :I:T:;1T;.
30 ,to 50 60

dockingtime for the threetestcases . ^rneaverage rs E.6s. ."u.cf_r, 35,000 .ornpourra,oitiJ -In-a_real u, erwent.DMdatabase weredockea u poiurri". io el h (average s/liganai.ri e.a rhis ;_ ::T_"] T is partly due to work*sharing letrueln l:lfT*, processors, pararet but mostly due to in?ormati-Jn

fi#,*%s :?*"Ti*H:ET;f:r*

c
a H on
E _-

ti-" d," "';."1yl#;H;H:#:";:lilH:


18 o:9r not dependonly on the ti-u

not be assumeda.priori, red-ucing unnecessary tor_ slonal degTees freedom of can

ti,1er_or XK263_Htv_!roteur".o_il"-i the lffS rc^reduced 5.0s. Although,rg;"a to .tg,;il'ir*

a expected,docking times T,, i?il; T"T.8JL "",,o.,1 the torsion angles, the EGA/LS " "i.,'u"?;'ir'#:"": nxmS l::j]:. :I

As

20

30

ao 50 gdoralion

FIGURE 9. ptotsof the success ratesof EGA/LS after ,119" of 1o,0Oo energy evatuarions lul,o_T'lg m erght parallet "y:tes 1ootimes evolutions foiit r"" rnoj"i-

Elffii;-ltltt"

enzamidine' Ft raoa/orskoii,

ff1";?""1"f,., cvcnthc "f ror XKio:-jil: J" rf v Hffi::8,1"t11' :::_i)."rj_llu,",:i tesrcaserhuscan bc esrimatcJro shown).
of generations not fixed in the is ^^,In:.lr-O"tand catculahons, its number depends thenum_ on 1980

8t]O0"/.: delending on the number of torsional :f^ oegreesof freedom.With rieid ligands, til p;;: bility to find the correctsoluiion

j,i."":y*i,

fft

;tj;irn*: i1:d.rTr#ili:'9,.i m:: :-,ff r#ril:: lii:".i,Hl ifi:iJ;i:


Til;;;;: ::,,r,"j":,"d au,ury o can be estimated from the frequencv of ,r,rimes.unrit _ o)i . ,. (1

:.,l'.LiT:j T ::i:",:m::ff: ,Til;:':::ff #iil:l'i::'?:,1;',: iffil'"TJ,;jl: r*!i',"d


n""ded-to

Y"llq

grve proba6itity ,r".*, ot-0.s9. of _a

.1;g ffi :ff::,'"#5 l-"iTffi ii,i'?'ilTeffe.t ;il;;; rhe purtimes.

*ith; fSl/LS prorocor ;i.;;;

;5i+ilffilff?Il"f :xJ ;"".:: ff: i*ffi


vol.22,NO. 16

[fii{txiil;:ilTf"lxxH:i:t1"":# ;t

MASSIVE DOCKING FLEXIBLE OF LIGANDS TABLEII. Number Generations Round of per E.proyffi
Test Case XK263/l hvr Forskolin/1abB Benzamidine/3ptb MinGenerations (200Individuats) 32 26 31 MaxGenerations (200Individuats) 71 60
o6

Average Generations (200Individuats) 48.4 42.8 48.3

rng ot large databases.

.L.lld be in-creased to 10to havea proba_ up lff ot success Druty of 0.99. of nichingmay havean impor.^lT.iIgO:"n." performance the of dockingprotocols :ll: "u":l T that could lead to substantial savingsL'dockine timesin thecase complexmolecuf"J O,,,tf," of .tf,"? nano,caretut optimization the dockingprotocols of t9 ableto do high thro"g"hiutaoclc ]: T1,1"t"1 9"

p"*ible, for example,when optimizing ::::r^T,, runcuonalgroups in chemicalspace.

for massive virtualscreening making dockingof entirestructurat pogsiblg.the databases m reasrble timeson affordable machines.

isonly of Oresented one many con_ ^*IT,1t"nThplg-*tarions. rt represents, however, :lt]:ol: tool a surtabie

Aclmowledgments Conclusions
n"y dockingalgorithmbasedon AutoDock ^ ^1 J.u has.been implemented. Improvements hclude a parauelzed of the programand a new -version
We thank Profs. Olson and Goodsell for useful ,. discussionson a previous verslon of this manu_ scnpt, and Prof. E. Giralt for conhnuous encourage_ ment and support. We thank C4_CESCA fo. u ge1.,_ erous allocation of computer time.

geneflcalgorithm that uses the concept of niching. Individual subpopulationsare run on independerit making full use of the advantases of lr-"", T.rlt me parallellzedprogram. The enhanced atgorittrm

References
1. Herkbe!& R. p; pope, A. CurI Opin Cheh Biol 2000,4, J.

!l911tsl"lt:*.'thi coTlo^uld data basein slightly more than 1 diy. EU11/Ls ro be more robust than CelLS in ,. :eem nrgn-dlmensional problems,and can provide much Defter solutionsin a shorter time of the generic algorithm with ^ ,I\ "oTbT"dgn a.rocatoptimizer is necessary effidient for dockine or au but the simplest molecules.Irr the absence oi a -localoptimizer,_EGA always superior is over GA ano mrght be applied in cases wherelocal optimizaTABLEIIt.
EGIy'LS Docklngnme. Test Case XK263/1hvr Forskolin/1ab8 Benzamidine/gptb xK263/1hvr (rigid)

dockinga rs,ooo of a'rug_iire

c; M ' 3ilt"i:##::fl'";&& &;Arenbv' Banks'Drug


,. ;t;:;u.t a., a"*".s, C.; RognarrD. J Med Chem2000, 43,

" lilf?'H,:;#l'M

r';Murcko'M DrusDiscoverv A
Bioteclmol 2000,ll,

5. Hoplingea A. J.;S Duca, S. CurI J. Opin

6. Weber,L. Dlug Discov ery Today 7998,3,j7g.

Energy Evaluations per CpU 3 x 10.000 3 x'10.000 3 x 10.000 3 x 10.000

Docking Timeper Compound in Seconds 14.6 3.8 5.0

prj-riTt JOuRNAL CoM OF ONni=H lvtmv E

1981

THORMANN PONS AND


7. Weber,L. Curr Opin Chem Biol 199& 2, 381. 8. Gane, P J.; Dean, P M. CuIr Opin Struct Biol 2000, 10, 401.. 9. Sotrif{er, C. A., Flader, W, Winger, R. H.; Rode, B. M.; Liedl, K. R.;Varga, M. Methods 2000, 280. 20, J. 10. Gohlke,H.; Hendlich,M.; Klebe,M. G. J Mol Biol 2000,29S, 337. 11. Godden,J. W.; Stahuh, F. L.; Bajorath,J. J Comput Chem 1999 20, 1634. , 12. Stemberg, J. E.; Gabb, H. A.; Jackson, M. Curt Opin M. R. StructBiol 1998,8,250. 13. Apostolakis, Pluckthun,A.; Caflisch,A. J Comput Chem J.; 1998,19,21. 14. Rarey,M.; Kramer, B.; Lengauer, T.; Klebe, G. J MoI Biol 1996, 261,470. 15. Motis, c. M.;Goodsel, D. S.;Halida, R. S.;Huet, R.;Hart, W. E.; Belew, R. K.; Olson, A. J. I Comput Chem 1998, t9, 7639. 16. Behnke, D.; Henni& L.; Findeisen, M.; Welzel, p; Miillel D.; Thormann, M.; Hofmann, H. J. Tetrahedron 2000, 56, 1081. 17. Di[e!, D. J.; Verlinde,C. L. M. J. J Comput Chem 1ggg,20, 1740. 18. Gillet, V"J.; Mllett, P; Bradshaw, J.; Green, D. V S. J Chem Inf Comput Sci 1999,39, 169. 19. vieth, M,; Hirst, J. J. D.; Domin, B. N.; Daigler, H.; Brooks, C. L., m J Comput Chem 1998,19,7623. 20. Jin, A. Y; Leung, D. F.; Weaver, D. F. J Comput Chem t99Z 18,1971. 21. Wild, D. ).; WiIIett, P.J Chern tif Colnpur Sci 1996,96, 159. 22, Jones, Willett, P; Clen, R. C.;Leach,A. R.;Taylor,R. Mol C.; J Biol.1997 ,267 , 727. Hou, T. J.; Wang, J. M.; Liao, N.; Xu, X. J. J Chem Inf Compur Sci 1999,39. nS. 24. Belger, E. D.; McKinley, K. S.; Blumotu, R. D.; IMlson, p R. Hoard: A ScalableMemory Allocator for Multithreaded Applications; The Ninth Intemational Conference on Architeitural Support for Programming Langrages and Operating Systens (ASPLO9D().Cambridge,MA, November2000. 25. Lipinski, C. A.; Lombardo, F.i Domint D. W; Feene, p J. Ad Drug Delivery Rev 1997,23, 3. Thormann, M.; Pons, M. unDublished results.

1982

vol.22,NO. 16

Vous aimerez peut-être aussi