Académique Documents
Professionnel Documents
Culture Documents
COMPUTERBASED DE NOVO
DESIGN OF DRUGLIKE MOLECULES
Gisbert Schneider and Uli Fechner
Abstract | Ever since the first automated de novo design techniques were conceived only
15 years ago, the computer-based design of hit and lead structure candidates has emerged
as a complementary approach to high-throughput screening. Although many challenges
remain, de novo design supports drug discovery projects by generating novel
pharmaceutically active agents with desired properties in a cost- and time-efficient manner.
In this review, we outline the various design concepts and highlight current developments in
computer-based de novo design.
DE NOVO DESIGN Molecular DE NOVO DESIGN produces novel molecular only those candidate molecules that represent a local
The design of bioactive structures with desired pharmacological properties neighbourhood of the search agent are considered at
compounds by incremental from scratch. In this approach, a medicinal chemist a time. The map guiding the search to an optimum is
construction of a ligand model and, equally, de novo molecule-design software is constructed en passant along the search path, and there-
within a model of the receptor
or enzyme active site, the
confronted with a virtually infinite search space. The fore dynamically evolves during the search process.
structure of which is known number of chemically feasible, drug-like molecules The virtual search agent mimics a medicinal
from X-ray or NMR data106. has been estimated to be in the order of 106010100, chemist, and scoring functions perform a function
from which the most promising candidates have to analogous to virtual assays. In the ideal case, such an
be selected (cherry picked)13. Such a large space pro- in silico laboratory provides a road map that guides
hibits exhaustive searching, despite great advances in the agents to high-quality molecular structures via
high-throughput screening (HTS) technology. Instead tractable synthesis routes. Positive design restricts
of the systematic construction and evaluation of each this virtual optimization process to small regions
individual compound, navigation in the de novo design of chemical space that have a higher probability of
process relies on the principle of local optimization, containing drug-like molecules. Negative design, by
which does not necessarily lead to the globally optimal contrast, defines tabu zones that are characterized by
solution: the design process converges on a local or adverse properties and unwanted structures4,5. Still,
practical optimum. In fact, most software implemen- there is no guarantee that a molecule will be retrieved
tations are non-deterministic, and rely on some kind from chemical space that finds immediate appraisal
of stochastic structure optimization. from a synthetic chemist. In this context, it is essential
Johann Wolfgang Just as chemists with different backgrounds are that we learn to accept that de novo design will rarely
Goethe-University, likely to propose different molecules as promising yield novel lead structures with nanomolar activity
Institute of Organic solutions, multiple runs with stochastic de novo in the first instance. Rather, the designed structures
Chemistry and Chemical
design software will produce different compounds as will probably represent examples of a prospective new
Biology, Beilstein Endowed
Chair for Cheminformatics, a result of the nature of the search algorithm. The lead series with micromolar activities that require
Marie-Curie-Str. 11 trick is to incorporate as much chemical knowledge further optimization.
D-60439 Frankfurt as possible about the structure of the search space The artwork Development II by M. C. Escher pro-
am Main, Germany into the design algorithm to facilitate directed navi- vides an illustration of the concept of chemical space
Correspondence to G.S.
e-mail: gisbert. gation towards a goal location. As it is impossible to (FIG. 1). There might exist several activity islands in
schneider@modlab.de enumerate all possible virtual molecules in advance chemical space, represented by the well-shaped and
doi:10.1038/nrd1799 because of the problem of combinatorial explosion, formed lizards; that is, drug molecules with desired
Concepts
Basically, three questions have to be addressed by a de
novo design program: how to assemble the candidate
compounds; how to evaluate their potential quality;
and how to sample the search space effectively. De novo
design is faced with the problem of combinatorial explo-
sion: the number of different element types and the way
they can be linked together is huge. Furthermore, there
are not only a large number of theoretically possible
Figure 1 | How drug-like chemical space might be topologies but also a variety of conformations for a
structured. M. C. Eschers Development II The M. C. single topology. This renders a simple enumeration
Escher Company (Baarn, Holland, 2004). All rights reserved.
of all solutions an exhaustive search impossible.
All algorithmic decisions of a de novo design program
are obviously assessed by the quality of their outcome,
pharmacological behaviour. Each island represents a which, in turn, crucially depends on a meaningful
distinct structural class of molecules that are consid- reduction of the search space.
ered to be isofunctional with regard to their primary
target. It is impossible to directly hop from island Primary target constraints
to island. Instead, one has to take a route through What kind of input is necessary with regard to a
areas that are populated by less desirable compounds, particular biological target before a de novo design
depicted by the blurred lizard patterns in the Escher run can be started? This question is directly con-
artwork. With increasing distance from an island nected to the quality assessment of candidate com-
the essential activity-defining molecular patterns pounds, because the constraints extracted from the
become less pronounced. In other words, SCAFFOLD input are used in scoring the generated structures.
HOPPING which might be the desired outcome of a All information that is related to the ligandreceptor
de novo design experiment to obtain new lead series interaction forms the PRIMARY TARGET CONSTRAINTS for
with potentially improved properties or to circum- candidate compounds. Such constraints can be
vent intellectual property constraints depends on a gathered both from the three-dimensional receptor
conceptual abstraction from chemical structure. Any structure and from known ligands of the particular
successful design attempt that aims to generate novel target. If the former is consulted, the design strategy
structures that interact with a given target will have is receptor-based; in the latter case, it is ligand-based.
to be grounded on a representation of molecules that Receptor-based design starts with the determination
allows an escape from one activity island to another. of the binding site. As complementarities in molecu-
Before reaching a new activity island, however, mol- lar shape and submolecular physical and chemical
ecules will be synthesized and tested that have only properties are important for specific binding, the
marginal activity; for example, binding constants in binding site is then examined to derive shape con-
SCAFFOLD HOPPING the medium- or even high-micromolar range. Such straints for a ligand, as well as specific non-covalent
The identification of candidates would usually not be followed up or might ligandreceptor interactions in the form of hypo-
isofunctional structures not even be recognized at all in a drug discovery thetical INTERACTION SITES. Interaction sites are typi-
with different backbone
architectures.
project. One reason for this is that wandering through cally subdivided into hydrogen bonds, electrostatic
terra incognita in chemical space can only be guided and hydrophobic interactions. Receptor groups
PRIMARY TARGET by maps that were created from previously existing capable of hydrogen-bonding are of special interest
CONSTRAINTS knowledge, and extrapolation from this knowledge owing to the strongly directional nature of the two
All information that is related
requires that small steps be made one at a time. interaction partners hydrogen-bond acceptor and
to the ligandreceptor
interaction that is, the This review gives an overview of computer-based donor and often form key interaction sites. They
binding affinity of a ligand to molecular de novo design methods on a conceptual level. allow the assignation of ligand atom positions with a
the particular biological target. We focus on the design of small, drug-like molecules. complementary hydrogen-bond type within a small
There are also attempts in the design of peptides68 and region of space and a defined orientation. Key inter-
INTERACTION SITE
other polymeric structures911 that are not considered action sites have a major role in the effort to reduce
A position in space that is not
occupied by the receptor and in
here. By means of several successful deployments the vast number of possible structures because they
which a ligand atom favourably of de novo design in the hit- and lead-finding stages define strong and explicit requirements for successful
interacts with the receptor. of the drug discovery process we demonstrate that receptorligand binding.
Table 1 | Selected de novo design programs with their basic properties in chronological order (continued in Table 2)
Name (year) Building Primary Search strategy Structure sampling
blocks target
constraints
At Fr Rc Li DFS BFS Rnd MC EA Gr Lk Lat MD Sto Scoring function
HSITE/2D X X X Fitting and clipping of planar Steric constraints and hydrogen
Skeletons12,31,95 skeletons bonds
(1989)
3D Skeletons32 X X X X Steric constraints and hydrogen
(1990) bonds
Diamond Lattice33 X X X X Steric constraints and hydrogen
(1990) bonds
BUILDER v128 X X X X X Steric constraints and key
(1992) interaction sites
LEGEND20 (1991) X X X X Force field
LUDI13,14,9698 X X X X X Empirical scoring function
(1992) (SCORE1; revised version
SCORE2 in 1998)
NEWLEAD30 X X X X X Steric constraints
(1993)
SPLICE60 (1993) X X X X Pharmacophore and steric
constraints
GenStar34 (1993) X X X X Steric constraints and ligand
enzyme contact
GroupBuild18 X X X X Force field
(1993)
CONCEPTS39 X X X X Empirical scoring function
(1993)
SPROUT17,57-59 X X X X X X Solvent accesible surface,
(1993) hydrogen bonds, electrostatic
and hydrophobic interactions
MCSS & X X X X Simplified van der Waals potential
HOOK25,27 (1994) of non-polar interactions
GrowMol21 (1994) X X X X X Simple empirical scoring function
61 X X X X Potential energy
MCDNLG (1995)
At, atoms; BFS, breadth-first search; DFS, depth-first-search; EA, evolutionary algorithms; Fr, fragments; Gr, grow; Lat, lattice; Li, ligand; Lk, link; MC, Monte Carlo
sampling with Metropolis criterion; MD, molecular dynamics; QSAR, quantitative structureactivity relationship; Rc, receptor; Rnd, random; Sto, stochastic.
grid-based calculations to extract the most promising compromise has to be made. Two de novo approaches25,26
interaction sites. The performance of grid-based use Multiple Copy Simultaneous Search (MCSS)27 for
methods crucially depends on the resolution of the grid. the generation of primary target constraints.
It is evident that higher resolution leads to more grid MCSS determines energetically favourable posi-
points and therefore to greater computational costs, so a tions and orientations of functional groups in the
Table 2 | Selected de novo design programs with their basic properties in chronological order
Name (year) Building Primary Search strategy Structure sampling
blocks target
constraints
At Fr Rc Li DFS BFS Rnd MC EA Gr Lk Lat MD Sto Scoring function
Chemical X X X X X Combined score of shape, grid-
Genesis22 (1995) based and scalar constraints
DLD26,99 (1995) X X X X Potential-energy function without
electrostatic interactions
PRO_ X X X X X X Empirical scoring function
LIGAND15,44,100-103
(1995)
SMoG41,42,104 X X X Knowledge-based scoring function
(1996)
BUILDER v229 X X X X Steric constraints
(1995)
CONCERTS35 X X X X Force field
(1996)
RASSE23 (1996) X X X X Force field augmented by chemical
rules
PRO_SELECT16,40 X X X X Empirical scoring function
(1997)
SkelGen63,64 X X X X X Geometric, connectivity and
(1997) chemical constraints
Nachbar45,105 X X X X Target-specific QSAR model
(1998) based on topological connectivity
descriptor
Globus49 (1999) X X X X Molecular similarity based on
all-atom-pairs-shortest-path
descriptor
DycoBlock36,37 X X X X Force field and solvent-accessible
(1999) surface
LEA47 (2000) X X X X Target-specific QSAR model based
on three-dimensional descriptors
LigBuilder24 (2000) X X X X X Empirical scoring function
48 X X X X Molecular similarity based on
TOPAS (2000)
topological pharmacophore and
substructure fingerprints
F-DycoBlock38 X X X X Force field and solvent-accessible
(2001) surface
ADAPT67 (2001) X X X X Weighted sum of DOCK score,
clogP, MM, number of rotatable
bonds and hydrogen bonds
Pellegrini & Field46 X X X X X Target-specific QSAR model
(2003)
SYNOPSIS55 X X X X Two examples: electric dipole
(2003) moment and empirically derived
HIV-RT scoring
CoG50 (2004) X X X X X Molecular similarity based on
fingerprint descriptor
BREED62 (2004) X X X *
*Exhaustive recombination. Exhaustive enumeration; no internal scoring function. At, atoms; BFS, breadth-first search; DFS, depth-first-search; EA, evolutionary
algorithms; Fr, fragments; Gr, grow; HIV-RT, human immunodeficiency virus reverse transcriptase; Lat, lattice; Li, ligand; Lk, link; MC, Monte Carlo sampling with
Metropolis criterion; MD, molecular dynamics; MM, molecular mass; QSAR, quantitative structureactivity relationship; Rc, receptor; Rnd, random; Sto, stochastic.
0
years quite a few others followed this concept18,23,3538.
the Pareto frontier (orange line). 5 The first empirical scoring function in the field of
Numbers next to the solutions 1 2
indicate the number of times 0
de novo design was implemented in the program
they are dominated. The MOGA 1 LUDI13,14. Empirical scoring functions are a weighted
algorithm has been successfully 0 sum of individual ligandreceptor interaction types
applied to a number of problems commonly supplemented by penalty terms, such as
in cheminformatics9092 and the number of rotatable ligand bonds. The weights
recently made its debut in the correspond to the average free-energy contribution
field of de novo design50. Property 1 of a single interaction of that type and are obtained by
a regression analysis of a set of receptorligand com-
plexes. Interaction types include, for example, hydro-
binding site. Multiple copies of functional groups are gen bonds, electrostatic interactions and hydrophobic
randomly placed inside the binding pocket. All frag- interactions. The regression analysis requires both
ments are then minimized simultaneously using a force known structures and binding constants, and so the
field such that the forces among individual functional available datasets are limited in size and often feature
groups are not considered. Groups are discarded if the similar ligands and receptors. This can result in a bias
interaction energy between them and the protein is of empirical scoring functions towards specific struc-
above a certain threshold. An MCSS run yields a set of tural motifs. However, they are fast and have proved
pre-docked fragments that can be further investigated their suitability, and are therefore implemented in
to choose the most promising ones. The same outcome several de novo design programs15,16,21,24,39,40.
can be achieved by the use of docking software, which Knowledge-based scoring functions have become
is indeed the initial step in some de novo design pro- popular in the field of docking during the past few
grams2830. Rule- and grid-based methods only derive years, yet to date there is only a single de novo design
the primary target constraints: the outcome of these program, SmoG41,42, that uses its own implementation of
two methods is a map of the receptor that pinpoints this type of scoring function. Knowledge-based scoring
favourable interaction sites of a ligand. MCSS and the is grounded on a statistical analysis of ligandreceptor
pre-docking of fragments effectively amalgamate the complex structures. The frequencies of each possible
derivation procedure and the first steps of structure pair of atoms in contact to each other are determined.
N
H O
Ile56
O Link
N
H O
OH
O
O OH
Ki = 16 M
O
O
Phe46 Asp37
b Place first fragment
Define binding pocket Determine interaction sites N
H O
O Grow
OH
O O
O O
B Lattice strategy
N N
H H
OH
O O
O O
Fill pocket with lattice points Find and connect interaction points Assign molecular framework Build molecule
Figure 2 | Principles of structure-based ligand assembly. A | The link (Aa) and grow (Ab) concepts are displayed for the
example of FK506-binding protein (FKBP12) ligand de novo design. On the basis of an X-ray model of the binding pocket
(PDB-identifier: 1fkf), interaction sites were identified. Selected interaction centres are indicated by blue dots (lipophilic), green
(acceptor) and red (donor) lines. A micromolar inhibitor of FKBP12 was designed using the software package LUDI, representing
one of the first successful prospective applications of de novo design93. The magenta-coloured substructures highlight the
linker in the link approach, and the grown part of the molecule in the grow scenario, respectively. B | The lattice strategy
provides an alternative approach that is grounded on a grid representation of potential positions of ligand atoms within the
binding pocket. Lattice points are indicated by grey dots. Ligand candidates are formed from those points that lie along the
shortest path through the lattice connecting interaction sites (the particular molecule shown is an artificial example).
PHARMACOPHORE
The ensemble of steric and
electronic features that is
Interactions found to occur more frequently than receptors (GPCRs), which are the most successful drug
necessary to ensure the optimal would be randomly expected are considered attractive; targets in terms of therapeutic benefit and potential
supramolecular interactions interactions that occur less frequently are considered sales43. Receptor-based structure generation is inevi-
with a specific biological target repulsive. Only structural information is necessary tably confronted with the problem of conformational
structure and to trigger (or to
block) its biological response106.
to derive these frequencies so that a greater number complexity. A ligand-based strategy, in contrast, can
of structures can be included in the analysis. As the either consider the three-dimensional or the topological
QUANTITATIVE STRUCTURE available structures are also more diverse than those structure of one or more known ligands.
ACTIVITY RELATIONSHIPS with known binding affinities, less bias is expected One way to use the information inherent to the
(QSAR). Mathematical
relationships linking chemical
compared with empirical scoring functions. known actives is the derivation of a three-dimensional
structure and pharmacological ligand PHARMACOPHORE model. Once established, it can
activity in a quantitative Ligand-based scoring be used to obtain a pseudo-receptor model44. This
manner for a series of If a three-dimensional structure of a particular biological facilitates the application of de novo design programs
compounds. Methods that can
be used in QSAR include
target is unavailable but one or more binding molecules that were originally developed with a receptor-based
various regression and pattern- are known, ligand-based design provides an alternative strategy in mind to ligand-based design. Alternatively,
recognition techniques106. strategy. This scenario holds true for G-protein-coupled the three-dimensional ligand pharmacophore model
For example, the building blocks of TOPAS48 are positioned by the de novo program itself or provided
obtained by virtual retro-synthesis of a drug-molecule by another program (pre-docked building blocks). The
collection with a set of 11 common organic reactions positioned building blocks are automatically connected
the so-called RECAP54 reactions. The same set of to each other by so-called linkers to yield a complete
reaction schemes is then used to assemble candidate molecule that satisfies all key interaction sites. Linkers
compounds. PRO_SELECT 16 adds the template- are selected with the objective of forming favourable
substituent idea of combinatorial chemistry. The most interactions with the receptor.
sophisticated approach is undertaken by SYNOPSIS55. The growing procedure1315,18,20,21,23,32,34,41,42,5759 starts
In this program, the building blocks are constituted by with a single building block at one of the key interac-
a database of available molecules. Structure assembly tion sites of the receptor (FIG. 2). This starting point is
is guided by 70 different simulated organic synthesis selected by the program, the user or is given by a pre-
steps so that a synthesis route is proposed for every docked fragment. The structure is then grown from the
generated structure. Furthermore, the acceptance of a initial building block in an attempt to provide suitable
reaction incorporates information about neighbours interactions for both the key interaction sites of the
of the functional group, other functional groups that receptor and regions of the receptor between two key
might prevent the reaction, and reactivity rankings interaction sites. PRO_SELECT16 introduced the idea
if a functional group is present more than once in a of combinatorial chemistry to the linking method.
molecule. An alternative approach is the application Substituents are linked to a pre-docked scaffold with
of external software that attempts to assess the syn- user-defined attachment sites. Both the growing and the
thetic accessibility of a set of candidate compounds linking strategy have their strengths and weaknesses.
for example, CAESA17 or SEEDS56. These programs The growing strategy can run into difficulties if the
automatically analyse generalized synthetic routes and active site contains two or more distinct (sub)pockets
select potential precursors from databases of available separated by a large gap in which the possible interac-
compounds. CAESA additionally provides an estima- tions between small-molecule ligands and the protein
tion of the ease of synthesis for example, recogni- are limited18. When fragments are used in combination
tion of complex ring systems or stereocentres by an with the linking approach, slightly misplaced fragments
expert rule system. or fragments with no strictly defined spatial orienta-
tion can lead to ambiguity during linking. An example
Structure sampling is a phenyl ring, which has no preferred orientation in
The basic building blocks for the assembly of candi- a lipophilic binding pocket.
date structures can be either single atoms or fragments. Another concept is the placement of an atomic
Atom-based approaches are superior to fragment-based lattice in the binding site (FIG. 2). This lattice is made
methods in terms of the structural variety that can up of regularly arranged sp3 carbon atoms (diamond
be generated. But this increase in potential solutions lattice)33, randomly and evenly distributed atoms29, or
makes it harder to find suitable candidate compounds pre-docked fragments28. Lattice atoms in the vicinity
among the ones that are amenable. Fragment-based of different interaction sites are then joined by find-
design strategies, on the other hand, significantly ing the shortest path through the lattice atoms. Atoms
reduce the size of the search space. This reduction can that are part of such a shortest path are connected by
be called a meaningful reduction if fragments are used newly formed bonds. DLD26 and MCDNLG61 start with
that commonly occur in drug molecules. Additionally, a binding site that is filled with a non-physical arrange-
the definition of fragment is variable: a fragment can ment of atoms. The initial atom arrangement is termed
be anything from an atom to a polycyclic ring system, non-physical because atoms are placed or connected
which means that atoms are essentially a subset of in a way that is non-existent in reality. Examples are
fragments. A composition of building blocks that participation of atoms in a number of bonds that far
contains small fragments and larger ones allows for exceed typical chemical bond valences or which com-
structural changes to a different extent depending pletely disregard van der Waals radii. Then, an atom
on the specific requirements. Many of the early de is randomly selected and a randomly chosen transi-
novo design strategies were atom-based. But as the tion is applied to this atom. Such a transition can be a
combinatorial problem that is coupled with atoms translation or rotation operation26,61, a change of atom
as building blocks became more and more evident, type or bond type, or the appearance or disappearance
fragments were generally used. Today, we commonly of an atom26. The guidance of iterated transitions by a
find building-block sets that are mainly composed of potential-energy function finally yields a chemically
fragments with more than one atom and padded with valid molecule.
a few single-atom fragments. A few de novo design methods implement structure
There are several general concepts of structure sam- sampling that is driven by a molecular dynamics simu-
pling: linking, growing, lattice-based sampling, random lation. Covalent connections are formed among the
structure mutation, transitions driven by molecular building blocks in a stochastic and reversible manner
dynamics simulations, and graph-based sampling. to dynamically evolve candidate compounds. Initially,
The linking approach1315,24,25,30,5760 starts with the building blocks are randomly positioned in the binding
placement of building blocks at key interaction sites site. Subsequent free movement of the building blocks
of the receptor (FIG. 2). These building blocks are either is guided by molecular dynamics equilibria. Every
few molecular dynamics steps, one or more building Later approaches3639 exclusively regarded interactions
blocks are randomly chosen, all bonds are cleaved and between the building blocks and the receptor (consen-
empty valences are filled with bonds to nearby building sus molecular dynamics). A further addition to this
blocks. The first such program, CONCERTS35, consid- method of structure sampling was a repeated cycle of
ered interactions between the building blocks and the construction and deconstruction: after a certain period
receptor, as well as between different building blocks. of molecular dynamics simulation, all candidate com-
pounds are dissected into their respective fragments.
Initial state Before each deconstruction, high-scoring structures
HO
O are stored in a list for later inspection38.
Ligand-based de novo design is not provided with
N NH O
interaction sites as primary target constraints. The
OH majority of ligand-based methods operate on the topo-
HN O logical molecular graphs and feature an evolutionary
algorithm for optimization. This choice implicitly paves
O the way for their structure sampling technique: genetic
O operators that are specifically tailored for molecular
O HN O graphs. Whereas Globus49 only implemented a recom-
+
N N bination operator, TOPAS48 solely applies a mutation
O N operator that randomly substitutes whole fragments,
OH
thereby obeying the rules of virtual chemical reaction
O 0.47
schemes. Nachbar45 and Brown50 developed both muta-
HO
O
tion and recombination operators. Candidate com-
pounds are mutated by changing an atom element type
O
or bond order, opening or closing rings and expand-
ing or contracting rings. Chemical Genesis22 applies
O genetic operators to the three-dimensional molecular
Tanimoto index (similarity to the template structure)
O
OH
HO O H O N
O N N N
S H
N
O O H
O H2N S
N
OH
1 2 3
N O
HN F3C
HN
O CF3
O
4 5
Figure 5 | Examples of pharmacologically active substances that were designed de novo using computer algorithms.
was up to two orders of magnitude higher than that a direct consequence of our lack of knowledge about
expected for random screening of historical corporate function-determining features of desired ligand mol-
compound collections. Lengthy assay development ecules in the early stages of the discovery process79.
and costly, logistically complex, HTS efforts were not We expect de novo design to increasingly become a
required in order to generate two novel, selective and complementary strategy to HTS through a number of
patentable hit series whose chemical simplicity can routes: by suggesting new chemical entities that were
serve as a starting point for further refinement and designed taking into consideration several aspects of
lead-optimization activities. lead- and drug-likeness and synthetic accessibility;
How do such automated, entirely computer-gener- by making full use of pre-existing knowledge, such
ated designs compare with rational, non-automated as reference ligands or receptor models; and by facili-
design approaches for example, by manual molec- tating the design of activity-enriched screening sets
ular modelling, docking and piecemeal structure with increased hit rates.
modification? The design of non-peptidic thrombin Automated de novo design has proven its value
inhibitors has been the focus of many de novo for hit and lead-structure identification. Designed
design projects during recent years and can help molecules provide sometimes astonishing ideas for
answer this question. the medicinal chemist and aid in the development
Thrombin is a trypsin-like serine protease with a of novel and patentable leads with desired property
central role in the blood-clotting cascade. A pivotal profiles. To gain further acceptance of computer-
part of current thrombin inhibitors is a functional generated molecules it will be essential to understand
group interacting with Asp189 at the bottom of the current limitations of the design techniques. One
S1 recognition pocket of the enzyme. As thrombin weakness is the inability of de novo design software
cleaves fibrinogen after an arginine residue, guanidin- to sufficiently consider the flexibility of the target
ium-mimicking fragments are usually selected by de protein. Although examples of docking methods exist
novo design software at this position. The best result that consider receptor flexibility, examples in the field
of an automated combinatorial docking and design of de novo design are sparse38,80. This is probably a
approach using LUDI is compound 6 (Ki = 10 nM), result of the combinatorial problem de novo design
which contains a benzamidine in this position (FIG. 6)76. is faced with; receptor flexibility adds just another
In this study, p-amino-benzamidine (Ki = 34 M) was aspect of combinatorics on top of this problem. The
identified as a preferred core fragment for subsequent same holds for flexible alignments of ligand ensembles
combinatorial optimization by LUDI (unsubstituted that provide the basis for a QSAR, pharmacophore
benzamidine alone yields a Ki of 250 M). or pseudo-receptor model81. One cannot expect good
For comparison, a carefully, rationally de novo designs from poor initial alignments, because these
designed molecule is compound 7 (FIG. 7), which was guide the search through chemical space.
optimized in a step-wise manner to display selective Most pharmaceutical leads are part of a limited set
thrombin inhibition77. Also revealing the benzami- of chemotypes8284. Many pharmaceutical companies
dine residue in the S1 pocket, this compound was direct their attention towards this limited number of
developed from a rigid tricyclic core structure that structural classes. The chemical diversity of potential
pre-organizes this chemotype for thrombin bind- leads discovered by HTS is therefore restricted by
ing78. As a result of a fluorine scan of the precursor the diversity of the screening libraries that are used.
molecule, an FHC hydrogen bond was shown to De novo design offers a broader exploration of
significantly enhance the proteinligand interaction chemical space and therefore makes it possible to
(Ki = 6 nM; 67-fold selectivity compared with trypsin identify novel ligand scaffolds, which can be a major
inhibition) (FIG. 7). The increased positive polarization competitive advantage85. Fragment-based screening
of the hydrogen in the ortho position to the fluor sub- strategies, such as NMR and high-throughput X-ray
stituent is also supposed to enhance the edge-to-face crystallography, can be used to help identify new
interaction with Trp215. Such effects and other types chemotypes by suggesting starting orientations of
of ligandreceptor interactions, such as arenearene molecular building blocks for subsequent computer-
and cation interactions, or water-mediated con- assisted linking and growing86,87. Still, the prediction
tacts, are often neglected by de novo design software. of crucial properties of a drug molecule primary
These non-covalent interactions can influence the target constraints (binding behaviour) as well as
binding mode and affinity of a ligand, and therefore secondary constraints such as pharmacokinetic
demonstrate limitations of current automated de novo properties is limited. Molecules that are predicted
design methods. Starting the amalgamation of frag- to be the best from a de novo design run rarely rep-
ments from small seed structures, such as the ben- resent the preferred choice of a medicinal chemist.
zamidine fragment in the thrombin example, might Even though there is fully automated de novo design
even prevent better solutions from being found. software, it remains a crucial human task to pick the
most promising candidates. An important goal of de
Conclusions novo design is to inspire medicinal chemists through
The drug discovery pipelines of many pharmaceu- the chemical motifs that are identified. Ultimately, the
tical companies are fuelled by HTS as one of the aim is to offer support for hit and lead identification
major sources of new hit-to-lead candidates. This is and widen the chemical horizon.
1. Dobson, C. M. Chemical space and biology. Nature 432, 27. Miranker, A. & Karplus, M. Functionality maps of binding 51. Lipinski, C. et al. Experimental and computational
824828 (2004). sites: a multiple copy simultaneous search method. approaches to estimate solubility and permeability in drug
2. Lipinski, C. & Hopkins, A. Navigating chemical space for Proteins 11, 2934 (1991). discovery and development settings. Adv. Drug. Deliv. Rev.
biology and medicine. Nature 432, 855861 (2004). 28. Lewis, R. A. et al. Automated site-directed drug design 23, 325 (1997).
3. Schneider, G. Trends in virtual combinatorial library design. using molecular lattices. J. Mol. Graphics 10, 6678 52. Teague, S. J. et al. The design of leadlike combinatorial
Curr. Med. Chem. 9, 20952101 (2002). (1992). libraries. Angew. Chem. Int. Ed. Engl. 38, 37433747
4. Richardson, J. S. & Richardson, D. C. The de novo design 29. Roe, D. C. & Kuntz, I. D. BUILDER v.2: improving the (1999).
of protein structures. Trends Biochem. Sci. 14, 304309 chemistry of a de novo design strategy. J. Comput. Aided 53. Aronov, A. M. Predictive in silico modeling for hERG
(1989). Mol. Des. 9, 269282 (1995). channel blockers. Drug Discov. Today 10, 149155 (2005).
5. Richardson, J. S. et al. Looking at proteins: 30. Tschinke, V. & Cohen, N. C. The NEWLEAD program: 54. Lewell, X. O., Budd, D. B., Watson, S. P. & Hann, M. M.
representations, folding, packing, and design. Biophys. J. a new method for the design of candidate structures from RECAP Retrosynthetic Combinatorial Analysis
63, 11851209 (1992). pharmacophoric hypothesis. J. Med. Chem. 36, Procedure: a powerful new technique for identifying
6. Moon, J. B. & Howe, W. J. Computer design of bioactive 38633870 (1993). privileged molecular fragments with useful applications in
molecules: a method for receptor-based de novo ligand 31. Lewis, R. A. & Dean, P. M. Automated site-directed drug combinatorial chemistry. J. Chem. Inf. Comput. Sci. 38,
design. Proteins 11, 314328 (1991). design: the formation of molecular templates in primary 511522 (1998).
7. Schneider, G. & Wrede, P. The rational design of amino structure generation. Proc. R. Soc. Lond. B 236, 141162 55. Vinkers, H. M. et al. SYNOPSIS: SYNthesize and OPtimize
acid sequences by artificial neural networks and simulated (1989). System in Silico. J. Med. Chem. 46, 27652773 (2003).
molecular evolution: de novo design of an idealized leader 32. Gillett, V. A., Johnson, A. P., Mata, P. & Sike, S. Automated 56. Honma, T. et al. Structure-based generation of a new class
peptidase cleavage site. Biophys. J. 66, 335344 (1994). structure design in 3D. Tetrahedron Comput. Method. 3, of potent Cdk4 inhibitors: new de novo design strategy and
8. Schneider, G. et al. Peptide design by artificial neural 681696 (1990). library design. J. Med. Chem. 44, 46154627 (2001).
networks and computer-based evolutionary search. Proc. 33. Lewis, R. A. Automated site-directed drug design: 57. Gillett, V., Johnson, P., Mata, P., Sike, S. & Williams, P.
Natl Acad. Sci. USA 95, 1217912184 (1998). approaches to the formation of 3D molecular graphs. SPROUT: a program for structure generation. J. Comput.
9. Venkatasubramanian, V., Chan, K. & Caruthers, J. M. J. Comput. Aided Mol. Des. 4, 205210 (1990). Aided Mol. Des. 7, 127153 (1993).
Computer-aided molecular design using genetic 34. Rotstein, S. H. & Murcko, M. A. GenStar: a method for de 58. Gillet, V. et al. P. SPROUT: recent developments in the de
algorithms. Computers Chem. Eng. 18, 833844 (1994). novo drug design. J. Comput. Aided. Mol. Des. 7, 2343 novo design of molecules. J. Chem. Inf. Comput. Sci. 34,
10. Venkatasubramanian, V., Sundaram, A., Chan, K. & (1993). 207217 (1994).
Caruthers, J. M. in Genetic Algorithms in Molecular 35. Pearlman, D. A. & Murcko, M. A. CONCERTS: dynamic 59. Mata, P. et al. SPROUT: 3D structure generation using
Modelling (ed. Devillers, J.) 271302 (Academic, London, connection of fragments as an approach to de novo ligand templates. J. Chem. Inf. Comput. Sci. 35, 479493 (1995).
1996). design. J. Med. Chem. 39, 16511663 (1996). 60. Ho, C. M. W. & Marshall, G. R. SPLICE: a program to
11. Sundaram, A. & Venkatasubramanian, V. Parametric Introduces the concept of consensus molecular assemble partial query solutions from three-dimensional
sensitivity and search-space characterization studies of dynamics as a method for structure sampling to de database searches into novel ligands. J. Comput. Aided
genetic algorithms for computer-aided polymer design. novo design. Mol. Des. 7, 623647 (1993).
J. Chem. Inf. Comput. Sci. 38, 11771191 (1998). 36. Liu, H., Duan, Z., Luo, Q. & Shi, Y. Structure-based ligand 61. Gelhaar, D. K. et al. De novo design of enzyme inhibitors
12. Danziger, D. J. & Dean, P. M. Automated site-directed drug design by dynamically assembling molecular building by monte carlo ligand generation. J. Med. Chem. 38,
design: a general algorithm for knowledge acquisition blocks at binding site. Proteins 36, 462470 (1999). 466472 (1995).
about hydrogen-bonding regions at protein surfaces. 37. Zhu, J., Yu, H., Fan, H. Liu, H. & Shi, Y. Design of selective 62. Pierce, A. C., Rao, G., & Bemis, G. W. BREED: generating
Proc. R. Soc. Lond. B 236, 101113 (1989). inhibitors of cyclooxygenase-2 dynamic assembly of novel inhibitors through hybridization of known ligands.
First work about interaction site derivation from a molecular building blocks. J. Comput. Aided Mol. Des. 15, application to CDK2, P38, and HIV protease. J. Med.
receptor structure tailored for the use in automated 447463 (2001). Chem. 47, 27682775 (2004).
de novo design. 38. Zhu, J., Fan, H., Liu, H. & Shi, Y. Structure-based ligand 63. Todorov, N. P. & Dean, P. M. Evaluation of a method for
13. Bhm, H.-J. The computer program LUDI: a new simple design for flexible proteins: application of new F-DycoBlock. controlling molecular scaffold diversity in de novo ligand
method for the de-novo design of enzyme inhibitors. J. Comput. Aided Mol. Des. 15, 979996 (2001). design. J. Comput. Aided. Mol. Des. 11, 175192 (1997).
J. Comput. Aided Mol. Des. 6, 6178 (1992). 39. Pearlman, D. A. & Murcko, M. A. CONCEPTS: new 64. Todorov, N. P. & Dean, P. M. A branch-and-bound
14. Bhm, H.-J. LUDI: rule-based automatic design of new dynamic algorithm for de novo design suggestion. method for optimal atom-type assignment in de novo
substituents for enzyme inhibitor leads. J. Comput. Aided J. Comput. Chem. 14, 11841193 (1993). ligand design. J. Comput. Aided. Mol. Des. 12, 335350
Mol. Des. 6, 593606 (1992). 40. Eldridge, M. D., Murray, C. W., Auton, T. R., Paolini, G. V. & (1998).
15. Clark, D. E. et al. PRO LIGAND: an approach to de novo Mee, R. P. Empirical scoring functions: I. The development 65. Darwin, C. On the Origin of Species (Facsimile of the First
molecular design. 1. Application to the design of organic of a fast empirical scoring function to estimate the binding Edition) (Harvard Univ. Press, Cambridge, Massachusetts,
molecules. J. Comput. Aided Mol. Des. 9, 1332 (1995). affinity of ligands in receptor complexes. J. Comput. Aided 1859/1975).
A comprehensive approach that adopts a lot of Mol. Des. 11, 425445 (1997). 66. Weininger, D. SMILES, a chemical language and
earlier ideas and provides new concepts. 41. DeWitte, R. S. & Shakhnovich, E. I. SMoG de novo design information system. 1. Introduction to methodology and
16. Murray, C. W. et al. PRO_SELECT: combining structure- method based on simple, fast, and accurate free energy encoding rules. J. Chem. Inf. Comput. Sci. 28, 3136
based drug design and combinatorial chemistry for rapid estimates. 1. Methodology and supporting evidence. (1988).
lead discovery. 1. Technology. J. Comp. Aided Mol. Des. J. Am. Chem. Soc. 118, 1173311744 (1996). 67. Pegg, S. C.-H., Haresco, J. J. & Kuntz, I. D. A genetic
11, 193207 (1997). 42. Ishchenko, A. V. & Shakhnovich, E. I. SMall Molecule algorithm for structure-based de novo design. J. Comput.
17. Gillett, V. J., Myatt, G., Zsoldos, Z. & Johnson, A. P. Growth 2001 (SMoG2001): an improved knowledge- Aided Mol. Des. 15, 911933 (2001).
SPROUT, HIPPO and CAESA: tools for de novo structure based scoring function for proteinligand interactions. 68. Schneider, G. & Bhm, H.-J. Virtual screening and fast
generation and estimation of synthetic accessibility. J. Med. Chem. 45, 27702780 (2002). automated docking methods. Drug Discov. Today 7,
Perspect. Drug Discov. Des. 3, 3450 (1995). 43. Wise, A., Gearing, K. & Rees, S. Target validation of 6470 (2002).
18. Rotstein, S. H. & Murcko, M. A. GroupBuild: a fragment- G-protein coupled receptors. Drug Discov. Today 7, 69. Hou, T. & Xu, X. Recent development and application of
based method for de novo drug design. J. Med. Chem. 235246 (2002). virtual screening in drug discovery: an overview. Curr.
36, 17001710 (1993). 44. Waszkowycz, B. et al. PRO LIGAND: an approach to de Pharm. Des. 10, 10111033 (2004).
19. Goodford, P. J. A computational procedure for determining novo molecular design. 2. design of novel molecules from 70. Honma, T. Recent advances in de novo design strategy for
energetically favorable binding sites on biologically important molecular field analysis (MFA) models and practical lead identification. Med. Res. Rev. 23, 606632
macromolecules. J. Med. Chem. 28, 849857 (1985). pharmacophores. J. Med. Chem. 37, 39944002 (1994). (2003).
20. Nishibata, Y. & Itai, A. Automatic creation of dug candidate 45. Nachbar, R. B. Molecular evolution: automated 71. Ji, H. et al. Structure-based de novo design, synthesis,
structures based on receptor structure. Starting point for manipulation of hierarchical chemical topology and its and biological evaluation of non-azole inhibitors specific for
artificial lead generation. Tetrahedron 47, 89858990 application to average molecular structures. Genet. lanosterol 14-demethylase of fungi. J. Med. Chem. 46,
(1991). Programming Evolvable Machines 1, 5794 (2000). 474485 (2003).
21. Bohacek, R. S. & McMartin, C. Multiple highly diverse Development of genetic operators for graph-based 72. Perola, E., Walters, W. P. & Charifson, P. S. A detailed
structures complementary to enzyme binding sites: results structure sampling and detailed description of the comparison of current docking and scoring methods on
of extensive application of a de novo design method problems that have to be solved. systems of pharmaceutical relevance. Proteins 56,
incorporating combinatorial growth. J. Am. Chem. Soc. 46. Pellegrini, E. & Field, M. J. Development and testing of a 235249 (2003).
116, 55605571 (1994). de novo drug-design algorithm. J. Comp. Aided Mol. Des. 73. Schuffenhauer, A. et al. Molecular diversity management
22. Glen, R. C. & Payne, A. W. R. A genetic algorithm for the 17, 621641 (2003). strategies for building and enhancement of diverse and
automated generation of molecules within constraints. 47. Douguet, D., Thoreau, E. & Grassy, G. A genetic algorithm focused lead discovery compound screening collections.
J. Comput. Aided. Mol. Des. 9, 181202 (1995). for the automated generation of small organic molecules: Comb. Chem. High Throughput Screen. 7, 771781
23. Luo, Z., Wang, R. & Lai, L. RASSE: a new method for drug design using an evolutionary algorithm. J. Comput. (2004).
structure-based drug design. J. Chem. Inf. Comput. Sci. Aided Mol. Des. 14, 449466 (2000). 74. Honma, T. et al. A novel approach for the development of
36, 11871194 (1996). 48. Schneider, G., Lee, M.-L., Stahl, M. & Schneider, P. De selective Cdk4 inhibitors: library design based on locations
24. Wang, R., Gao, Y. & Lai, L. LigBuilder: a multi-purpose novo design of molecular architectures by evolutionary of Cdk4 specific amino acid residues. J. Med. Chem. 44,
program for structure-based drug design. J. Mol. Model. assembly of drug-derived building blocks. J. Comput. 46284640 (2001).
6, 498516 (2000). Aided Mol. Des. 14, 487494 (2000). 75. Rogers-Evans, M., Alanine, A. I., Bleicher, K. H., Kube, D.
25. Eisen, M. B., Wiley, D. C., Karplus, M. & Hubbard, R. E. 49. Globus, A., Lawton, J. & Wipke, W. T. Automatic Molecular & Schneider, G. Identification of novel cannabinoid
HOOK: a program for finding novel molecular design using evolutionary algorithms. Nanotechnology 10, receptor ligands via evolutionary de novo design and rapid
architectures that satisfy the chemical and steric 290299 (1999). parallel synthesis. QSAR Comb. Sci. 23, 426430 (2004).
requirements of a macromolecule binding site. Proteins 50. Brown, N., McKay, B., Gilardoni, F. & Gasteiger, J. A 76. Bhm, H.-J., Banner, D. W. & Weber, L. Combinatorial
19, 199221 (1994). graph-based genetic algorithm and its application to the docking and combinatorial chemistry: design of potent
26. Miranker, A. & Karplus, M. An automated method for multiobjective evolution of median molecules. J. Chem. Inf. non-peptide thrombin inhibitors. J. Comput. Aided Mol.
dynamic ligand design. Proteins 23, 472490 (1995). Comput. Sci. 44, 10791087 (2004). Des. 13, 5156 (1999).
77. Obst, U., Banner, D. W., Weber, L. & Diederich, F. 89. Fonseca, C. M. & Fleming, P. J. in Genetic Algorithms: 101. Frenkel, D. et al. PRO LIGAND: an approach de novo
Molecular recognition at the thrombin active site: Proceedings of the Fifth International Conference (ed. molecular design. 4. Application to the design of peptides.
structure-based design and synthesis of potent and Forrest, S. 416423 (Morgan Kaufmann: San Mateo, J. Comput. Aided Mol. Des. 9, 213225 (1995).
selective thrombin inhibitors and the X-ray crystal CA, 1993). 102. Clark, D. E. & Murray, C. W. PRO LIGAND: an approach to
structures of two thrombin-inhibitor complexes. Chem. 90. Handschuh, S., Wagener, M. & Gasteiger, J. Superposition de novo molecular design. 5. Tools for the Analysis of
Biol. 4, 287295 (1997). of three-dimensional chemical structures allowing for Generated Structures. J. Chem. Inf. Comput. Sci. 35,
78. Olsen, J. A. et al. A fluorine scan of thrombin inhibitors to conformational flexibility by a hybrid method. J. Chem. Inf. 914923 (1995).
map the fluorophilicity/fluorophobicity of an enzyme active Comput. Sci. 38, 220232 (1998). 103. Murray, C. W., Clark, D. E., Byrne, D. G. PRO LIGAND:
site: evidence for CF...C=O interactions. Angew. Chem. 91. Agrafiotis, D. K. Multiobjective optimisation of combinatorial an approach to de novo molecular design. 6. Flexible
Int. Ed. Eng. 42, 25072511 (2003). libraries. IBM J. Res. DeV. 45, 545566 (2001). fitting in the design of peptides. J. Comput. Aided Mol.
79. Gribbon, P. & Sewing A., High-throughput drug discovery: 92. Wright, T., Gillet, V. J., Green, D. V. S. & Pickett, S. D. Des. 9, 381395 (1995).
what can we expect from HTS? Drug Discov. Today 10, Optimizing the size and configuration of combinatorial 104. Grzybowski, B. A. et al. Combinatorial computational
1722 (2005). libraries. J. Chem. Inf. Comput. Sci. 43, 381390 (2003). method gives new picomolar ligands for a known enzyme.
80. Anderson, A. C. & Wright, D. L. The design and docking of 93. Babine, R. E. et al. Design, synthesis and X-ray Proc. Natl Acad. Sci. USA 99, 12701273 (2002).
virtual compound libraries to structures of drug targets. crystallographic studies of novel FKBB-12 ligands. Bioorg. Design of a picomolar human carbonic anhydrase II
Curr. Comp. Aided Drug Des. 1, 103127 (2005). Med. Chem. Lett. 5, 17191724 (1995). inhibitor, the highest-affinity inhibitor to date, with
An excellent overview of current developments in 94. Schindler, T. et al. Structural mechanism of STI-571 the program SMoG.
molecular docking and scoring and its relation to de inhibition of Abelson tyrosine kinase. Science 289, 105. Nachbar, R. B. Molecular evolution: a hierarchical
novo design. 19381942 (2000). representation for chemical topology and its automated
81. Doweyko, A. M. 3D-QSAR illusions. J. Comp. Aided Mol. 95. Lewis, R. A. & Dean, P. M. Automated site-directed drug manipulation. Proc. 3rd Ann. Genetic Programming
Des. 18, 587596 (2004). design: the concept of spacer skeletons for primary structure Conf. 246253 (Univ. of Wisconsin, Madison, Wisconsin
82. Bemis, G. W. & Murcko, M. A. The properties of known generation. Proc. R. Soc. Lond. B 236, 125140 (1989). 1998).
drugs. 1. Molecular frameworks. J. Med. Chem. 39, Pioneering theoretical outline to tackle the problem 106. Wermuth, C. G., Gannelin, C. R., Lindberg, P. and
28872893 (1996). of automated drug design from first principles. Mitscher, L. A. Glossary of terms used in medicinal
83. Mller, G. in Chemogenomics in Drug Discovery (eds 96. Bhm, H.-J. A novel computational tool for automated chemistry. Pure Appl. Chem. 70, 11291143 (1998).
Kubinyi, H. & Mller, G.) 741 (Wiley-VCH, Weinheim, 2004). structure-based drug design. J. Mol. Recognit. 6, 131
84. Jenkins, J. L., Glick, M. & Davies, J. W. A 3D similarity 137 (1993). Acknowledgements
method for scaffold hopping from known drugs or natural Concise overview of the early developments of the H. Kubinyi is thanked for helpful discussion and kind support. This
ligands to new chemotypes. J. Med. Chem. 47, program LUDI. work was supported by the Beilstein-Institut zur Frderung der
61446159 (2004). 97. Bhm, H.-J. The development of a simple empirical Chemischen Wissenschaften, Frankfurt am Main. U.F. is thankful
85. Bailey, D. & Brown, D. High-throughput chemistry and scoring function to estimate the binding constant for a for a fellowship granted by Aventis Pharma Deutschland GmbH,
structure-based design: survival of the smartest. Drug protein-ligand complex of known three-dimensional a company of the Sanofi-Aventis group.
Discov. Today 6, 5759 (2001). structure. J. Comput. Aided Mol. Des. 8, 243256 (1994).
86. Verdonk, M. L. & Hartshorn, M. J. Structure-guided 98. Bhm, H.-J. Prediction of binding constants of protein Competing interests statement
fragment screening for lead discovery. Curr. Opin. Drug ligands: a fast method for the prioritization of hits obtained The authors declare no competing financial interests.
Discov. Devel. 7, 404410 (2004). from de novo design or 3D database search programs.
87. Villar, H. O., Yan, J. & Hansen, M. R. Using NMR for ligand J. Comput. Aided Mol. Des. 12, 309323 (1998).
discovery and optimization. Curr. Opin. Chem. Biol. 8, 99. Stultz, C. M. & Karplus, M. Dynamic ligand design and Online links
387391 (2004). combinatorial optimization: designing inhibitors to
88. Gillet, V. J., Khatib, W., Willett, P., Fleming, P. J. &. Green, endothiapepsin. Proteins 40, 258289 (2000). FURTHER INFORMATION
D. V. S. Combinatorial library design using a multiobjective 100. Westhead, D. R. et al. PRO LIGAND: an approach to de Glossary of terms used in medicinal chemistry:
genetic algorithm. J. Chem. Inf. Comput. Sci. 42, novo molecular design. 3. A genetic algorithm for structure http://www.chem.qmul.ac.uk/iupac/medchem/
375385 (2002). refinement. J. Comput. Aided Mol. Des. 9, 139148 (1995). Access to this interactive links box is free online.