Vous êtes sur la page 1sur 8

Chem Biol Drug Des 2006; 67: 5–12 ª 2005 The Authors

Journal compilation ª 2005 Blackwell Munksgaard


doi: 10.1111/j.1747-0285.2005.00323.x
Perspective

Structural Interaction Fingerprints: A New


Approach to Organizing, Mining, Analyzing, and
Designing Protein–Small Molecule Complexes

Juswinder Singh*, Zhan Deng, Gaurav that describe the intermolecular interactions including the strengths
Narale and Claudio Chuaqui of hydrogen bonds and hydrophobic interactions [5]. In addition to
2D approaches, a variety of sophisticated computer graphics pro-
Computational Drug Design Group, Department of Research grams such as InsightII (Accelrys Inc., Burlington, MA, USA), VIDA
Informatics, Biogen Idec, 12 Cambridge Center, Cambridge, MA (Open Eye, Cambridge, MA, USA), Sybyl (Tripos Inc., St Louis, MO,
02142, USA USA), and Maestro (Schrodinger Inc., New York, NY, USA) are avail-
*Corresponding author: Juswinder Singh, singhjus@yahoo.com able for analyzing the 3D structure of proteins and their complexes
The combination of advances in structure-based [6]. Although these 2D and 3D approaches are useful when examin-
drug design efforts in the pharmaceutical industry ing small numbers of structures, the analysis of large datasets is
in parallel with structural genomics initiatives in not feasible. In addition to visual analysis of complexes, it is also
the public domain has led to an explosion in the common to use energy-based methods to evaluate how favorable
number of structures of protein–small molecule the binding interactions are between a protein and a small mole-
complexes structures. This information has critical cule complex. These scoring methods are commonly used to rank
importance to both the understanding of the and filter large datasets of virtual screening results [7]. Recent
structural basis for molecular recognition in biolo- studies have demonstrated that docking algorithms and the underly-
gical systems and the design of better drugs. A
ing energy-scoring functions are usually able to recapitulate the
significant challenge exists in managing this vast
amount of data and fully leveraging it. Here, we crystallographically observed binding modes from the set of gener-
review our work to develop a simple, fast way to ated ligand poses. However, these same scoring functions are poor
store, organize, mine, and analyze large numbers at identifying the correct binding mode from incorrect ones, leading
of protein–small molecule complexes. We illustrate to a large number of high-scoring false positives that result in
the utility of the approach to the management of lower enrichment rates for virtual screening [8].
inhibitor complexes from the protein kinase fam-
ily. Finally, we describe our recent efforts in apply- We review our published work to develop a simple and robust
ing this method to the design of target-focused approach for representing and analyzing 3D protein–ligand com-
chemical libraries.
plexes called SIFt [9, 10]. We will show how this method can be
Received 25 October 2005
applied to visualizing, organizing, analyzing, and mining of protein–
inhibitor complexes. To illustrate the utility of the approach, we
apply it to the analysis of the protein kinase family as its family
The past decade has witnessed a dramatic increase in the number of members are attractive targets for drug design and there exist over
protein–small molecule complex structures from experimental as well 93 complexes in the public databanks. Finally, we will describe
as in silico approaches. Over 5000 small molecule complexes have recent work showing how the method can be applied to focusing
been deposited in public databases [1], and it is likely that a much large chemical libraries to drug targets.
greater number have been determined within the pharmaceutical
industry. With significant progress in structural genomics initiatives
[2] and advances of high-throughput crystallography [3] and high- Materials and Methods
throughput NMR technology [4], the total number of structures will
grow at an even greater speed. In parallel to the growth of experi- What is a SIFt fingerprint?
mentally determined structures, a wealth of in silico structural infor- SIFt stands for Structural Interaction Fingerprint, which is a 1D binary fingerprint repre-
sentation of the intermolecular interactions in a 3D protein–inhibitor complex [9, 10].
mation is also being generated from virtual screening efforts. The The fingerprint representation of the interaction patterns is compact, and allows for
ability to fully leverage this experimental and in silico information rapid clustering and analysis of large numbers of complexes.
hinges on the ability to organize, analyze, and mine the structural
data to both derive insights into molecular recognition in biological The SIFt methodology has been described in detail previously [9, 10]. The SIFt is calcu-
systems as well as facilitate the design of novel therapeutics. lated on a set of input 3D protein–small molecule complexes. The protein structure
may have been determined experimentally by NMR or crystallography, or generated
through homology modeling. The small molecule structure may also have been deter-
Several approaches exist for analyzing the interactions of protein– mined experimentally or through in silco molecular modeling studies. The SIFt is gener-
small molecule complexes. For example, LIGPLOT generates two- ated by first defining the union of those residues that are in contact between the
dimensional (2D) schematics of protein–small molecule complexes protein and the small molecule complex. The resulting panel of ligand binding site

5
Singh et al.

A B

Figure 1: The procedure used in the generation of the SIFt fingerprint. (A) 3D binding site of a kinase with a small molecule inhib-
itor bound. (B) Sequence of the positions in the binding site in contact with the small molecule, together with their location in the structure of the kinase
(g-loop and b3 to b4). Each binding site position is then represented by a bitstring, with each bit switched to 'on' depending upon whether it is involved in a
contact, whether the contact is with the main chain, side chain of the protein, and if the interaction is polar, apolar, hydrogen bond acceptor/donor. (C) Concat-
enation of all bitstrings for each binding site residue. This process is repeated for all ligands.

residues, which act as a mask covering all of the interactions occurring between the We have used the Tanimoto coefficient as a measure to compare the similarity
protein and the ligands, is then used as the common reference frame to construct the between two SIFt fingerprints, between two p-SIFts, and also between a SIFt and
interaction fingerprints (Figure 1). a p-SIFt profile [13]. A set of SIFt patterns can be clustered using the
Tanimoto similarity measure by applying a standard hierarchical clustering algorithm
For a group of structures involving the same target protein (e.g., docking results), the [13].
ligand binding site is defined as the list of residues comprising the union of all resi-
dues involved in ligand binding over the entire library of structures. For the protein kin-
ase–ligand complex structures, however, as the target proteins involved are different, a Knowledge-based constraints in focusing
sequence alignment of the protein binding sites is needed which can be based upon
sequence and/or structural information.
chemical libraries
We have developed a new strategy for designing and filtering target-specific chemical
libraries based on r-SIFt [14]. r-SIFt is a variation of SIFt that was designed specifically
After all the ligand binding site residues are identified and all the protein–ligand inter-
for analyzing compounds of a combinatorial library, and where the 'r' in r-SIFt stands
molecular interactions are calculated, the next step is to classify these interactions.
for the various R-groups of a combinatorial library. The main difference between the
Our current implementation of SIFt uses seven bits for each binding site residue, repre-
original SIFt and r-SIFt lies in the meanings of the interaction bits that comprise the
senting seven different types of interactions. The seven bits are switched on or off if
entire fingerprints. In r-SIFt, the bits represent whether or not a certain R-group or core
the following interactions are observed: 1) if a contact is involved at this position;
fragment of the compound satisfies a contact interaction (i.e., within a distance thresh-
2) whether it involves the main-chain atoms; 3) whether it involves the side chain;
old) with a particular protein residue.
4) whether it is a polar interaction; 5) whether it is a non-polar interaction; 6) whether
it is a hydrogen bond acceptor; 7) whether it is a hydrogen bond donor. By doing so,
each residue is represented by a seven-bit-long bit string. The whole interaction finger- r-SIFt incorporates the binding interactions of variable fragments in a combinatorial
print of the complex is finally constructed by sequentially concatenating the bit string library. It enables target-specific constraints from structures of protein–small mole-
of each binding site residue together, according to the ascendant residue number order. cule complexes in a binding site to be used to focus a combinatorial library [14].
This results in each interaction fingerprint being the same length, enabling easy com- Our r-SIFt-based library-focusing strategy consists of the following major computa-
parison of interactions at a particular binding site position across a series of com- tional steps, as illustrated in Figure 3: 1) library enumeration and docking wherein a
plexes. combinatorial library is enumerated, and a small subset with maximized chemical
diversity is selected and carefully docked onto the target molecule; 2) calculation of
2D molecular descriptors of all the variable R-groups used in constructing the large
combinatorial library; 3) construction of r-SIFt patterns of the docking posed from
Building profiles of protein–inhibitor complexes step 1; 4) clustering and classification of r-SIFt patterns where compounds that are
We have developed a profile-based approach termed p-SIFt [9] that enables us to able to bind to the target molecule with the desired binding mode (dockable com-
describe the conservation of interactions between a set of protein–ligand receptor pounds) are differentiated from non-dockable compounds; 5) generation of predictive
complexes. The p-SIFt approach is based upon profiling methods that have been models. Based on the classification from step 4 and using the 2D descriptors as
developed previously to analyze and mine protein sequences and structures for dis- predictive variables, a decision tree model is built to predict the dockability of the
tantly related family members [11, 12]. A sequence profile is essentially a position- compounds. This model can be then used to filter the original large combinatorial
specific scoring matrix encoding the probability of finding any of the 20 amino acid library.
residues at that position in the target sequence. In the case of p-SIFt, the SIFts
derived from a set of probe structures are used to derive a position-dependent pro- To evaluate this strategy, we enumerated several different combinatorial libraries
file encoding the probability that a given interaction at that position is present using a pyridinyl imidazole inhibitor of p38 (PDB code: 1ouk) as the template scaf-
(Figure 2). The probe set of structures may correspond to gene families, e.g., kinas- fold, varying R1, R2, and R3 independently (one at a time) and also simultaneously.
es, or to subfamilies of structures representing ligands with a particular activity or Based on the co-crystal structures of 1ouk and other similar inhibitors considered
selectivity profile.

6 Chem Biol Drug Des 2006; 67: 5–12


Structural Interaction Fingerprints

B Site
1
0.8
0.6
0.4
0.2
0
0 50 100 150 200 250 300 350 400
Bit_idx

Figure 2: The procedure used to generate a p-SIFt from a set of SIFts is illustrated in panel (A). The profile shown
corresponds to the p-SIFt generated from the 93 kinase structures using all seven bits to compute the SIFts is shown in panel (B). The p-SIFt is annotated with a top-
most bar delineating the general kinase structural features for that portion of the fingerprint; the bar below consists of alternating blocks corresponding to each resi-
due (site in the uniform PKA numbering scheme) in the kinase used to construct the fingerprint; finally, the third bar consists of blocks for each bit representing the
interaction features at that site. Reprinted in part with permission from J Med Chem 48 (1), 121–133, 2005. Copyright (2004) American Chemical Society.

A 2. Calculate
Virtual library Molecular
descriptors
1. Dock
5. Build Decision tree
Docking poses predictive
models
3. Generate r-SIFts
Native versus
r-SIFt patterns non-native
4. Cluster clusters

B C

Figure 3: (A) An overview of the major computational steps in r-SIFt. (B) The Markush definition of the imidazole combinatorial lib-
rary based upon the p38 pyrimidyl inhibitor in the pdb complex 1ouk. (C) A classification model based upon the p38 pyrimdyl inhibitor library.

Chem Biol Drug Des 2006; 67: 5–12 7


Singh et al.

to be in the 'native binding mode,' the R1 groups are expected to interact with Clustering of kinase inhibitors based upon
the hydrophobic pocket of p38. The R2 portion of 1ouk-inh, on the other hand, is interaction fingerprints
positioned in the vicinity of the adenine-binding site in the hinge region, whereas
the R3 group interacts with the phosphate-binding region (P-loop).
A dendrogram derived from comparison of interaction fingerprints
of the 93 protein–inhibitor complexes is shown in Figure 4. The
These enumerated libraries were subjected to the aforementioned r-SIFt library objective was to cluster together kinase–inhibitor complexes show-
focusing analysis. In each case, a decision tree predictive model for each respect- ing similar interaction patterns. The approach involves computing
ive R-group was generated. Its performance was evaluated using 10-fold cross-val- the similarity metric (Tanimoto coefficient) between all of the fin-
idated predictive accuracy on the test set, which was set aside
gerprints and then using a hierarchical clustering algorithm to
during model building. In addition, the reagent enrichment factor (EF) is
measured by the increased concentration of dockable compounds in the selection group similar interaction fingerprints together. The dendrogram
pool. revealed four major clusters (Figure 4) consisting of ATP analogs,
p38 inhibitors, CDK2 inhibitors, as well as those kinase inhibitors
(concentration of dockable compounds in final selection pool)
EF ¼
(concentration of dockable compounds in original library)
that bind to the ATP site but recognize a distinct conformational
state in the activation segment of the kinase termed DFG-out. A
Details of the analysis procedure are reported elsewhere [14]. significant number of protein kinase inhibitors stabilize this con-
formational state of the enzyme including GleevecTM [15]. In addi-
tion to these four major clusters, about one-third of the structures
Protein kinase-inhibitor dataset are either singletons or form tiny clusters. Interestingly, the major
The protein kinase family exemplifies the challenges faced with the large amount of clusters represent different grouping examples of protein–ligand
structural data being generated not only on specific drug targets, but also at the gene complexes: the first one is made up of the same protein and
family level. Here we use 93 small molecule complexes that have been deposited in
the public domain, which comprise examples from 34 different kinase family members,
chemically similar compounds (e.g., p38 in complex with trisubsti-
14 different protein kinase subfamilies, and 54 unique kinase small molecule ligands/ tuted imidazoles); the second group contains the same protein but
inhibitors (Deng et al., unpublished results). with a variety of ligands (e.g., CDK2 in complex with different
CDK2 chemotypes); the third cluster contains different proteins in
complex with chemically similar ligands (e.g., different protein
Results kinases bound to ATP and ATP analogs); and the fourth are those
inhibitors that recognize the inactive conformational state (e.g.,
Conserved binding interactions across kinase c-Abl in complex with STI-571).
inhibitors
The majority of known kinase inhibitors bind to the ATP site, and
few inhibitors have been described that target the substrate binding Profile analysis of kinase complexes and its
site. We have used SIFt to examine conserved interactions across application to selective enrichment of
protein kinase–inhibitor complexes [9, 10]. These conserved interac- inhibitors
tions can be used to understand critical determinants for inhibitor The p38, CDK2, and ATP SIFt clusters represent a set of structures
binding and selectivity. They can also be used to screen for novel having similar conserved and variable interactions. In order to com-
templates. pare within and between these clusters we have developed a pro-
file-based methodology, p-SIFt [9]. The p-SIFt approach is analogous
We determined the degree of conservation of interactions in 93 to profile-based techniques that have proven to be very useful in
protein–kinase inhibitor complexes by exploring the frequency of the analysis and database mining of groups of protein sequences
contacts at each of the 56 positions that are involved in binding. and structures [11, 12]. The use of profiles provides a sensitive
These residues include (using the numbering system from the means to compare and contrast multiple inhibitors binding to a drug
cAMP-dependent protein kinase structure; pdbcode 1ATP): the gly- target. A structural interaction fingerprint profile (p-SIFt) represents
cine-rich loop, which is a conserved signature of the family and the degree to which interactions are conserved across a set of lig-
plays a role in binding of ATP (47–57); the hinge region, which is and–receptor complexes. The p-SIFt is derived from an array of SIFt
located between the N- and C-terminal domains and plays a role patterns, and its derivation from a set of SIFts is shown in Figure 2.
in hydrogen-bonding the adenine moiety of ATP (123–125, 127, Because the interaction fingerprint represents the binding mode of
130); b3 (70, 72); the hydrophobic pocket region (95, 104, 105, a ligand to a target protein, similar fingerprints imply that the cor-
118, 119); the 'gatekeeper' residue whose size determines inhibitor responding ligands make similar interactions with the protein.
access (120); and the activation segment, which is targeted by
several inhibitors that stabilize the inactive conformation of the A useful means to compare p-SIFts is to plot a difference profile
kinase. computed by the direct subtraction of one p-SIFt from another. This
enables a simple and fast way to identify similarities and differ-
We found that 20% of the contact interactions are conserved (‡0.7) ences between protein sets of complexes. The difference profile
over the 93 structures as a whole, 11% are intermediate between p38 and CDK2 highlights the importance of the gatekeeper
(0.4 £ intermediate < 0.7) in conservation, whereas 69% are vari- residue in controlling access of small molecule inhibitors to the bind-
able (<0.4). The canonical interactions are common to all inhibitors ing site of kinases (Figure 5). The box in Figure 5 highlights interac-
and clearly play a critical role in inhibitor binding to the protein tions that occur in the hydrophobic pocket of p38 but not in CDK2.
kinase family. These interactions could be used as a basic kinase- This is due to the small Thr 'gatekeeper' in p38 rendering the resi-
like binding filter in virtual screening for novel kinase inhibitors. dues making up the hydrophobic pocket accessible to small molecule

8 Chem Biol Drug Des 2006; 67: 5–12


Structural Interaction Fingerprints

A B

Figure 4: (A) Hierarchical clustering of SIFts from 93 protein kinase small molecule crystal structures. On the
right are the dendrogram and the corresponding distance matrix. SIFts are reorganized according to the order given by the dendro-
gram. Six different regions are labeled above the SIFt heat map. Three major clusters (1–3) are labeled on the left-hand side of the heat map and also a cluster
corresponding to the DFG-out conformation of the kinases. (B) Comparison of the binding modes of the three different kinase clusters.

catalytic Mg
A G-loop β3 to β4 β5 and hinge loop loop

-
1
p38 – CDK2

Figure 5: (A) The contact-


only difference profiles –1
47 56 74 94 106 120 130157 166 185
between p38 and CDK2. The
difference plots range from )1 to 1, PKA residue number
where a value of 0 indicates that the
interaction is conserved to the same B p38 CDK2
degree in the two sets of structures,
whereas a value of )1 or 1 denotes
that a conserved interaction in one set
of structures is not conserved in the
other. The blocks indicate the residues
involved in the hydrophobic pocket of
p38. (B) Binding sites for p38 and CDK2
with box highlighting the hydrophobic
pocket and a cpk representation of gat-
ekeeper residue.

inhibitors (see box in Figure 5B), whereas bulky residues at position The difference profiles exhibit clear regions where inhibitors bind
120, such as the Phe in CDK2, restrict access to the hydrophobic to protein kinases in unique ways. These observations suggest that
pocket, limiting the contacts available to a putative inhibitor. p-SIFts can be used to model the selectivity of inhibitors based on

Chem Biol Drug Des 2006; 67: 5–12 9


Singh et al.

the types of interactions they are able to satisfy when binding to ideal enrichment curve over the first 2% of the database, meaning
the kinase. In order to validate the use of p-SIFts as selectivity fil- that 14 of the 16 known p38 actives were in the top 20 ranked lig-
ters, we carried out a self-recognition experiment using the set of ands. Upon examination of the docking poses it was discovered
93 X-ray structures as a test data set. The p-SIFTs derived from that for the other two inhibitors correct poses were never gener-
50% of the complexes making up the ATPg, p38, and CDK2 clusters ated in the initial pose pool.
were all successful at recognition of their own members relative to
the counter targets [9]. This supports the application of profiles to Both SIFt and p-SIFt have been demonstrated to be effective know-
filtering docking results selectively against specific drug targets. ledge-based pose filters. We have shown that when applied in con-
junction with energy-based scoring functions it is possible to derive
hybrid-scoring schemes that can yield significantly improved data-
SIFt-based screening for p38 inhibitors base enrichment rates. The SIFt component of the hybrid scheme
Our SIFt analysis has identified clear interaction preferences for complements the physics- or rules-based functions by eliminating
ATP, p38, and CDK2 clusters as well as a canonical set of con- false poses and thus effectively compensating for the weakest fea-
served interactions common to all ligands bound to kinases at the ture of the scoring functions. In fact we were able to show that
ATP binding site. In this section, we will demonstrate how p-SIFt once incorrect binding modes are removed, all of the scoring func-
can be applied in a VS workflow that can be tailored to a specific tions tested performed equally well in database enrichment against
target without having to rely solely on the ambiguities of energy- CDK2 kinase [9].
based scoring. To this end, we have tested the performance of
p-SIFt-based scoring in a typical database enrichment application
using p38 as the target [9]. Structure-based focusing of chemical libraries
The past decade has witnessed significant advances in combina-
A database containing 16 known inhibitors of p38 was spiked into torial chemistry. With the discovery and availability of more rea-
a background of 1000 diverse commercially available compounds gents and reaction schemes, as well as the advances of
and docked against the X-ray structures of p38 (PDB code 1a9u). chemical synthesis methods, the number of compounds that
We then analyzed whether p-SIFT provided any advantage over are synthetically feasible is daunting. How to intelligently lever-
popular scoring functions in enrichment of inhibitors by plotting the age the 3D structural information of target molecules and use it
percentage of actives recovered as a function of the percentage of in designing target-focused libraries is of great interest in the
the database screened. field.

For p38, the enrichment obtained by applying p-SIFt scoring provi- Figure 3 shows the general workflow of our proposed strategy for
ded markedly superior results over those obtained using energy designing target-focused chemical libraries, where information on
scoring functions. In Figure 6, the enrichment curves and cumulative the desired binding mode can be directly embedded into the rea-
enrichment factors for p38 are presented, comparing the traditional gent selection process. Key to this approach is to use the r-SIFt
and p-SIFt scoring approaches. p-SIFt scoring performs close to the method to effectively classify compounds based on whether or not
they can interact with the target while satisfying desired binding
patterns, then use machine learning techniques to build filtering
100 rules that can be applied to large libraries.
90
To investigate the performance of an r-SIFt based library filtering
80
strategy, we focused a large combinatorial library down to an
70
% actives recovered

optimal set of commercially available reagents for each variable


60 R-group. The goal of this example was to build filtering rules for
50 each of the variable R-group from small subsets of representative
40
p-SIFt scoring reagents. The rules can be then used to filter very large original
Traditional chemscore
Traditional PMF reagent libraries to identify a list of suitable reagents that would
30
Random be able to constitute dockable compounds (i.e., compounds that
20 will be predicted to bind in the desired binding mode to the tar-
10 get by docking programs), thus eliminating the need to screen the
0 whole combinatorial library, which could be extremely time consu-
0 2 4 6 8 10 12 14 16 ming.
% Library screened

The combinatorial chemical library was based upon an imidazole


Figure 6: Enrichment curves obtained for VS against inhibitor of p38 (pdbcode 1ouk; Figure 3B), which we defined as an
p38. The enrichment curves were derived using the scoring methods des-
imidazole core attached to R1, R2, and R3. We first built decision
cribed in Ref. [9]. The ChemScore and PMF curves were obtained using the
traditional scoring scheme, using the ChemScore and PMF scoring functions, tree predictive models for each of the variable group (R1, R2, and
respectively, for both final pose selection and ligand ranking. Other scoring R3) from libraries where a single R-group was varied independently,
functions performed similarly under the traditional scoring scheme. The dis- while keeping the other R groups fixed. Our results showed that
tribution of hits expected by chance is shown in black. these independent models had 60–80% accuracy at predicting

10 Chem Biol Drug Des 2006; 67: 5–12


Structural Interaction Fingerprints

•Structures of many
drug targets are
Lead known
discovery
Xray Complex

Knowledge-
Confirm SIFT
Figure 7: Integration of SIFt hits based lead profile
into the structure-based discovery
drug design workflow. The
experimental structures of a drug target
in complex with small molecules are
used to generate SIFt and p-SIFt. This Virtual
•Incorporate structural •Clear preferences in
is used to filter a virtual chemical lib-
information of protein- screening interaction patterns of
rary, which is used to identify com-
SM complexes into VS scaffolds inhibitors
pounds for testing. These are confirmed paradigm
as hits and their structures are deter-
mined, thus leading to further cycles of
structure-based drug design.

respective R-groups that would dock with desired binding mode, With r-SIFt we have proven that we are able to explore very
and 70–80% accuracy for R-groups that would not dock. Interest- large chemical libraries and achieve high enrichment rates for
ingly, these predictive models generated from the 1ouk-inh template those compounds likely to bind without the time-consuming effort
were found transferable to other templates, as long as the of docking the whole library. r-SIFt should provide a powerful
R-groups were targeted at the same binding locus. To further test compliment to combinatorial chemistry to identify which areas of
the validity of treating R1, R2, and R3 as independent, we carried chemical space are most productive in terms of binding to a
out the r-SIFt procedure on a library where both R1 and R2 were receptor. We also envision that p-SIFt will enable virtual chemical
varied simultaneously. The accuracies of the R1 and R2 models libraries to be computationally tested against panels of drug tar-
derived from this coupled library were found to be comparable to get family members and enable identification of selective or pan
those obtained from the independent R-group libraries. Finally, to inhibitors (Figure 7). The combination of these tools coupled with
evaluate the accuracy of our approach on the full combinatorial lib- high-throughput experimental approaches should allow us to fully
rary, a combinatorial library based on 1ouk p38 small molecule tem- leverage the potential of the vast amount of structural data being
plate varying R1, R2, and R3 simultaneously was enumerated and a generated on drug targets.
test subset focused using the independent R-group predictive mod-
els. Using r-SIFt focusing, we were able to select a small pool of
compounds in which the concentration of dockable compounds was
24-fold higher than that in the original library, thus the reagents
Acknowledgments
selected were enriched by 24-fold for compounds able to adopt the
We would like to thank Rainer Fuchs, Donovan Chin, Alexey Lugovskoy, Prashant Singh,
desired binding mode. Jennifer Campbell, and Herman van Vlijmen for useful discussions during the course of
this work.

Conclusion and Future Directions


References
We have described a novel approach called SIFt, which takes 3D
interaction patterns from protein complexes and translates them 1. Yamaguchi A., Iida K., Matsui N., Tomoda S., Yura K., Go M. (2004) Het-PDB Navi:
into 1D fingerprints that are easy to store, mine, and analyze. In a database for protein–small molecule interactions. J Biochem (Tokyo);135:79–84.
addition to our work on SIFt, several other groups have recently 2. Todd A.E., Marsden R.L., Thornton J.M., Orengo C.A. (2005) Progress of structural
reported using new approaches to handling protein–ligand com- genomics initiatives: an analysis of solved target structures. J Mol Biol;348:1235–
1260.
plexes [16, 17]. These approaches should provide a valuable set of
3. Sharff A., Jhoti H. (2003) High-throughput crystallography to enhance drug discov-
tools for structural biologists and computational chemists to man-
ery. Curr Opin Chem Biol;3:340–345.
age the large amounts of structural data on protein–small molecule 4. Hajduk P.J., Gerfin T., Boehlen J.M., Haberli M., Marek D., Fesik S.W. (1999) High-
complexes that are being generated. throughput nuclear magnetic resonance-based screening. J Med Chem;42:2315–
2317.
In the era of whole genome analysis, the ability to understand 5. Wallace A.C., Laskowski R.A., Thornton J.M. (1995) LIGPLOT: a program to gener-
the selectivity determinants against whole protein families is a ate schematic diagrams of protein–ligand interactions. Protein Eng;8:127–134.
significant challenge. Our results with p-SIFt suggest that it is 6. Olson A.J., Pique M.E. (1998) Visualizing the future of molecular graphics. SAR
able to selectively enrich compounds against specific kinases. QSAR Environ Res;8:233–247.

Chem Biol Drug Des 2006; 67: 5–12 11


Singh et al.

7. Good A. (2001) Structure-based virtual screening protocols. Curr Opin Drug Discov 13. Raymond J.W., Blankley C.J., Willett P. (2003) Comparison of chemical clustering
Dev;4:301–307. methods using graph- and fingerprint-based similarity measures. J Mol Graph
8. Warren G.L., Andrews C., Capelli A.-M., Clarke B., LaLonde J., Lambert M.H., Lindv- Model;21:421–433.
all M., Nevins N., Semus S.F., Senger S., Tedesco G., Wall I.D., Woolven J.M., Pe- 14. Deng Z., Chuaqui C., Singh J. Knowledge-based design of target-focused libraries
ishoff C.E., Head M.S. (2005) A critical assessment of docking programs and using protein–ligand interaction constraints. J Med Chem;in press.
scoring functions. J Med Chem. doi: 10.1021/jm050362n 15. Schindler T., Bornmann W., Pellicena P., Miller W.T., Clarkson B., Kuriyan J. (2000)
9. Chuaqui C., Deng Z., Singh J. (2005) Interaction profiles of protein kinase–inhibitor Structural mechanism for STI-571 inhibition of abelson tyrosine kinase. Sci-
complexes and their application to virtual screening. J Med Chem;48:121–133. ence;289:1938–1942.
10. Deng Z., Chuaqui C., Singh J. (2004) Structural interaction fingerprint (SIFt): a novel 16. Kelly M.D., Mancera R.L. (2004) Expanded interaction fingerprint method for analyz-
method for analyzing three-dimensional protein–ligand binding interactions. J Med ing ligand binding modes in docking and structure-based drug design. J Chem Inf
Chem;47:337–344. Comput Sci;44:1942–1951.
11. Bowie J.U., Luthy R., Eisenberg D.A. (1991) A method to identify protein sequences 17. Kroemer R.T., Vulpetti A., McDonald J.J., Rohrer D.C., Trosset J.Y., Giordanetto F.,
that fold into a known three-dimensional structure. Science;253:164–170. Cotesta S., McMartin C., Kihlen M., Stouten P.F. (2004) Assessment of docking
12. Luthy R., Xenarios I., Bucher P. (1994) Improving the sensitivity of the sequence poses: interactions-based accuracy classification (IBAC) versus crystal structure
profile method. Protein Sci;3:139–146. deviations. J Chem Inf Comput Sci;44:871–881.

12 Chem Biol Drug Des 2006; 67: 5–12