Attribution Non-Commercial (BY-NC)

130 vues

Attribution Non-Commercial (BY-NC)

- Improving Prediction Accuracy
- Syllabus-Statistics-in-HR-AIHR-Academy.pdf
- cur_fit2
- Fek 310 Slide Calculate Log p
- completed courses
- Calculated values of the octanol–water partition coefficient and aqueous solubility for aminoazobenzene dyes and related structures
- Soft Computing 1
- Template for Chapter I of the Thesis Proposal
- Goodness of Fit Process for Cox's Regression Model
- SAS STAT®9.2 User’s Guide Introduction to Statistical Modeling with SASSTAT Software (Book Excerpt)
- Application
- Regression Analysis
- transferable skills audit - madihah
- Analisis Desy pretttt
- art_10.1007_BF00165544.pdf
- Project Management Journal
- Wu_et_al-2014-Irrigation_and_Drainage.pdf
- Outline Statistics
- The Effectiveness of a Multicenter Quality.1
- Prenatal_MVM_2-6-15final

Vous êtes sur la page 1sur 4

out drugs

❦

Quantitative structure–activity

analysis works to isolate drug

candidates from the vast well of data.

BY N A N C Y OG I H A R A

a single drug to market, the pharmaceutical industry is striving

for ways to improve the efficiency of the discovery process.

Technology is making drug discovery and development more effi-

cient through the use of computational methods, particularly in

ligand-based design. With today’s high-throughput methods and

increasingly robust algorithms, hundreds of thousands of com-

pounds can be rapidly screened for binding to protein targets faster

and more accurately than ever.

The quantitative structure–activity relationship (QSAR) is a

routine tool in drug discovery that computational scientists use

to analyze large sets of candidate drug molecules. Sophisticated

descriptors have been developed to characterize the three-dimen-

sional geometry and chemistry of small molecules. In rational

drug design, the QSAR can help identify the features of a mole-

cule that control activity, which is critical information for the medic-

inal chemist. The QSAR can also be used to select the best

candidate molecules from large compound libraries, reducing test-

ing time and costs.

Drug discovery

QSAR methods date back to the 1800s, when scientists first cor-

related alcohol toxicity with hydrophobicity (1). Today’s drug

design efforts, however, are laboriously quantitative, incorporating

molecular structure description, combinatorial mathematics,

statistics, computer simulations, and database analysis. In today’s

data-rich environment, QSAR methods enable users to maximize

their use of available data. Because they can be applied quickly

and easily, the methods are useful as a screening tool, identify-

ing drug candidates that are likely to be most effective so that

more costly experimental or computational work can be focused.

They also help scientists understand complex, multicomponent

problems that often defy study by experiment or simulation.

QSARs identify a mathematical relationship between some

property of a molecular system, such as its ability to inhibit a fam-

ily of enzymes, and a series of “descriptors” representing chem-

ical or geometric characteristics. Typical descriptors include

thermodynamic properties (such as enthalpies and entropies),

electronic properties, or functions related to molecular shape (such

as molecular weight, volume, polar surface area, dipole moment,

and number of rotatable bonds). A typical QSAR spreadsheet is

composed of rows representing the molecules or compounds in

the data set and columns representing descriptor values. The prop-

erty in which you are interested is also in a column of the table.

Figure 1 shows an example of a QSAR study table.

The relationship between structure and activity is derived

empirically by analyzing a set of molecules for which values of

the property and descriptors are known. A study may involve many

descriptors, making derivation of QSARs a complex statistical exer-

cise. But this exercise is easily automated, and its benefits are

significant. QSARs identify the key structural and chemical fac-

tors that determine the property of interest. They can then be

applied to predict the factors that are critical to the property that

interests you—assisting in the optimized design of materials,

drugs, and chemicals.

QSAR tools help explain and predict properties based on sta-

tistical correlations. Using these tools, researchers may develop

predictive models based on analysis that identifies correlations

in your data, or they may apply established models to predict prop-

erties. In the latter case, the property of interest is the activity

of a set of drug molecules, shown in Figure 1 in the column labeled

“Activity”. For each molecule, the activity is compiled from

experimental data entered by the user or computed using a sim-

ulation method. Similarly, each cell in the table is filled using known

data from experiments or databases or by computing the value

of the descriptor.

Researchers generate a QSAR by analyzing all of this data

to establish an equation that best describes the relationship

between the property (activity) and the descriptors. Methods

that can be used to establish this correlation include regression

techniques, principal-component analyses (PCAs), and genetic

algorithms. The QSAR tells you which descriptors are most sta-

tistically significant in determining the property, allowing you

to focus your studies on the molecular characteristics that

2003 AMERICAN CHEMICAL SOCIETY

Statistical methods

QSARs were pioneered in the 1960s by Corwin Hansch and col-

leagues at Pomona College (www.pomona.edu) and the University

of Iowa (www.uiowa.edu), who used multiple linear regression to

describe activity as a function of chemical structure (2). However,

limitations with this method included the requirement of large num-

bers of compounds to explore structural combinations.

Data reduction techniques Y1 X1 X2 X3 X4 automation of the QSAR model

Structure Activity Apol Area Dipole Energy

such as PCA helped overcome search by combining a genetic

this requirement of high obser- 1. 3.150 1.06E+04 270.566 7.139 133.003 algorithm with statistical mod-

vation-to-parameter ratio (3). eling tools, rapidly generating a

2. 3.450 9.55E+03 242.417 2.056 100.681

By reducing the number of vari- population of statistically valid

ables that describe biological 3. 4.130 1.17E+04 252.990 1.037 103.760 structure–activity models rather

activity or chemical properties than a single model.

to a fewer number of inde- 4. 3.450 1.17E+04 257.214 2.313 109.687 These algorithms use a “sur-

pendent or thogonal compo- 5. 3.690 8.65E+03 215.372 1.028 90.970

vival of the fittest” strategy to

nents, regression can be determine if a solution makes it

performed on these principal 6. 4.010 1.17E+04 242.563 2.286 93.813 to the next stage. Beginning with

components. The result is that a population of randomly con-

7. 4.280 1.17E+04 251.587 1.558 100.894

redundancies are removed and structed QSAR models, GFA

intercorrelated data is mini- rates them by using an error

mized. Figure 1. Tabling descriptors. An example of a QSAR study table. measure that estimates each

Partial least squares (PLS) model’s relative predictiveness.

goes one step further than PCA by including cross-validation, a Researchers then “evolve” the population by repeatedly selecting

technique of leaving out components to be predicted by the rela- two better-rated models to serve as “parents” and then creating

tionship established by the other compounds. The actual predictive a next-generation or child model by using terms from each of the

ability of the final model is then evaluated by how well it predicts parent models. This new model replaces the worst-rated model

the unprocessed, unbiased data. Although PCA and PLS can pro- in the population, and as evolution proceeds, the population

duce highly predictive QSAR models, their main drawbacks lie becomes enriched with models of higher and higher quality.

in their limited ability to derive interpretable models. Although one can simply select the best-scoring model from the

Stepwise regression methods, such as forward-stepping lin- population, selecting the best models still relies on scientific

ear regression, have come into popular use because they can pro- knowledge and intuition for appropriateness of the features and

duce models with a reasonable level of interpretability and are combinations.

easily applied to original descriptor sets. These methods, how- Overall, GFA greatly simplifies identification of the significant

ever, rely on obtaining sufficient response levels from individual variables in statistical analyses. GFA is ideal when a data set con-

variables in isolation. With extremely large data sets, the signal- tains many more descriptors than samples, when selecting

to-noise ratio of a single variable is not always apparent. among competing correlated descriptors, or when there may be

nonlinear relationships in the data.

Genetic algorithms In these cases, GFA rapidly points to the most information-

Genetic function approximation (GFA) is part of a powerful class rich combinations of features and exposes patterns in the data

of computational techniques set that may otherwise remain

known as genetic algorithms hidden (see box, “Predicting

(4, 5). Incorporated into QSAR drug toxicity”, p 32).

What’s in store for QSARs?

model development, genetic

Several companies provide computational tools that researchers

algorithms help researchers

can use to perform QSAR analysis for drug discovery. These tools

Recursive

find optimum solutions for com-

incorporate statistical and graphical models of biological activity

partitioning

binatorial problems. Genetic Recursive partitioning as imple-

or properties from molecular structures, which in turn are used

algorithms offer a significant mented in developing QSAR

to make activity predictions of untested compounds. These tools

advantage in that, unlike the models for drug discovery pro-

have stemmed from a number of theoretical approaches devel-

methods described in the pre- vides the ability to derive deci-

oped in recent years to better predict the activity of QSAR models.

ceding section, they consider sion-tree-based QSAR models,

Commercial suppliers include

variables in combination with 2 which can be used to qualita-

j Accelrys (www.accelrys.com), whose Cerius environment

one another instead of just in iso- 2 2 2 2 tively predict activities or activ-

includes C .GA, C .QSAR+, C .CSAR, and C .NNet; multi-Y

lation. Genetic algorithms also ity classes in structure–activity

recursive partitioning, genetic algorithms, GFA, nonlinear

maintain the use of original relationship analysis or in

PCA, and PLS.

descriptors without converting focused library design.

j Tripos (www.tripos.com), which markets HQSAR and QSAR

the descriptors into principal Recursive par titioning is

with CoMFA; and molecular field generation, PCA, PLS

components, thereby retaining defined as the division of com-

regression, and hierarchical clustering.

the desired level of interpre- pound sets into groups of higher

j Chemical Computing Group (www.chemcomp.com), with its

tability of the final QSAR model. and lower response as a function

molecular operating environment (MOE), QuaSAR-Binary,

GFA takes genetic algo- of their descriptors. Recursive

and Binary QSAR.

rithms a step further through partitioning has been used for

many years in the credit and arsenal of discover y tools to

insurance world. When some- complement computational

one applies for credit, informa- Predicting drug toxicity methods.

tion “descriptors” about an The initial process of drug development involves screening can- However, with virtual high-

applicant can be immediately didate molecules for optimal therapeutic index. Screening is throughput screening comes

gathered and interpreted, such greatly facilitated by the use of computational models that cir- the challenge of effectively deal-

as gender, income, age, and col- cumvent extensive laboratory studies. The following case study ing with large, noisy, and often

lege education. The decision illustrates the use of genetic function approximation (GFA) to complex data sets. “The indus-

path determined by a person’s predict toxicity. try still struggles to enrich lead

descriptive details and charac- Scientists from a major pharmaceutical company applied por tfolios,” says Omoshile

teristics goes into dictating GFA, with a range of molecular descriptors and nonlinear func- Clement, senior product man-

what his or her percentage tions, to a diverse set of experimental compounds to develop a ager of rational and combina-

rates or premiums will be. This broadly applicable model for predicting cytotoxicity. torial drug design at Accelrys

is also true for recursive parti- The researchers assayed the viability of human dermal (www.accelrys.com). “The chal-

tioning in drug discovery and fibroblasts and determined inhibitory concentration 50% (IC50) lenge remains to improve signal-

QSAR, but in this case, the values (the molar concentration of drug compound required to to-noise ratios while reducing

decision path provides a way of kill 50% of the fibroblast cells). They then used octanol/water the threat of overtraining from

predicting activity for a given partition coefficient (LogP) values from the Pomona College too many variables.” Many

compound. database and calculated molecular hydrophobicity (ClogP). The approaches loom on the hori-

Recursive par titioning is researchers calculated other descriptors using Accelrys’s zon, ready to be adopted, from

especially good for large C2.QSAR+ with energy-minimized structures for each molecule. the incorporation of non-deci-

amounts of data that are difficult They tabulated the IC50 values and descriptors for each mol- sion-tree variables and the use

to sieve into usable divisions ecule and performed linear regression of LogP, a stepwise linear of artificial intelligence in neu-

of classification. The solution is regression of the entire descriptor set, a GFA regression with ral networks (see also Sites and

to partition the data, or divide linear operators, and a GFA regression with nonlinear operators Software, p 23) to the addition

it into bifurcated decision using the genetic algorithms module of the Cerius2 program. of ADME (absorption, dis-

“trees” or categories. In so do- The predictive capability of the nonlinear GFA model was tribution, metabolism, and

ing, you find out what charac- significantly better than that of the linear LogP model. With a excretion) properties as both

teristics are unique about the diverse set of compounds, it is unlikely that a single mechanism variables and descriptors to fil-

compounds and correlate those defines the toxic effect of all compounds, and the LogP model ter and refine QSAR data sets.

qualities with drug activity. was insufficient for highly toxic compounds. The GFA model, Although still in its infancy,

A variation of the recursive however, was capable of fitting these data and can therefore be ADME shows promise of even

partitioning method is multi-Y used as a reasonable predictive model for in vitro cytotoxicity. further improving and enhanc-

recursive partitioning, which 5 ing QSAR methods.

uses neural networks to screen

a library against any number of References

protein targets (multiple Y, 6). 4 (1) Borman, S. Chem. Eng. News

1990, 68, 20–23.

In addition to being able to gen- (2) Hansch, C.; et al. J. Amer. Chem.

Predicted

this method offers improved 3 (3) Sharaf, M. A.; Illman, D. A.;

Kowalski, B. R. In Chemometrics;

sensitivity when analyzing the Wiley: New York, 1986; p 179.

impor tance of variables and (4) Rogers, D.; Hopfinger, A. J. J.

2

higher tolerance for noise and Chem. Inf. Comp. Sci. 1994,

34, 854–866.

outliers, particularly when eval- (5) Rogers, D. In Proc. 7th Intern.

uating large complex data sets. 1 Conf. Genetic Algorithms, East

The advantage of this method 1 2 3 4 5 Lansing, MI, 1997.

Experimental (6) Zupan, J., Gasteiger, J., Eds.

is efficiency, allowing for more Neural Networks. In Chemistry

opportunities to use a single Figure 2. Nonlinear GFA model (green) versus linear LogP model (red). & Drug Design, 2nd ed.; Wiley-

screened data set against, for VCH: Weinheim, 1999.

example, multiple diseases.

Nancy Ogihara is a marketing communications specialist for Accelrys

(www.accelrys.com). Send your comments or questions about this arti-

The bottom line cle to mdd@acs.org or to the Editorial Office address on page 3. o

Successful applications of QSAR technology to drug discovery

research are becoming increasingly commonplace. Computational KEY TERMS: automation, high throughput, informatics,

medicinal chemistry, modeling, screening

scientists and experimentalists are adding QSAR methods to their

- Improving Prediction AccuracyTransféré parEconomiks Panviews
- Syllabus-Statistics-in-HR-AIHR-Academy.pdfTransféré parNitin
- cur_fit2Transféré parIoannis Moutsatsos
- Fek 310 Slide Calculate Log pTransféré parIndra Paqotz
- completed coursesTransféré parapi-303009266
- Calculated values of the octanol–water partition coefficient and aqueous solubility for aminoazobenzene dyes and related structuresTransféré parKaio Max
- Soft Computing 1Transféré parShikha Ghodeshwar
- Template for Chapter I of the Thesis ProposalTransféré paredwardg_7
- SAS STAT®9.2 User’s Guide Introduction to Statistical Modeling with SASSTAT Software (Book Excerpt)Transféré parPuli Sreenivasulu
- Goodness of Fit Process for Cox's Regression ModelTransféré parPaweł Marzec
- ApplicationTransféré parJolly Estocapio
- Regression AnalysisTransféré parAbhishek2009GWU
- transferable skills audit - madihahTransféré parapi-429863945
- Analisis Desy prettttTransféré parAlhamdulillah Hirrabbilngalamin
- art_10.1007_BF00165544.pdfTransféré parchecome
- Project Management JournalTransféré parTuấn Nguyễn
- Wu_et_al-2014-Irrigation_and_Drainage.pdfTransféré parAntonioPiresdeCamargo
- Outline StatisticsTransféré parJhay-r Lanzuela Taluban
- The Effectiveness of a Multicenter Quality.1Transféré parJosé María Aegris
- Prenatal_MVM_2-6-15finalTransféré parStephen Hunter
- assumptions_in_multiple_regression.pdfTransféré parHira Mustafa Shah
- 06-2002-CHO-sTransféré parbadboys123
- moleculesTransféré parmaria
- Statistics Assignment FinalTransféré parrohanfyaz00
- bca cbnm 15 3.pdfTransféré parAnonymous tnBTc9wah
- Modeling Design-Coding Factors That Drive Maintainability of Software SystemsTransféré parDenis Ávila Montini
- 853237752_8824Transféré parYudhaSay'sDucker
- Data Analysis (1) (2)Transféré parAbhishek Dubey
- Lecture Notes on High Dimensional StatisticsTransféré parSau L.
- Lectura ChenTransféré parIndira

- 03012012_Invoice Management Training v6Transféré parSantosh Ravindra Nadagouda
- EDI810SpecDocumentationFAQ.pdfTransféré parMohamed Ali Amri
- EB Exam and Computer Test for MA IIITransféré parnpgovlk
- HTML Editor.docxTransféré parUniq Manju
- Camera 5CsTransféré parManvendr Singh
- DM7400Transféré parDiego Pl
- Kaiser PeopleNet AV Clinician Registration and Time EntryTransféré parAnonymous roR4PnM1
- Del Plano a La Esfera de LibrosTransféré parAna Gabriela Rivera
- Akg Aus Price List Jan 10 RrpTransféré parRadio Parts
- 2Transféré parSraVanKuMarThadakamalla
- A. Permasalahan DesainTransféré parMuhammad Sugari
- Camillo Learning Log[1]Transféré parcami3939
- HW4 SolutionsTransféré parRyan Slabaugh
- Infs1602 NotesTransféré parWeilonYing
- kurylovichdgisc9307d2Transféré parapi-284054990
- AUTOMATIC CORROSION CLASSIFICATION AND.pdfTransféré pareid elsayed
- BEET TextbooksTransféré paragent23701
- iPod Nano 6thgen User GuideTransféré paracme2ajax
- Lecture 1 - Introduction to Multimedia.pptTransféré parWakil Khan
- Graphic Isl 14(Ray Diagram)Transféré parNik Fakhira
- Control-M_Server for Unix Admin guideTransféré parGopi Krishna
- data representationTransféré parnikmay2604
- 199000812-Quizzes.pdfTransféré parAlex Serrano Martinez
- dfm.pdfTransféré parchrist9088
- GPS303 CD User ManualTransféré parchristmas14
- Tischvorlage GB ZR Cross Referenz Optibelt - Wettbewerb Lü 02 06 15 (8).pdfTransféré parAhmad Sumartono
- sysadTransféré parAry Sudarmanto
- ATMS.pdfTransféré parsrs_ce05
- Kuliah_10 - Subnetting RoutingTransféré parHery Nugroho
- Design and Development of Unipolar SpwmTransféré parIrfanullah Khan

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.