Vous êtes sur la page 1sur 16

Full Papers

Design of Discovery Libraries for Solids Based on QSAR Models


D. Farrussenga, C. Klannerb, L. Baumesa, M. Lengliza, C. Mirodatosa, F. Schthb*
a
Institut de Recherches sur la Catalyse – CNRS – 2, Av. A. Einstein, F-69626 Villeurbanne, France
b
MPI fr Kohlenforschung, Kaiser-Wilhelm-Platz 1, D-45470 Mlheim, Germany, E-mail: schueth@lcofo.mpg.de

Keywords: Combinatorial chemistry, High-throughput screening, Material library design, Virtual


screening, Descriptor, Data mining, Heterogeneous catalysis

Received: 17. 09. 2004; Accepted: 27. 10. 2004

Abstract
A method is described which is used to construct a descriptor vector of solid catalysts in
the oxidation of propene. Different methods are described which allow one to construct a
correlation between characteristics of the catalysts and their performance in propene
oxidation. Successful descriptor vectors are generated which predict catalytic performance
substantially better than statistically expected. These descriptor vectors do not contain
explicit information on the elemental composition of the catalysts any more, but only
parameters that are either derived from the elemental composition, such as the enthalpy
of oxide formation, or are related to the synthetic method. The general concept can
probably be extended to the development of descriptors for solids to be used in other
applications as well.

1 Introduction the atomic level, can be encoded as fingerprints that ena-


ble their molecular features or molecular characteristics to
Over only a few years, the Quantitative Structure – Activi- be captured through the so-called “descriptors”. These de-
ty Relationship (QSAR) approach has substantially scriptors form a vector of variables, so that a library of
changed the process of drug discovery, because it enables compounds can be described by a tabular array where a
one to select an adapted and – as opposed to a random se- row represents a molecule and the columns the different
lection – improved subset of experiments among an infin- descriptors. Molecules can be described by a vast number
ite number of candidate molecules by applying virtual of descriptors that are related to structural features or mo-
screening techniques. In addition, the number of com- lecular properties. Commonly, property descriptors are
pounds that can be taken into consideration is increased sub-divided into one-, two-, and three-dimensional (1D,
by orders of magnitude with respect to a conventional 2D, and 3D, respectively) descriptors, indicating the re-
strategy. This helps to increase the chance of discoveries quired type of structural representation of a molecule for
while saving experiments by discarding compounds before its calculation. 1D descriptors include bulk properties and
they have to be synthesized. Depending on whether the physiochemical parameters, for instance, molecular
goal is to design discovery or targeted libraries, different weight. Properties that can be computed from a 2D-struc-
criteria for library design are applied. For a discovery li- ture representation include, for example, defined structur-
brary, samples that constitute the subset should be as di- al fragments and connectivity indices. 3D properties are,
verse as possible and the set should cover the whole search for instance, the solvent-accessible surface area, molecular
space. For a targeted library, on the other hand, one seeks volumes, or spatial pharmacophores. After selection of the
drug candidates that are similar to a lead structure. Thus, relevant descriptors, the diversity of any two molecules
the diversity profiling of drugs/molecules is the key con- can be assessed by computing the distance between them,
cept in the design of libraries of molecules. It relies on the based on the descriptors.
“similar property principle”, that is, the assumption that The combinatorial approach has recently been extended
structurally similar molecules should have similar biologi- from drug discovery to materials science and catalysis [1 –
cal activities. The similarity/diversity of two molecules can 6]. Unfortunately, the QSAR approach cannot be equally
be assessed by measuring a “distance” between them ac- well transferred to the discovery of materials, because (i)
cording to different methods, such as, for instance, Euclidi- solids can hardly be characterized at the atomic scale,
an metrics or Tanimoto indices. It is clear that the distance which is a serious obstacle for fingerprint encoding, as
calculation depends strongly on the coordinates of the two mentioned above for molecules, and (ii) the similar prop-
molecules, which in turn rely on the way they have been erty principle is not generally applicable to solids in mate-
encoded. Molecules and drugs, which can be described at rials science [7]. Hence, truly QSAR and virtual screening

78 QSAR Comb. Sci. 2005, 24 DOI: 10.1002/qsar.200420066  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Design of Discovery Libraries for Solids Based on QSAR Models

methods based on descriptors have not been applied to sify the catalytic behavior experimentally is depicted on
catalysts so far, even if data-mining methods have been re- the right side. The whole process consists of the following
cently used [8 – 13]. Therefore, there is no method today stages: (1) collection and synthesis of a diverse library of
for the library design of solids that can guarantee that the solids, in which the selection is based upon experience and
properties of the solids will be diverse. It is the high com- chemical intuition; (2) description of these solids by nu-
plexity of solids, as compared to molecules or drugs, which merous attributes; (3) selection of a subset of relevant and
makes the design of libraries a serious challenge [14]. One uncorrelated attributes to create a first possible descriptor
of the main hurdles is the description of a solid, especially, vector; (4) high-throughput (HT) testing of the whole li-
if the solid has not been synthesized in reality and, there- brary of solids in a catalytic reaction and classification of
fore, no characterization results are available. the catalysts in distinct classes of performance, resulting in
The motivation of this work is to demonstrate for the clusters containing solids which exhibit similar catalytic
first time an implementation strategy for the building of a properties; and (5) computing QPAR models between de-
QSAR analogue model for solids. Since the solids are not scriptor vectors and catalytic performances, where in this
represented as structures, in the following, the term QPAR process the descriptor vectors are further modified. De-
(Quantitative Property – Activity Relationship) will be tails of each step are reported in the following sections.
used. The methodology to generate and select relevant de-
scriptors is described. Different QPAR models are com-
2.2 Sample Collection and HT Synthesis
pared and discussed with respect to previous knowledge
and the available literature. A short account on some as- For the first stage, we collected approximately 500 differ-
pects of this work has been given recently [15]. ent solids with the aim to cover a wide chemical space.
Hence, samples had to be as diverse as possible with re-
spect to the elements, the material types, and the synthesis
procedures, which also means that compounds were in-
2 Materials and Methods
cluded in the study for which it was clear that they were es-
sentially inactive in the target reaction, such as silica. This
2.1 Workflow Overview
initial diverse library was created based on a priori knowl-
Figure 1 gives a general description of the workflow used edge and intuition. A first selection of 367 catalysts was
in this study, which, however, is generic and should be ap- carried out among solids already available in our laborato-
plicable to other problems as well. The data-processing ries. Then, 100 additional solids were synthesized by means
steps to generate the catalyst descriptors are shown on the of HT equipment in order to expand the chemical space
left side in Figure 1, while the workflow to assess and clas- with respect to the element and support representation.

Figure 1. General workflow for the development of descriptors for solid catalysts; for explanation, see text.

QSAR Comb. Sci. 2005, 24  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 79
Full Papers D. Farrusseng et al.

Thus, we ensure that most of the elements of the periodic with 3000 attributes. We assumed that a set of around 75
table (excluding exotics) were represented, and that the attributes would be a maximum with respect to the num-
occurrence of each element was well distributed over the ber of catalysts (467), otherwise the risk of self learning
library. As it is far beyond the scope of this work, the prep- would be too high. The task thus consists of selecting dis-
aration method of each solid catalyst will not be discussed criminative attributes that are not strongly correlated be-
in detail. However, synthetic methods include impregna- tween each others. A ranking of the 3119 attributes (X2
tion, ion exchange, precipitation, co-precipitation, deposi- and X3 attribute sets), based on their discriminative pow-
tion – precipitation, the activated-carbon route [16], sol-gel er, was carried out using the Relief algorithm [18]. Then,
synthesis, and others. 56 attributes were selected from the 3100 X2 attributes
with respect to their ranking and in order to map different
properties (Table 1). This ensemble of 56 X2 attributes
2.3 Computation of Catalyst Attributes
plus the 19 X3 attributes form a first descriptor vector
A Microsoft Access database was implemented to record which is used as input for the QPAR model (and which
the description of the catalyst synthesis and to perform in was further reduced during the model building). Note that
an automatic manner the calculation of thousands of at- this descriptor vector does not contain information on the
tributes (also called meta data) for each catalyst. A variety elemental composition any more. This only enters the vec-
of different information is recorded in the database, name- tor via the computed values that are correlated to the ele-
ly, (1) synthesis parameters of the catalysts, (2) elemental ments present in the catalyst.
composition of the catalysts, (3) properties of constituent
elements, (4) properties of constituent-element oxides, and
2.5 HT Testing and Clustering of Catalytic Performance
(5) properties of constituent-element ions (Figure 2). The
entries concerning the properties of elements, element ox- The library of solids has been tested in the gas-phase oxi-
ides, and element ions were collected from the Handbook dation reaction of propene with oxygen. This reaction of-
of Chemistry and Physics [17] and other sources of physi- fers the advantage to provide a wide spectrum of possible
co-chemical data. The workflow that describes the combi- products from C1 to C6 including a number of alkenes and
natorial generation of the attributes is shown in Figure 3. oxygenates. A complete list is given in Table 2. Data analy-
By combining the elemental composition of a catalyst and sis serves to classify the performances of all catalysts into a
the respective properties of the elements, ions, or oxides, small number of distinct classes that are representative of
3100 attributes were computed in a combinatorial manner typical catalytic behavior. An automated HT set-up has
using operators such as the average of a certain property been used to evaluate the performance of each catalyst. It
for the constituents of the catalyst, the maximum, the min- consists of a set of mass-flow controllers for oxygen, pro-
imum, weighted averages, and so on. For instance, from pene, and nitrogen, which feed the reagents via a common
the enthalpies of formation of the element oxides, one can line into a 16-fold plug flow reactor. The principle set-up
compute the average enthalpy of formation, or the spread of this system corresponds to the one described in Ref.
between the highest and the lowest enthalpy of formation. [19]. A gas chromatograph equipped with a capillary col-
Such values should, in some complex way, be related to umn and flame-ionization detector in combination with a
the availability of oxygen at the surface of a solid catalyst, methanizer has been used for analysis. Catalytic tests were
and thus to the performance in an oxidation reaction. The carried out with a gas consisting of 1% propene, 5% oxy-
motivation to generate a vast number of attributes is the gen (slightly over-stoichiometric for full oxidation to water
fact that relevant and discriminative attributes are a priori and carbon dioxide), and nitrogen as balance, at a space
not known, and obviously the relationship between prop- velocity of 225 mL h1 (gcat)1 at five different tempera-
erties of a catalyst and its performance is not simple. tures (200, 250, 300, 400, and 500 8C). Catalyst was used as
Finally, different types of attribute sets were generated: grains of between 250 and 500 mm in size. Propylene con-
X1 accounted for the composition of all catalysts (60 in to- version and the selectivity to 27 products were determined
tal), X2 consists of parameters calculated on the basis of for each catalyst. Each test was repeated twice. Each full
X1 and physical data (3100 in total), and X3 contains syn- cycle to analyze 16 catalysts needs 8 h. This means that the
thesis parameters of the solid catalysts. The last attribute catalysts were not analyzed at the same time on stream.
set consists of 19 categorical (mostly binary) variables However, the activation or deactivation information was
which provide information on the last synthesis step, also taken into account to some extent by calculating the
namely main synthesis parameters, precursor types, addi- difference between the propene conversion of the first and
tive types, etc. the second measurement. Using this procedure, the per-
formance of the 467 catalysts was described by 120 varia-
bles: propene conversion, 21 selectivities, deactivation be-
2.4 Search Space Reduction – the Descriptor Vector
havior, and mass balance (a measure for possible coking
The number of attributes must be reduced since there are or residue formation on the catalysts) at five different tem-
no modeling techniques available which are able to deal peratures. Because of obvious strong correlations in prod-

80  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 2005, 24
Design of Discovery Libraries for Solids Based on QSAR Models

Figure 2. Simplified scheme of the database for solid catalysts.

uct selectivities accompanied with low variance, some pected, that this will be different for other reaction condi-
composite variables were constructed by the combination tions under net reducing conditions, where coke formation
of variables to reduce the complexity of the problem: The can be a serious problem. 17 variables with high informa-
variables mass balance at all temperatures and temporal tion content have been taken directly for analysis: conver-
behavior at 200 8C, 250 8C, and 300 8C have been discarded, sion, selectivity to CO and to CO2 at all five temperatures,
as their information content was extremely low. It is ex- and temporal behavior at 400 8C and 500 8C. Four variables

QSAR Comb. Sci. 2005, 24  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 81
Full Papers D. Farrusseng et al.

Figure 3. Scheme for the generation of the attribute set X2. For the elements only one operation was necessary, for oxides and ions
two steps were needed; first the different states for one element had to be taken into account and in the second step the different
constituents in the catalyst were generated. (SD ¼ standard deriation).

with medium information content were taken into ac- methyl-2-butene, and pentenes), and S C6 hydro-
count by summing up the respective variable for all five carbonsall temperatures (S C6H12). Then, a principal component
temperatures giving formaldehydeall temperatures, acetalde- analysis (PCA) was carried out to decrease the dimension-
hydeall temperatures, acroleinall temperatures, and benzeneall temperatures. ality and to orthogonalize the dataset. From the scores on
Variables that were assumed to be important from a chem- the PCs axis, distinct catalytic classes were generated by
ical point of view, but had a low variance, were taken into means of hierachical clustering (Wards distance) and the
account by grouping products with similar chemical func- k-means technique as implemented in Statistica 6.1.
tion or chemical structure The values of the respective
groups for all temperatures were summed up, giving
2.6 QPAR Model
S C3H6Oall temperatures (propionaldehyde and acetone), S
acidsall temperatures (acetic acid and acrylic acid), S alkanesall The task here consists of modeling the correlation between
temperatures (ethane, butane, and pentane), S C4 hydro- the descriptors (56 þ 19) and the performance clusters,
carbonsall temperatures (1-butene, 2-butene, and 2-methylpro- which represent typical catalytic behavior. In other words,
pene), S C5 hydrocarbonsall temperatures (2-methyl-1-butene, 2- we seek classification models that enable us to assign a cat-

82  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 2005, 24
Design of Discovery Libraries for Solids Based on QSAR Models

Table 1. A list of the 56 attributes from the attribute set X2 used for the correlation. 1 – 24 are element properties and 25 – 56 are ox-
ide properties. Since often more than one oxide exists, an additional operation to calculate the value for an element is necessary, for
instance, the mean of the densities of different oxides for one element.
Number Code Property Calculation for cata-
lyst
1 meanbc_1ie first ionization energy mean all metals and semi-metals
2 meanbc_ar atomic radius mean all metals and semi-metals
3 difec_ar difference from highest to lowest value
4 meanbc_bseo bond strength element – oxygen mean all metals and semi-metals
5 difec_bseo difference from highest to lowest value
6 meanbc_bsee bond strength element – element mean all metals and semi-metals
7 difec_bsee difference from highest to lowest value
8 meanec_ea electron affinity mean all elements
9 meanbc_ea mean all metals and semi-metals
10 difec_ea difference from highest to lowest value
11 meanec_pe Pauling electronegativity mean all elements
12 meanbc_pe mean all metals and semi-metals
13 difec_pe difference from highest to lowest value
14 minec_nffefmsmo normalized formation free-enthalpy for most stable minimum value all elements
metal oxide
15 maxec_nffefmsmo maximum value all elements
16 minec_sedmsmoom smallest formation free-enthalpydifference from minimum value
the most stable metal oxide to another metal oxide
17 maxec_sedmsmoom maximum value all elements
18 wmec_ms molar mass weighted mean all elements
19 meanec_ms mean all elements
20 meanbc_ms mean all metals and semi-metals
21 wmbc_no number of element oxides weighted mean all metals and half metals
22 minec_no minimum value all elements
23 maxec_no maximum value all elements
24 nvec number of elements in catalyst number all elements

Number Code Property Calculation for ele- Calculation for cata-


ment lyst
25 meanec_moe_d density of oxides mean mean all elements
26 meanec_moe_dc dielectric constant of oxides mean mean all elements
27 meanec_meanvho- formation free-enthalpy of oxides mean value for the mean all elements
soe_ffe highest oxidation state
28 meanec_difoe_ffe difference highest to mean all elements
lowest value
29 meanec_moe_ffe mean mean all elements
30 meanec_meanvho- melting point of oxides mean value for the mean all elements
soe_mp highest oxidation state
31 meanec_moe_mp mean mean all elements
32 meanec_difoe_mp difference highest to mean all elements
lowest value
33 meanec_nvoe_os oxidation states of oxides number mean all elements
34 sumec_nvoe_os number sum all elements
35 meanbc_mie_cn coordination number of ions mean mean all metals and
semi-metals
36 meanbc_difie_cn difference highest to mean all metals and
lowest value semi-metals
37 meanec_mie_cn mean mean all elements
38 meanec_difie_cn difference highest to mean all elements
lowest value
39 meanbc_mie_icp ionic covalency parameter of ions mean mean all metals and
semi-metals
40 meanbc_difie_icp difference highest to mean all metals and
lowest value semi-metals
41 meanec_mie_icp mean mean all elements
42 meanec_difie_icp difference highest to mean all elements
lowest value

QSAR Comb. Sci. 2005, 24  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 83
Full Papers D. Farrusseng et al.

Table 1. (cont.)

Number Code Property Calculation for ele- Calculation for cata-


ment lyst
43 meanbc_mie_ir ionic radius of ions mean mean all metals and
semi-metals
44 meanbc_difie_ir difference highest to mean all metals and
lowest value semi-metals
45 meanec_mie_ir mean mean all elements
46 meanec_difie_ir difference highest to mean all elements
lowest value
47 meanbc_mie_l optical basicity of ions mean mean all metals and
semi-metals
48 meanbc_difie_l difference highest to mean all metals and
lowest value semi-metals
49 meanec_mie_l mean mean all elements
50 meanec_difie_l difference highest to mean all elements
lowest value
51 meanec_maxvhosie_os oxidation states of Ions maximum value for mean all elements
the highest oxidation
state
52 meanec_difie_os difference highest to mean all elements
lowest value
53 sumec_sumie_os sum sum all elements
54 sumec_nvie_os number sum all elements
55 sumec_minvhosie_os minimum value for sum all elements
the highest oxidation
state
56 sumec_maxvhosie_os maximum value for sum all elements
the highest oxidation
state

Table 2. Products of the reaction clearly identified by gas chromatography.


C1 methane C3 propene C4 butane C6 hexane
formaldehyde allyl alcohol 2-methylpropene cyclohexane
carbon monoxide propylene oxide 1-butene benzene
carbon dioxide acetone 2-butene
propionaldehyde
C2 ethane acrolein C5 pentane
acetaldehyde acrylic acid 2-methyl-1-butene
acetic acid 2-methyl-2-butene

alyst to a specific cluster. The model quality is assessed by variables based on their discriminative power. The dataset
means of prediction-rate criteria. Two classification tech- of 467 catalysts was randomly divided into three subsets.
niques were used: Artificial Neural Networks (ANN) and The learning step was performed on one half of the whole
Classification tree as implemented in Statistica 6.1. dataset, the verification step on one quarter, and the inde-
In the search for appropriate ANN models, both Multi- pendent testing step on the remaining quarter.
Layer Perceptron (MLP) and Probabilistic Neural Net- For the Classification tree models, we used the C&RT
works (PNN) were applied using the Intelligent Solver of method with the Gini criteria as splitting conditions and
Statistica. In order to get robust models, we have enabled FACT-style direct stopping as pruning rule (Statistica 6.1).
the solver to discard attributes during the screening of the Cross-validation was carried out on one third of the data-
neural networks. In addition, after models were built, a set to ensure that models were not prone to overlearning.
pruning was performed for discarding irrelevant variables.
Pruning refers to a sensitivity analysis, e.g., a ranking of

84  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 2005, 24
Design of Discovery Libraries for Solids Based on QSAR Models

2.7 Quality Assessment of QPAR Models the variance, which indicates the high dimensionality of
the search space (Figure 4). In addition, the loading plots
The assessment was performed by comparing the predic- PC1 vs. PC2 and PC3 versus PC4 enable us to visualize the
tions made by the model with our observations. The results fairly good covering of the variable search space for the at-
are reported in a so-called confusion matrix. This table re- tributes which contain most information (Figures 5a and
veals the number of correctly classified catalysts, how
many catalysts were misclassified, and for which classes
the misclassification occurred. The prediction rate is de-
rived from the confusion matrix to estimate the quality of
the prediction. It accounts for the correctly classified cases
in the respective predicted class and can be considered as
statistical benchmark for the quality of the prediction,
since it can directly be compared to the “ratio” given in
the confusion matrix. This ratio gives the statistical distri-
bution probabilities into the respective classes for the orig-
inal dataset. Hence, a model becomes meaningful when
the prediction rates are higher than the corresponding ra-
tios.

3 Results

3.1 Feature Selection – Descriptor Vector


In order to assess the relevance of the different attributes,
Figure 4. Eigenvalue plot and the corresponding percentage of
the Relief algorithm was used, which indicates the discrim- variance for each principal component.
inative power of the attributes with respect to the different
classes of catalytic behavior. Among the 3119 attributes, 5
attributes describing the synthesis procedure are by far the 5b). From the distribution study of the attributes one can
most relevant and 11 attributes related to the synthesis are extract that most of them follow a normal distribution, as
in the top 50. Therefore, all 19 synthesis attributes from at- shown for the example of the attribute meanecmie_l (#49)
tribute set X3 were selected, as their influence on the cata- reported in Figure 6a. This is in contrast with the attribute
lytic performance was estimated to be very strong. This set X1, which accounts for the elemental compositions
also corresponds to experience in the field of catalysis, (Figure 6b). Indeed, the box and whisker plot indicates
where it is known that the synthesis method to create a that the median and quartiles for most of the variables re-
catalyst is of paramount importance in determining the garding the elemental compositions are equal to zero. This
performance. On the other hand, clear trends of the rele- results from the fact that a catalyst contains typically be-
vance of the continuous attributes from the attribute set tween three and eight elements, which implies that all oth-
X2 can hardly be identified. Nevertheless, it was noticea- er elements are assigned the value 0. Therefore, in addi-
ble that attributes calculated with simple operators such as tion to higher information content of the attributes, gener-
the minimum of a set of values for a given catalyst, the ating the X2 attribute set enables us to obtain normally
maximum, or the average have more discriminative power distributed attributes, which is beneficial with respect to
than other, more complex operators such as standard devi- the QPAR modeling task. In conclusion, this ensemble of
ation. Attributes generated by the complex operators 56 X2 attributes plus the 19 attributes from the X3 form
where thus discarded. In order to reduce the number of at- the initial descriptor vector that is used as input for the
tributes, strongly correlated attributes (e.g., melting point QPAR model.
and boiling point of elements) and attributes based on
properties that were only accessible for few elements (for
3.2 Data Analysis and Clustering of Catalytic
instance, heat capacity of element oxides) were discarded.
Performance
Finally, among the rest we decided to pick 56 attributes
from the X2 attribute set according to their ranking, in or- Catalytic performance for 467 very different catalytic ma-
der to map all defined properties. The list of continuous terials in a reaction such as propene oxidation is highly
descriptors is reported in Table 1. A PCA analysis on the complex. Altogether 120 variables fully describe the cata-
56 continuous attributes was carried out in order to get in- lytic performance. As already explained in the methods
sights on the features of the dataset. The Eigenvalue plot section, various variables among the 120 have been group-
shows that up to eight PCs are required to explain 75% of ed since they were obviously very strongly correlated re-

QSAR Comb. Sci. 2005, 24  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 85
Full Papers D. Farrusseng et al.

Figure 5. Loading plots for the PCA analysis of the selected


56 continuous attributes from the attribute set X2. Projection of
all variables on the PC1/PC2 plane (a) and the PC3/PC4 plane
(b).

sulting in a reduced dataset of 27. In order to make the da-


taset orthogonal for allowing proper clustering, a subse-
quent PCA step was carried out. Taking the first eight PCs
79.4% of the variance based information is retained. Most
of the variables are represented above 75%, except the
variables accounting for the selectivities to formaldehyde
and acetaldehyde (35 and 45%, respectively). The first
PCs are closely correlated with specific catalytic behavior.
High PC1 levels mean high conversion at low temperature
and high CO2 selectivity, whereas low PC1 levels mean
high CO selectivity. High PC2 values, one the other hand,
are related with high selectivity for alkane and alkene for-
mation. Some obvious trends concerning the variables
conversion, CO, and CO2 at all temperatures can also be
distinguished in the PC1 – PC2 and PC1 – PC3 loading
plots (Figures 7a and 7b). All conversion variables and all
CO2 variables are directly correlated and both inversely
with all CO variables. It can also be observed that all other
variables are independent (orthogonal) of propene conver-
sion, CO, and CO2. The variables benzene, S acids, S alka-
nes, S C4 hydrocarbons, S C5 hydrocarbons, and S C6 hy-
drocarbons are strongly correlated and orthogonal to all
remaining variables.
Different clustering methods were applied to classify
the catalytic behavior of the 467 catalysts. The clustering
was performed on the PCA scores (coordinates of catalysts
in the space defined by the first 8 PCs axis). The selection
criteria were (1) to generate a minimum number of classes

Figure 6. a) Typical distribution of normalized values shown for the example of the attribute meanecmie (#49). b) Box and whisker
plot for the distribution of the elements in the catalysts. Since most catalysts only contain few elements, most values for a given cata-
lyst are zero. Since most materials are oxides, only the median for oxygen differs substantially from zero. Quartiles significantly differ-
ing from zero are only observed for silicon and aluminium, which occur frequently in the supports.

86  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 2005, 24
Design of Discovery Libraries for Solids Based on QSAR Models

Figure 8. Hierarchical tree plot for all 467 catalysts. Cutting


between 100 and 200 generates four well-separated clusters.

Figure 7. Loading plots for the PCA analysis of the catalytic


performance data. Projection of all variables on the PC1/PC2 Figure 9. Number of cases in each cluster for k-means cluster
plane (a) and the PC1/PC3 plane (b). analysis based on eight principal components.

which represent the different obvious catalytic behavior it is possible to assign a specific catalytic behavior to each
(coverage), (2) to avoid that a class is represented by less class. The chemical significance of each of the clusters is:
than 15 catalysts of the whole population (representative-
cluster #1: low conversion, high selectivity to CO2,
ness), and (3) to get classes as distinct as possible. Addi-
cluster #2: medium conversion, high selectivity to CO2,
tionally, the goal of the clustering was to identify each clus-
cluster #3: low conversion, high selectivity to CO, partial
ter with distinct chemical behavior. With respect to these
oxidation products,
criteria, the hierarchical clustering indicates that four dis-
cluster #4: low selectivity to (CO2 þ CO), hydrocarbons,
tinct classes can be generated when cutting at a linkage
and
distance of 150 (Figure 8). On the other hand, k-means
cluster #5: high conversion, high selectivity to CO2.
clustering enables us to generate five distinct classes with a
higher degree of discrimination. The distribution of the
3.3 QPAR Model
467 catalysts in the five clusters is shown in Figure 9. It re-
veals that four classes encompass about 100 catalysts, After the selection of the 56 (X2) plus 19 (X3) attributes
whereas the last class contains only 17 catalysts, which rep- and the identification of five distinct classes of catalytic be-
resent about 4% of the whole dataset. The results of the k- havior, a model was built in order to establish Quantitative
means clustering are shown on the PCA score plot (Fig- Property – Activity Relationships between the attributes
ure 10). For further analysis, these five clusters were inves- and the classes. In addition, because models can highlight
tigated, since they fulfill all our requirements. In addition the most discriminative attributes, a last selection of attrib-

QSAR Comb. Sci. 2005, 24  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 87
Full Papers D. Farrusseng et al.

Table 3. Contingency table for the ANN analysis, reporting the


predictions for the training (top), verification (middle) and test-
ing (bottom) datasets. ANN: Probabilistic Neural Network
(PNN) with 51 input nodes: 119 nodes in the first layer and
240 nodes in the second layer.
training 1 2 3 4 5 sum ratio pred. rate sensibility
1 predicted 54 0 2 1 0 57 0.24 0.95 9,83
2 predicted 1 69 1 0 2 73 0.30 0.95 0.96
3 predicted 3 1 67 0 0 61 0.25 0.93 0.95
4 predicted 0 0 0 7 0 7 0.03 1.00 0.86
5 predicted 0 2 0 0 40 42 0.18 0.95 0.95
sum/mean 68 72 80 8 42 240 0.96 0.93

verification 1 2 3 4 5 sum ratio pred. rate sensibility


1 predicted 14 2 5 2 2 25 0.27 0.56 0.45
2 predicted 10 18 1 0 7 36 0.26 0.50 0.62
3 predicted 7 2 22 0 0 31 0.25 0.71 0.79
4 predicted 0 0 0 1 1 2 0.03 0.50 0.33
5 predicted 0 7 0 0 18 20 0.20 0.65 0.57
sum/mean 31 29 28 3 23 114 0.59 0.55

test 1 2 3 4 5 sum ratio pred. rate sensibility


1 predicted 14 3 7 1 1 26 0.25 0.54 0.50
2 predicted 4 14 3 0 13 34 0.25 0.41 0.50
3 predicted 9 5 12 0 1 27 0.21 0.44 0.50
4 predicted 0 0 1 5 0 6 0.05 0.83 0.83
5 predicted 1 6 1 0 12 20 0.24 0.60 0.44
sum/mean 28 28 24 6 27 113 0.57 0.56

Figure 10. Score plots showing the results of k-means cluster


analysis based on eight principal components.
be pointed out that the PNN model enables good predic-
tions to be performed on cluster #4 when considering the
utes was performed, yielding the final descriptor that has very low number of catalysts used in the learning step
the most robust predictive power. Two different classifica- (eight and three catalysts for learning and verification, re-
tion tools were used to build the QPAR model, namely spectively). The lowest prediction performance is associat-
ANN and Classification tree techniques. ed with cluster #2. This results from the fact that catalysts
Both MLP and PNN neural networks with two hidden belonging to cluster #5 are misclassified in cluster #2 and
layers gave high overall prediction rates. The best PNN vice versa. These misclassifications originate from the jux-
network found is characterized by 51 nodes in the input taposition of the two clusters which can be seen in the
layer, and 119 and 240 in the first and second hidden lay- PCA plots. Chemically, the performance of the catalysts in
ers, respectively. For this model, the results of the predic- these two clusters is related (differing only with respect to
tions for the training, verification and testing datasets, re- the level of conversion), so that this misclassification is
spectively, are reported in the contingency table (Table 3). less severe than, for instance, misclassification into cluster
Values reported on the diagonal correspond to correct #3 or #4.
classifications while the other entries account for misclassi- Applying the pruning method, the number of attributes
fications. For example, in the quality assessment for the was reduced from 75 to 51 resulting in the generation of a
training dataset, 54 catalysts were correctly classified in descriptor vector with more condensed information con-
cluster #1, whereas one was misclassified in cluster #2, and tent. Similar results were obtained with MLP neural net-
3 misclassified in cluster #3. Prediction rates (number of works. The best architecture found consists of 45 nodes in
correctly classified catalysts divided by the number of all the input layers, and 144 and 27 in the first and second hid-
catalysts assigned to this class) have been calculated for den layers, respectively.
each catalyst class. The tables reveal very good learning The Classification tree analysis was based on a further
performance with prediction rates above 93% for all reduced attribute set of 42 attributes based on a weight
classes. The prediction rates for the verification and the analysis by ANN. A loss matrix had to be used to achieve
test datasets at about 57% indicate that over learning did discriminative models. A loss matrix consists of a square
not take place, i.e., the model can perform predictions matrix of coefficients (Table 4) multiplied by a vector of
with a high confidence level on “new catalysts”. It has to class probabilities to form a vector of cost estimates. The

88  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 2005, 24
Design of Discovery Libraries for Solids Based on QSAR Models

diagonal of a loss matrix is always equal to zero, because a ed in Figure 11 as example. The bar chart on top of the
correct classification has zero costs. The value 1 indicates tree presents the distribution of the catalysts in the five
no adjusted preference, while all values > 1 induce a pref- clusters as already shown in Figure 9. The whole dataset is
erence for certain classifications. With the loss matrix as firstly divided into two smaller datasets according to the
set, it becomes more “costly” to misclassify cases from most discriminative splitting rule i.e. whether catalysts
show values for maxec_nffefmsmo (#15) higher or lower
than 153.1. Then, successive splitting conditions are used
Table 4. Loss matrix for the Classification tree analysis which in-
dicates the “costs” for misclassification of catalysts in the re- to generate nodes that are aimed at creating clusters which
spective classes. are clearly separated from each other. At the end of each
branch, the terminal nodes contain the results of the pre-
Loss matrix 1 2 3 4 5
dictions. For example, when all splitting conditions which
1-predicted 0 1 4 5 1 define the terminal node * are fulfilled, the prediction rate
2-predicted 3 0 1.2 5 3 is 84% with respect to cluster #1 whereas the initial proba-
3-predicted 1.1 1 0 1.5 1 bility of cluster #1 is 25% (node 1). The results of the ter-
4-predicted 1 1 1 0 1 minal nodes are gathered by cluster prediction and then
5-predicted 1 1.5 1.1 1.1 0
reported in the confusion matrix (Table 5) which shows
the prediction rates for all catalyst classes and for the
small clusters than from large clusters. In other words, us- learning and cross-validation datasets, respectively.
ing a loss matrix enables to improve the prediction rates Judging from the prediction performance, it is obvious
on smaller classes. This is desirable, because these small that the learning has proceeded rather well since the mod-
clusters contain the catalysts producing partial oxidation el can predict all five clusters satisfactorily, with an overall
products or hydrocarbons, which are more valuable prod- prediction rate of 0.68. In order to validate the model, a
ucts than CO2 and CO. The search for best combinations prediction test was carried out with independent catalysts
of misclassification costs was carried out by trial and error (one third of the dataset). Also here, the prediction rates
to obtain high prediction rates in all five classes. for each class are well above the distribution probabilities
The best classification tree yielded a model based on although the prediction performance is significantly inferi-
only 23 attributes, 33 split nodes, and 34 terminal nodes or with respect to the training set. In addition, also the
(leaves). Because the large size of the classification trees good prediction for the catalytic behavior #4 has to be
results in a complex scheme, a complete graph represent- pointed out. Indeed, the model allows us to predict, with a
ing the tree structure cannot be shown here. However, the confidence rate at around 40%, that a catalyst would be-
first nodes and a few leaves of a simpler model are depict- long to the class of partial oxidation catalysts. In contrast,

Figure 11. Schematic representation of the first nodes and leaves of the classification tree. The numbers in the boxes and the bars
represent the number of cases left in each class after application of the corresponding splitting rule.

QSAR Comb. Sci. 2005, 24  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 89
Full Papers D. Farrusseng et al.

Table 5. Contingency table for the Classification tree analysis often is as important as the chemical composition with re-
which reports the predictions for the training and testing data- spect to the catalytic performance. Great care therefore
sets. has to be taken to capture the essentials of the synthetic
training 1 2 3 4 5 sum ratio pred. rate sensibility procedure.
In general, all solids have been synthesized in several
1 predicted 42 14 13 0 4 80 0.25 0.61 0.64
steps. However the number of steps is a matter of defini-
2 predicted 5 58 0 0 7 70 0.28 0.83 0.67
3 predicted 16 1 54 1 1 73 0.24 0.74 0.74 tion. For example, Cu/MCM-41 (MCM-41: mesoporous
4 predicted 6 5 5 10 1 27 0.04 0.37 0.91 silica as discovered by scientists of Mobil Oil Corporation
5 predicted 1 8 1 0 49 59 0.20 0.83 0.79 [20]) would intuitively be classified as a two-step reaction
sum/mean 77 86 73 11 62 309 0.68 0.75 (preparation of the support, which is not commercially
test 1 2 3 4 5 sum ratio pred. rate sensibility
available and has to be synthesized, and subsequent im-
pregnation), while Cu/SiO2 made by impregnation of a
1 predicted 18 7 14 0 7 46 0.25 0.39 0.45 commercial SiO2 support would normally be assumed to
2 predicted 6 19 10 1 9 45 0.27 0.42 0.44 be a one-step reaction. However, although the support
3 predicted 11 2 12 0 0 25 0.25 0.48 0.31
was bought from a supplier, its synthesis, for instance via
4 predicted 2 3 2 5 0 12 0.04 0.42 0.83
5 predicted 3 12 1 0 14 30 0.19 0.47 0.47 flame pyrolysis, should also be considered as a synthetic
sum/mean 40 43 39 6 30 158 0.44 0.50 step, resulting, altogether, in a two-step reaction. Consid-
ering an ion-exchange reaction, it is also a matter of defini-
tion whether each exchange step counts as a reaction step
or whether the whole exchange procedure is considered as
with random selection, one would only have a 4% chance a single step. This creates a problem in coding the syn-
of classifying such a catalyst. thesis procedure: the more steps that are defined and the
From the tree structure and splitting conditions, 34 rules more precisely each single step is described, the more en-
are derived yielding an explicit model. A prediction rule tries that are equal to zero and therefore without informa-
corresponds to the ensemble of splitting conditions from tion are obtained. This results from the fact that each cata-
the top node to a terminal node. The collection of all path- lyst has to be described with the same attributes in order
ways to each terminal node in text form results in a “rec- to allow meaningful correlation in the model-building
ipe” which enables us to predict straightforwardly the cat- step. For example, assuming that each catalyst should be
alytic behavior of a new catalyst, in contrast to ANN. As described by four synthesis steps and that each step con-
an example, some of the 34 rules are reported in Table 6. sists of 15 parameters (altogether 60 parameters), the en-
try for a catalyst synthesized in a single step (for instance
by precipitation) would contain at least 45 variables with-
4 Discussion out any information. Hence, the goal was to find a good
compromise between a precise description and a suffi-
In the course of the work on this project, several issues ciently simple coding, by either discarding or regrouping
were encountered which are important and which shall be information to avoid the above-mentioned problem. For
addressed in this section, together with the discussion of this study, it was decided to restrict the information only to
the results and the wider implications this work may have. the last synthetic step which was encoded by 19 different
categorical attributes: coding the type of the synthesis re-
action (such as ion-exchange or impregnation), solvents,
4.1 Synthesis Coding
precursors, and the presence of supports and additives,
Appropriate encoding of the synthesis procedure, even in such as chlorine, alkali metal, or others.
a simplified form, is a very difficult task, if one deals with In a more advanced stage of this technology and on a
solid catalysts. On the other hand, the synthesis protocol broader data basis of catalysts, one can – and should – cer-

Table 6. Example for rules that allow the prediction of the performance cluster into which a catalyst falls. Explanation of symbols:
see in Table 1. Both sets of rules predominantly sort catalysts into cluster #1.
Cluster 1 2 3 4 5 1 2 3 4 5
Terminal node 14 Terminal node 26
Cases 10 3 0 0 0 13 0 3 0 0
Rule 1 maxec_nffefmsmo   153.1 maxec_nffefmsmo   153.1
Rule 2 meanbc_bseo  185.3 meanbc_bseo  185.3
Rule 3 meanec_ea  0.75 meanec_ea > 0.75
Rule 4 meanbc_difie_l  0.09 meanec_difie_l > 0.11
Rule 5 sumec_minvhosie_os  7

90  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 2005, 24
Design of Discovery Libraries for Solids Based on QSAR Models

tainly encode the synthesis of the solids in more detail.


However, on the basis of less than 500 catalysts, this did
not seem to be appropriate.

4.2 Design of Relevant Descriptors


Software-based methods alone were not sufficient to re-
duce the more than 3000 attributes to the significant ones
that could be used to construct suitable descriptor vectors.
Chemical knowledge and intuition has thus been used to
design possibly useful elements of the descriptor vector.
This shall be illustrated for some examples: Two properties
were introduced to account for the stability of oxides and
their ability to change oxidation states. These properties
were assumed to be correlated to the availability of oxygen
atoms for an oxidation reaction, such as the propene oxi-
dation. A normalized formation free enthalpy of forma-
tion of element oxides has been introduced (calculation is
performed by dividing formation free enthalpy by the
number of metal atoms in the formula unit). The oxide
with the lowest value of all possible oxides in the catalyst
is referred to as the most stable oxide. Additionally, the
difference between the most stable oxide and the second-
most stable oxide has been taken into account. Consider-
ing another attribute based on element oxide characteris-
tics, the sum of all accessible oxidation states in oxides of
all elements in the catalyst was expected to correlate with
the reduction and oxidation properties of the catalyst. In-
Figure 12. Prediction rates for each cluster in comparison to
tuitively, one would expect that oxidation and reduction the statistical probability of a catalyst to belong to the cluster.
steps would be easier if many oxidation states are accessi- a) ANN of the MLP type based on 45 attributes. b) Decision
ble under ambient conditions tree C&RT based on 23 attributes

4.3 QPAR Models


The comparison of prediction performance based on the the most interesting catalytic behavior corresponds to the
whole dataset for ANN and Classification tree models is class that contains the smallest number of catalysts. This is
shown in Figure 12. The prediction rates are compared to an issue for the modeling task, because the learning on
the initial probability for a catalyst to belong to a given limited numbers of cases cannot be optimal, and because
cluster. The gap between the prediction rate and respec- the construction of the models is usually biased by the dis-
tive initial probability reveals how well the model can pre- tribution of the classes, i.e., models tend to learn better on
dict performance. From the bar chart it is obvious that larger classes than on smaller ones. For the latter problem,
classification trees performed less well than ANN. On the the bias can be corrected by using cost-misclassification
other hand, the simplicity/readability of Classification tree systems that allow the learning on targeted classes to be
models is a major advantage since much information can forced. The design of the cost matrix is not straightforward
be gained on the most relevant descriptors and guidelines, and is usually carried out by trial-and-error processes.
as will be discussed in the following section. It should also
be pointed out that many more catalysts should be
4.4 Descriptor Vector
screened to allow a fully reliable comparison between the
two modeling techniques, especially since a significant part The goal of this study was the evaluation of a descriptor
of the different datasets were reserved either for verifica- vector that consists of relevant attributes. This descriptor
tions or cross-validations. vector should allow the correlation of properties of solid
When complex catalytic reactions are analyzed, such as catalysts with the respective performance of the catalysts,
the partial oxidation of hydrocarbons, it is obvious that a and would thereby enable the design of libraries suitable
very limited number of catalyst types will have the desired for HT testing. On the other hand, however, from the in-
performance, compared to the infinite number of potential spection of classification rules or attributes selected for
candidates. From a statistical point of view, it follows that the descriptor vector by ANN, one may learn more about

QSAR Comb. Sci. 2005, 24  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 91
Full Papers D. Farrusseng et al.

the decisive factors that influence the catalytic behavior of information on the relevant properties that a catalyst
solids in certain catalytic reactions. should have for a specific reaction.
For the different types of ANN that were tested in this At this point it is not clear just how general descriptor
study, certain attributes were consistently selected as input vectors for catalytic reactions will be, i.e., whether a de-
variables by many ANN. If one analyzes these attributes, scriptor for propene oxidation catalysts, such as developed
one can identify certain trends. Considering the attributes in this study, will also be valid for other alkenes or even
related to properties of the elements, variables based on for hydrocarbon oxidation reactions. The establishment of
the atomic radius (ar, #2, 3), the electron affinity (ea #8, 9, a much broader database is necessary to verify and further
10), the normalized formation free-enthalpy of the most develop the descriptor concepts for catalysts. It is expected
stable metal oxide (nffefmsmo, #14, 15), and the smallest that there will be an intimate interplay between the refine-
energy difference between the most stable metal oxide ment of descriptors and the testing of new solids in catalyt-
and another metal oxide (sedmsmoom, #16, 17) seemed to ic reactions. The broader the experimental database be-
be of major importance. The number of elements in a cata- comes, the more discriminative will be the descriptors de-
lyst (nvec, #24) was also significant. When examining the veloped on this basis, which in turn allows more focused
attributes related to the element oxides, only two attrib- catalytic testing.
utes based on the melting point (mp, #31, 32) seemed to In addition, the concept seems to be more broadly appli-
have any significance. When analyzing the attributes relat- cable. Any field in which the correlation between proper-
ed to element ions, the ionic radius (ir), coordination num- ties of solids and performance in a given application is
ber (cn, #35 – 38), and ionic covalent parameter (icp, #39 – complex and development is, to a large extent, empirically
42) seemed to be the most important variables. The list of governed, may benefit from the possibility of virtual
synthesis attributes contains the highest number of attrib- screening as the first stage of a high-throughput program.
utes (eight) that were selected as input variables in all net- It will be interesting to see how fast these methods will
works. This is reasonable from a chemical point of view, as find their way into the laboratories active in this field.
synthesis parameters refer directly to the experimentally
tested solid catalyst.
All types of analysis revealed that the stability of oxides
Acknowledgements
has a strong impact on the performance of catalysts in the
oxidation of propene. This is a result that chemical intu-
We thank the Marie Curie Fellowship Association and re-
ition would also have given. However, in the framework of
gion Rhône-Alpes (Programme EuroDoc) for having sup-
this study this conclusion was discovered without addition-
ported the students mobility and training. In addition, we
al interference by a chemist, and one could therefore, with
would like to thank the Leibniz program of the DFG and
some justification, say that the methodology used in the
the FCI who provided funding in addition to the basic
framework of this investigation has implemented chemical
funding by the Max-Planck-Gesellschaft and the CNRS.
intuition on a basic level in a software program.

5 Conclusions References

We have described the implementation of a methodology [1] I. E. Maxwell, Nature 1998, 394, 325.
[2] B. Jandeleit, D. J. Schaefer, T. S. Powers, H. W. Turner,
that is the basis for a virtual screening of complex solids
W. H. Weinberg, Angew. Chem. Int. Ed. 1999, 38, 2494.
with respect to their catalytic properties. These solids are [3] W. F. Maier, Angew. Chem. Int. Ed. 1999, 38, 1216.
generated at random from available elements via a set of [4] S. Senkan, Angew. Chem. Int. Ed. 2001, 40, 312.
different synthetic procedures. Screening in-silico then al- [5] Y. Yamada, T. Kobayashi, Chem. Sens. 1999, 15, 100.
lows one to identify those samples which should be experi- [6] J. M. Newsam, F. Schth, Biotechnol. Bioeng. 1999, 61, 203.
mentally investigated. Since the coding is set up in a man- [7] C. Klanner, D. Farrusseng, L. Baumes, C. Mirodatos, F.
ner that corresponds relatively closely to a synthetic proce- Schth, QSAR Comb. Sci. 2003, 22, 729.
[8] D. Wolf, O. V. Buyevskaya, M. Baerns, Appl. Catal., A:
dure, there is a high probability that the suggested samples Gen. 2000, 200, 63.
can indeed be synthesized. [9] U. Rodemerck, M. Baerns, M. Holena, D. Wolf, Appl. Surf.
This approach could be very valuable, especially in reac- Sci. 2004, 223, 168.
tions where no good lead is available as yet, since for such [10] A. Corma, J. M. Serra, A. Chica, in Principles and Methods
reaction thousands of samples may have to be tested be- for Accelerated Catalyst Design and Testing (Eds.: E. G.
fore some activity is discovered at all. Prescreening to re- Derouane, V. Parmon, F. Lemos, F. R. Ribeiro), Kluwer,
Dordrecht, The Netherlands 2002, p. 153.
duce the number of tests to be performed is therefore al-
[11] J. M. Serra, A. Corma, E. Argente, S. Valero, V. Botti,
most mandatory. Moreover, if descriptors with high pre- Appl. Catal., A: Gen. 2003, 254, 133.
dictive power are discovered, one may be able to extract [12] J. M. Serra, A. Corma, D. Farrusseng, L. Baumes, C. Miro-
datos, C. Flego, C. Perego, Catal. Today 2003, 82, 67.

92  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 2005, 24
Design of Discovery Libraries for Solids Based on QSAR Models

[13] D. Farrusseng, L. Baumes, C. Mirodatos, in High-Through- [17] For example: Handbook of Chemistry and Physics, 77th Ed-
put Analysis: A Tool For Combinatorial Materials Science ition (Eds.: D. R. Lide, H. P. R. Frederikse.) CRC Press,
(Eds.: R. A. Potyrailo., E. J. Amis.), Kluwer Academic/Ple- Boca Raton 1996 – 1997.
num Publishers, New York 2004, p. 551. [18] K. Kira and L. Rendell, A practical approach to feature se-
[14] J. Cawse, Experimental Design for Combinatorial and High lection. In: Proceedings of the 9th International Conference
Throughput Materials Development, John Wiley & Sons, on Machine Learning (Aberdeen, July 1992), D. Sleeman &
Weinheim, Germany 2002. P. Edwards (eds.), Morgan Kaufmann 1992, pp. 249  256
[15] C. Klanner, D. Farrusseng, L. Baumes, M. Lengliz, C. Miro- Aberdeen, Scotland.
datos, F. Schth, Angew. Chem. Int. Ed. 2004, 43, 5347. [19] C. Hoffmann, A. Wolf, F. Schth, Angew. Chem. Int. Ed.
[16] M. Schwickardi, T. Johann, W. Schmidt, F. Schth, Chem. 1999, 38, 2800.
Mater. 2002, 14, 3913. [20] a) C. T. Kresge, M. E Leonowicz, W. J Roth, J. C.Vartuli,
J. S Beck, Nature 1992, 359, 710.

QSAR Comb. Sci. 2005, 24  2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 93

Vous aimerez peut-être aussi