Vous êtes sur la page 1sur 10

Chem. Mater.

2006, 18, 3287-3296 3287

A New Mapping/Exploration Approach for HT Synthesis of Zeolites


Avelino Corma,* Manuel Moliner, Jose M. Serra, Pedro Serna, Marı́a J. Dı́az-Cabañas, and
Laurent A. Baumes
Instituto de Tecnologı́a Quı́mica, UPV-CSIC, UniVersidad Politécnica de Valencia,
AVda. de los Naranjos s/n, 46022 Valencia, Spain
ReceiVed March 15, 2006. ReVised Manuscript ReceiVed May 4, 2006

This work shows a methodology for the synthesis of self-assembled organic-inorganic materials which
integrates high-throughput tools for the synthesis and characterization of solid materials and data-mining
techniques in materials science. This is illustrated by a detailed exploration of the hydrothermal synthesis
in the system SiO2:GeO2:Al2O3:F-:H2O:N(16) methylsparteinium. Data analysis and dimensional reduction
were conducted by using principal components analysis and clustering algorithms, allowing the definition
of a new and suitable structural vector which summarizes the X-ray diffraction characterization data as
well as an improvement of data visualization and interpretation. Different modeling techniques were
applied for the prediction of the properties of the materials considering the synthesis descriptors as input
of the model. Furthermore, different “material property” descriptors were considered as outcome of the
model, that is, the crystallinity of the formed phases, structural principal components computed by principal
component analysis, or clustering results. It was found that the final properties of the materials could be
successfully modeled using artificial neural networks and decision trees.

1. Introduction kinetics. Despite the notable efforts made to rationalize the


The application of combinatorial and high-throughput (HT) synthesis of zeolites,9-12 the relationship between synthesis
techniques to materials science can help chemists to increase variables and the zeolitic structure formed is not clearly
the number of variables of a given process that can be studied understood, because of the metastable nature of zeolites and
in a reasonable time period as well as to increase the number the complexity of the involved synthesis mechanisms. As a
of samples produced and characterized.1-3 Moreover, data result of this, the discovery of new microporous materials is
mining and database technology are applied for the analysis still predominantly an empirical process, though strongly
and modeling of the large amounts of data generated, helped by accumulated experience. High-throughput methods
allowing in turn a speeding up of the discovery and should be useful in this field13-17 to determine the effect of
optimization process while establishing scientific principles. different synthesis parameters and to help in the discovery
In recent years, the usefulness of HT methods has been of new zeolites.
proven for the discovery of solid functional materials.4-8 Very recently, a new zeolite, named ITQ-21, containing
Indeed, these methods allow the simultaneous study of Si, Ge, and optionally Al as framework cations was
numerous synthesis and processing variables, this being reported.18 This material presents a unique pore topology
especially important when dealing with highly nonlinear and formed by nearly spherical large cavities of 1.18 nm diameter
multidimensional systems as is the case for the synthesis of joined to six other neighbored cavities by circular 12-ring
microporous molecular sieve systems. pore windows with an aperture of 0.74 nm, which results in
The hydrothermal crystallization processes of microporous a three-directional channel system of fully interconnected
materials are governed by a large number of parameters
which determine the phases formed and the crystallization (9) Piccione, P. M.; Yang, S.; Navrotsky, A.; Davis, M. E. J. Phys Chem.
B 2002, 106, 3629.
(10) Corma, A.; Davis, M. E. ChemPhysChem. 2004, 5 (3), 304-313.
* To whom correspondence should be addressed. Tel.: 34(96)3877800. (11) Schüth, F.; Schmidt, W. AdV. Eng. Mater. 2002, 4 (5), 269-279.
Fax: 34(96)3877809. E-mail: acorma@itq.upv.es. (12) Rajagopalan, A.; Suh, C.; Li, X.; Rajan, K. Appl. Catal., A 2003, 254,
(1) Combinatorial Materials Science; Xiang, X. D., Takeuchi, I., Eds.; 147-160.
Dekker: New York, 2003. (13) Akporiaye, D. E.; Dahl, I. M.; Karlsson, A.; Wendelbo, R. Angew.
(2) Koinuma, H.; Takeuchi, I. Nat. Mater. 2004, 3, 429-438. Chem., Int. Ed. 1998, 37 (5), 609-611.
(3) Hanak, J. J. Appl. Surf. Sci. 2004, 223, 1-8. (14) Holmgren, J.; Bem, D.; Bricker, M.; Gillespie, R.; Lewis, G.;
(4) Gorer, A. U.S. Patent 6.723.678, 2004, to Symyx Technologies Inc. Akporiaye, D.; Dahl, I.; Karlsson, A.; Plassen, M.; Wendelbo, R. Stud.
(5) Sohn, K. S.; Seo, S. Y.; Park, H. D. Electrochem. Solid State Lett. Surf. Sci. Catal. 2001, 135, 461-470.
2001, 4, H26-H29. (15) Bricker, M. L.; Sachtler, J. W. A.; Gillespie, R. D.; McGoneral, C.
(6) Boussie, T. R.; Diamond, G. M.; Goh, C.; Hall, K. A.; LaPointe, A. P.; Vega, H.; Bem, D. S.; Holmgren, J. S. Appl. Surf. Sci. 2004, 223
M.; Cheryl Lund, M. L.; Murphy, V.; Shoemaker, J. A. W.; Tracht, (1-3), 109-117.
U.; Turner, H.; Zhang, J.; Uno, T.; Rosen, R. K.; Stevens, J. C. J. (16) Pescarmona, P. P.; Rops, J. J. T.; van der Waal, J. C.; Jansen, J. C.;
Am. Chem. Soc. 2003, 125, 4306-4317. Maschmeyer, T. J. Mol. Chem. A 2002, 182-183, 319-325.
(7) Corma, A.; Serra, J. M.; Serna, P.; Argente, E.; Valero, S.; Botti, V. (17) Klein, J.; Lehmann, C. W.; Schmidt, H. W.; Maier, W. F. Angew.
J. Catal. 2005, 229, 513-524. Chem., Int. Ed. 1999, 38, 3369.
(8) Klanner, C.; Farrusseng, D.; Baumes, L. A.; Mirodatos, C.; Schuth, (18) Corma, A.; Dı́az-Cabañas, M. J.; Martı́nez-Triguero, J.; Rey, F.; Rius,
F. Angew. Chem., Int. Ed. 2004, 43 (40), 5347-5349. J. Nature 2002, 418, 514-517.

10.1021/cm060620k CCC: $33.50 © 2006 American Chemical Society


Published on Web 06/20/2006
3288 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

large cavities. This zeolite was synthesized using a large and Table 1. Levels and Ranges of Synthesis Factors Employed in the
Experimental Design
rigid structure-directing agent, N(16)-methylsparteinium
(MSTP), and the directing effect of Ge toward the formation variation ranges
of structures containing double four rings seems decisive for number
the synthesis of ITQ-21.19 Zeolite ITQ-3020 is a new structure level level 1 level 2 level 3 level 4
of the MWW family, which is more closely related to MCM- time (days) 2 1 5
Si/Ge 4 15 20 25 50
5621 but with clearly different X-ray diffraction (XRD) Al/(Si + Ge) 3 0.02 0.04 0.067
features. The thermal and hydrothermal stability of zeolites MSPT/(Si + Ge) 2 0.25 0.5
increases as the germanium content decreases. Furthermore, F/(Si + Ge) 2 0.25 0.5
H2O/(Si + Ge) 3 2 5 10
it is important for catalytic applications to find out the
synthesis conditions in which fully crystalline samples of
ITQ-21 could be obtained with the lowest amount (or none)
of Ge and the highest acidity [determined by the (Si + Ge)/
Al ratio].
Classical designs of experiments (DoE),22 like factorial
or combination designs, have been applied successfully, when
exploring the synthesis gel conditions aimed at the discovery
of new zeolites or the optimization of existing ones.23-25 It
is clear that the synthesis variables should be carefully
selected in order to cover the largest part of the most
promising parameter space, while keeping the total number
of experiments at a reasonable and feasible level. Moreover,
the HT methods currently applied for parallel hydrothermal
synthesis strongly constrain how the synthesis parameters
can be experimentally studied. For instance, when using
autoclave arrays (multiautoclaves with 15-96 wells), the
intensive exploration of crystallization temperature and time
is restricted. Therefore, DoE strategies should be developed
Figure 1. Phase diagram showing the occurring materials as a function of
which consider the specific aspects of HT methods in this the five synthesis variables (starting gel molar ratios and crystallization
field, while minimizing the number of experiments. On the time).
basis of the data analysis/mining methodology applied in this
work, we propose a new mapping/exploration approach for ) 144) was selected for studying simultaneously the concentrations
reducing the screening of low-promise conditions, within the of the components in the starting gel, that is, Al/(Si + Ge), MSPT/
multivariate synthesis spaces found in microporous systems. (Si + Ge), F-/(Si + Ge), and Si/Ge molar ratios, as well as the
crystallization time. Table 1 shows the values and levels considered
for the different variables. For experimental details, see the
2. Experimental Section and the Design of
Supporting Information.
Experiments
Different data-mining techniques have been applied to extract
A detailed exploration of the hydrothermal synthesis in system knowledge about the relationships between synthesis conditions and
SiO2:GeO2:Al2O3:F-:H2O:MSPT has been performed, to understand the occurrence of different zeolite phases, minimizing the human
the influence of these factors on the growth of ITQ-21 and ITQ-30, participation in the analysis of the great amount of data generated.
at 175 °C under static conditions. Parallel syntheses were developed Furthermore, the advantages of data-mining techniques when
using a robotic system and 15-fold Teflon-lined stainless steel processing, visualizing, and interpreting this type of nonlinear data
autoclaves for the crystallization.25 Crystallinity was measured by have been shown. In this sense, three issues are key in our
means of XRD, using a multisample Phillips X’Pert diffractometer methodology: (i) the analysis and extraction of knowledge (i.e.,
employing Cu KR radiation. A factorial experimental design (4.32.22 Pareto analysis and data visualization techniques), (ii) a reduction
of the complexity/dimensionality of the problem, minimizing the
(19) Blasco, T.; Corma, A.; Dı́az-Cabañas, M. J.; Rey, F.; Rius, J.; Sastre, information loss (i.e., clustering analysis and principal component
G.; Vidal-Moya, J. A. J. Am. Chem. Soc. 2004, 126, 13414-13423. analysis, PCA), and (iii) modeling, enabling one to make a priori
(20) Corma, A.; Dı́az-Cabañas, M. J.; Moliner, M.; Martı́nez, C. Discovery predictions (i.e., classification trees and neural networks, NNs).
of a new catalytically active and selective zeolite (ITQ-30) by high-
throughput synthesis techniques. J. Catal. in press. Moreover, this approach combining diverse data-mining techniques
(21) Fung, A. S.; Lawton, S. L.; Roth, W. J. U.S. Patent 5 362 697, 1994, has been shown as a realistic way of statistically treating data from
to Mobil Oil Corp. materials science. At last, we have used the NN model based on
(22) Montgomery, D. C. Design and Analysis of Experiments, 4th ed.; John
Wiley & Sons Inc.: New York, 1997. ITQ-21 crystallinity to minimize the germanium content present
(23) Tagliabue, M.; Carluccio, L. C.; Ghisletti, D.; Perego, C. Catal. Today in the final structure, to increase its thermal stability, while
2003, 81, 405-412. maintaining high crystallinity. More details for data-mining tech-
(24) Holmgren, J.; Bem, D.; Bricker, M. L.; Gillespie, R. D.; Lewis, G.;
Akporiaye, D.; Dahl, I.; Karlsson, A.; Plassen, M.; Wendelbo, R.
niques are described in the Supporting Information.
Proceedings of the 13th International Zeolite Conference; Montpellier,
France, July 8-13, 2001; Galarneau, A., Di Renzo, F., Fajula, F., 3. Results and Discussion
Vedrine, J., Eds.; Stud. Surf. Sci. Catal. 2001, 135, 461.
(25) Moliner, M.; Serra, J. M.; Corma, A.; Argente, E.; Valero, S.; Botti,
V. Microporous Mesoporous Mater. 2005, 78, 73-81. 3.1. Screening Results: Phase Diagram. Figure 1 shows
(26) Lobo, R. F.; Davis. M. E. Microporous Mater. 1994, 3, 61. the phase diagram obtained following the factorial design
A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3289

Figure 2. XRD patterns of ITQ-21 and ITQ-30.

Figure 3. Standardized Pareto chart for ITQ-21 and ITQ-30 formation, showing the effect of the different synthesis factors on the crystallinity of each
zeolite. The length of each bar displayed in the frequency histogram is proportional to the absolute value of its associated estimated effect.

described above. ITQ-21, ITQ-30, and amorphous material (Si + Ge) and F/(Si + Ge) play a positive role in the
were obtained in the explored space. The standard X-ray formation of ITQ-21 and ITQ-30. However, some important
diffractograms for each crystalline phase are shown in Figure differences can be observed when comparing the analyses
2. Automatic calculation of the occurrence and crystallinity for ITQ-21 and ITQ-30. On one hand, the relative importance
was done integrating the area of the characteristic peaks for of MSPT/(Si + Ge) and F/(Si + Ge) is higher for ITQ-30,
each phase and referring this to the fully crystalline materials. because only in a few small zones can this material be
For ITQ-21, the integrated area is comprised of a 2θ angle obtained with the minimum content of MSPT/(Si + Ge) and
between 25.4 and 27.2°, and for ITQ-30, the range is between F/(Si + Ge). On the other hand, Si/Ge appears as an
24.6 and 25.4°. Because ITQ-30 also presents diffraction important negative factor for ITQ-21 samples, while it
peaks in the 25.4-27.2° region, the percentage of ITQ-30 becomes slightly positive for ITQ-30 samples. This result
is subtracted considering the crystallinity measured from the has to be understood as a penalization for the growth of ITQ-
peak located at 25.0°. Considering the crystallinity of the 21 when increasing the Si/Ge ratio, because the crystallinity
synthesized materials, three different groups have been decreases but also some syntheses change to ITQ-30. This
created. A material is qualified as “amorphous” if both the reason can be applied for the slight benefit of Si/Ge for ITQ-
ITQ-21 and ITQ-30 crystallinities are below 20%. “ITQ- 30, taking into account a balance between the loss of
21” is defined as a material for which the ITQ-21 crystallinity crystallinity and the appearance of new ITQ-30 points.
is higher than 20% and ITQ-30 below 20%. If the ITQ-30 However, ITQ-21 samples appear with a lower Si/Ge content.
crystallinity is greater than 20%, the material is noted as Finally, the relative influence of time for these materials is
“ITQ-30”. quite different, being much more important in the case of
A first approach using Pareto analysis shows in Figure 3 ITQ-30 than in that of ITQ-21. This effect of time could be
the relative influence of each synthesis factor over the understood as a retransformation process of ITQ-21, in such
crystallinity of ITQ-21 and ITQ-30 samples. In this chart, a way that ITQ-30 can only be obtained in 1 day if it is
the length of each bar is the estimated effect divided by its worked with the maximum levels of MSPT/(Si + Ge) and
standard error, which is equivalent to computing a t statistic F/(Si + Ge) and the minimum level of Al/(Si + Ge).
for each effect. The vertical line on the plot means that bars 3.2. Analysis and Knowledge Extraction from HT
which extend beyond the line correspond to effects that are Experimental Data. In this section, different techniques of
statistically significant at the 95% confidence level. This unsupervised analysis will be applied to the original data
statistical way of understanding the results allows quantifica- set derived from the XRD characterization of the whole set
tion of the hypothetical weight of the factors in the growth of samples, allowing an improvement in data visualization,
of materials. Both ITQ-21 and ITQ-30 seem to be quite classification, and the ulterior knowledge extraction. Indeed,
influenced in a negative sense by water and aluminum structural vectors will be computed from the raw character-
content; that is, the more water or the higher Al/(Si + Ge), ization data by means of dimensional reduction and analysis
the less crystalline are the samples. Afterwards, MSPT/ techniques, that is, clustering algorithms and PCA.
3290 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

Figure 4. Tree diagram (dendrogram) showing the Euclidean distances between the different clusters and subclusters.

Clustering analyses of raw XRD data allow classification


of the as-synthesized samples into different structural groups
without applying any previous knowledge. That can be of
interest when the resulting materials contain mixtures of
phases or unknown phases, where the conventional phase
identification systems find difficulties. Moreover, this type
of data classification allows the achievement of high degrees
of automation in the high-throughput experimental workflow.
3.2.A. Clustering Analysis. The k-means clustering algo-
rithm examines each sample from the population and assigns
it to one of the clusters trying to minimize the variance
intraclass and maximize the variance interclass. The centroid
of one cluster is iteratively computed when a new component
is added to the cluster, this process being repeated until all
of the components are grouped into the selected number of
clusters. This methodology suffers from the initialization of
centroids. Depending on the first randomly chosen centroids,
the final solution can highly change. Therefore, numerous
assignments have been performed in order to get a stable
and representative solution.
A first data set constituted by the XRD data of each sample
has been taken into account for the clustering analysis. This Figure 5. XRD measurements of the as-synthesized samples ordered
considering the cluster distribution obtained by the k-means algorithm using
involves vectors with 800 attributes, corresponding to the the second data set.
intensities obtained for each diffraction angle of the 144
samples. The number of clusters chosen to perform the later ing to ITQ-21 and ITQ-30 samples. More specific subclusters
analysis was investigated by means of a tree diagram (called can be related to slight differences in the XRD diffractograms
a dendrogram), using Ward’s clustering method (see the for a given structure, because of changes in their crystallinity
Clustering Analysis section in the Supporting Information). or germanium contents. From a practical point of view, we
In this tree diagram (Figure 4), the different groups of have selected a number of three clusters, to make a first
samples are plotted as a function of the relative diversity of classification based on the three types of materials identified
each group (linkage distance). This classification analysis manually, that is, amorphous, ITQ-21, and ITQ-30.
shows that two big clusters can be clearly recognized, A second data set constituted by XRD data from the
corresponding to amorphous and crystalline materials, whereas characteristic 2θ range (24.5-27.5°) of ITQ-30 for each
the last cluster can be split into two new groups, correspond- sample was considered. Figure 5 shows a general visualiza-
A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3291

Figure 6. Identification of the formed phase using a k-means clustering analysis.

Figure 7. Averaged XRD diffractogram for the three clusters obtained by k-means analysis.

Table 2. Clustering Analysis Carried out Using the XRD Data,


Showing the Match between Clustering Results and Phase
Identification
clustering k-means match
specific 2θ range complete 2θ range
clusters match (%) match (%)
1. amorphous 87.3 99.0
2. ITQ-21 100.0 89.7
3. ITQ-30 92.3 69.2
tion of the XRD data, ordered according to their belonging
to the different clusters obtained by the k-means clustering
algorithm using the second data set. Figure 6 shows the good
match between the clusters obtained by k-means analysis for
both data sets and the corresponding material/phase. The
clustering analysis using the whole of the XRD data allows
one to accurately distinguish amorphous and crystalline
materials, whereas it fails only in a few samples when
distinguishing between ITQ-21 and ITQ-30 phases (Table
2). However, it is possible to improve the quality of the
separation between ITQ-21 and ITQ-30 samples taking only
into account the range of 2θ where these two structures
present different peaks (24.5° and 27.5°). The k-means
clustering in this way allows a strong improvement of the Figure 8. Distribution of the three different phases in the SPC coordinates.
classification between both phases, although the classification (PCA computed using the whole of the XRD data, first data set.)
3292 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

9.2%, respectively. Because of the simplification of the


original vector, we can provide now an easy visualization
of the distribution of the samples into the virtual three-
dimensional SPC space. The results of the k-means clustering
algorithm and the PCA can be combined, as it is shown in
Figure 8. SPC projections of the samples are clearly separated
from one cluster to another.
Diffraction data usually contain information about the type
of crystalline phase as well as about the crystallinity of the
material, crystallite size, zeolite framework composition, and
so forth. Indeed, the fine-tuning of ITQ-21 crystallite size
has been reported19 from nanocrystals to large crystals by
controlling the rates of nucleation and crystal growth, through
the H2O/(Si + Ge) ratio. In the present study, trying to
rationalize the meaning of SPC space, we will study the
variation of phase crystallinity and framework composition
Figure 9. Identification of different structural properties in the SPC space: inside this new space. On one hand, Figure 9 shows the
distribution of ITQ-21 and ITQ-30 with different ranges of crystallinity. distribution of ITQ-21 and ITQ-30 samples with different
degrees of crystallinity into the SPC space. It can be seen
accuracy of the amorphous samples is reduced. Figure 7 that they are clearly distributed in the space, it being possible
presents the averaged XRD pattern for each cluster (first data to correlate crystallinity against SPCs. On the other hand,
set), showing the good match between the clustering analysis the correlation between the germanium content in the ITQ-
and phase identification (see the real diffractograms of 21 framework and the SPC was studied. Given that the Si/
standard ITQ-21 and ITQ-30 samples mentioned previously). Ge ratio in the starting gel has been shown as a very
The characteristic peaks of ITQ-30 can be observed, and the influencing factor on the final crystallinity of ITQ-21 (see
averaged diffractogram can be clearly distinguished from the the Pareto analysis in Figure 3), the variation of the Si/Ge
ITQ-21 XRD pattern. was followed apart from the correlation between the SPC
3.2.B. Principal Component Analysis. The PCA computed and crystallinity. Concretely, Figure 10 represents the third
from the whole of the XRD data will be referred to as SPC as a function of Si/Ge, for three different degrees of
structural principal components (SPCs) from here on. When crystallinity. It is clear that SPC#3 is strongly correlated with
PCA techniques are applied, it is possible to reduce the XRD the structural changes produced by the Si/Ge framework
vector of each sample (vectors with 800 intensities for each variation. In fact, this correlation is attributed to the informa-
2θ angle) to a vector with only three new variables (SPCs), tion extracted by PC analysis from the XRD peak shift
without a loss of the main information of the original data produced by the isomorphic substitution of Si by Ge in the
because 81.8% of the cumulative variance has been extracted. zeolite framework, as can be clearly seen in the Figure 10
The corresponding percentage of variance for each compo- inset. No correlation was found between Si/Ge and the
nent (SPC#1, SPC#2, and SPC#3) is 39.8%, 32.8%, and remaining two SPCs.

Figure 10. Identification of different structural properties in the SPC space for ITQ-21 samples: correlation between SPC#3 and Si/Ge in the starting gel,
for three different degrees of crystallinity. Inset: Partial diffractograms corresponding to four samples with different Si/Ge ratios and the same crystallinity
(20%), showing the peak shift.
A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3293

Table 3. NN and Decision Tree Prediction Performances of the


Obtained Phase Using the Synthesis Variables as Model Input
class % DT accuracy % NN accuracy
amorphous 92.16 96.08
ITQ-21 93.10 93.10
ITQ-30 92.31 92.31

raw XRD data allows one to obtain a new series of structural


components in a fully automated manner, which entirely
describes the properties of the synthesized samples. In
addition, these structural vectors can be used to improve the
prediction performance of QSAR/QSPR models, such as
NNs, as well as the development of new exploration tools
(mapping) of nonlinear and multidimensional spaces, such
as those found in the development of new microporous
materials.
3.3. Construction of Predictive Models (QSPR/QSAR).
3.3.A. PredictiVe Modeling of Material Properties from
Synthesis Descriptors. As a first step, NN models were
obtained using the synthesis descriptors as input and the
zeolite crystallinity as output. Very good prediction results
could be obtained using a NN with a two-hidden-layer
topology and the back propagation training algorithm (R )
0.3). A total of 70% of the data were employed for the
training process and the rest for testing. Figure 11 shows
Figure 11. Prediction performance of the NN model using the synthesis the experimental and predicted crystallinity for both zeolites,
factors as input and the crystallinity of ITQ-21 and ITQ-30 as output. (Net clearly illustrating the high accuracy of the model despite
topology 5_10_4_2, trained using BackProp with the Momentum algorithm
and 80% data.) the experimental error associated with the synthesis and
characterization steps. Subsequently, this predictive model
Consequently, SPCs contain the summarized information was applied for finding the theoretical synthesis conditions
of XRD patterns concerning the different structural and that optimize the ITQ-21 crystallinity by keeping the molar
morphological changes in the whole of the materials ratio Si/Ge > 30. Three different sets of conditions with
explored. These results demonstrate that the application of predicted crystallinity around 60% were selected for experi-
dimensional reduction techniques, just as with PCA, of the mental testing, with 2 days of crystallization time. The

Figure 12. Decision tree ID3-IV obtained using synthesis descriptors as model input and phase clusters as output. [The importance of each factors as
follows: Si/Ge 100%, Al/(Si + Ge) 79%, MSTP/(Si + Ge) 72%, H2O/(Si + Ge) 70%, and crystallization time 38%.] The initial data partition called the
initial branch or root encompasses all data records. This root is split into subsets or child branches, on the basis of the value of a particular input field, which
may in turn be split again into sub-branches and so on.
3294 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

Figure 13. NN prediction performance of the SPC using the synthesis factors as input. The correlation factor for the crystallinity of ITQ-21 and ITQ-30
is 0.960 and 0.958, respectively. The inset shows the topology of the best NN.

Figure 14. Eigenvalues for two different data set sizes: on the left-hand side, 60% of the whole available amount of experiments is considered, while on
the right side, only 40% is used for the calculation of the eigenvectors.

experimental crystallinity achieved was slightly lower than output is well-suited when the aims of the exploration are
expected, being for the samples close to 50, as can be shown both the discovery of new structures and the optimization
in Figure 11 (filled squares). of a determined feature when competing phases are also
Subsequently, predictive models based on decision trees formed. Given that synthesis variables have been shown as
and NNs were computed using just the type of formed the main factors in the growth of both ITQ-21 and ITQ-30
material as output data. Figure 12 shows the best decision by the Pareto analysis, and bearing in mind that SPCs are
tree found, describing successfully the type of material strongly correlated with the type of material formed, its
formed as a function of the synthesis variables. Table 3 crystallinity, and its framework composition, there is no
compares the prediction performance of the NN and decision doubt about the existence of clear relationships between
tree models, with very high accuracy, although the NN model synthesis descriptors and SPCs. Following this approach, an
is slightly better. The relative importance of each input factor accurate NN model was obtained using the available data
in the occurrence of each phase follows, in both models, the (70% for training and 30% for validation), trained following
order Si/Ge > Al/(Si + Ge) > MSPT ≈ H2O/(Si + Ge) > the back propagation algorithm (R ) 0.3). Figure 13 shows
time, contrasting with the standardized effect observed for the observed SPCs versus the predicted ones, the averaged
the crystallinity of each phase (Figure 3), where H2O/(Si + prediction error to the test samples being in the range of
Ge) and Al/(Si + Ge) played the major roles for ITQ-21 10%.
and ITQ-30, respectively. Considering all of the predictive results based on decision
As a second step, predictive models were computed using trees and NNs, we can see in Figure 12 that the lowest Ge
the SPCs as output for the model, whereas synthesis variables content in the ITQ-21 zeolite that can be synthesized
were used as input. This approach may allow prediction of with high crystallinity is for a Si/Ge ratio of 37.5. This is
the structural properties of a material, it being possible to in agreement with previous results19 that suggest that
distinguish between the type of phase (known or unknown), ITQ-21 could be obtained for a Si/Ge ratio of 25, but not
crystallinity, framework composition, and so forth. The SPC for 50.
A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3295

Figure 15. 3D scatter plot with the first three principal components. On the left-hand side are represented the experiments corresponding to the 40% of the
entire data set used for the calculation of the eigenvectors, while on the right side, unseen materials are projected.

Table 4. Best Selected NN: MLP 3:3-10-3:1 methodology. Therefore, the stability of the approach is tested
Real Classes by reducing drastically the number of experiments that are
training set: test set: used for producing the PCA. Two different sizes, 40% and
100% recognition 96% recognition 60% of the whole available data set, have been used for the
predicted
class 1 2 3 1 2 3 calculation of the eigenvectors, and the first three principal
1 35 0 0 58 0 0 components have been kept for both analyses, see Figure
2 0 16 0 0 17 0 14. Then, the remaining unseen experimental data (60% and
3 0 0 6 1 2 9
40%, respectively) are projected into the modified space
a NN prediction performances of the obtained phase using the SPC
using the analytic definition of the selected principal
coordinates as input.
components (i.e., the first three components), see Figure 15.
This helps to fine-tune the better synthesis conditions for Then, NNs are trained using only the materials used for the
the lowest Ge-content ITQ-21 samples that will have the PCA calculations with PCA coordinates as input and phase
maximum stability and better catalytic performance. types as output. Therefore, when the coordinates of the
3.3.B. PredictiVe Modeling of Phase Type from the unseen solids are calculated through PCA axes definition,
Structural Principal Components. Finally, the correlation the NN is used in a second step to assign them a label
between SPCs and the type of structure by NN modeling corresponding to the expected phase class. Table 4 indicates
was studied. Carefulness is compulsory during this study in the recognition rates for both training and test sets consider-
order to not overfit the data but also to present a realistic ing the most drastic PCA study (i.e., 40% of the data for

Figure 16. Data mining applied in the development of new solid materials: methodology for automated data analysis, visualization, and QSPR modeling.
3296 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

component calculation). It can be argued that the NN plays different “material property” descriptors were considered as
a rather small role because the separation between classes outcome of the model, that is, crystallinity of the formed
into the PCA space is sharp. However, the results are phase, SPCs computed by PCA, or clustering results. It was
excellent, and this approach appears to be of great interest. found that the final properties of the materials could be
successfully modeled using neural networks, obtaining high-
4. Conclusions quality predictions, especially when applying SPCs as model
output.
This works shows a complete study integrating high-
throughput tools for the synthesis and characterization of This proposed methodology (see Figure 16) for unsuper-
solid materials and data-mining techniques in the discovery vised characterization analysis and subsequent predictive
and optimization of new microporous materials. The phase modeling could be applied when other material properties
diagram of the system SiO2:GeO2:Al2O3:F-:H2O:N(16) me- are to be explored or optimized, such as, for instance, acidity,
thylsparteinium hydroxide has been systematically explored fluorescence/phosphorescence, or adsorption properties, and
following a factorial design, the effect of the starting gel when other characterization techniques are employed, such
composition being determined, as well as the crystallization as RAMAN, NMR, photoluminescence spectroscopy, and
time. Two different zeolites (ITQ-21 and ITQ-30) were IR imaging. Finally, these predictive models could be used
detected within the explored space. for guiding the next experimental round, allowing one to
Data visualization and dimensional reduction were con- skip the screening of Virtually low-performing materials and
ducted by using principal components analysis and clustering promoting the synthesis of new dissimilar materials (with
algorithms, allowing extraction of the desired structural respect to the explored space) and therefore accelerating the
vectors from the XRD characterization data. These unsu- multiparametric space exploration.
pervised techniques allow the obtainment of a view of the
screening results closer to the topology of the explored Acknowledgment. Financial support from the Spanish
multidimensional space, including information about the government (Project MAT 2003-07945-C02-01 and Grants
TIC2003-07369-C02-01 and FPU AP2003-4635) and the E.U.
formed phase(s), crystallinity of the material, particle size,
Commission (TOPCOMBI Project) is gratefully acknowledged.
and isomorphic substitution degree, allowing as well the The authors thank I. Millet and J. Herrera for technical
reduction of the experimental noise of the original charac- assistance.
terization data. Moreover, the automation of this type of
analysis can be easily implemented without any prior Supporting Information Available: Details for data mining
knowledge of the problem. techniques. This material is available free of charge via the Internet
Different modeling techniques were applied for the predic- at http://pubs.acs.org.
tion of the properties of the materials obtained considering
the synthesis data as input of the model. Furthermore, CM060620K

Vous aimerez peut-être aussi