Académique Documents
Professionnel Documents
Culture Documents
This work shows a methodology for the synthesis of self-assembled organic-inorganic materials which
integrates high-throughput tools for the synthesis and characterization of solid materials and data-mining
techniques in materials science. This is illustrated by a detailed exploration of the hydrothermal synthesis
in the system SiO2:GeO2:Al2O3:F-:H2O:N(16) methylsparteinium. Data analysis and dimensional reduction
were conducted by using principal components analysis and clustering algorithms, allowing the definition
of a new and suitable structural vector which summarizes the X-ray diffraction characterization data as
well as an improvement of data visualization and interpretation. Different modeling techniques were
applied for the prediction of the properties of the materials considering the synthesis descriptors as input
of the model. Furthermore, different “material property” descriptors were considered as outcome of the
model, that is, the crystallinity of the formed phases, structural principal components computed by principal
component analysis, or clustering results. It was found that the final properties of the materials could be
successfully modeled using artificial neural networks and decision trees.
large cavities. This zeolite was synthesized using a large and Table 1. Levels and Ranges of Synthesis Factors Employed in the
Experimental Design
rigid structure-directing agent, N(16)-methylsparteinium
(MSTP), and the directing effect of Ge toward the formation variation ranges
of structures containing double four rings seems decisive for number
the synthesis of ITQ-21.19 Zeolite ITQ-3020 is a new structure level level 1 level 2 level 3 level 4
of the MWW family, which is more closely related to MCM- time (days) 2 1 5
Si/Ge 4 15 20 25 50
5621 but with clearly different X-ray diffraction (XRD) Al/(Si + Ge) 3 0.02 0.04 0.067
features. The thermal and hydrothermal stability of zeolites MSPT/(Si + Ge) 2 0.25 0.5
increases as the germanium content decreases. Furthermore, F/(Si + Ge) 2 0.25 0.5
H2O/(Si + Ge) 3 2 5 10
it is important for catalytic applications to find out the
synthesis conditions in which fully crystalline samples of
ITQ-21 could be obtained with the lowest amount (or none)
of Ge and the highest acidity [determined by the (Si + Ge)/
Al ratio].
Classical designs of experiments (DoE),22 like factorial
or combination designs, have been applied successfully, when
exploring the synthesis gel conditions aimed at the discovery
of new zeolites or the optimization of existing ones.23-25 It
is clear that the synthesis variables should be carefully
selected in order to cover the largest part of the most
promising parameter space, while keeping the total number
of experiments at a reasonable and feasible level. Moreover,
the HT methods currently applied for parallel hydrothermal
synthesis strongly constrain how the synthesis parameters
can be experimentally studied. For instance, when using
autoclave arrays (multiautoclaves with 15-96 wells), the
intensive exploration of crystallization temperature and time
is restricted. Therefore, DoE strategies should be developed
Figure 1. Phase diagram showing the occurring materials as a function of
which consider the specific aspects of HT methods in this the five synthesis variables (starting gel molar ratios and crystallization
field, while minimizing the number of experiments. On the time).
basis of the data analysis/mining methodology applied in this
work, we propose a new mapping/exploration approach for ) 144) was selected for studying simultaneously the concentrations
reducing the screening of low-promise conditions, within the of the components in the starting gel, that is, Al/(Si + Ge), MSPT/
multivariate synthesis spaces found in microporous systems. (Si + Ge), F-/(Si + Ge), and Si/Ge molar ratios, as well as the
crystallization time. Table 1 shows the values and levels considered
for the different variables. For experimental details, see the
2. Experimental Section and the Design of
Supporting Information.
Experiments
Different data-mining techniques have been applied to extract
A detailed exploration of the hydrothermal synthesis in system knowledge about the relationships between synthesis conditions and
SiO2:GeO2:Al2O3:F-:H2O:MSPT has been performed, to understand the occurrence of different zeolite phases, minimizing the human
the influence of these factors on the growth of ITQ-21 and ITQ-30, participation in the analysis of the great amount of data generated.
at 175 °C under static conditions. Parallel syntheses were developed Furthermore, the advantages of data-mining techniques when
using a robotic system and 15-fold Teflon-lined stainless steel processing, visualizing, and interpreting this type of nonlinear data
autoclaves for the crystallization.25 Crystallinity was measured by have been shown. In this sense, three issues are key in our
means of XRD, using a multisample Phillips X’Pert diffractometer methodology: (i) the analysis and extraction of knowledge (i.e.,
employing Cu KR radiation. A factorial experimental design (4.32.22 Pareto analysis and data visualization techniques), (ii) a reduction
of the complexity/dimensionality of the problem, minimizing the
(19) Blasco, T.; Corma, A.; Dı́az-Cabañas, M. J.; Rey, F.; Rius, J.; Sastre, information loss (i.e., clustering analysis and principal component
G.; Vidal-Moya, J. A. J. Am. Chem. Soc. 2004, 126, 13414-13423. analysis, PCA), and (iii) modeling, enabling one to make a priori
(20) Corma, A.; Dı́az-Cabañas, M. J.; Moliner, M.; Martı́nez, C. Discovery predictions (i.e., classification trees and neural networks, NNs).
of a new catalytically active and selective zeolite (ITQ-30) by high-
throughput synthesis techniques. J. Catal. in press. Moreover, this approach combining diverse data-mining techniques
(21) Fung, A. S.; Lawton, S. L.; Roth, W. J. U.S. Patent 5 362 697, 1994, has been shown as a realistic way of statistically treating data from
to Mobil Oil Corp. materials science. At last, we have used the NN model based on
(22) Montgomery, D. C. Design and Analysis of Experiments, 4th ed.; John
Wiley & Sons Inc.: New York, 1997. ITQ-21 crystallinity to minimize the germanium content present
(23) Tagliabue, M.; Carluccio, L. C.; Ghisletti, D.; Perego, C. Catal. Today in the final structure, to increase its thermal stability, while
2003, 81, 405-412. maintaining high crystallinity. More details for data-mining tech-
(24) Holmgren, J.; Bem, D.; Bricker, M. L.; Gillespie, R. D.; Lewis, G.;
Akporiaye, D.; Dahl, I.; Karlsson, A.; Plassen, M.; Wendelbo, R.
niques are described in the Supporting Information.
Proceedings of the 13th International Zeolite Conference; Montpellier,
France, July 8-13, 2001; Galarneau, A., Di Renzo, F., Fajula, F., 3. Results and Discussion
Vedrine, J., Eds.; Stud. Surf. Sci. Catal. 2001, 135, 461.
(25) Moliner, M.; Serra, J. M.; Corma, A.; Argente, E.; Valero, S.; Botti,
V. Microporous Mesoporous Mater. 2005, 78, 73-81. 3.1. Screening Results: Phase Diagram. Figure 1 shows
(26) Lobo, R. F.; Davis. M. E. Microporous Mater. 1994, 3, 61. the phase diagram obtained following the factorial design
A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3289
Figure 3. Standardized Pareto chart for ITQ-21 and ITQ-30 formation, showing the effect of the different synthesis factors on the crystallinity of each
zeolite. The length of each bar displayed in the frequency histogram is proportional to the absolute value of its associated estimated effect.
described above. ITQ-21, ITQ-30, and amorphous material (Si + Ge) and F/(Si + Ge) play a positive role in the
were obtained in the explored space. The standard X-ray formation of ITQ-21 and ITQ-30. However, some important
diffractograms for each crystalline phase are shown in Figure differences can be observed when comparing the analyses
2. Automatic calculation of the occurrence and crystallinity for ITQ-21 and ITQ-30. On one hand, the relative importance
was done integrating the area of the characteristic peaks for of MSPT/(Si + Ge) and F/(Si + Ge) is higher for ITQ-30,
each phase and referring this to the fully crystalline materials. because only in a few small zones can this material be
For ITQ-21, the integrated area is comprised of a 2θ angle obtained with the minimum content of MSPT/(Si + Ge) and
between 25.4 and 27.2°, and for ITQ-30, the range is between F/(Si + Ge). On the other hand, Si/Ge appears as an
24.6 and 25.4°. Because ITQ-30 also presents diffraction important negative factor for ITQ-21 samples, while it
peaks in the 25.4-27.2° region, the percentage of ITQ-30 becomes slightly positive for ITQ-30 samples. This result
is subtracted considering the crystallinity measured from the has to be understood as a penalization for the growth of ITQ-
peak located at 25.0°. Considering the crystallinity of the 21 when increasing the Si/Ge ratio, because the crystallinity
synthesized materials, three different groups have been decreases but also some syntheses change to ITQ-30. This
created. A material is qualified as “amorphous” if both the reason can be applied for the slight benefit of Si/Ge for ITQ-
ITQ-21 and ITQ-30 crystallinities are below 20%. “ITQ- 30, taking into account a balance between the loss of
21” is defined as a material for which the ITQ-21 crystallinity crystallinity and the appearance of new ITQ-30 points.
is higher than 20% and ITQ-30 below 20%. If the ITQ-30 However, ITQ-21 samples appear with a lower Si/Ge content.
crystallinity is greater than 20%, the material is noted as Finally, the relative influence of time for these materials is
“ITQ-30”. quite different, being much more important in the case of
A first approach using Pareto analysis shows in Figure 3 ITQ-30 than in that of ITQ-21. This effect of time could be
the relative influence of each synthesis factor over the understood as a retransformation process of ITQ-21, in such
crystallinity of ITQ-21 and ITQ-30 samples. In this chart, a way that ITQ-30 can only be obtained in 1 day if it is
the length of each bar is the estimated effect divided by its worked with the maximum levels of MSPT/(Si + Ge) and
standard error, which is equivalent to computing a t statistic F/(Si + Ge) and the minimum level of Al/(Si + Ge).
for each effect. The vertical line on the plot means that bars 3.2. Analysis and Knowledge Extraction from HT
which extend beyond the line correspond to effects that are Experimental Data. In this section, different techniques of
statistically significant at the 95% confidence level. This unsupervised analysis will be applied to the original data
statistical way of understanding the results allows quantifica- set derived from the XRD characterization of the whole set
tion of the hypothetical weight of the factors in the growth of samples, allowing an improvement in data visualization,
of materials. Both ITQ-21 and ITQ-30 seem to be quite classification, and the ulterior knowledge extraction. Indeed,
influenced in a negative sense by water and aluminum structural vectors will be computed from the raw character-
content; that is, the more water or the higher Al/(Si + Ge), ization data by means of dimensional reduction and analysis
the less crystalline are the samples. Afterwards, MSPT/ techniques, that is, clustering algorithms and PCA.
3290 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.
Figure 4. Tree diagram (dendrogram) showing the Euclidean distances between the different clusters and subclusters.
Figure 7. Averaged XRD diffractogram for the three clusters obtained by k-means analysis.
Figure 10. Identification of different structural properties in the SPC space for ITQ-21 samples: correlation between SPC#3 and Si/Ge in the starting gel,
for three different degrees of crystallinity. Inset: Partial diffractograms corresponding to four samples with different Si/Ge ratios and the same crystallinity
(20%), showing the peak shift.
A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3293
Figure 12. Decision tree ID3-IV obtained using synthesis descriptors as model input and phase clusters as output. [The importance of each factors as
follows: Si/Ge 100%, Al/(Si + Ge) 79%, MSTP/(Si + Ge) 72%, H2O/(Si + Ge) 70%, and crystallization time 38%.] The initial data partition called the
initial branch or root encompasses all data records. This root is split into subsets or child branches, on the basis of the value of a particular input field, which
may in turn be split again into sub-branches and so on.
3294 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.
Figure 13. NN prediction performance of the SPC using the synthesis factors as input. The correlation factor for the crystallinity of ITQ-21 and ITQ-30
is 0.960 and 0.958, respectively. The inset shows the topology of the best NN.
Figure 14. Eigenvalues for two different data set sizes: on the left-hand side, 60% of the whole available amount of experiments is considered, while on
the right side, only 40% is used for the calculation of the eigenvectors.
experimental crystallinity achieved was slightly lower than output is well-suited when the aims of the exploration are
expected, being for the samples close to 50, as can be shown both the discovery of new structures and the optimization
in Figure 11 (filled squares). of a determined feature when competing phases are also
Subsequently, predictive models based on decision trees formed. Given that synthesis variables have been shown as
and NNs were computed using just the type of formed the main factors in the growth of both ITQ-21 and ITQ-30
material as output data. Figure 12 shows the best decision by the Pareto analysis, and bearing in mind that SPCs are
tree found, describing successfully the type of material strongly correlated with the type of material formed, its
formed as a function of the synthesis variables. Table 3 crystallinity, and its framework composition, there is no
compares the prediction performance of the NN and decision doubt about the existence of clear relationships between
tree models, with very high accuracy, although the NN model synthesis descriptors and SPCs. Following this approach, an
is slightly better. The relative importance of each input factor accurate NN model was obtained using the available data
in the occurrence of each phase follows, in both models, the (70% for training and 30% for validation), trained following
order Si/Ge > Al/(Si + Ge) > MSPT ≈ H2O/(Si + Ge) > the back propagation algorithm (R ) 0.3). Figure 13 shows
time, contrasting with the standardized effect observed for the observed SPCs versus the predicted ones, the averaged
the crystallinity of each phase (Figure 3), where H2O/(Si + prediction error to the test samples being in the range of
Ge) and Al/(Si + Ge) played the major roles for ITQ-21 10%.
and ITQ-30, respectively. Considering all of the predictive results based on decision
As a second step, predictive models were computed using trees and NNs, we can see in Figure 12 that the lowest Ge
the SPCs as output for the model, whereas synthesis variables content in the ITQ-21 zeolite that can be synthesized
were used as input. This approach may allow prediction of with high crystallinity is for a Si/Ge ratio of 37.5. This is
the structural properties of a material, it being possible to in agreement with previous results19 that suggest that
distinguish between the type of phase (known or unknown), ITQ-21 could be obtained for a Si/Ge ratio of 25, but not
crystallinity, framework composition, and so forth. The SPC for 50.
A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3295
Figure 15. 3D scatter plot with the first three principal components. On the left-hand side are represented the experiments corresponding to the 40% of the
entire data set used for the calculation of the eigenvectors, while on the right side, unseen materials are projected.
Table 4. Best Selected NN: MLP 3:3-10-3:1 methodology. Therefore, the stability of the approach is tested
Real Classes by reducing drastically the number of experiments that are
training set: test set: used for producing the PCA. Two different sizes, 40% and
100% recognition 96% recognition 60% of the whole available data set, have been used for the
predicted
class 1 2 3 1 2 3 calculation of the eigenvectors, and the first three principal
1 35 0 0 58 0 0 components have been kept for both analyses, see Figure
2 0 16 0 0 17 0 14. Then, the remaining unseen experimental data (60% and
3 0 0 6 1 2 9
40%, respectively) are projected into the modified space
a NN prediction performances of the obtained phase using the SPC
using the analytic definition of the selected principal
coordinates as input.
components (i.e., the first three components), see Figure 15.
This helps to fine-tune the better synthesis conditions for Then, NNs are trained using only the materials used for the
the lowest Ge-content ITQ-21 samples that will have the PCA calculations with PCA coordinates as input and phase
maximum stability and better catalytic performance. types as output. Therefore, when the coordinates of the
3.3.B. PredictiVe Modeling of Phase Type from the unseen solids are calculated through PCA axes definition,
Structural Principal Components. Finally, the correlation the NN is used in a second step to assign them a label
between SPCs and the type of structure by NN modeling corresponding to the expected phase class. Table 4 indicates
was studied. Carefulness is compulsory during this study in the recognition rates for both training and test sets consider-
order to not overfit the data but also to present a realistic ing the most drastic PCA study (i.e., 40% of the data for
Figure 16. Data mining applied in the development of new solid materials: methodology for automated data analysis, visualization, and QSPR modeling.
3296 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.
component calculation). It can be argued that the NN plays different “material property” descriptors were considered as
a rather small role because the separation between classes outcome of the model, that is, crystallinity of the formed
into the PCA space is sharp. However, the results are phase, SPCs computed by PCA, or clustering results. It was
excellent, and this approach appears to be of great interest. found that the final properties of the materials could be
successfully modeled using neural networks, obtaining high-
4. Conclusions quality predictions, especially when applying SPCs as model
output.
This works shows a complete study integrating high-
throughput tools for the synthesis and characterization of This proposed methodology (see Figure 16) for unsuper-
solid materials and data-mining techniques in the discovery vised characterization analysis and subsequent predictive
and optimization of new microporous materials. The phase modeling could be applied when other material properties
diagram of the system SiO2:GeO2:Al2O3:F-:H2O:N(16) me- are to be explored or optimized, such as, for instance, acidity,
thylsparteinium hydroxide has been systematically explored fluorescence/phosphorescence, or adsorption properties, and
following a factorial design, the effect of the starting gel when other characterization techniques are employed, such
composition being determined, as well as the crystallization as RAMAN, NMR, photoluminescence spectroscopy, and
time. Two different zeolites (ITQ-21 and ITQ-30) were IR imaging. Finally, these predictive models could be used
detected within the explored space. for guiding the next experimental round, allowing one to
Data visualization and dimensional reduction were con- skip the screening of Virtually low-performing materials and
ducted by using principal components analysis and clustering promoting the synthesis of new dissimilar materials (with
algorithms, allowing extraction of the desired structural respect to the explored space) and therefore accelerating the
vectors from the XRD characterization data. These unsu- multiparametric space exploration.
pervised techniques allow the obtainment of a view of the
screening results closer to the topology of the explored Acknowledgment. Financial support from the Spanish
multidimensional space, including information about the government (Project MAT 2003-07945-C02-01 and Grants
TIC2003-07369-C02-01 and FPU AP2003-4635) and the E.U.
formed phase(s), crystallinity of the material, particle size,
Commission (TOPCOMBI Project) is gratefully acknowledged.
and isomorphic substitution degree, allowing as well the The authors thank I. Millet and J. Herrera for technical
reduction of the experimental noise of the original charac- assistance.
terization data. Moreover, the automation of this type of
analysis can be easily implemented without any prior Supporting Information Available: Details for data mining
knowledge of the problem. techniques. This material is available free of charge via the Internet
Different modeling techniques were applied for the predic- at http://pubs.acs.org.
tion of the properties of the materials obtained considering
the synthesis data as input of the model. Furthermore, CM060620K