Académique Documents
Professionnel Documents
Culture Documents
© 1996-2002
http://www.acc.umu.se/~tnkjtg/Chemometrics/Editorial
OUTLINE
1.) Introduction Introduction to Statistical Experimental Design
2.) Stages in the DoE - What is it? Why and Where is it Useful?
process
3.) Analysing the resulting
experimental data Johan Trygg & Svante Wold
4.) Applications of DoE University of Queensland, Australia & Umeå University, Sweden
5.) References
6.) Appendix
7.) DoE Exercises
of the response of methods of statistical experimental If these objectives can not be put into
5. Isolated, unconnected experiments design. These methods have been fur- words, there is no reason to continue
6. Slow growth of knowledge, no map- ther refined by Yule, Box, Stu and Bill because it shows that the investigators
ping of experimental space. Hunter, Scheffe, Cox, Taguchi, and oth- don't know what to do.
ers, so that today they comprise a tool
Any measurement and experiment is box for virtually any optimization prob- 2.2 Screening (many factors)
influenced by noise. Under stable condi- lem. The basic idea is to devise a small Finding out a little about many factors.
tions, any process varies around its set of experiments, in which all pertinent Which factors are the dominating ones.
mean +/- 3 std dev. Two COST experi- factors are varied systematically. This set To assure that uncontrolled factors
ments may give two different results, but usually does not include more than ten (humidity, etc..) do not bias the results,
with no estimate (poor) of noise level. to twenty experiments. The subsequent perform the runs in random order
BUT by making a set of well planned analysis of the resulting experimental Screening experiments give information
experiments with correct analysis (DoE), data will identify the optimal conditions, about
we can separate "real effects" from noise the factors that most influence the re- • What are the important factors
and draw correct conclusions and act sults and those that do not, the presence • If we are in the correct region,
correctly. This means decreased variabil- of interactions and synergisms, and so
(ranges)
ity and quality improvement. So to inves- on. The most important aspect of statis-
tical experimental designs is that they • If there is curvature and if it
tigate systems involving several factors in
provide a strict mathematical framework masks the effects
presence of variability or noise one
for changing all pertinent factors simulta- • What to do next
needs a better strategy than that based
on changing one separate factor at a neously, and achieve this in a small num-
time and that strategy is given by experi- ber of experimental runs. Most of us can Pareto's principle states that 20 % of the
mental design (DoE). only grasp the effect of one factor at a data (factors) account for 80 % of the
time in our minds, and that leads to the information. Screening designs provide
1.2 What to do instead - Design inefficient COST approach. We need the simple models with information about
mathematics (and the computer) to keep dominating variables, and information
of Experiments (DoE)
track of the factors and their combina- about ranges. In addition, they provide
• Which factors have a real influ- tions. few experiments / factor which means
ence on the response? • All factors are varied together that relevant information is gained in
• What are the best settings of the over a set of experimental runs only a few experiments.
factors to achieve optimal condi- • Noise is decreased by means of Linear models and interaction models
tions for best performance on a are sufficient, since we are only inter-
averaging
system? ested in the effects. We merely ask, if a
• The functional space is efficiently
• What are the predicted values of factor does influence the response, not
mapped, interactions and syner- how.
the responses for given settings
gisms are seen
of the factors in a model?
In 1925 Fisher started the development
http://www.acc.umu.se/~tnkjtg/Chemometrics/Editorial Page 3
and optimization (few factors) Usually a fractional factorial (see section tions to Latent Structures (PLS)
After screening, the goal of the investiga- 2.2.3 in this editorial) or Plackett- PLS is one of the most common meth-
tion is usually to create a valid map of Burman design is used. ods for analyzing multivariate data where
the experimental domain (local space) a quantitative relationship between a
given by the significant factors and their 3. Analysing the resulting descriptor matrix X and a response
ranges. This is done with a quadratic matrix Y is sought. The PLS model can
polynomial model. The higher order
experimental data be expressed by:
models has an increased complexity and After the planning stage, when the set of
therefore also requires more experi- experiments are laid out according to a Model of X: X = TPT+E
ments / factor than screening designs. statistical design, the planned experi- Model of Y: Y = TCt +F
Different types of RSM designs ments are made, either in parallel, or
• Three level factorial designs one after another. Each experiment gives PLS contains MLR as a special case, then
- See http://www.itl.nist.gov/ results, i.e. values of the response vari- the PLS regression coefficients and the
div898/handbook/pri/section3/ ables. Thereafter, these data are ana- MLR coefficients are identical.
pri339.htm for more info. lysed by means of multiple regression, or
• Central composite designs generalisations thereof such as the PLS 3.3 Orthogonal projections to
(CCD) and O-PLS methods (see earlier Editori- latent structures (O-PLS)
- See http://www.itl.nist.gov/ als 2002). This gives a model relating the The recent O-PLS methods [O-PLS and
div898/handbook/pri/section3/ factors to the results, showing which O2-PLS] (see Editorial from April 2002
pri3361.htm for more info. factors are important, and how they on O-PLS), are improved modifications
combine in influencing the results. The of the NIPALS PLS algorithm. The devel-
• Box Behnken designs
model is then used to make predictions, opment of O-PLS has, like the orthogo-
- See http://www.itl.nist.gov/ e.g. how to set the factors to achieve
div898/handbook/pri/section3/ nal signal correction (OSC) filters
desired (optimal) results.The fitted (March 2002 Editorial), been driven by
pri3362.htm for more info. model is reviewed by…
• D-optimal designs the large amount of non-correlated
• Examining the coefficients and variation present in the data sets today,
- See http://www.itl.nist.gov/ their 95% confidence interval, or especially in a multivariate calibration
div898/handbook/pri/section5/ normal probability plots of ef- situation. The interpretational ability of
pri521.htm for more info. fects and interactions. the other inverse regression models
• Examining the ANOVA table, (PLS, PCR, MLR) largely depends on the
2.5 Robustness testing [read more
check for curvature degree of systematic orthogonal varia-
in Ref 8]
• Plotting residuals, normal prob- tion in X with regards to Y.
In robustness testing of, for instance, an
ability plot of residuals, and run
analytical method, the aim is to explore The basic idea of O2-PLS is to divide the
order residuals
how sensitive the responses are to small systematic part in X and Y into two
changes in the factor settings. Ideally, a • Checking for the optimal trans-
formation of the response parts, one which is related to X and Y,
robustness test should show that the and one that is not. For each matrix, the
responses are not sensitive to small through the use of the Box Cox
plot. latter is computed in a way that makes it
fluctuations in the factors, that is, the orthogonal to the other matrix, i.e. com-
results are the same for all experiments. • Conclusions:
pletely independent. If X or Y contains
Robustness testing is usually applied as 1. Select dominating factors
strong but irrelevant variation, O2-PLS
the last test just before the release of a 2. Check and modify ranges
improves the interpretational ability of
product or a method. When performing 3. Look for curvature
the parameters in the model, e.g. score
a robustness test of a method, the ob- plots, loading plots compared to the
jective is 3.1 Multiple Linear Regression MLR and PLS methods (and other meth-
• to ascertain that the method is (MLR) ods with similar properties such as ridge
robust to small fluctuations in Traditionally, the most frequently used regression).
the factor levels, method for finding the regression coeffi- Thus the O2-PLS model can be written
• and, if non-robustness is de- cients b is the ordinary least squares as a factor analysis model, where some
tected… method where: factors (T) are common to both X and
• to understand how to alter the y=Xb+f Y;
bounds of the factors so that b=(XTX)-1XTy
robustness may still be claimed. This minimizes the residuals (fTf), which X model: X = TWT + TY-orthoPY-orthoT +E
is equivalent to maximizing the fit to y. Y model: Y = UCT + UX-orthoPX-orthoT + F
Robustness is achieved when the de- With MLR, the coefficients of the model Prediction of Y: Yhat = TCT
signer understands these potential are computed to minimize the sum of
sources of variation and takes steps to the squares of the residuals. In order to 3.4 ANOVA - Analysis of Vari-
desensitize the product to them. G. estimate b, MLR requires that the X- ance
Taguchi, a Japanese engineer, had a big variables must be linearly independent
ANOVA breaks up sums of squares in
effect on quality control and experimen- ((XTX) of full rank). It is important to
components and compares their size
tal design in the 1980s and 1990s. The also note that MLR fits one response at
with F-test.
Taguchi Methods (see http:// a time and hence assumes them to be
www.stat.rutgers.edu/~buyske/591/ independent.
SS_Y = SS_Regression + SS_Residual
lect10.pdf for more info.) is a well SS_Residual =
known strategy in robustness testing. 3.2 Partial Least Squares Projec-
SS_lack_of_fit + SS_pure_error
http://www.acc.umu.se/~tnkjtg/Chemometrics/Editorial Page 5
Drug industry
1. Pharmaceutics, formulation for
drug release, hardness of pills,…
2. Organic chemistry, synthesis,
drug design, …
3. Analytical chemistry, Separation
[HPLC, …], resolution, speed.
4. Pharmacology
5. Process optimization and con-
trol, synthesis, fermentation,
separations, …
Process industry
1. Process optimization and control
(yield, purity, through put time,
pollution, energy consumption)
2. Product quality and performance
(material strength, warp, color,
taste, odour)
3. Product stability versus process
variation
http://www.acc.umu.se/~tnkjtg/Chemometrics/Editorial Page 6
Appendix
Model Parameters
Goodness of fit statistics, information
about model adequacy
PRESS, Predicted Residual Sum of
Squares:Sum of squared differences
between predicted and observed y-
values (over all rounds)
Wavelets in Chemometrics
May 2002 - compression, denoising and feature extraction
Johan Trygg
How to create an OSC filter for PLS and end up with a new generic modelling
April 2002 method, O-PLS
Johan Trygg