Académique Documents
Professionnel Documents
Culture Documents
12/7/2010
Summary
This document describes the construction and analysis of designs that include both quantitative
and categorical factors. The DOE Wizard facilitates the construction and analysis of such designs
by:
Example
As an example, an experiment involving 3 factors will be considered, similar to that described by
Box, Hunter and Hunter (2005). An investigation was conducted in a pilot plant to study the
effect of three factors:
Y: yield (%)
Had there been only 2 levels of X3, that factor could have been handled as a quantitative factor
via a single indicator variable taking the value -1 for one type of catalyst and +1 for the other.
However, the 3 levels of catalyst make it more appropriate to handle that factor as a true
categorical variable.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 1
STATGRAPHICS – Rev. 12/7/2010
Design Creation
To begin the design creation process, start with an empty StatFolio. Select DOE – Experimental
Design Wizard to load the DOE Wizard’s main window. Then push each button in sequence to
create the design.
The first step of the design creation process displays a dialog box used to specify the response
variables. For the current example, there is a single response variable:
Impact: The relative importance of each response (not relevant if only one response).
Sensitivity: The importance of being close to the best desired value (in this case, the
Maximum). Setting Sensitivity to Medium implies that the desirability attributed to the
response rises linearly between the Minimum and Maximum values indicated.
The second step displays a dialog box used to specify the factors that will be varied. In the
current example, there are 3 factors:
Type – The first 2 factors are Continuous, while the third factor is Categorical.
Levels – a list of the levels at which the categorical factor will be run, separated by commas.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 3
STATGRAPHICS – Rev. 12/7/2010
Since all of the factors are controllable process factors, only one Options button is enabled.
Pressing that button displays a second dialog box:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 4
STATGRAPHICS – Rev. 12/7/2010
Levels – the number of levels at which each factor should be run. This can only be
changed for the continuous factors.
Replicate design - if a number other than 0 is entered, the entire design will be repeated
the indicated number of times.
Randomize - check this box to randomly order the runs in the experiment.
Randomization is generally a good idea, since it can reduce the effect of lurking variables
such as trends over time. However, when replicating the examples in this documentation,
do not randomize the designs.
For the current experiment, a 3x3x3 factorial design with 27 runs has been selected. This design
leaves 15 degrees of freedom available for estimating the experimental error.
The tentatively selected design is displayed in the Select Design dialog box:
If the design is acceptable, press OK to save it to the STATGRAPHICS DataBook and return to
the DOE Wizard’s main window, which should now contain a summary of the design:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 5
STATGRAPHICS – Rev. 12/7/2010
Before evaluating the properties of the design, a tentative model must be specified. Pressing the
fourth button on the DOE Wizard’s toolbar displays a dialog box to make that choice:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 6
STATGRAPHICS – Rev. 12/7/2010
The default model includes main effects for each of the 3 experimental factors, interactions
between each pair of factors, and quadratic terms for the continuous factors. Selected terms
could be excluded by double-clicking on them with the left mouse button.
The basic design that was constructed has a total of 27 runs, leaving 15 degrees of freedom to
estimate the experimental error. If each run was very expensive, a smaller design might be
desired. To reduce the number of runs, press the button labeled Step 5: Select runs to display the
following dialog box:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 7
STATGRAPHICS – Rev. 12/7/2010
In the bottom left is a field where the desired number of runs should be specified. As indicated,
the default model has 12 coefficients. It is usually a good idea to select at least 3 more runs than
there are coefficients in the selected model.
To select the runs, press either of the 2 buttons on the dialog box. Since the number of ways of
choosing subsets of the candidate runs is too large to check all possibilities, STATGRAPHICS
(like other programs) uses a selection algorithm to choose a subset. The Forward method begins
with the runs that have already been performed (if any) and adds runs one at a time, adding at
each step the run that adds the most to the D-efficiency of the experiment. The Backward method
begins with all of the candidate runs and removes runs one at a time, removing at each step the
run that adds the least to the D-efficiency of the experiment. In either case, once the desired
number of runs has been selected, an exchange algorithm can be performed. This algorithm tests
all pairs of runs consisting of one that has been selected and one that has not, making any
exchanges that would increase the efficiency of the experiment. Exchanges continue until no
further improvements can be made by switching one run that’s been selected with one run that
has not been selected.
For the example, the program was asked to find 17 runs using backward selection with the
exchange algorithm. When the algorithm is complete, the selected rows will be highlighted in
red:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 8
STATGRAPHICS – Rev. 12/7/2010
The efficiencies of the selected design will also be displayed. You can try another algorithm or
press OK to accept the selection, at which point the rows of the datasheet will be reduced to the
selected runs:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 9
STATGRAPHICS – Rev. 12/7/2010
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 10
STATGRAPHICS – Rev. 12/7/2010
If the selection is acceptable, press Step 7: Save experiment to save the reduced number of runs.
You can also use the Design Plot to display the final design:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 11
STATGRAPHICS – Rev. 12/7/2010
catalyst
40
A 36
32
160 28
164 168 24 concentration
172 176 20
180
temperature
For each catalyst, runs are performed at the 4 combinations of low and high temperature and low
and high concentration. A run is also performed with catalyst B at a middle level of the
quantitative factors. For catalyst C, 4 star points are added to estimate the quadratic effects of
temperature and concentration.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 12
STATGRAPHICS – Rev. 12/7/2010
Design Properties
Several of the selections presented when pressing button #6 are helpful in evaluating the selected
design:
Design Worksheet
The design worksheet shows the 17 runs that have been selected, in the order they are to be run:
Worksheet for <untitled> - Design with both quantitative and categorical factors
run temperature concentration catalyst yield
degrees C % type %
1 160.0 20.0 A
2 160.0 20.0 B
3 160.0 20.0 C
4 160.0 30.0 C
5 160.0 40.0 A
6 160.0 40.0 B
7 160.0 40.0 C
8 170.0 20.0 C
9 170.0 30.0 B
10 170.0 40.0 C
11 180.0 20.0 A
12 180.0 20.0 B
13 180.0 20.0 C
14 180.0 30.0 C
15 180.0 40.0 A
16 180.0 40.0 B
17 180.0 40.0 C
ANOVA Table
The ANOVA table shows the breakdown of the degrees of freedom in the design:
ANOVA Table
Source D.F.
Model 11
Total Error 5
Lack-of-fit 5
Pure error 0
Total (corr.) 16
The StatAdvisor
The ANOVA table shows the degrees of freedom that will be available for estimating experimental error. Two estimates are
commonly used: total error, which includes degrees of freedom that could have been used to estimate effects that are not in the
current model, and pure error which comes only from replicated runs. In this case, the total error has 5 degrees of freedom,
while there are 0 degrees of freedom for pure error. In general, it's a good idea to have at least three or four error degrees of
freedom available when testing the statistical significance of estimated effects. Otherwise, the statistical tests will have very
little power.
11 of the 16 total degrees of freedom are used to estimate the main effects, quadratic effects, and
two-factor interactions.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 13
STATGRAPHICS – Rev. 12/7/2010
Model Coefficients
Model Coefficients
The coefficients displayed are based on a standardized model in which the quantitative factors
are coded as -1 when at their low level and +1 when at their high level. For categorical factors at
k levels, k - 1 indicator variables are created according to:
Xk-1 = -1 for level 1, 1 for level k, and 0 for all other levels
The variance inflation factors (VIF) indicate the extent to which the variance of each estimate is
inflated due to the non-orthogonality of the selected runs. In this case, the inflation is minor.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 14
STATGRAPHICS – Rev. 12/7/2010
Once the experiment has been created and any additional runs entered, it must be saved on disk.
Press the button labeled Step 7 and select a name for the experiment file:
Design files are extended data files and have the extension .sgx. They include the data together
with other information that was entered on the input dialog boxes.
To reopen an experiment file, select Open Data File from the File menu. The data will be loaded
into the datasheet, and the Experimental Design Wizard window will be displayed.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 15
STATGRAPHICS – Rev. 12/7/2010
Important Notes:
1. If more than one sample was taken at each set of experimental conditions, the data values
should be entered into data tables B through Z. The summary statistics in data table A
will then be automatically calculated from the other tables. Do not treat the samples as
replicates unless you actually reset the process between each sample.
2. If any experiments were not performed, leave the corresponding cell blank. The program
will recognize the imbalance in the design and handle it.
3. If any experimental runs were done at conditions different than originally planned,
change the entries in the experimental factor columns to correspond to the values that
were actually used.
4. If additional runs were performed, you may add them to the bottom of the datasheet.
They will be included in the fit.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 16
STATGRAPHICS – Rev. 12/7/2010
Step #8: Analyze data
Once the data have been entered, press the button labeled Step #8 on the Experiment Design
Wizard toolbar. This will display a dialog box listing each of the response variables:
If more than one response has been measured, you should repeat this step once for each response.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 17
STATGRAPHICS – Rev. 12/7/2010
Analysis Summary
When a response is analyzed, a new window is created providing numerous tables and graphs
summarizing the summary. The pane in the upper left corner of the window displays an Analysis
Summary:
Number of runs: 17
Analysis of Effects
Categorical factors:
C=catalyst (type)
Quantitative factors:
A=temperature (degrees C)
B=concentration (%)
Source Sum of Squares Df Mean Square F-Ratio P-Value
A 1807.0 1 1807.0 574.463 0.0000
B 80.4834 1 80.4834 25.5864 0.0039
C 10.1939 2 5.09694 1.62036 0.2868
AA 0.803512 1 0.803512 0.255444 0.6348
AB 0.1875 1 0.1875 0.059608 0.8168
AC 217.361 2 108.681 34.5506 0.0012
BB 6.18395 1 6.18395 1.96593 0.2198
BC 3.28464 2 1.64232 0.522109 0.6224
Analysis of Variance: a decomposition of the sum of squares for the response variable into
components for the model and for the residuals. The F-test tests the statistical significance of
the model as a whole. A small P-value (less than 0.05 if operating at the 5% significance
level) indicates that at least one factor in the model is significantly related to the dependent
variable. In the current example, the model is highly significant.
o R-squared - represents the percentage of the variability in the response variable which
has been explained by the fitted regression model, ranging from 0% to 100%.
o Adjusted R-Squared – the R-squared statistic, adjusted for the number of coefficients
in the model. This value is often used to compare models with different numbers of
coefficients.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 18
STATGRAPHICS – Rev. 12/7/2010
o Standard Error of Est. – the estimated standard deviation of the residuals (the
deviations around the model). This value is used to create prediction limits for new
observations.
o Mean Absolute Error – the average absolute value of the residuals.
o Durbin-Watson Statistic – a measure of serial correlation in the residuals. If the
residuals vary randomly, this value should be close to 2. A small P-value indicates a
non-random pattern in the residuals. For data recorded over time, a small P-value
could indicate that some trend over time has not been accounted for. In the current
example, the P-value is less than 0.05, so there is may be some serial correlation in
the residual.
o Lag 1 residual autocorrelation: a measure of the serial correlation between
consecutive residuals on a scale ranging from -1 to +1.
Analysis of Effects: decomposition of the model sum of squares into components for each
term in the fitted statistical model, including main effects (such as A), two-factor interactions
(such as AB), and quadratic effects (such as AA). Based on the settings specified on the
Analysis Options dialog box, either Type III or Type I sums of squares are displayed. The
sums of squares test the marginal significance of each factor, assuming it was the last to be
entered into the model. Small P-values indicate significant effects. In this example, three
effects have P-values less than 0.05 and are thus statistically significant at the 5% level.
In many cases, it may be desirable to remove insignificant effects from the model. This is done
by selecting Analysis Options, which displays the dialog box shown below:
Double-click on an effect to move it from the Include field to the Exclude field or vice versa.
Then press OK to refit the model. For the sample data, removing the 4 effects indicated above
yields a simpler model in which all remaining effects are statistically significant:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 19
STATGRAPHICS – Rev. 12/7/2010
Analysis of Effects
Categorical factors:
C=catalyst (type)
Quantitative factors:
A=temperature (degrees C)
B=concentration (%)
Source Sum of Squares Df Mean Square F-Ratio P-Value
A 1807.0 1 1807.0 702.657 0.0000
B 81.6029 1 81.6029 31.7314 0.0002
C 8.78007 2 4.39004 1.70707 0.2302
AC 217.361 2 108.681 42.2607 0.0000
Included in the summary are the P-value for the fitted model and the R-squared statistics.
Pareto Chart
The analysis window summarizes the contribution of each effect to the overall variability in the
response using a Pareto chart:
Sig. at 5%
A
Not sig.
AC
0 20 40 60 80 100
Contribution to variation (%)
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 20
STATGRAPHICS – Rev. 12/7/2010
The length of each bar equals the contribution of an effect to the overall variation in the
response, where an effect’s contribution is calculated by dividing its sum of squares by the total
corrected sum of squares from the ANOVA table. The color of each bar indicates whether an
effect is statistically significant at the indicated significance level.
Means Table
The analysis window also displays the estimated mean response at different locations within the
design space:
The line labeled Grand mean shows the estimated mean at the center of the design space, where:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 21
STATGRAPHICS – Rev. 12/7/2010
1. Each quantitative factor is set halfway between its low and high levels.
2. For each categorical factor, the results are averaged over all levels of that factor.
Estimated means are also shown for each level of the experimental factors and for combinations
of pairs of factors.
In addition to the means, the table also shows the estimated standard error for each mean and
interval estimates. The Pane Options dialog box selects the type of limits displayed:
1. Confidence limits: a separate confidence interval is displayed for each mean, calculated
using Student’s t distribution according to
Yˆ t / 2,n p s Yˆ (1)
where Ŷ is the predicted mean response, s Yˆ is the estimated standard error, n is the
number of observed runs, and p is the number of coefficients in the fitted model.
2. Bonferroni limits (by effect): limits are calculated for each variable and combination of
two variables, calculated using Student’s t distribution according to
Yˆ t / 2 g ,n p s Yˆ (2)
where g is the number of levels for the variable or combination of variables. Bonferroni
limits provide at least the stated confidence for all means in the set.
3. Working-Hotelling limits: limits are calculated for all means simultaneously using
Snedecor’s F distribution according to
Yˆ
pF1 , p ,n p s Yˆ (3)
Working-Hotelling limits provide at least the stated confidence for all of the means.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 22
STATGRAPHICS – Rev. 12/7/2010
80
76
72
yield
68
64
60
56
160.0 180.0 20.0 40.0 A B C
temperature concentration catalyst
For continuous factors such as temperature, the estimated mean is displayed everywhere between
the low and high levels. For categorical factors such as catalyst, point symbols indicate the
discrete locations at which the means are calculated.
If the plot is too crowded, Pane Options may be used to select a subset of the factors:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 23
STATGRAPHICS – Rev. 12/7/2010
Interaction Plot
The means displayed in the Means Table for combinations of factors can be displayed
graphically using the Interaction Plot:
92
catalyst=C
82
catalyst=B
catalyst=A
yield
72
62
catalyst=A
catalyst=B
52 catalyst=C
160.0 180.0
temperature
As with the Main Effects Plot, the factors displayed may be selected using Pane Options:
If Reverse Factors is checked, the second factor rather than the first will be plotted along the
horizontal axis.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 24
STATGRAPHICS – Rev. 12/7/2010
Yi 0 1 X 1 2 X 2 ... p 1 X p 1 (4)
where the ’s are coefficients that need to be estimated from the data and the X’s are calculated
from the levels of the experimental factors. The table of Standardized Regression Coefficients
displays the estimated ’s when the X’s are defined as follows:
With this scaling, X = -1 at the low level, 0 at the center level, and +1 at the high level.
Xj = 0 otherwise
With this scaling, the ’s constrast each additional level of the factor with the first level.
It provides the estimated coefficients, standard errors, and variance inflation factors. The
variance inflation factors measure how large the variance of the coefficients is compared to what
it would be if the independent variables were uncorrelated. Values greater than 10.0 usually
indicate serious multicollinearity amongst the predictor variables, which leads to imprecise
estimates of the model coefficients.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 25
STATGRAPHICS – Rev. 12/7/2010
The Predictions table described later used this model to predict values of yield at different
combinations of the experimental factors.
Trace Plot
An interesting way of visualizing the effects of each factor in the fitted regression models is
through a Trace Plot. Starting at a selected reference point within the experiment region, each
continuous factor is moved above and below that point, holding each of the other factors
constant. The estimated response is then displayed, as in the plot below:
75
Factor
72 temperature
concentration
69
yield
66
63
60
-1 -0.6 -0.2 0.2 0.6 1
Factor range
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 26
STATGRAPHICS – Rev. 12/7/2010
In the trace plot, the horizontal axis is defined as
2 X i ref i
x (7)
highi lowi
where Xi is the value of factor i, refi is the reference point for factor i, highi is the high level for
factor i defined when the experiment was created, and lowi is the low level. The default reference
point corresponds to the center of the design space for the continuous factors and the first level
of the categorical factors.
By comparing the changes in yield over the range of each factor, the most important factors can
be determined. If a line is relatively flat for a particular factor, the response is insensitive to
changes in that factor in the vicinity of the reference point.
70 73.8
40 76.2
60 36
78.6
50 32
81.0
28
160 164 168 24concentration 83.4
172 176 180 20
temperature
The Pane Options dialog box is used to select the type of plot to be displayed:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 27
STATGRAPHICS – Rev. 12/7/2010
Type: type of response plot to create. The standard error may be plotted as a surface, a two-
dimensional contour plot, a three-dimensional contour plot, or a three-dimensional mesh plot.
o From: location at which the first contour line is drawn, or the start of the first region.
o To: location at which the last contour line is drawn, or the end of the last region.
o Lines: if selected, a sequence of contour lines is drawn at selected levels of the predicted
response, as on a topographical map.
o Painted Regions: if selected, a set of regions is drawn covering various ranges of the
predicted response.
o Continuous with Grid: draws contours using a continuous range of colors and adds a
grid.
Resolution: defines the resolution m of an m-by-m grid of predicted values which is used to
draw the surface and contour lines. Increasing the resolution may improve the smoothness
and definition of the plots, at the expense of computer time and memory.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 28
STATGRAPHICS – Rev. 12/7/2010
o Horizontal Divisions: the number of divisions along the first experimental axis. This
determines how many vertical lines will be drawn on the surface plot.
o Vertical Divisions: the number of divisions along the second experimental axis. This
determines how many horizontal lines will be drawn on the surface plot.
o Contours Below: requests that a contour plot, of type specified below, be drawn in the
bottom face of the 3-D plot.
o Wire Frame: requests that the surface be drawn using cross-hatched lines as shown in
the figure above. This is the most effective choice for black-and-white presentation.
o Contoured: requests that the surface be drawn showing contour levels of the response.
Factors button: displays a dialog box to select the factors to be plotted on each axis and the
levels at which the other factors will be held:
The current example plots the predicted response versus temperature and concentration, when
catalyst = A.
The same information shown as a contour plot with continuous contours is displayed below:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 29
STATGRAPHICS – Rev. 12/7/2010
32 66.6
69.0
71.4
28
73.8
76.2
24 78.6
81.0
20 83.4
160 164 168 172 176 180
temperature
Predictions
The Predictions pane may be used to generate predictions from the fitted model:
The table may include all rows in the datasheet, or only rows for which the value of the response
variable Y has not been entered. The latter feature allows the analyst to make predictions at
combinations of X that were not included in the experiment. For example, the above table shows
the result of adding an 18th row with temperature = 175, concentration = 35, and catalyst = A.
The predicted value of yield is 69.62. The 95% confidence interval for the mean value of yield at
that same combination of the factors ranges from 67.5 to 71.7.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 30
STATGRAPHICS – Rev. 12/7/2010
One other noticeable entry in the above table is the Studentized residual for row #7. The
Studentized residual measures the difference between the observed response and the predicted
response, in units of its standard error, when the observation in question is not used to fit the
model. The Studentized residual for observation #7 equals 2.8. Values in excess of 3.0 are
unusual and would typically require further scrutiny.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 31
STATGRAPHICS – Rev. 12/7/2010
Pane Options
2. Fitted Y - the predicted values Yi calculated from the fitted model.
d i Yi Y( i ) (8)
di
ei* (9)
s( d i )
where
s 2 (d i ) MSE (i ) 1 X i ( X (i ) X ( i ) ) 1 X i (10)
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 32
STATGRAPHICS – Rev. 12/7/2010
The deleted residuals should follow a t distribution with n - p - 1 degrees of freedom,
where p is the number of estimated coefficients in the fitted model.
1. Standard Errors for Forecasts - the standard error for new observations at a selected
combination of the experimental factors Xh, given by
MSE 1 X h ( X X ) 1 X h (11)
2. Confidence Limits for Individual Forecasts - confidence limits for new observations at a
selected combination of the experimental factors Xh, given by
Yh t n p MSE 1 X h ( X X ) 1 X h (12)
3. Confidence Limits for Forecast Means - confidence limits for the mean response at a
selected combination of the experimental factors Xh, given by
Yh t n p MSE X h ( X X ) 1 X h (13)
Predict - whether forecasts are displayed for all of the runs in the experiment data file, or
only for runs that have a missing value in the response column.
Unusual Residuals
The Unusual Residuals table displays all rows with Studentized residuals less than -2 or greater
than +2:
While row #7 appears on the list, it is less than 3 in absolute value so may not be that serious.
Diagnostic Plots
Several plots are also provided under Diagnostic Plots to examine the residuals from the fitted
model. The Pane Options dialog box displays the various choices, which include the following:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 33
STATGRAPHICS – Rev. 12/7/2010
Plot of yield
90
80
observed
70
60
50
50 60 70 80 90
predicted
If the model fits well, the values should lie close to the line, as in the example above. Curvature
around the line may suggest the need to transform the values of Yi using a logarithm or similar
function.
1
residual
-1
-2
-3
50 60 70 80 90
predicted
The residuals should vary randomly around the line. Changes in the magnitude of the residuals
from left to right may signal that the variance of the experimental error varies with the mean
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 34
STATGRAPHICS – Rev. 12/7/2010
level of the response. Such heteroscedasticity may frequently be eliminated by a variance-
stabilizing transformation such as a logarithm or a square root.
1
residual
-1
-2
-3
0 3 6 9 12 15 18
run number
Any non-random pattern may indicate a time trend or other effect. In such cases, addition of a
factor to account for the change may improve the fit of the model. The above plot does suggest
an increase in variability during the second half of the experiment, which would be worthy of
further investigation.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 35
STATGRAPHICS – Rev. 12/7/2010
Residuals versus Factor
This plot displays the residuals ei versus the observed values of a selected experimental factor:
1
residual
-1
-2
-3
160 164 168 172 176 180
temperature
Any curvature around the line may suggest the need for a model with quadratic effects. The
above plot suggests that the variability amongst the replicated values at the centerpoint may be
somewhat less than that of the residuals at the low and high levels of catalyst.
3.6
2.6
1.6
residuals
0.6
-0.4
-1.4
-2.4
0.1 1 5 20 50 80 95 99 99.9
percentage
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 36
STATGRAPHICS – Rev. 12/7/2010
If the experimental error follows a normal distribution, the points should lie along a straight line.
Pane Options
Plot versus: selects the experimental factor to be shown in the plot, for those plots where a
factor is needed.
Fitted Line: specifies whether a line should be fit to the data on the normal probability plot.
Optimization
Each analysis window gives the ability to optimize an individual response variable. The
Optimization text pane will automatically calculate and display the optimal settings of the
experimental factors:
Optimize Response
Goal: maximize yield
Goal - the type of optimization desired, defined when the experiment was created.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 37
STATGRAPHICS – Rev. 12/7/2010
Optimum value - the predicted response at the optimum setting.
Low - the low level of the region over which the optimization is performed.
High - the high level of the region over which the optimization is performed.
In the above example, yield has been maximized with respect to temperature, concentration and
catalyst.
Pane Options
Factor – If the box for a factor is checked, it will be optimized over the indicated range if the
factor is continuous or over all levels if the factor is categorical. Otherwise, it will be
constrained to match the value specified in the Hold field.
+
Low – lowest level considered for each factor.
Hold – if not being optimized, the level at which each factor is set.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 38
STATGRAPHICS – Rev. 12/7/2010
Optimization
Once a statistical model has been developed for each response, the analyst may now determine
what combination of factors will yield the best results for all responses simultaneously.
Returning to the main DOE Wizard window and Pressing the button labeled Step #9 begins
searching the experimental region for the combination of the experimental factors that maximize
the desirability of the result. To avoid finding a local optimum, a search is performed beginning
at each design point.
When the optimization is complete, a message similar to that shown below will be displayed:
The dialog box indicates the “Desirability” of the final result, based on a metric designed to
balance competing requirements of multiple responses (see the document titled DOE Wizard for
full details). The value displayed in this case indicates that the predicted yield at the optimum
factor settings is 39.55% of the distance between 80 and 100, which was the desired range
specified when the design was created.
If you press OK, additional information will be added to the main DOE Wizard window:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 39
STATGRAPHICS – Rev. 12/7/2010
The table shows the estimated response at the optimal settings of the experimental factors. For
the sample data, it is estimated that the mean yield will equal 87.91 when the factors are set at
temperature = 180, concentration = 20, and catalyst = C. The 95% confidence interval for the
mean yield ranges between 85.76 and 90.06.
If you push the Tables and Graphs button on the analysis toolbar, you can display the estimated
desirability throughout the experimental region. An interesting type of display is the contoured
surface plot shown below (use Pane Options and the Factors button to select the factors to plot
on each axis):
Desirability Plot
catalyst=C
Desirability
0.0
0.1
1 0.2
0.8 0.3
Desirability
0.4
0.6
0.5
0.4 0.6
0.2 0.7
0 0.8
40
36 0.9
160 32
164 168 28 1.0
172 24 concentration
176 180 20
temperature
It is clear that the best place to operate is in the right front corner of the experimental region.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 40
STATGRAPHICS – Rev. 12/7/2010
Step 10: Save results
The button labeled Step 10 allows you to save the results in a StatFolio:
Actually, the StatFolio can be saved at any point and reloaded at a later date.
IMPORTANT: When using the Experimental Design Wizard, two files are created:
1. An experiment file with the extension .sgx which stores information about the
experimental data.
2. A StatFolio with the extension .sgp that stores the results of the analysis.
If you move the experiment to another computer, be sure to transfer both files.
Since the conclusions from the design are fairly clear, there is no need to augment the design.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 41
STATGRAPHICS – Rev. 12/7/2010
Extrapolation
The maximum predicted yield within the design space is 87.91%. To use the statistical model to
predict settings of the factors outside the experimental region that might produce even better
results, press the button labeled Step 12. The following dialog box will be displayed:
Change: the factors you wish to consider changing. Only quantitative factors may be
checked.
Display steps of: The program will begin at the starting location and follow the path of
steepest ascent in an attempt to increase the desirability of the predicted response. Specify the
increment of increased desirability at which the results should be displayed.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 42
STATGRAPHICS – Rev. 12/7/2010
Low and high: The limits within which the factors will be changed.
In this case, we have asked to program to search from the derived optimal conditions and display
improvement of 1% in desirability.
The results of the search are shown in the following table, which will be added to the main DOE
Wizard window:
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 43
STATGRAPHICS – Rev. 12/7/2010
8 181.6 19.7649 C
9 181.8 19.7356 C
10 182.0 19.7062 C
11 182.2 19.6768 C
12 182.4 19.6474 C
13 182.6 19.618 C
14 182.8 19.5886 C
15 183.0 19.5593 C
16 183.2 19.5299 C
17 183.4 19.5005 C
18 183.6 19.4711 C
19 183.8 19.4417 C
20 184.0 19.4123 C
21 184.2 19.383 C
22 184.4 19.3536 C
23 184.6 19.3242 C
24 184.8 19.2948 C
25 185.0 19.2654 C
26 185.2 19.236 C
27 185.4 19.2067 C
28 185.6 19.1773 C
29 185.8 19.1479 C
30 186.0 19.1185 C
31 186.2 19.0891 C
32 186.4 19.0598 C
33 186.6 19.0304 C
34 186.8 19.001 C
35 187.0 18.9716 C
36 187.2 18.9422 C
37 187.3 18.8422 C
The program suggests that the best course of action would be to increase temperature while
decreasing concentration. While the model predicts a yield of 100% at a temperature of 187.3
degrees and a concentration of 18.84%, confirmatory runs would be necessary to determine
whether the model gives reasonable predictions that far outside the experimental region.
2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 44