DOE Wizard - Quantitative and Categorical Factors

STATGRAPHICS – Rev.
12/7/2010
DOE Wizard –Designs with Both Quantitative and Categorical Factors
Summary
This document describes the construction and analysis of designs that include both quantitative
and categorical factors. The DOE Wizard facilitates the construction and analysis of such designs
by:
1. Creating a multilevel factorial design involving all combinations of selected levels of

each factor.
2. Reducing the number of runs if desired using the D-efficiency criterion.
3. Analyzing the results using a general linear model.
Example
As an example, an experiment involving 3 factors will be considered, similar to that described by
Box, Hunter and Hunter (2005). An investigation was conducted in a pilot plant to study the
effect of three factors:
X1: temperature (160 – 180 degrees C)

X2: concentration (20% – 40%)
X3: type of catalyst (A, B and C)
There is one response variable:
Y: yield (%)
Had there been only 2 levels of X3, that factor could have been handled as a quantitative factor
via a single indicator variable taking the value -1 for one type of catalyst and +1 for the other.
However, the 3 levels of catalyst make it more appropriate to handle that factor as a true
categorical variable.
Sample StatFolio: doewiz both.sgp
 2009 by StatPoint Technologies, Inc. DOE Wizard – Quantitative and Categorical Factors - 1
STATGRAPHICS – Rev. 12/7/2010
Design Creation
To begin the design creation process, start with an empty StatFolio. Select DOE – Experimental
Design Wizard to load the DOE Wizard’s main window. Then push each button in sequence to
create the design.
Step #1 – Define Responses
The first step of the design creation process displays a dialog box used to specify the response
variables. For the current example, there is a single response variable:
 Name: The name for the variable is yield.
 Units: Yield is measured as a percentage.
 Analyze: The parameter of interest is the mean percent yield.
 Goal: The goal of the experiment is to maximize the mean.
 Impact: The relative importance of each response (not relevant if only one response).
 Sensitivity: The importance of being close to the best desired value (in this case, the
Maximum). Setting Sensitivity to Medium implies that the desirability attributed to the
response rises linearly between the Minimum and Maximum values indicated.
 Minimum and Maximum: Range of desirable values for the response.

Step #2 – Define Experimental Factors
The second step displays a dialog box used to specify the factors that will be varied. In the
current example, there are 3 factors:
 Name – Each factor must be assigned a unique name.
 Units – Units are optional.
 Type – The first 2 factors are Continuous, while the third factor is Categorical.
 Role – Therole of each factor is Controllable.
 Low - the lower limits for the continuous factors.
 High - the upper limits for the continuous factors.
 Levels – a list of the levels at which the categorical factor will be run, separated by commas.
Step #3 – Select Design

The third step begins by displaying the dialog box shown below:
Since all of the factors are controllable process factors, only one Options button is enabled.
Pressing that button displays a second dialog box:
 Levels – the number of levels at which each factor should be run. This can only be
changed for the continuous factors.
 Replicate design - if a number other than 0 is entered, the entire design will be repeated
the indicated number of times.
 Randomize - check this box to randomly order the runs in the experiment.
Randomization is generally a good idea, since it can reduce the effect of lurking variables
such as trends over time. However, when replicating the examples in this documentation,
do not randomize the designs.
For the current experiment, a 3x3x3 factorial design with 27 runs has been selected. This design
leaves 15 degrees of freedom available for estimating the experimental error.
The tentatively selected design is displayed in the Select Design dialog box:
If the design is acceptable, press OK to save it to the STATGRAPHICS DataBook and return to
the DOE Wizard’s main window, which should now contain a summary of the design:
Step #4: Specify Model
Before evaluating the properties of the design, a tentative model must be specified. Pressing the
fourth button on the DOE Wizard’s toolbar displays a dialog box to make that choice:
The default model includes main effects for each of the 3 experimental factors, interactions
between each pair of factors, and quadratic terms for the continuous factors. Selected terms
could be excluded by double-clicking on them with the left mouse button.
Step #5: Select Runs
The basic design that was constructed has a total of 27 runs, leaving 15 degrees of freedom to
estimate the experimental error. If each run was very expensive, a smaller design might be
desired. To reduce the number of runs, press the button labeled Step 5: Select runs to display the
following dialog box:
In the bottom left is a field where the desired number of runs should be specified. As indicated,
the default model has 12 coefficients. It is usually a good idea to select at least 3 more runs than
there are coefficients in the selected model.
To select the runs, press either of the 2 buttons on the dialog box. Since the number of ways of
choosing subsets of the candidate runs is too large to check all possibilities, STATGRAPHICS
(like other programs) uses a selection algorithm to choose a subset. The Forward method begins
with the runs that have already been performed (if any) and adds runs one at a time, adding at
each step the run that adds the most to the D-efficiency of the experiment. The Backward method
begins with all of the candidate runs and removes runs one at a time, removing at each step the
run that adds the least to the D-efficiency of the experiment. In either case, once the desired
number of runs has been selected, an exchange algorithm can be performed. This algorithm tests
all pairs of runs consisting of one that has been selected and one that has not, making any
exchanges that would increase the efficiency of the experiment. Exchanges continue until no
further improvements can be made by switching one run that’s been selected with one run that
has not been selected.
For the example, the program was asked to find 17 runs using backward selection with the
exchange algorithm. When the algorithm is complete, the selected rows will be highlighted in
red:
The efficiencies of the selected design will also be displayed. You can try another algorithm or
press OK to accept the selection, at which point the rows of the datasheet will be reduced to the
selected runs:
The main DOE Wizard window will reflect the design:
If the selection is acceptable, press Step 7: Save experiment to save the reduced number of runs.
You can also use the Design Plot to display the final design:
Design with both quantitative and categorical factors
catalyst
40
A 36
32
160 28
164 168 24 concentration
172 176 20
180
temperature
For each catalyst, runs are performed at the 4 combinations of low and high temperature and low
and high concentration. A run is also performed with catalyst B at a middle level of the
quantitative factors. For catalyst C, 4 star points are added to estimate the quadratic effects of
temperature and concentration.
Design Properties
Step #6: Evaluate Design
Several of the selections presented when pressing button #6 are helpful in evaluating the selected
design:
Design Worksheet
The design worksheet shows the 17 runs that have been selected, in the order they are to be run:
Worksheet for <untitled> - Design with both quantitative and categorical factors
run temperature concentration catalyst yield
degrees C % type %
1 160.0 20.0 A
2 160.0 20.0 B
3 160.0 20.0 C
4 160.0 30.0 C
5 160.0 40.0 A
6 160.0 40.0 B
7 160.0 40.0 C
8 170.0 20.0 C
9 170.0 30.0 B
10 170.0 40.0 C
11 180.0 20.0 A
12 180.0 20.0 B
13 180.0 20.0 C
14 180.0 30.0 C
15 180.0 40.0 A
16 180.0 40.0 B
17 180.0 40.0 C
ANOVA Table
The ANOVA table shows the breakdown of the degrees of freedom in the design:
ANOVA Table
Source D.F.
Model 11
Total Error 5
Lack-of-fit 5
Pure error 0
Total (corr.) 16
The StatAdvisor
The ANOVA table shows the degrees of freedom that will be available for estimating experimental error. Two estimates are
commonly used: total error, which includes degrees of freedom that could have been used to estimate effects that are not in the
current model, and pure error which comes only from replicated runs. In this case, the total error has 5 degrees of freedom,
while there are 0 degrees of freedom for pure error. In general, it's a good idea to have at least three or four error degrees of
freedom available when testing the statistical significance of estimated effects. Otherwise, the statistical tests will have very
little power.
11 of the 16 total degrees of freedom are used to estimate the main effects, quadratic effects, and
two-factor interactions.
Model Coefficients
The table of model coefficients is shown below:
Model Coefficients
Power at Power at Power at

Coefficient Standard Error VIF Ri-Squared SN = 0.5 SN = 1.0 SN = 2.0
A 0.272166 1.03704 0.0357143 11.73% 32.07% 83.25%
B 0.272166 1.03704 0.0357143 11.73% 32.07% 83.25%
C 0.363976 1.18451 0.155772 8.73% 20.22% 59.86%
C 0.336523 1.25239 0.201525 9.37% 22.82% 66.42%
AA 0.665062 1.09276 0.0848861 6.11% 9.48% 23.25%
AB 0.288675 1.0 0.0 10.97% 29.15% 78.91%
AC 0.396746 1.25926 0.205882 8.13% 17.79% 52.90%
AC 0.360041 1.2963 0.228571 8.81% 20.56% 60.77%
BB 0.665062 1.09276 0.0848861 6.11% 9.48% 23.25%
BC 0.396746 1.25926 0.205882 8.13% 17.79% 52.90%
BC 0.360041 1.2963 0.228571 8.81% 20.56% 60.77%
alpha = 5.0%, sigma estimated from total error with 5 d.f.
The coefficients displayed are based on a standardized model in which the quantitative factors
are coded as -1 when at their low level and +1 when at their high level. For categorical factors at
k levels, k - 1 indicator variables are created according to:
X1 = -1 for level 1, 1 for level 2, and 0 for all other levels
X2 = -1 for level 1, 1 for level 3, and 0 for all other levels
Xk-1 = -1 for level 1, 1 for level k, and 0 for all other levels
The variance inflation factors (VIF) indicate the extent to which the variance of each estimate is
inflated due to the non-orthogonality of the selected runs. In this case, the inflation is minor.
Saving the Design File
Step #7: Save experiment
Once the experiment has been created and any additional runs entered, it must be saved on disk.
Press the button labeled Step 7 and select a name for the experiment file:
Design files are extended data files and have the extension .sgx. They include the data together
with other information that was entered on the input dialog boxes.
To reopen an experiment file, select Open Data File from the File menu. The data will be loaded
into the datasheet, and the Experimental Design Wizard window will be displayed.
Analyzing the Results

After the design file has been created and saved, the experiments would be performed. At a later
date, once the results have been collected, the experimenter would return to STATGRAPHICS
and reopen the saved design file using the Open Data Source selection on the main File menu.
The results can then be typed into the response columns. The results for the example are
displayed below:
run temperature concentra catalyst yield
tion
degrees C % type %
1 160.0 20.0 A 63.7
2 160.0 20.0 B 59.7
3 160.0 20.0 C 53.7
4 160.0 30.0 C 50.3
5 160.0 40.0 A 56.8
6 160.0 40.0 B 54.5
7 160.0 40.0 C 53.0
8 170.0 20.0 C 73.3
9 170.0 30.0 B 67.2
10 170.0 40.0 C 66.6
11 180.0 20.0 A 77.2
12 180.0 20.0 B 80.8
13 180.0 20.0 C 88.3
14 180.0 30.0 C 85.1
15 180.0 40.0 A 71.5
16 180.0 40.0 B 78.3
17 180.0 40.0 C 82.2
Important Notes:
1. If more than one sample was taken at each set of experimental conditions, the data values
should be entered into data tables B through Z. The summary statistics in data table A
will then be automatically calculated from the other tables. Do not treat the samples as
replicates unless you actually reset the process between each sample.
2. If any experiments were not performed, leave the corresponding cell blank. The program
will recognize the imbalance in the design and handle it.
3. If any experimental runs were done at conditions different than originally planned,
change the entries in the experimental factor columns to correspond to the values that
were actually used.
4. If additional runs were performed, you may add them to the bottom of the datasheet.
They will be included in the fit.
Step #8: Analyze data
Once the data have been entered, press the button labeled Step #8 on the Experiment Design
Wizard toolbar. This will display a dialog box listing each of the response variables:
 Response: column containing the response variable to be analyzed.
 Transformation: the desired transformation to be applied before the model is fit.
 Power and addend: the transformation parameters if a Power or Box-Cox transformation is

selected.
If more than one response has been measured, you should repeat this step once for each response.
Analysis Summary
When a response is analyzed, a new window is created providing numerous tables and graphs
summarizing the summary. The pane in the upper left corner of the window displays an Analysis
Summary:
Analyze Experiment - yield

File name: C:\DocData16\both.sgx
Comment: Design with both quantitative and categorical factors
Number of runs: 17
Analysis of Variance for yield (%)

Source Sum of Squares Df Mean Square F-Ratio P-Value
Model 2423.51 11 220.319 70.0415 0.0001
Residual 15.7278 5 3.14555
Lack-of-fit 5
Pure error 0
Total (corr.) 2439.24 16
R-squared = 99.3552 percent
R-squared (adjusted for d.f.) = 97.9367 percent
Standard Error of Est. = 1.77357
Mean absolute error = 0.822775
Durbin-Watson statistic = 1.56465 (P=0.0470)
Lag 1 residual autocorrelation = 0.167273
Analysis of Effects
Categorical factors:
C=catalyst (type)
Quantitative factors:
A=temperature (degrees C)
B=concentration (%)
A 1807.0 1 1807.0 574.463 0.0000
B 80.4834 1 80.4834 25.5864 0.0039
C 10.1939 2 5.09694 1.62036 0.2868
AA 0.803512 1 0.803512 0.255444 0.6348
AB 0.1875 1 0.1875 0.059608 0.8168
AC 217.361 2 108.681 34.5506 0.0012
BB 6.18395 1 6.18395 1.96593 0.2198
BC 3.28464 2 1.64232 0.522109 0.6224
Included in the output are:
 Analysis of Variance: a decomposition of the sum of squares for the response variable into
components for the model and for the residuals. The F-test tests the statistical significance of
the model as a whole. A small P-value (less than 0.05 if operating at the 5% significance
level) indicates that at least one factor in the model is significantly related to the dependent
variable. In the current example, the model is highly significant.
 Model statistics: summarize the fitted model. Included are:
o R-squared - represents the percentage of the variability in the response variable which
has been explained by the fitted regression model, ranging from 0% to 100%.
o Adjusted R-Squared – the R-squared statistic, adjusted for the number of coefficients
in the model. This value is often used to compare models with different numbers of
coefficients.
o Standard Error of Est. – the estimated standard deviation of the residuals (the
deviations around the model). This value is used to create prediction limits for new
observations.
o Mean Absolute Error – the average absolute value of the residuals.
o Durbin-Watson Statistic – a measure of serial correlation in the residuals. If the
residuals vary randomly, this value should be close to 2. A small P-value indicates a
non-random pattern in the residuals. For data recorded over time, a small P-value
could indicate that some trend over time has not been accounted for. In the current
example, the P-value is less than 0.05, so there is may be some serial correlation in
the residual.
o Lag 1 residual autocorrelation: a measure of the serial correlation between
consecutive residuals on a scale ranging from -1 to +1.
 Analysis of Effects: decomposition of the model sum of squares into components for each
term in the fitted statistical model, including main effects (such as A), two-factor interactions
(such as AB), and quadratic effects (such as AA). Based on the settings specified on the
Analysis Options dialog box, either Type III or Type I sums of squares are displayed. The
sums of squares test the marginal significance of each factor, assuming it was the last to be
entered into the model. Small P-values indicate significant effects. In this example, three
effects have P-values less than 0.05 and are thus statistically significant at the 5% level.
Simplifying the Model
In many cases, it may be desirable to remove insignificant effects from the model. This is done
by selecting Analysis Options, which displays the dialog box shown below:
Double-click on an effect to move it from the Include field to the Exclude field or vice versa.
Then press OK to refit the model. For the sample data, removing the 4 effects indicated above
yields a simpler model in which all remaining effects are statistically significant:
Analysis of Effects
Categorical factors:
C=catalyst (type)
Quantitative factors:
A=temperature (degrees C)
B=concentration (%)
A 1807.0 1 1807.0 702.657 0.0000
B 81.6029 1 81.6029 31.7314 0.0002
C 8.78007 2 4.39004 1.70707 0.2302
AC 217.361 2 108.681 42.2607 0.0000
In the main DOE Wizard window, additional information will be added:
Step 8: Analyze the experimental results

Model yield
Transformation none
Model d.f. 6
P-value 0.0000
Error d.f. 10
Stnd. error 1.60364
R-squared 98.95
Adj. R-squared 98.31
Included in the summary are the P-value for the fitted model and the R-squared statistics.
Pareto Chart
The analysis window summarizes the contribution of each effect to the overall variability in the
response using a Pareto chart:
Pareto Chart for yield
Sig. at 5%
A
Not sig.
AC
0 20 40 60 80 100
Contribution to variation (%)
The length of each bar equals the contribution of an effect to the overall variation in the
response, where an effect’s contribution is calculated by dividing its sum of squares by the total
corrected sum of squares from the ANOVA table. The color of each bar indicates whether an
effect is statistically significant at the indicated significance level.
Means Table
The analysis window also displays the estimated mean response at different locations within the
design space:
Means table for yield with 95% confidence intervals
Estimated Standard Lower Upper

Factor Level Mean Error Limit Limit
Grand mean 68.1542 0.405341 67.251 69.0573
temperature
160.0 56.5847 0.595647 55.2575 57.9119
170.0 68.1542 0.405341 67.251 69.0573
180.0 79.7236 0.595647 78.3964 81.0508
concentration
20.0 70.5685 0.589909 69.2541 71.8829
30.0 68.1542 0.405341 67.251 69.0573
40.0 65.7399 0.589909 64.4255 67.0543
catalyst
A 67.3 0.801822 65.5134 69.0866
B 68.1 0.717171 66.502 69.698
C 69.0625 0.566974 67.7992 70.3258
temperature by concentration
160.0,20.0 58.999 0.733816 57.364 60.6341
160.0,30.0 56.5847 0.595647 55.2575 57.9119
160.0,40.0 54.1704 0.733816 52.5354 55.8055
170.0,20.0 70.5685 0.589909 69.2541 71.8829
170.0,30.0 68.1542 0.405341 67.251 69.0573
170.0,40.0 65.7399 0.589909 64.4255 67.0543
180.0,20.0 82.1379 0.733816 80.5028 83.7729
180.0,30.0 79.7236 0.595647 78.3964 81.0508
180.0,40.0 77.3093 0.733816 75.6743 78.9444
temperature by catalyst
160.0,A 60.25 1.13395 57.7234 62.7766
160.0,B 56.875 1.07576 54.4781 59.2719
160.0,C 52.6292 0.866066 50.6994 54.5589
170.0,A 67.3 0.801822 65.5134 69.0866
170.0,B 68.1 0.717171 66.502 69.698
170.0,C 69.0625 0.566974 67.7992 70.3258
180.0,A 74.35 1.13395 71.8234 76.8766
180.0,B 79.325 1.07576 76.9281 81.7219
180.0,C 85.4958 0.866066 83.5661 87.4256
concentration by catalyst
20.0,A 69.7143 0.90918 67.6885 71.7401
20.0,B 70.5143 0.835479 68.6527 72.3759
20.0,C 71.4768 0.710739 69.8932 73.0604
30.0,A 67.3 0.801822 65.5134 69.0866
30.0,B 68.1 0.717171 66.502 69.698
30.0,C 69.0625 0.566974 67.7992 70.3258
40.0,A 64.8857 0.90918 62.8599 66.9115
40.0,B 65.6857 0.835479 63.8241 67.5473
40.0,C 66.6482 0.710739 65.0646 68.2318
The line labeled Grand mean shows the estimated mean at the center of the design space, where:
1. Each quantitative factor is set halfway between its low and high levels.
2. For each categorical factor, the results are averaged over all levels of that factor.
Estimated means are also shown for each level of the experimental factors and for combinations
of pairs of factors.
In addition to the means, the table also shows the estimated standard error for each mean and
interval estimates. The Pane Options dialog box selects the type of limits displayed:
The choices are:
1. Confidence limits: a separate confidence interval is displayed for each mean, calculated
using Student’s t distribution according to

Yˆ  t / 2,n  p s Yˆ (1)

where Ŷ is the predicted mean response, s Yˆ is the estimated standard error, n is the
number of observed runs, and p is the number of coefficients in the fitted model.
2. Bonferroni limits (by effect): limits are calculated for each variable and combination of
two variables, calculated using Student’s t distribution according to

Yˆ  t / 2 g ,n  p s Yˆ (2)
where g is the number of levels for the variable or combination of variables. Bonferroni
limits provide at least the stated confidence for all means in the set.
3. Working-Hotelling limits: limits are calculated for all means simultaneously using
Snedecor’s F distribution according to
Yˆ  
pF1 , p ,n  p s Yˆ (3)
Working-Hotelling limits provide at least the stated confidence for all of the means.
Main Effects Plot

The means displayed in the Means Table can be displayed graphically using the Main Effects
Plot:
Main Effects Plot for yield
80
76
72
yield
68
64
60
56
160.0 180.0 20.0 40.0 A B C
temperature concentration catalyst
For continuous factors such as temperature, the estimated mean is displayed everywhere between
the low and high levels. For categorical factors such as catalyst, point symbols indicate the
discrete locations at which the means are calculated.
If the plot is too crowded, Pane Options may be used to select a subset of the factors:
Interaction Plot
The means displayed in the Means Table for combinations of factors can be displayed
graphically using the Interaction Plot:
Interaction Plot for yield
92
catalyst=C
82
catalyst=B
catalyst=A
yield
72
62
catalyst=A
catalyst=B
52 catalyst=C
160.0 180.0
temperature
As with the Main Effects Plot, the factors displayed may be selected using Pane Options:
If Reverse Factors is checked, the second factor rather than the first will be plotted along the
horizontal axis.
Standardized Regression Coefficients

The statistical output displayed above is based on fitting a multiple linear regression model. Such
models take the general form
Yi   0   1 X 1   2 X 2  ...   p 1 X p 1 (4)
where the ’s are coefficients that need to be estimated from the data and the X’s are calculated
from the levels of the experimental factors. The table of Standardized Regression Coefficients
displays the estimated ’s when the X’s are defined as follows:
1. For continuous factors, a single X is created by
X = 2 (factor value – center level) / (high level – low level) (5)
With this scaling, X = -1 at the low level, 0 at the center level, and +1 at the high level.
2. For categorical factors at k levels, k - 1 X’s are created where
Xj = -1 if the factor is at level 1
Xj = +1 if the factor is at level j+1 (6)
Xj = 0 otherwise
With this scaling, the ’s constrast each additional level of the factor with the first level.
The table for the current example is shown below:
Standardized Regression Coefficients

Effect Estimate Stnd. Error V.I.F.
Constant 68.1542 0.405341
A:temperature 11.5694 0.436456 1.03704
B:concentration -2.41429 0.428592 1.0
C:catalyst -0.0541667 0.579436 1.16732
C:catalyst 0.908333 0.521013 1.16732
AC -0.344444 0.636239 1.25926
AC 4.86389 0.577378 1.2963
It provides the estimated coefficients, standard errors, and variance inflation factors. The
variance inflation factors measure how large the variance of the coefficients is compared to what
it would be if the independent variables were uncorrelated. Values greater than 10.0 usually
indicate serious multicollinearity amongst the predictor variables, which leads to imprecise
estimates of the model coefficients.
The StatAdvisor shows the fitted model as:
yield = 68.1542 + 11.5642*temperature - 2.41429*concentration + -0.0541667*catalyst – 0.908333*catalyst -

0.344444*temperature*catalyst + 4.86389*temperature*catalyst
The Predictions table described later used this model to predict values of yield at different
combinations of the experimental factors.
Unstandardized Regression Coefficients

The regression model can also be written with the continuous factors expressed in their original
units. The estimated coefficients for this unstandardized model as shown below for the sample
data:
Unstandardized Regression Coefficients

Coefficient Estimate
constant -121.284
A:temperature 1.15694
B:concentration -0.241429
C:catalyst 5.80139
C:catalyst -81.7778
AC -0.0344444
AC 0.486389
The StatAdvisor shows the fitted model as:
yield = -121.284 + 1.15694*temperature - 0.241429*concentration + 5.80139*catalyst - 81.7778*catalyst -

0.0344444*temperature*catalyst + 0.486389*temperature*catalyst
Trace Plot
An interesting way of visualizing the effects of each factor in the fitted regression models is
through a Trace Plot. Starting at a selected reference point within the experiment region, each
continuous factor is moved above and below that point, holding each of the other factors
constant. The estimated response is then displayed, as in the plot below:
Trace Plot for yield

Reference Point: 170.0 30.0 A
75
Factor
72 temperature
concentration
69
yield
66
63
60
-1 -0.6 -0.2 0.2 0.6 1
Factor range
In the trace plot, the horizontal axis is defined as
2 X i  ref i 
x (7)
highi  lowi
where Xi is the value of factor i, refi is the reference point for factor i, highi is the high level for
factor i defined when the experiment was created, and lowi is the low level. The default reference
point corresponds to the center of the design space for the continuous factors and the first level
of the categorical factors.
By comparing the changes in yield over the range of each factor, the most important factors can
be determined. If a line is relatively flat for a particular factor, the response is insensitive to
changes in that factor in the vicinity of the reference point.
Surface and Contour Plots

The fitted statistical models can also be displayed using surface and contour plots. A typical
surface plot shows the predicted response as a function of two continuous factors, with all other
factors held constant:
Estimated Response Surface

catalyst=A
yield
57.0
59.4
61.8
64.2
90 66.6
69.0
80
71.4
yield
70 73.8
40 76.2
60 36
78.6
50 32
81.0
28
160 164 168 24concentration 83.4
172 176 180 20
temperature
The Pane Options dialog box is used to select the type of plot to be displayed:
 Type: type of response plot to create. The standard error may be plotted as a surface, a two-
dimensional contour plot, a three-dimensional contour plot, or a three-dimensional mesh plot.
 Contours – options for a contour plot.
o From: location at which the first contour line is drawn, or the start of the first region.
o To: location at which the last contour line is drawn, or the end of the last region.
o By: spacing between contour lines or regions.
o Lines: if selected, a sequence of contour lines is drawn at selected levels of the predicted
response, as on a topographical map.
o Painted Regions: if selected, a set of regions is drawn covering various ranges of the
predicted response.
o Continuous: draws contours using a continuous range of colors.
o Continuous with Grid: draws contours using a continuous range of colors and adds a
grid.
 Resolution: defines the resolution m of an m-by-m grid of predicted values which is used to
draw the surface and contour lines. Increasing the resolution may improve the smoothness
and definition of the plots, at the expense of computer time and memory.
 Surface – options for a surface plot.
o Horizontal Divisions: the number of divisions along the first experimental axis. This
determines how many vertical lines will be drawn on the surface plot.
o Vertical Divisions: the number of divisions along the second experimental axis. This
determines how many horizontal lines will be drawn on the surface plot.
o Contours Below: requests that a contour plot, of type specified below, be drawn in the
bottom face of the 3-D plot.
o Wire Frame: requests that the surface be drawn using cross-hatched lines as shown in
the figure above. This is the most effective choice for black-and-white presentation.
o Solid: requests that the surface be drawn using a solid color.
o Contoured: requests that the surface be drawn showing contour levels of the response.
 Factors button: displays a dialog box to select the factors to be plotted on each axis and the
levels at which the other factors will be held:
The current example plots the predicted response versus temperature and concentration, when
catalyst = A.
The same information shown as a contour plot with continuous contours is displayed below:
Estimated Response Surface

catalyst=A
yield
40 57.0
59.4
36 61.8
64.2
concentration
32 66.6
69.0
71.4
28
73.8
76.2
24 78.6
81.0
20 83.4
160 164 168 172 176 180
temperature
Predictions
The Predictions pane may be used to generate predictions from the fitted model:
Estimation Results for yield

Observed Fitted Studentized Lower 95.0% CL Upper 95.0% CL
Row Value Value Residual Residual for Mean for Mean
1 63.7 62.6643 1.03571 0.985092 59.9632 65.3653
2 59.7 59.2893 0.410714 0.353652 56.7091 61.8695
3 53.7 55.0435 -1.34345 -1.05572 52.8904 57.1965
4 50.3 52.6292 -2.32917 -1.95375 50.6994 54.5589
5 56.8 57.8357 -1.03571 -0.985092 55.1347 60.5368
6 54.5 54.4607 0.0392857 0.0335971 51.8805 57.0409
7 53.0 50.2149 2.78512 2.84548 48.0618 52.368
8 73.3 71.4768 1.82321 1.31347 69.8932 73.0604
9 67.2 68.1 -0.9 -0.607342 66.502 69.698
10 66.6 66.6482 -0.0482143 -0.0318201 65.0646 68.2318
11 77.2 76.7643 0.435714 0.39717 74.0632 79.4653
12 80.8 81.7393 -0.939286 -0.833662 79.1591 84.3195
13 88.3 87.9101 0.389881 0.290357 85.757 90.0632
14 85.1 85.4958 -0.395833 -0.279436 83.5661 87.4256
15 71.5 71.9357 -0.435714 -0.39717 69.2347 74.6368
16 78.3 76.9107 1.38929 1.29382 74.3305 79.4909
17 82.2 83.0815 -0.881548 -0.669541 80.9285 85.2346
18 69.6179 67.5641 71.6716
The table may include all rows in the datasheet, or only rows for which the value of the response
variable Y has not been entered. The latter feature allows the analyst to make predictions at
combinations of X that were not included in the experiment. For example, the above table shows
the result of adding an 18th row with temperature = 175, concentration = 35, and catalyst = A.
The predicted value of yield is 69.62. The 95% confidence interval for the mean value of yield at
that same combination of the factors ranges from 67.5 to 71.7.
One other noticeable entry in the above table is the Studentized residual for row #7. The
Studentized residual measures the difference between the observed response and the predicted
response, in units of its standard error, when the observation in question is not used to fit the
model. The Studentized residual for observation #7 equals 2.8. Values in excess of 3.0 are
unusual and would typically require further scrutiny.
Pane Options
 Include: items to include in the table:
1. Observed Y - the observed response values Yi .
2. Fitted Y - the predicted values Yi calculated from the fitted model.
3. Residuals - the residuals ei .
4. Studentized Residuals - a type of standardized residual, where each residual is divided by

an estimate of its standard error. STATGRAPHICS computes Studentized deleted
residuals, in which each observation is removed one at a time and the model refit without
that data value. The deleted residual then equals the observed response minus the value
predicted from a model fit without that observation, i.e.,
d i  Yi  Y( i ) (8)
The Studentized residual is calculated from
di
ei*  (9)
s( d i )
where

s 2 (d i )  MSE (i ) 1  X i ( X (i ) X ( i ) ) 1 X i  (10)
The deleted residuals should follow a t distribution with n - p - 1 degrees of freedom,
where p is the number of estimated coefficients in the fitted model.
1. Standard Errors for Forecasts - the standard error for new observations at a selected
combination of the experimental factors Xh, given by

MSE 1  X h ( X  X ) 1 X h  (11)
2. Confidence Limits for Individual Forecasts - confidence limits for new observations at a
selected combination of the experimental factors Xh, given by

Yh  t n  p MSE 1  X h ( X  X ) 1 X h  (12)
3. Confidence Limits for Forecast Means - confidence limits for the mean response at a
selected combination of the experimental factors Xh, given by

Yh  t n  p MSE X h ( X  X ) 1 X h  (13)
 Predict - whether forecasts are displayed for all of the runs in the experiment data file, or
only for runs that have a missing value in the response column.
 Confidence level - the confidence levels for the intervals.
Unusual Residuals
The Unusual Residuals table displays all rows with Studentized residuals less than -2 or greater
than +2:
Unusual Residuals for yield

Predicted Studentized
Row Y Y Residual Residual
7 53.0 50.2149 2.78512 2.85
While row #7 appears on the list, it is less than 3 in absolute value so may not be that serious.
Diagnostic Plots
Several plots are also provided under Diagnostic Plots to examine the residuals from the fitted
model. The Pane Options dialog box displays the various choices, which include the following:
Observed versus Predicted

This plot displays the observed response Yi versus the fitted values Yi , together with a diagonal
line:
Plot of yield
90
80
observed
70
60
50
50 60 70 80 90
predicted
If the model fits well, the values should lie close to the line, as in the example above. Curvature
around the line may suggest the need to transform the values of Yi using a logarithm or similar
function.
Residual versus Predicted

This plot displays the residuals ei versus the fitted values Yi , with a horizontal line at zero:
Residual Plot for yield
1
residual
-1
-2
-3
50 60 70 80 90
predicted
The residuals should vary randomly around the line. Changes in the magnitude of the residuals
from left to right may signal that the variance of the experimental error varies with the mean
level of the response. Such heteroscedasticity may frequently be eliminated by a variance-
stabilizing transformation such as a logarithm or a square root.
Residuals versus Run Order

This plot displays the residuals ei versus run number i, with a horizontal line at zero:
1
residual
-1
-2
-3
0 3 6 9 12 15 18
run number
Any non-random pattern may indicate a time trend or other effect. In such cases, addition of a
factor to account for the change may improve the fit of the model. The above plot does suggest
an increase in variability during the second half of the experiment, which would be worthy of
further investigation.
Residuals versus Factor
This plot displays the residuals ei versus the observed values of a selected experimental factor:
1
residual
-1
-2
-3
160 164 168 172 176 180
temperature
Any curvature around the line may suggest the need for a model with quadratic effects. The
above plot suggests that the variability amongst the replicated values at the centerpoint may be
somewhat less than that of the residuals at the low and high levels of catalyst.
Normal Probability Plot of Residuals

This plot displays the residuals ei versus quantiles of a normal distribution, with an optional
fitted line as reference:
Normal Probability Plot for Residuals
3.6
2.6
1.6
residuals
0.6
-0.4
-1.4
-2.4
0.1 1 5 20 50 80 95 99 99.9
percentage
If the experimental error follows a normal distribution, the points should lie along a straight line.
Pane Options
 Plot: the type of plot to be created.
 Plot versus: selects the experimental factor to be shown in the plot, for those plots where a
factor is needed.
 Direction: defines the orientation of the normal probability plot.
 Fitted Line: specifies whether a line should be fit to the data on the normal probability plot.
Optimization
Each analysis window gives the ability to optimize an individual response variable. The
Optimization text pane will automatically calculate and display the optimal settings of the
experimental factors:
Optimize Response
Goal: maximize yield
Optimum value = 87.9101
Factor Low High Optimum

temperature 160.0 180.0 180.0
concentration 20.0 40.0 20.0
catalyst C
The table shows:
 Goal - the type of optimization desired, defined when the experiment was created.
 Optimum value - the predicted response at the optimum setting.
 Low - the low level of the region over which the optimization is performed.
 High - the high level of the region over which the optimization is performed.
 Optimum - the optimum setting of the experimental factors.
In the above example, yield has been maximized with respect to temperature, concentration and
catalyst.
Pane Options
 Factor – If the box for a factor is checked, it will be optimized over the indicated range if the
factor is continuous or over all levels if the factor is categorical. Otherwise, it will be
constrained to match the value specified in the Hold field.
+
 Low – lowest level considered for each factor.
 High – highest level considered for each factor.
 Hold – if not being optimized, the level at which each factor is set.
Optimization
Step #9: Optimize responses
Once a statistical model has been developed for each response, the analyst may now determine
what combination of factors will yield the best results for all responses simultaneously.
Returning to the main DOE Wizard window and Pressing the button labeled Step #9 begins
searching the experimental region for the combination of the experimental factors that maximize
the desirability of the result. To avoid finding a local optimum, a search is performed beginning
at each design point.
When the optimization is complete, a message similar to that shown below will be displayed:
The dialog box indicates the “Desirability” of the final result, based on a metric designed to
balance competing requirements of multiple responses (see the document titled DOE Wizard for
full details). The value displayed in this case indicates that the predicted yield at the optimum
factor settings is 39.55% of the distance between 80 and 100, which was the desired range
specified when the design was created.
If you press OK, additional information will be added to the main DOE Wizard window:
Step 9: Optimize the responses

Response Values at Optimum
Response Prediction Lower 95.0% Limit Upper 95.0% Limit Desirability
yield 87.9101 85.757 90.0632 0.395506
Factor Settings at Optimum

Factor Setting
temperature 180.0
concentration 20.0
catalyst C
The table shows the estimated response at the optimal settings of the experimental factors. For
the sample data, it is estimated that the mean yield will equal 87.91 when the factors are set at
temperature = 180, concentration = 20, and catalyst = C. The 95% confidence interval for the
mean yield ranges between 85.76 and 90.06.
If you push the Tables and Graphs button on the analysis toolbar, you can display the estimated
desirability throughout the experimental region. An interesting type of display is the contoured
surface plot shown below (use Pane Options and the Factors button to select the factors to plot
on each axis):
Desirability Plot
catalyst=C
Desirability
0.0
0.1
1 0.2
0.8 0.3
Desirability
0.4
0.6
0.5
0.4 0.6
0.2 0.7
0 0.8
40
36 0.9
160 32
164 168 28 1.0
172 24 concentration
176 180 20
temperature
It is clear that the best place to operate is in the right front corner of the experimental region.
Step 10: Save results
The button labeled Step 10 allows you to save the results in a StatFolio:
Actually, the StatFolio can be saved at any point and reloaded at a later date.
IMPORTANT: When using the Experimental Design Wizard, two files are created:
1. An experiment file with the extension .sgx which stores information about the
experimental data.
2. A StatFolio with the extension .sgp that stores the results of the analysis.
If you move the experiment to another computer, be sure to transfer both files.
Step 11: Augment Design
Since the conclusions from the design are fairly clear, there is no need to augment the design.
Extrapolation
Step 12: Extrapolate
The maximum predicted yield within the design space is 87.91%. To use the statistical model to
predict settings of the factors outside the experimental region that might produce even better
results, press the button labeled Step 12. The following dialog box will be displayed:
 Start at: the position from which to start the search.
 Change: the factors you wish to consider changing. Only quantitative factors may be
checked.
 Display steps of: The program will begin at the starting location and follow the path of
steepest ascent in an attempt to increase the desirability of the predicted response. Specify the
increment of increased desirability at which the results should be displayed.
 Low and high: The limits within which the factors will be changed.
In this case, we have asked to program to search from the derived optimal conditions and display
improvement of 1% in desirability.
The results of the search are shown in the following table, which will be added to the main DOE
Wizard window:
Step 12: Extrapolate model

Extrapolated Response Values
Step Desirability yield
0 0.395506 87.9101
1 0.412294 88.2459
2 0.429082 88.5816
3 0.44587 88.9174
4 0.462658 89.2532
5 0.479446 89.5889
6 0.496234 89.9247
7 0.513022 90.2604
8 0.52981 90.5962
9 0.546598 90.932
10 0.563386 91.2677
11 0.580174 91.6035
12 0.596962 91.9392
13 0.61375 92.275
14 0.630538 92.6108
15 0.647326 92.9465
16 0.664114 93.2823
17 0.680902 93.618
18 0.69769 93.9538
19 0.714478 94.2896
20 0.731266 94.6253
21 0.748054 94.9611
22 0.764843 95.2969
23 0.781631 95.6326
24 0.798419 95.9684
25 0.815207 96.3041
26 0.831995 96.6399
27 0.848783 96.9757
28 0.865571 97.3114
29 0.882359 97.6472
30 0.899147 97.9829
31 0.915935 98.3187
32 0.932723 98.6545
33 0.949511 98.9902
34 0.966299 99.326
35 0.983087 99.6617
36 0.999875 99.9975
37 1.0 100.186
Factor Settings for Extrapolation

Step temperature concentration catalyst
0 180.0 20.0 C
1 180.2 19.9706 C
2 180.4 19.9412 C
3 180.6 19.9119 C
4 180.8 19.8825 C
5 181.0 19.8531 C
6 181.2 19.8237 C
7 181.4 19.7943 C
8 181.6 19.7649 C
9 181.8 19.7356 C
10 182.0 19.7062 C
11 182.2 19.6768 C
12 182.4 19.6474 C
13 182.6 19.618 C
14 182.8 19.5886 C
15 183.0 19.5593 C
16 183.2 19.5299 C
17 183.4 19.5005 C
18 183.6 19.4711 C
19 183.8 19.4417 C
20 184.0 19.4123 C
21 184.2 19.383 C
22 184.4 19.3536 C
23 184.6 19.3242 C
24 184.8 19.2948 C
25 185.0 19.2654 C
26 185.2 19.236 C
27 185.4 19.2067 C
28 185.6 19.1773 C
29 185.8 19.1479 C
30 186.0 19.1185 C
31 186.2 19.0891 C
32 186.4 19.0598 C
33 186.6 19.0304 C
34 186.8 19.001 C
35 187.0 18.9716 C
36 187.2 18.9422 C
37 187.3 18.8422 C
The program suggests that the best course of action would be to increase temperature while
decreasing concentration. While the model predicts a yield of 100% at a temperature of 187.3
degrees and a concentration of 18.84%, confirmatory runs would be necessary to determine
whether the model gives reasonable predictions that far outside the experimental region.

DOE Wizard - Quantitative and Categorical Factors

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

DOE Wizard - Quantitative and Categorical Factors

Transféré par

Droits d'auteur :

Formats disponibles

STATGRAPHICS – Rev.

DOE Wizard –Designs with Both Quantitative and Categorical Factors

1. Creating a multilevel factorial design involving all combinations of selected levels of

X1: temperature (160 – 180 degrees C)

There is one response variable:

Sample StatFolio: doewiz both.sgp

Step #1 – Define Responses

 Name: The name for the variable is yield.

 Units: Yield is measured as a percentage.

 Analyze: The parameter of interest is the mean percent yield.

 Goal: The goal of the experiment is to maximize the mean.

 Minimum and Maximum: Range of desirable values for the response.

 Name – Each factor must be assigned a unique name.

 Units – Units are optional.

 Role – Therole of each factor is Controllable.

 Low - the lower limits for the continuous factors.

 High - the upper limits for the continuous factors.

Step #3 – Select Design

Step #4: Specify Model

Step #5: Select Runs

The main DOE Wizard window will reflect the design:

Design with both quantitative and categorical factors

Step #6: Evaluate Design

The table of model coefficients is shown below:

Power at Power at Power at

X1 = -1 for level 1, 1 for level 2, and 0 for all other levels

X2 = -1 for level 1, 1 for level 3, and 0 for all other levels

Saving the Design File

Step #7: Save experiment

Analyzing the Results

 Response: column containing the response variable to be analyzed.

 Transformation: the desired transformation to be applied before the model is fit.

 Power and addend: the transformation parameters if a Power or Box-Cox transformation is

Analyze Experiment - yield

Analysis of Variance for yield (%)

Included in the output are:

 Model statistics: summarize the fitted model. Included are:

Simplifying the Model

In the main DOE Wizard window, additional information will be added:

Step 8: Analyze the experimental results

Pareto Chart for yield

Means table for yield with 95% confidence intervals

Estimated Standard Lower Upper

The choices are:

Main Effects Plot

Main Effects Plot for yield

Interaction Plot for yield

Standardized Regression Coefficients

1. For continuous factors, a single X is created by

X = 2 (factor value – center level) / (high level – low level) (5)

2. For categorical factors at k levels, k - 1 X’s are created where

Xj = -1 if the factor is at level 1

Xj = +1 if the factor is at level j+1 (6)

The table for the current example is shown below:

Standardized Regression Coefficients

The StatAdvisor shows the fitted model as:

yield = 68.1542 + 11.5642*temperature - 2.41429*concentration + -0.0541667*catalyst – 0.908333*catalyst -

Unstandardized Regression Coefficients

Unstandardized Regression Coefficients

The StatAdvisor shows the fitted model as:

yield = -121.284 + 1.15694*temperature - 0.241429*concentration + 5.80139*catalyst - 81.7778*catalyst -

Trace Plot for yield

Surface and Contour Plots

Estimated Response Surface

yield = 68.1542 + 11.5642temperature - 2.41429concentration + -0.0541667catalyst – 0.908333catalyst -

yield = -121.284 + 1.15694temperature - 0.241429concentration + 5.80139catalyst - 81.7778catalyst -