Vous êtes sur la page 1sur 2

www.asqdetroit.

org

Quality Herald

The Relationship of
Alpha, Beta and Power
in Design of Experiments

esign of Experiments (DOE) has rapidly become a tool of necessity for any organization
with the need to improve a process, a product, or efficiently develop new technologies.
Numerous advances have been made over the past 20+
years, aided by the roll-out of
Six Sigma programs and the
availability of powerful, inexpensive software specific to
the DOE methodology. An
understanding of several
basic concepts allows the
user to be more effective in
application of the DOE tools.
One overarching concept is
power. Power is the probability of detecting a variable
effect of at least y with an
expected standard deviation
of s.

Power calculations have


been applied routinely over
the decades to estimate
sample size by statisticians
and industrial engineers. Today, commercially available
software includes the power
function as an independent
calculation used in sampling
and DOE replicate estimates. In the early 2000s, one
software developer1 integrated the power metric directly
into the DOE software, making it an essential component
of the design evaluation process. This pre-test evaluation
generates an estimate of test replicates for a given matrix
design at a desired power level. Maintaining a reasonable
power level, say 80% or greater, improves the likelihood of
successfully identifying important variables.
The power concept ties to several statistical concepts,
including hypothesis testing and alpha & beta risk metrics.
These statistical relationships have traditionally been considered difficult to apply without an extensive background

November - December 2011

-By Larry Scott

in statistics or industrial engineering. However, the development of integrated software with inputs requirements
like power are paramount in generating high-quality DOE
designs.

The Power Relationship


In an effort to relate these statistical
concepts to the application of
DOE, some review is in order.
The first tier illustration in Figure
1 represents a normal distribution
curve with a mean of 10.0, a
standard deviation of 1.0, and
assigned a 90% confidence
interval (CI). The region between
the confidence intervals or the
acceptance region, represents 90%
of the distribution area and the
remaining area split evenly at 5% in
each tail.
The combined areas of the tier 1
tails represent the alpha risk, in
this case 10%. The alpha metric
represents the risk of falsely
detecting an effect. In other words,
it reflects the probability of a falsely
rejected hypothesis test, H0: m1 =m2,
implying a difference between
the test mean (m1=10.0) and
sample mean (m2=10.0), when no
difference exists. For this case, a 10% probability that an
independent sampling would generate a mean outside the
acceptance region.
Lets examine hypothesis test options as illustrated in
Figure 1. Option 1, a sample mean (m2) falls inside the
acceptance region where the null hypothesis (H0) fails to
be rejected indicating m1 = m2. Option 2, a sample mean
(m2) falls outside the acceptance region in supports of
the alternative hypothesis (Ha) implying a difference
relative to the test mean (m1), i.e. m1 m2. The alpha
risk represents the probability that the sample mean falls
outside the acceptance region, when no difference exists
between sample and test means. Responding to a false
Page 6

www.asqdetroit.org

Quality Herald

effect could result in an effort to capture a non-existent


improvement resulting in wasted resources. This condition
is referred to as a type I error. The power of this test is the
probability that the null hypothesis will be rejected when it is
indeed false.
A more common situation is illustrated in tier 2 of Figure 1.
In this example, the test mean of 10.5 clearly falls outside
the acceptance region, but a portion of the distribution
overlaps the acceptance region. Outside the acceptance
region where m1 m2, warrants rejection of the null
hypothesis, H0: m1=m2. It reflects the accurate conclusion
that the new mean 10.5 is statistically different from the test
mean (10.0).
The overlap region represents a possible sampling drawn
from a population with a mean of 10.5, but yields a sample
mean that falls within the acceptance region. The overlap
represents the beta risk, the probability of failing to detect
a real effect, which in this case is approximately 30%. This
condition implies no difference between the sample mean
and the test mean, m1 = m2, when a difference does exist,
i.e. m1 m2. A failed detection would likely result in no
effort to capture a potential improvement. This condition is
referred to as a type II error.
For the tier 2 illustration, the area outside the acceptance
region represents the power of the test, which is equal to
(100% * 1-b), or 70%. In other words, the probability of
finding a variable effect is 70% with a beta risk (b) of 30%.
The preferred power level for a designed experiment is
equal to or greater than 80%. Eighty percent is a trade-off
between resource usage, i.e. number of tests or replicates
versus the chance of missing a true difference in means.
The tier 3 distribution represents a high power condition.
The beta risk for this case is the probability of a sample
drawn from a population with a mean of 11.0 that would
yield a mean within the acceptance region. A possible, but
unlikely condition represented by the infinite reach of the
left tail of the distribution across the acceptance region.

Improving Power of a DOE


Power is a multi-dimensional concept that depends on
numerous elements previously described: signal, noise,
alpha, beta, etc. Management of these components offer
opportunity to improve test power; subsequently improving
the chances of a successful designed experiment. Several
improvement options are outlined here2.

1) The size of the difference in the output response. The


larger the response delta or signal (Dy) of the system
signal to noise ratio (Dy/s), the greater the power. For
a given noise level, a larger signal provides greater power.
2) The size of the experimental error. For a given signal
(Dy), a smaller experimental error (s) as represented by
system noise, provides greater power. Options to reduce
system noise run the gamut from clarifying standard
operating procedures (SOP) or simplifying operator

November - December 2011

work instructions to reduce the need for procedural


interpretation and minimizing operator decisions; or
implementation of better equipment to reduce common
cause variation.
3) The size of the alpha (a) risk. The alpha risk represents
the risk of falsely detecting a variable effect. Most
software packages default to an alpha of 5%, yielding
a confidence of 95%. Depending on the phase of
experimentation3 for example the discovery or screening
phase, a sufficient power level can be achieved by
selecting a larger alpha, say 10%. An alpha of 10% in an
early phase of experimentation provides a relative high
level of confidence (90%).
Alpha and beta risks are inversely proportional. So,
decreasing alpha, say 10% to 5%, while maintaining
a constant sample size, results in an increased beta
and vice versa. To decrease both a and b risks
simultaneously, either increase sample size through
increased replicates, use of larger designs, or increase
the delta of the sample and test means (Dy) during
planning of the designed experiment.
4) Apply an appropriate experimental design. An orthogonal
or balanced design matrix provides a high level of
power. However, as the number of input factors or levels
increases, orthogonal designs become large, providing
power to estimate coefficients for higher order regression
terms (ABC) that are likely insignificant, therefore wasting
resources. The experimenter needs to understand design
selection as related to the phase of experimentation, i.e.
discovery, breakthrough, optimization and validation,
as well as design resolution4 to optimize results with a
minimum number of experimental treatments.
5) The number of replicates. Experiment replicates and
repeats are often confused concepts. Replicates require
re-setting an experiment treatment; repeats are simply a
sequence of counts from the same experimental replicate.
Replicates provide a more accurate measure of system
variation than do repeats, as well as provide higher power.

The benefits of a high quality DOE include: identification


and quantification of significant effects, reduced resources
through hidden replication, as well as the generation of
useful empirical models. One key aspect of a quality
design is its power to detect significant variable effects.
Power relates to several statistical elements, including
system signal and noise, as well as alpha & beta risks.
The ability to optimize these components allows the
experimenter to generate high-quality, robust designs that
yield accurate models.
1
Design Expert Software, Stat-Ease, Inc., www.statease.com.
2

Experiment Design Made Easy, Stat-Ease Inc. Training Manual,


Version 18.06, Section 2, Page 19.
3

L. A. Scott, Design of Experiments: Strategy, Quality Herald


Newsletter; Volume 21, Issue 3, 2006.
4

L. A. Scott, Design of Experiments: Array Designs, Quality Herald


Newsletter; Volume 21, Issue 2, 2006.

Page 7

Vous aimerez peut-être aussi