Vous êtes sur la page 1sur 3

1.3.5.14. Anderson-Darling Test http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.

htm

1. Exploratory Data Analysis


1.3. EDA Techniques
1.3.5. Quantitative Techniques

1.3.5.14. Anderson-Darling Test

Purpose: The Anderson-Darling test (Stephens, 1974) is used to test if a sample of data
Test for came from a population with a specific distribution. It is a modification of the
Distributional Kolmogorov-Smirnov (K-S) test and gives more weight to the tails than does
Adequacy the K-S test. The K-S test is distribution free in the sense that the critical
values do not depend on the specific distribution being tested (note that this is
true only for a fully specified distribution, i.e. the parameters are known).
The Anderson-Darling test makes use of the specific distribution in
calculating critical values. This has the advantage of allowing a more
sensitive test and the disadvantage that critical values must be calculated for
each distribution. Currently, tables of critical values are available for the
normal, uniform, lognormal, exponential, Weibull, extreme value type I,
generalized Pareto, and logistic distributions. We do not provide the tables of
critical values in this Handbook (see Stephens 1974, 1976, 1977, and 1979)
since this test is usually applied with a statistical software program that will
print the relevant critical values.

The Anderson-Darling test is an alternative to the chi-square and


Kolmogorov-Smirnov goodness-of-fit tests.

Definition The Anderson-Darling test is defined as:


H0: The data follow a specified distribution.
H a: The data do not follow the specified distribution
Test The Anderson-Darling test statistic is defined as
Statistic:

where

F is the cumulative distribution function of the specified


distribution. Note that the Yi are the ordered data.
Significance
Level:

1 of 3 09/12/2014 10:34 p. m.
1.3.5.14. Anderson-Darling Test http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm

Critical The critical values for the Anderson-Darling test are dependent
Region: on the specific distribution that is being tested. Tabulated
values and formulas have been published (Stephens, 1974,
1976, 1977, 1979) for a few specific distributions (normal,
lognormal, exponential, Weibull, logistic, extreme value type
1). The test is a one-sided test and the hypothesis that the
distribution is of a specific form is rejected if the test statistic,
A, is greater than the critical value.

Note that for a given distribution, the Anderson-Darling


statistic may be multiplied by a constant (which usually
depends on the sample size, n). These constants are given in
the various papers by Stephens. In the sample output below, the
test statistic values are adjusted. Also, be aware that different
constants (and therefore critical values) have been published.
You just need to be aware of what constant was used for a
given set of critical values (the needed constant is typically
given with the critical values).

Sample We generated 1,000 random numbers for normal, double exponential,


Output Cauchy, and lognormal distributions. In all four cases, the Anderson-Darling
test was applied to test for a normal distribution.

The normal random numbers were stored in the variable Y1, the double
exponential random numbers were stored in the variable Y2, the Cauchy
random numbers were stored in the variable Y3, and the lognormal random
numbers were stored in the variable Y4.

Distribution Mean Standard Deviation


------------ -------- ------------------
Normal (Y1) 0.004360 1.001816
Double Exponential (Y2) 0.020349 1.321627
Cauchy (Y3) 1.503854 35.130590
Lognormal (Y4) 1.518372 1.719969

H0: the data are normally distributed


Ha: the data are not normally distributed

Y1 adjusted test statistic: A2 = 0.2576


Y2 adjusted test statistic: A2 = 5.8492
Y3 adjusted test statistic: A2 = 288.7863
Y4 adjusted test statistic: A2 = 83.3935

Significance level: = 0.05


Critical value: 0.752
Critical region: Reject H0 if A2 > 0.752

When the data were generated using a normal distribution, the test statistic
was small and the hypothesis of normality was not rejected. When the data
were generated using the double exponential, Cauchy, and lognormal
distributions, the test statistics were large, and the hypothesis of an
underlying normal distribution was rejected at the 0.05 significance level.

2 of 3 09/12/2014 10:34 p. m.
1.3.5.14. Anderson-Darling Test http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm

Questions The Anderson-Darling test can be used to answer the following questions:

Are the data from a normal distribution?


Are the data from a log-normal distribution?
Are the data from a Weibull distribution?
Are the data from an exponential distribution?
Are the data from a logistic distribution?

Importance Many statistical tests and procedures are based on specific distributional
assumptions. The assumption of normality is particularly common in
classical statistical tests. Much reliability modeling is based on the
assumption that the data follow a Weibull distribution.

There are many non-parametric and robust techniques that do not make
strong distributional assumptions. However, techniques based on specific
distributional assumptions are in general more powerful than non-parametric
and robust techniques. Therefore, if the distributional assumptions can be
validated, they are generally preferred.

Related Chi-Square goodness-of-fit Test


Techniques Kolmogorov-Smirnov Test
Shapiro-Wilk Normality Test
Probability Plot
Probability Plot Correlation Coefficient Plot

Case Study Josephson junction cryothermometry case study.

Software The Anderson-Darling goodness-of-fit test is available in some general


purpose statistical software programs. Both Dataplot code and R code can be
used to generate the analyses in this section.

3 of 3 09/12/2014 10:34 p. m.