Vous êtes sur la page 1sur 11

The Kolmogorov-Smirnov Test

Vasileios Hatzivassiloglou University of Texas at Dallas

Kolmogorov-Smirnov test
A fully non-parametric test for comparing two distributions Does not depend on approximations for the distribution

Empirical distribution function


For a random variable X and a sample {x1, x2, ..., xn} the empirical distribution function of X is defined as 1 n FX ( x ) I ( xi x ) ni1

where I(condition) is the indicator function, i.e., 1 if the condition is true and 0 otherwise
3

Example data
FX is an estimate of
the cumulative probability function of X

Consider the following example data: {1.26, 0.34, 0.70, 1.75, 50.57, 1.55, 0.08, 0.42, 0.50, 3.20, 0.15, 0.49, 0.95, 0.24, 1.37, 0.17, 6.98, 0.10, 0.94, 0.38} n = 20 Is this data normal?
4

Examining the data


Sorted data: {0.08, 0.10, 0.15, 0.17, 0.24, 0.34, 0.38, 0.42, 0.49, 0.50, 0.70, 0.94, 0.95, 1.26, 1.37, 1.55, 1.75, 3.20, 6.98, 50.57} Mean: 3.61, Standard deviation: 11.2

Examining the data for normality


For normal data,
15% should be below one s.d. from the mean (3.61-11.2 = -7.59) none of the samples are even negative about 2% should be above two standard deviations from the mean (3.61+2 11.2=26.01) here we have one in 20 samples way beyond that value (50.57)
6

Example empirical distribution

Log transformation

The Kolmogorov-Smirnov test


Given two cumulative probability functions FX and FY, the test statistics are

D D

max( FX ( x ) FY ( x ))
x

max( FY ( x ) FX ( x ))
x

Usually the value D=max{D+, D-} is used (although its distribution is harder to study than either D+ or D-)
9

Comparing distributions

10

Advantages of KolmogorovSmirnov
It is non-parametric and hence robust It does not rely on the means location only (like the t-test) It works for non-normal data (the t-test can fail if the data is too far from normal) It is not sensitive to scaling It is more powerful than 2 However, it is less sensitive than t if the data is indeed normal 11

Vous aimerez peut-être aussi