Académique Documents
Professionnel Documents
Culture Documents
Levenes test is a
There has been a release of
non-parametric test for
toxic substances at a
variances. The null
factory. The leak has been
hypothesis assumes
measured with 2 sensors
the variance of the
across 7 different sites for
two samples to be equal. The above figure is a table of
up to 4 hours and once from The Hampels test, tests for outliers based on the Site-Hour samples which have same variance as the
an hour before the accident. deviation of each observation from the median. We Before site because their p-values are clearly >
. performed the Hampels test per sample. Here in the =0.05. The rest of the samples have different spread.
Our main goal is to answer the following questions: above example the Z value is greater than 3.5 hence it
1. Are the measured PPM values at the is an outlier. We use the result of Hampels for the rest
different sites worse than normal level? of our analysis because it detects better and more Wilcoxon rank sum test
2. If so, which sites are really affected. outliers than Tukeys.
The normal PPM levels are given by the Before Advanced Normality tests
site. So to determine if the PPM values from other
sites are worse than normal, we compared each
measurement made per hour after the accident with The K-S test is sensitive to any difference in the
the hour from before the accident. If most hours of a underlying distributions of the two samples.
given site appears worse, then we can say the site Substantial difference in shape, spread or median
is generally affected. results in small p-value. Therefore we applied the
Wilcoxon rank sum test to detect the actual location
Overall Structure shift. It tests equality of medians of two given samples.
Most of the samples appeared to have outliers. So we We found only two samples (above table) have the
The general order in which we proceeded to performed the rest of the tests by removing the same median as the Before site. One important thing
analyse the data is as follows. outliers. The above example represents a set of to note is that this test has strong assumption that the
thorough normality tests we performed on each two input samples have equal variance but we can
sample. If the p-value of any of the tests is >=0.05, clearly see that the samples in the tables above also
then the sample could be normal. But this gives rise to have equal variance from the Levenes test results
the issue where some tests may accept normality above.
while others reject it. To avoid this issue, we prioritise
the result of each test by its strength and the t-test
consensus on the result can be drawn effectively from
this order. The order is: Shapiro-Wilk >
Anderson-Darling > Cramer-von Mises >
Kolmogorov-Smirnov. In the above example, all the
tests accept normality, hence the sample is normal.
The result of this test allows to decide whether to use
parametric or non-parametric distribution free tests for
comparing each sample against the Before site.