Vous êtes sur la page 1sur 6

Accred Qual Assur (2009) 14:7378 DOI 10.

1007/s00769-008-0457-8

GENERAL PAPER

The relationship between accreditation status and performance in a prociency test


Michael Thompson Kenneth Mathieson Linda Owen Andrew P. Damant Roger Wood

Received: 1 May 2008 / Accepted: 28 August 2008 / Published online: 16 September 2008 Springer-Verlag 2008

Abstract In the FAPAS prociency testing scheme, participants are asked to state whether the analytical method used was accredited or not accredited. It is thus possible to compare the stated accreditation status with performance in the scheme. For this purpose, fty qualifying examples of analyte-test material combination were selected at random from the reports from the year 2006. The accredited/non-accredited subsets of results from each example were subjected to a statistical analysis to determine whether any signicant differences between the distributions of results could be detected. Outliers were removed from the datasets before the main statistical tests and considered separately. The inlying data were subjected to non-parametric tests for differences in central tendency and dispersion. A few signicant examples were found, but could be reasonably attributed to chance. Among the inliers there were no grounds to reject the overall null hypothesis, that is, that accreditation has no effect on performance.

However, the proportion of outliers was about twice as high among the non-accredited group. Keywords Prociency test Accreditation status Performance

Introduction The relationship between accreditation status and performance in prociency tests is interesting. Accreditation usually requires participation in a prociency test where one is available. However, not all participants are accredited, and this circumstance allows the comparison in performance between accredited and non-accredited status where the information is available. Such studies have been previously reported [13]. Of course, it is not safe naively to equate performance in a prociency test with general performance. Prima facie we would expect laboratories using accredited methods to perform better, as their results are regularly subjected to third-party scrutiny. However, accreditation agencies seldom make such a claim and there are indeed cogent reasons why differences between subsets might be small. For example, most non-accredited laboratories would be very much aware of the quality requirements in their own sector and may be working to standards that mimic accreditation. Non-accredited participants may well have accreditation as a goal for the near future. Moreover, they will be trying to comply with the schemes performance criterion, where such exists. Given these features, it is not surprising that some previous studies have failed to nd any noteworthy differences between the accredited and non-accredited subsets. In contrast, however, one study [3] has reported that accredited laboratories performed substantially better.

M. Thompson (&) School of Biological and Chemical Sciences, Birkbeck (University of London), Malet Street, London WC1E 7HX, UK e-mail: m.thompson@bbk.ac.uk K. Mathieson L. Owen FAPAS, Central Science Laboratory, Sand Hutton, York, YO41 1LZ, UK A. P. Damant Food Standards Agency, Aviation House, 125 Kingsway, London WC2B 6NH, UK R. Wood Food Standards Agency, c/o Lincolne, Sutton and Wood Ltd, 70-80 Oak Street, Norwich NR3 3AQ, UK

123

74

Accred Qual Assur (2009) 14:7378

These reported studies, however, have been based on results extracted from a limited range of analytical methods and test materials. Moreover, some of the conclusions were based on an analysis of scored and classied results, which may have reduced the information content potentially available in the raw results. The present study was designed to be as general as possible, by using raw results from a large number of different analyte/matrix combinations from a single prociency testing scheme.

accredited for the method used? to which participants could only answer yes or no or refrain from answering. No further information on accreditation status was available. There were no apparent reasons to assume that the stated accreditation status of a method was incorrect or acting as a proxy for a different attribute of the participants.

Statistical analysis Each combination of test material and analyte was considered separately. Outliers were dened as results falling ^ outside the limits l 3:5^ where the statistics were estir mated from the whole dataset by Hubers H15 procedure. The factor 3.5 is, of course, arbitrary, but the value was selected because of the tendency of results from prociency test to be heavily tailed but to include a small proportion of extreme outliers. Exclusion of the results in the heavy tails by the use of a smaller factor would have subsequently resulted in inappropriately small variance estimates (which were not robustied in subsequent tests). Outliers thus identied were excluded from the main statistical analysis, but examined separately. The inlying results were classied into the subsets (a) accredited, (b) not accredited, and (c) unspecied, and the results rstly examined visually (see Fig. 1 for an example) to ensure that no unexpected circumstance would be likely to affect the outcome of the subsequent statistical analysis. No statistical examination of the unspecied group was undertaken: grounds for assuming that the unspecied group was part of the non-accredited group were considered inadequate to justify combining them. However, the unspecied group was examined visually but offered no unexpected features that suggested more detailed examination.

Selection of datasets Results from 50 different combinations of analyte and test material were taken for study from results of the FAPAS prociency testing scheme for food analysis (FAPAS Secretariat, Central Science Laboratory, Sand Hutton, York, YO41 1LZ, UK). Results were selected initially by listing in a random order all of the 206 FAPAS reports from year 2006. Qualifying reports and analytes were then taken in order from the randomised list. To qualify, reports needed the following properties. more than 20 rounds of the test had previously taken place, so that the selected tests could be regarded as stabilised in their characteristics; more than 14 participants reported results in both of the accredited and non-accredited subsets, to ensure that statistical tests had a reasonable power; reported results were largely or wholly on interval or rational scales of measurement; the measurand was a chemical entity; reported results had a sufcient number of signicant gures in comparison with the dispersion to make a statistical analysis meaningful.

Repeated selection of the same analyte/test material combination was avoided if possible. Because of a dearth of qualifying instances, however, a small amount of duplication was necessary. Many of the reports included results for several analytes. Usually in such instances the results were derived from different analytical methods and were for all practical purposes independent, even though the participants were mostly common. In most instances, individual analytes were excluded from the study if there was a strong correlation between the results, as sometimes occurs when several analytes were determined simultaneously in a single procedure. In some instances, studies with less than 15 participants in either group were included when other analytes in the report qualied or some results were excluded as outliers from the main statistical analysis. All of the reported raw results in selected reports and analytes were considered in the study. The accreditation status of each participant was determined by the response to the question Is your laboratory

Fig. 1 Results for arsenic (ppm mass fraction) in FAPAS according to accreditation status. The results show a strong deviation from the normal distribution, which is not related to status

123

Accred Qual Assur (2009) 14:7378 Table 1 Results of statistical analysis FAPAS Test round material Analyte Mass fraction or unit % % % % % % % % % ppm ppb ppb ppb ppb % % % ppb ppm ppm ppm ppm ppm ppm ppm ppm ppb ppb mEq kg-1 % mg L-1 mg L-1 mg L ppb ppb ppb mg L-1 ppb ppb ppb ppb ppb ppb
-1

75

Concentration No. obser-vations Acc 52.42 3.98 25.32 2.043 1.33 1.92 35.32 29.05 5.5 394 1.28 0.682 1.41 0.37 7.233 60.48 31.83 66.5 8.35 1.37 16.2 5.92 6.04 10.1 15.35 3.91 2282 686 10.86 0.25 266 42.2 21.1 2.53 6.09 118 192 96.5 1197 33.2 267 307 427 196 78 68 49 72 37 43 22 21 21 37 32 30 31 29 44 41 40 32 19 37 29 24 17 28 32 21 25 24 30 18 17 16 22 33 43 27 28 17 24 39 19 32 33 22 Not 23 27 9 21 22 26 14 15 18 17 17 14 14 11 11 11 11 12 18 18 19 18 18 16 13 21 15 11 16 23 12 14 14 21 25 23 15 18 25 18 22 25 22 21

No. outliers Acc 1 3 2 4 1 4 0 1 0 1 0 0 0 1 2 3 3 2 0 2 0 3 0 0 0 0 0 0 0 0 2 2 1 1 0 2 0 0 2 1 0 1 1 2 Not 1 2 1 1 3 0 2 1 0 1 1 2 3 3 0 0 0 5 1 0 0 0 1 1 1 0 0 0 1 1 4 3 1 2 1 0 1 1 2 0 1 0 2 1

Median difference Value 0.14 -0.04 -0.03 -0.01 0.005 -0.03 0.21 0.34 -0.34 64.7 0 -0.02 0.09 0.02 0.101 0 0 -4.2 0.49 0.05 -0.22 0.05 0.035 -0.06 0.03 -0.15 496 86 0.126 -0.01 22.37 2.17 0.48 -0.03 -0.1 4 -6.6 -1.16 63.5 -2 3 1.64 30.5 8.45 p 0.18 0.01 0.88 0.28 0.83 0.33 0.57 0.35 0.19 0.97 0.69 0.51 0.68 0.192 1 0.97 0.42 0.41 0.35 0.74 0.32 0.84 0.52 0.62 0.24 0.07 0.23 0.64 0.33 0.061 0.48 0.82 0.84 0.69 0.6 0.276 0.226 0.315 0.9 0.92 0.37 0.16

Variance ratio Value p 0.77 1 1.24 1.05 2.23 0.81 2.17 0.57 1.02 0.45 0.42 0.72 1.32 3.61 2.28 1.48 0.19 0.77 0.47 0.54 0.41 0.52 1.03 0.75 0.83 0.72 0.47 0.45 0.72 0.39 0.967 0.34 0.39 1.13 0.872 3.31 0.48 0.79 1.01 1.88 0.62 0.52 0.49 0.92 0.56 0.83 0.051 0.67 0.21 0.54 0.21 0.608 0.14 0.21 0.62 0.538 0.0015 0.041 0.29 0.033 0.57 0.17 0.22 0.19 0.25 0.89 0.81 0.68 0.49 0.2 0.34 0.58 0.62 0.26 0.97 0.023 0.078 0.75 0.9 0.072 0.059 0.6 0.988 0.0925 0.3 0.28

0148

Meat

Moisture Ash Fat Nitrogen Na Cl

2811

Honey

Fructose Glucose Sucrose

2046 0481

Meat Meat

SO2 Aatoxin B1 Aatoxin B2 Aatoxin G1 Aatoxin G2

0.0003 1.24

1445

Oils/fats

Saturated FA Monounsat FA Polyunsat FA

0762

Crab meat

Hg As Cd Cu

1054

Feeding stuff Ash Crude bre Moisture Protein Total oil

2228 1448 0362

Maize Cooking oil Soft drink

Fumonisin B1 Fumonisin B2 Peroxide value Acidity Allura red Carmoisine Sunset yellow

0.0074 0.67

1746 1749 0272 0359 2229 0544 0770

Cereal Cereal Pig muscle Coffee Wheat our Soya our

Ochratoxin Ochratoxin Chlortetracycline Caffeine Deoxynivalenol Total As Cd Pb Total Hg

Sulfadimethoxine ppb

Vegetable oil Gammm-hch

123

76 Table 1 continued FAPAS Test round material Analyte Mass fraction or unit % % % % mg L-1 mg L-1 Concentration No. obser-vations Acc 75.87 0.707 0.194 0.285 39.3 153 62 53 31 37 27 24 Not 18 17 16 18 12 15

Accred Qual Assur (2009) 14:7378

No. outliers Acc 1 4 0 0 2 1 Not 0 2 3 2 2 1

Median difference Value 0.13 0 0.003 p 0.12 0.98 0.56

Variance ratio Value p 1.04 0.99 0.64 1.02 1.58 4.8 0.83 0.87 0.5 0.9 0.32 0.0035

0150

Canned meat

Moisture Ash Na Cl Acesulfame-K Aspartame

0360

Tonic water

-0.01 0.59 -0.155 0.75 2.95 0.39

Concentrations are expressed as mass fraction unless otherwise stated FA fatty acid

The accredited and not-accredited subsets were examined for difference between the central tendencies and between dispersions. First they were subjected to the MannWhitney test of the difference between the medians. A non-parametric test was preferred because of the considerable deviation of some of the subsets from the normal distribution (see for example Fig. 1). They were then subjected to the variance ratio test but with p values estimated by the bootstrap, as in some instances they would otherwise be unduly affected by non-normality. The results are shown in Table 1. (The subsets were also subjected to the KolmogorovSmirnov two-sample test (example Fig. 2) but, in the event, this test was more conservative than the previous combination and provided no extra information.) When the 50 sets of statistics were accumulated, they were subjected to an overview, (a) by comparing the proportions of outliers (as dened above) deleted from the accredited and not-accredited groups, and (b) by examining the sets of 50 collected p values. The p value is the probability of observing the actual test statistic, or a more

extreme value, if the null hypothesis is true. Under that circumstance, the collected p values would be a random sample from a uniform distribution over the range (0, 1), that is, p*U(0, 1). This was tested via the Kolmogorov Smirnov one-sample test.

Results and discussion Central tendency (inlying results) Of the 50 individual studies, three gave rise to median differences that were signicant at a condence level of 95%, in comparison with the expectation of 2.5 instances under a general null hypothesis. Of these, one difference (ash in Round 0148) was of small and inconsequential magnitude for practical purposes (a relative difference of 1% between medians) and the signicance was the outcome of an unusually small dispersion of results. The other two occurred where the particular analyte was seldom subjected to prociency testing, although the series itself (including many possible analytes) was well established. When the results were considered as a whole, the p values were distributed in a manner not visually distinguishable from a random sample from a uniform distribution of range (0, 1) (Fig. 3), which is what would be expected under the general null hypothesis. The KolmogorovSmirnov one-sample statistic provided a two-tailed probability of p [ 0.2 (Fig. 4). Thus there were no grounds for rejecting the null hypothesis that accredited and non-accredited laboratories in this study provided results with identical medians. This outcome is not surprising, as there were no obvious reasons for suspecting a difference of the central tendency between accredited and non-accredited laboratories. One possible cause might have been the use of different analytical methods in the two subsets of laboratories. No such inuence was evident in this study in those instances where a p values of less than 0.05 was apparent.

Fig. 2 Cumulative distributions of results for arsenic (ppm mass fraction) from FAPAS, showing the format for the two-sample KolmogorovSmirnov test

123

Accred Qual Assur (2009) 14:7378

77

Fig. 3 Probability values from the signicance tests for median difference equal to zero, and variance ratio equal to unity. The probabilities refer to the observation of the data at least as extreme as that encountered, under the assumptions of the null hypothesis, independence and randomness

Dispersion Variance ratio among inliers Of the 50 individual studies, ve provided variance ratios that were signicantly different from unity at a condence level of 95%, in comparison with an expectation of 2.5 instances. Of these, three showed a greater variance among the accredited laboratories. In only two instances did the non-accredited participants produce results with a greater variance. With the small numbers of observations, the power of these 50 individual tests was not high. However, a higher power could be achieved by considering the results overall. As with medians, under the general null hypothesis, we would expect the distribution of collected p values to be p*U(0, 1). When the results were thus considered, the p values (Fig. 3) were not visually distinguishable from a random sample and the KolmogorovSmirnov one-sample statistic provided a two-tailed probability of 0.05 \ p \ 0.1. Moreover, the D-statistic (the maximum vertical distance between the cumulative probability distributions) occurred towards the centre of the distribution (Fig. 5) so was not indicative of a possible excess of p values less than 0.05. Thus, in this study, there were no grounds for rejecting the general null hypothesis, namely that the
Fig. 5 Cumulative distribution of two-tailed p values from the variance ratio test for difference between medians compared with the uniform distribution expected under the general null hypothesis. The maximum deviation (D) occurs towards the centre of the distribution

dispersion of results from accredited and non-accredited laboratories were identical. Proportion of outliers The numbers of outliers in individual studies were too small to determine whether the proportions in the accredited and the not-accredited groups were signicantly different. When the 50 studies were examined as a whole, however, the proportion of outliers was higher in the notaccredited group (7.0%, 61/870) than the accredited group (3.4%, 56/1637). It is difcult rigorously to test these proportions for signicant difference because they accumulated from 50 separate binomial distributions with different unknown probabilities. However, an approximate test produced a two-sided probability of 0.05, so there is little room for doubt that the difference between accredited and non-accredited methods is real in this respect.

Conclusions No differences between laboratories using accredited and non-accredited methods were found in this study when inlying results were considered. The was a slight excess of instances where the probabilities of the variance ratio tests fell below 0.05, but these were nearly evenly divided between the two extremes of the variance ratio. Laboratories using accredited methods gave rise overall to a smaller proportion of outliers. These outcomes can safely be regarded as representative of the food sector (in 2006) because the individual sets of results were selected from an international prociency test in a way that (a) obviated investigator preferences and (b) included a wide range of analyte/matrix combinations.

Fig. 4 Cumulative distribution of two-tailed p values from the MannWhitney test for difference between medians compared with the uniform distribution expected under the general null hypothesis

123

78

Accred Qual Assur (2009) 14:7378

These ndings are consistent with those of King et al. [1] and Visser [2] but apparently in sharp contrast with those of Morris and Macey [3]. The reasons for this inconsistency are not immediately apparent, but may reect the fact that the last study was based on results that had been converted to scores and classied by a procedure that was incompletely dened in the paper. It is also possible that accreditation has a less discriminating role in the food analysis sector than in others.

References
1. King B, Boley N, Kannan G (1999) Accredit Qual Assur 4:280291. doi:10.1007/s007690050369 2. Visser RG (1999) Accredit Qual Assur 4:108110. doi:10.1007/ s007690050326 3. Morris A, Macey D (2004) Accredit Qual Assur 9:5254. doi:10. 1007/s00769-003-0736-3

123

Vous aimerez peut-être aussi