In the Classroom
Using Candy Samples To Learn about Sampling Techniques and Statistical Data Evaluation
Larissa S. Canaes, Marcel L. Brancalion, Adriana V. Rossi, and Susanne Rath* Department of Analytical Chemistry, Institute of Chemistry, State University of Campinas,13084971 Campinas, SP, Brasil; *raths@iqm.unicamp.br
The first step in any chemical analysis is to obtain an ana lytical sample of the bulk material. In effect, the reliability of an analytical result is often conditioned by the quality of the original sample. The sample must have the same chemical and physical properties of the raw material, so that it represents well the material that will be analyzed. In these cases, sampling is
directly responsible for the accuracy of the analytical results. The best way to sample would be to obtain large samples at random from the total population, based on the idea that as the sample size approaches the population size the errors decrease to zero (1). In practice, some factors—such as measurement costs and facilities for manipulating huge amounts of the bulk material— make it impractical to select large, essentially unlimited samples.
A typical random sample is usually far smaller than desired, rais
ing concerns about how accurately the sample really represents the bulk material. This doubt can be answered by statistical analysis of the data (2). These facts justify the important need for students to un derstand all the challenges involved in sampling techniques, the first step of any chemical analysis. In spite of that, students in classroom experiments are usually presented with homogeneous samples, so they tend to believe that sampling and statistical analyses are not problems that they have to deal with. Some references do describe ways to present to students how random sampling works and how it could be representative (1, 3), while
others propose different exercises emphasizing statistical analysis
of data (4–6). However, these exercises usually require extended
laboratory periods or are very theoretical. In 2000, Ross (7) proposed a simple, fast, and didactic classroom exercise using colored candies, which could easily demonstrate the effect of sample and particle size in sampling. However, this paper does not address statistics; this is more fully explored by Vitha and Carr (2). Inspired by these papers (2, 7), we developed and imple mented a more complete classroom exercise for undergraduate and beginning graduate students to explore both sampling and statistics. It is an easy, interesting exercise that takes ~1.5 hours to demonstrate the effects involved in sampling techniques (sample amount and particle size and the representativeness of the sample in relation to the bulk material). This exercise also includes a simple statistical approach to commonly used parameters (mean, median, standard deviation, errors, quartiles, and confidence limits), presentation of results, graphs (histogram, boxplot, and whisker plot) and related tests (normality, outliers, significance) using parametric and nonparametric statistical methods.
Procedure
Materials For the sampling exercise, we used sugarcoated, round
chocolate candies available in several colors, sizes, and types. All
of the candies were purchased at a local market; the quantities
given should be adequate for undertaking this activity in a class of 10–35 students. The variety of sizes, shapes, and colors is important for representing heterogeneity in data.
• 10 packages (104 g) of M&M ^{1} candies
• 1 package (98 g) of M&M candies with peanuts
• 2 packages (35.2 g) of M&M Minis ^{2}
• 1 package (80 g) of Confeti ^{3} chocolate candies
• Disposable gloves
• Plastic tray or paper plates
• Paper cups (50 and 200 mL capacities)
The gloves, plastic tray or paper plates, and paper cups were used in all the sample manipulation with hygiene in mind, so that the samples could be eaten at the end of the experiment. This exercise has been developed and used with students in both an undergraduate classical analytical chemistry course and a graduate course, Statistics in Analytical Chemistry, from 2000 to present.
Data Acquisition
In the first step, students were divided into ten groups and each group was responsible for data acquisition from one bag of the regular M&M candies. Each group counted the candies, separating and reporting them by color. The students were asked to compile the data and to start the statistical evaluation.
Parametric and Nonparametric Approaches
In order to show the students the theoretical and practical differences between parametric and nonparametric approaches, the raw data were statistically evaluated by comparison of param eters, representations, and by the application of tests from both statistical methods. The results were also statistically compared to an average composition of the bags, provided at the manu facturer’s Web site. ^{1}
Sample Amount Effect—Part I
Next, all the M&M candies from the 10 bags that had been sorted by color were put together in a large plastic tray to simu late the population of a “bulk material”. The students obtained this composition by aggregating the data of all 10 bags. Then, the groups collected “samples” of the bulk material, using two different sizes of cups (50 and 200 mL). Each group sampled five times. In the next step, a reducing procedure by quartering was used to enable the students to evaluate how representative each kind of sampling is. For the quartering procedure, the candies were uniformly spread inside a circular tray and then divided radially into quarters. The opposite quarters were combined. This process was repeated two more times. The statistical treat ment of data was the same as used before.
© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education
1083
In the Classroom
Sample Amount Effect—Part II
All the candies were returned to the tray and some different candies (purple in color; 0.99% of the total in the bag) from another manufacturer were added to the mix. In our study, Con feti ^{3} candies were used because they are very similar to M&Ms in terms of size. The two sampling procedures used in Part I and described above were also used in Part II, although just the purple candies were counted and analyzed statistically, because this condition simulates an analyte present in low concentra tion as well as mimicking the effect of using different sampling methods.
Particle Size Effect The last observations explored the influence of different particle sizes on a sampling method, using a simple visual exer cise. Students combined candies of different sizes (two bags of M&M peanut candies, one bag of M&M Minis, and some of the regular M&Ms used before) in a large glass jar, and mixed them very well to qualitatively observe the size distribution after different mixing procedures.
Data Acquisition
After counting and compiling the raw data, the students were asked to organize these data by percentage of candies of each color. We instruct students to be aware of significant figures and observe that the smallest unit possible is one candy. As the num ber of candies per bag is about one hundred, the number of can dies of each color has two significant figures. Thus, the percentage of candies must be represented by a maximum of two significant digits, without decimals. Table 1 reports the results from a typical classroom exercise used for the statistical treatments. In this exercise, each bag was considered as a sample origi nating from a total population (in the manufacturer’s produc tion line), the colors were the property being measured and the counting results were the measurements in a total of ten replicates represented by the number of bags used.
The Parametric Approach
In order to introduce the concepts important to a paramet ric approach, the different ideas of random (or indeterminate), systematic (or determinate), and gross errors were presented. The instructor discussed how different types of error affect the final results. In the present experiment, a gross error is exemplified by accidentally dropping some or all of the candies (loss of sample):
as a consequence the experiment must be restarted with a new bag. In some instances one set of measurements apparently lies an abnormal distance from other values in a random sample from a population. Such measurements, called outliers, may be related to human errors and must be removed or corrected because they interfere with the precision and accuracy of the results. In a sense, this definition leaves it up to the analyst to de cide what will be considered abnormal. The students were asked about possible outliers. It was explained that before abnormal observations can be singled out, it is necessary to characterize normal observations; a statistical test that can identify outliers should be used. Nevertheless, the students pointed out some possible outliers from Table 1; after they evaluated the data us ing Dixon’s Qtest (8) at a 95% significance level (P = 0.05), no values were rejected. After that, the students were asked to represent the results in a simple way, retaining the sampling information—that is, the average value for the frequency of each color and the distribution about the mean. For this, they used the arithmetic means (x ^{–} ) and the confidence interval of the error distribution for two different confidence intervals (95 and 99%), presenting these statistical concepts (8). The calculated parameters—means, standard de viation (s), and relative standard deviation (RSD) or coefficient of variation (CV), as well as Student’s t values—are shown in Table 2, using the data from Table 1. Note that the mean, s, and RSD are presented with three significant figures—the total number of candies was higher than 1000 (four significant figures) and the number of candies of each color was given with three significant figures. For the confidence interval the results were rounded.
Table 1. Comparison of the Dispersion of Colors in the Candy Samples
Candies Classified by Color, % 

Groups (Bags) 
Candies per Bag 
Blue 
Brown 
Green 
Orange 
Red 
Yellow 
1 
127 
29 
13 
13 
10 
18 
17 
2 
127 
23 
11 
12 
16 
8 
30 
3 
128 
27 
14 
13 
20 
8 
18 
4 
119 
15 
9 
15 
26 
16 
19 
5 
114 
11 
19 
11 
23 
12 
24 
6 
118 
10 
15 
11 
24 
20 
20 
7 
115 
15 
14 
10 
28 
11 
22 
8 
119 
10 
13 
12 
30 
17 
18 
9 
115 
18 
11 
14 
24 
17 
16 
10 
114 
12 
10 
12 
27 
13 
26 
Total 
1196 
In the next exercise the students had to verify whether a sig nificant difference existed between the means reported for each color (n = 10 bags) and the values supplied by the manufacturer (used as the actual values, μ), in order to evaluate whether the data obtained from the sampling experiment accurately represent
the bulk material or bulk sample or, in this case, a large popula tion, denoted by the provided values. Significance testing was introduced and a null hypothesis (H _{0} ) was formulated: the two population means are equal. It is important to emphasize that to accept a hypothesis does not mean that it is true, only that we do not have evidence to believe otherwise. Thus hypothesis tests are usually stated in terms of both a condition that is doubted (null hypothesis) and a condition that is believed (alternative hypothesis). In our study the alternative hypothesis would be that the two population means are not equal. The students are asked to test the hypothesis using a ttest (8) for each color and
a significance level of 0.05. The significance level, P, defines the
sensitivity of the test. A value of P = 0.05 means that we inadver
tently reject the null hypothesis 5% of the time when it is in fact true. The choice of P is somewhat arbitrary, although in practice
a value of 0.05 is commonly used in analytical chemistry.
In the Classroom
The comparisons were made by using the critical values of t for 9 degrees of freedom 2.26 and 3.25, respectively, for significance levels of P = 0.05 and P = 0.01 (8). Significant differences were observed for both significance levels between the samples evaluated and the population (supplied by the manufacturer) just for the green amount of candies, the color that also presented the lowest standard deviation and, as a con sequence, the lowest confidence interval (see Table 2). For the green candies, the null hypothesis is rejected as it is statistically understood, since a data distribution with a smaller dispersion means greater precision near the arithmetic mean. In this way, any value that is not very close will not be contained inside the confidence interval provided by the Gaussian distribution curve. The opposite effect could be observed for bigger dispersions of data (see the data for the blue and orange candies). New information was introduced to the students. Bags 1–3 and bags 4–10 came from sample batches A and B, respec tively. Now it was asked whether a significant difference existed between the two sample batches of the candies in relation to each color (see Table 3). In this case, we compared two sample
means x _{A} and x _{B} , which correspond to sample batches A and B,
Table 2. Parametric Approach to Analyzing the Data
Candies Classified by Color, % 

Data Parameters ^{a} 
Blue 
Brown 
Green 
Orange 
Red 
Yellow 

Values supplied by the manufacturer ^{b} 
14.3 
14.3 
14.3 
21.4 
14.3 
21.4 

Mean x (n =10) – 
17.0 
12.9 
12.3 
22.8 
14.0 
21.0 

Standard deviation (s ) 
007.06 
02.88 
01.49 
06.03 
04.22 
04.47 

RSD ^{c} 
41.5 
22.3 
12.1 
26.4 
30.1 
21.3 

Confidence interval: ^{d} P = 0.05 
17±5 
13±2 
12±1 
23±4 
14±3 
21±3 

Confidence interval: ^{d} P = 0.01 
17±7 
13±3 
12±2 
23±6 
14±4 
21±5 

Student’s t =  x – μn ^{½} /s (t statistic values) – 
01.2 
01.4 
04.9 
0.84 
0.23 
0.28 

Is there a significant difference? ^{e} 
No 
No 
Yes 
No 
No 
No 
^{a} Based on the data reported in Table 1.
^{b} These mean values are reported at the manufacturer’s Web site: http://global.mms.com/br/about/products/milkchocolate.jsp (accessed Jun 2008).
^{c} RSD is the relative standard deviation, and is given by 100(s/ x ).
^{d} The confidence interval is determined by ( x – t ) s/n ^{½} < μ < ( x + t ) s/n ^{½} , where t is the critical value of Student’s ttest. Confidence levels of 95% and
99% are represented by probability values in which P = 0.05 or 0.01, respectively. ^{e} Comparison of the mean value supplied by the manufacturer and the mean value of the ten bags evaluated in this experiment.
–
–
–
Table 3. Parametric Comparison of the Two Random Samples of Candies
Candies Classified by Color, % 

Parameters 
Blue 
Brown 
Green 
Orange 
Red 
Yellow 
Sample batch A Mean, x _{A} (n = 3) – 
26 
13 
13 
15 
11 
22 
Sample batch B Mean, x _{B} (n = 7) – 
13 
13 
12 
26 
15 
21 
Sample batch A standard deviation, s _{A} 
3.0 
1.5 
0.58 
5.0 
5.8 
7.2 
Sample batch B standard deviation, s _{B} 
3.0 
3.4 
1.8 
2.5 
3.2 
3.5 
Is there a significant difference? 
Yes 
No 
No 
Yes 
No 
No 
Note: Significance of the comparison of the mean values between the two sample batches ( x _{A} – x _{B} ) evaluated at P = 0.5.
–
–
© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education
1085
In the Classroom
respectively. Taking the null hypothesis that the two means are
equal, we need to test whether (x _{A} − x _{B} ) differs significantly from zero. First the Ftest was applied for the comparison of standard deviations (8). Both samples had standard deviations that did not differ significantly, which allows calculation of a pooled estimate of standard deviation from the two individual standard deviations, s _{A} and s _{B} . In turn, the value of t was obtained and compared with the critical value of t, using 8 degrees of freedom [(n _{A} + n _{B} ) − 2]. Typical statistical tests incorporate assumptions about the underlying normal (Gaussian) distribution of data, and hence rely on distribution parameters. Statistical values such as means, standard deviations, and confidence limits are, strictly speaking, for a large population size. In analytical chemistry,
Color of Candy
Figure 1. Boxandwhisker plots for fractions (%) of each color of candies. (The black squares inside the boxes represent the means).
Table 4. Comparison of the FiveNumber Summary for Each Color
Sample 
Minimum, 
Lower 
Median, 
Upper 
Maximum, 
Color 
% 
Quartile, % 
% 
Quartile, % 
% 
Blue 
10 
11 
15 
23 
29 
Brown 
09 
11 
13 
14 
19 
Green 
10 
11 
12 
13 
15 
Orange 
10 
20 
24 
27 
30 
Red 
08 
11 
13 
17 
20 
Yellow 
16 
18 
19 
24 
30 
Table 5. Comparison of Results Obtained by Parametric and Nonparametric Approaches
Sample 
Parametric 
Nonparametric 
Color 
(Mean), % 
(Median), % 
Blue 
17.0 
15 
Brown 
12.9 
13 
Green 
12.3 
12 
Orange 
22.8 
24 
Red 
14.0 
13 
Yellow 
21.0 
19 
we generally deal with small sets of data, sometimes fewer than five results, and in some instances we are interested in methods that do not require the assumption of normally distributed data. Methods that make no assumptions about the shape of a data set’s distribution are called nonparametric or distribution free methods.
A Nonparametric Approach
A nonparametric approach to data analysis uses the same
data (Table 1); however, instead of the mean, the students are now asked to calculate the median for each color from the 10 bags. In addition, the lower and upper quartiles should be calcu lated, as well as the smallest (minimum) and the greatest (maxi mum) values in the distribution. Values categorized into these five rankings are then represented in a simple visual way by a box andwhisker plot (8) (Figure 1), where the immediate visuals are the center, the spread, and the overall range of distribution. A boxandwhisker plot consists of a rectangle (the box) with two lines (the whiskers) extending from opposite edges of the box, and a further line in the box, crossing it parallel to the edges. The ends of the whiskers indicate the range of the data, the edges of the box from which the whiskers protrude represent the upper and lower quartiles, and the line crossing the box represents the median of the data. A boxandwhiskers plot, accompanied by a numerical scale, is a graphical representation of the fivenumber summary, thus, the data set is described by its extremes, its lower and upper quartiles, and its median (see Table 4). The plot shows at a glance the spread and the symmetry of the data (8). After considering these results, no values were rejected. The comparison of results obtained by parametric and non parametric approach is shown in Table 5. The differences between the mean and the median are not significant, indicating that the data can be drawn from a normal distribution, which makes sense for the sampling exercise used. One method of testing this hypothesis is by using a χ ^{2} test. This method, unfortunately, is only reliable in cases with at least 50 data points.
Sample Amount Effect
At this point the students are told of the relationships of the operations involved in sampling and analysis. The concepts of primary sample (bulk sample), reduced sample, subsample, laboratory sample, and test sample are discussed. It is pointed out that the term “sample” implies the existence of sampling error, which arises from a lack of homogeneity in the popula tion. Since sampling error is always associated with analytical
error, it must be isolated by the statistical procedure of analysis of variance (9). It will be assumed in our experiment that all the candies from the 10 bags represent the bulk sample and the task now is to obtain the laboratory sample from the bulk sample of the material.
In this part of the exercise the students discussed and simu
lated different conditions of sampling (using different containers and reducing by quartering) from a bulk sample with known composition, which was obtained by mixing all the candies in a large container. The data collected and the basic statisti cal parameters calculated are presented in Table 6, in terms of percentages. The values of the relative errors for each sampling procedure are shown in parentheses. The same statistical treat ments applied before were also used in this case.
In the Classroom
Table 6. Dispersion of Candies Reported by Color for Each Sample
Candies Classified by Color, % (Relative Error) 

Sampling Methods 
Total Candies 
Blue 
Brown 
Green 
Orange 
Red 
Yellow 
Observed means of the population (µ) 

1196 
17.0 
12.9 
12.3 
22.8 
14.0 
21.0 

Sampling with a small container (50 mL cup) 

1 0040 
15 (−12) 
08 (−38) 
15 (−22) 
10 (−56) 
20 (−43) 
32 (−52) 

2 0049 
16 (−5.9) 
16 (24) 
04 (−67) 
33 (45) 
06 (−133) 
25 (19) 

3 0043 
18 (5.9) 
12 (−7.0) 
19 (54) 
19 (−17) 
21 (50) 
11 (−47) 

4 0042 
17 (0) 
12 (−7.0) 
17 (38) 
21 (−7.9) 
09 (−35) 
24 (14) 

5 0035 
14 (−18) 
06 (−53) 
20 (63) 
06 (−74) 
23 (64) 
31 (48) 

Sampling with a large container (200 mL cup) 

1 0177 
15 (−12) 
11 (−15) 
16 (30) 
22 (−3.5) 
11 (−21) 
25 (19) 

2 0153 
18 (5.9) 
12 (−7.0) 
11 (−11) 
20 (−12) 
14 (0) 
25 (19) 

3 0185 
24 (41) 
15 (16) 
10 (−19) 
19 (−17) 
13 (−7.1) 
19 (−9.5) 

4 0191 
15 (−12) 
17 (32) 
12 (−2.4) 
20 (−12) 
17 (21) 
19 (−9.5) 

5 0192 
19 (12) 
09 (−30) 
15 (22) 
24 (5.3) 
12 (−14) 
21 (0) 

Reducing by quartering 

1 
0265 
16 (5.9) 
14 (8.5) 
12 (2.4) 
23 (0.0088) 
13 (7.1) 
22 (4.7) 
Note: Values in parenthesis are the relative errors, calculated by 100 x – μ/μ, where µ is the known value of the bulk sample.
–
Table 7. Comparison of the Percentage of Purple Candies in Each Sample
Purple Candies or “Analyte”, % (Relative Error) 

Samples 
Actual Values (µ) 
Sampled with a 50 mL Cup 
Sampled with a 200 mL Cup 
Reduced by Quartering 
1 0.99 
2.5 (152) 
1.0 (1.0) 
0.98 (–1.0) 

2 0.99 
0.0 (–100) 
1.9 (92) 
— 

3 0.99 
1.9 (92) 
0.49 (–50) 
— 
Note: Values in parenthesis are the relative errors, calculated by 100 x – μ/μ, where µ is the known value of the bulk sample with purple candy added.
–
It is possible to observe in these experiments that the number of candies sampled influences the values of the relative errors. Whereas with the small sampling cup the relative errors varied from ‒133 to 64%, with the larger cup the values range between ‒30 to 41%. Thus, the larger container resulted in a rela tive error about three times smaller than provided by the smaller one. These numbers elucidate to the students the improvement in sampling caused by increasing the sample amount from ap proximately 42 to 180 candies per collection. For reducing by quartering—even though the procedure was only made once, in contrast to the five replicates for the other samplings—the relative error observed was 0.0088 to 8.5%, still smaller than the values observed with the other sampling procedures. This is because reducing by quartering results in a larger sample (265 candies in this case) and because this method was developed to optimize a sampling condition, resulting in smaller errors (8). Another discussion topic concerns dispersion of data points resulting when the same sample procedure is used. The values were graphically presented (Figure 2), so students can observe that a greater dispersion usually occurs with a smaller sampling con tainer than with a larger one (blue and brown were exceptions).
In the same way, the students also simulated a sample with an analyte present in low concentration, by the addition of 0.99% purple candies to the total material. Once more, the effect of using different sampling procedures was evaluated, as summarized in Table 7.
Color of Candy
Figure 2. Percentage of candies sampled using the small container (50 mL) and the larger container (200 mL). The solid bar represents the expected value of each color, as provided by the manufacturer. ^{1}
© Division of Chemical Education • www.JCE.DivCHED.org • Vol. 85 No. 8 August 2008 • Journal of Chemical Education
1087
In the Classroom
Figure 3. The size gradient formed by the differentsized candies mixed inside a glass jar.
The students could note again the reduction in sampling error as the sample amount is increased, denoted by the decrease in relative errors obtained using the small cup, the larger cup, and by quartering. This means that, as the amount sampled becomes larger, it better represents the bulk sample, up to the limit of the entire sample, which represents the actual value of the material.
Particle Size Effect
In this exercise, students easily observe the effect of differ ent particle sizes during the sampling of solid materials. All the different candies were mixed together inside the flask, resulting in a size distribution in which the smaller candies accumulated at the bottom of the flask, while the bigger candies were more evident at the top of the flask. Figure 3 shows a photograph of this phenomenon. After this visual experiment, the students were questioned about the errors in sampling that might result from this effect, namely, size segregation in real samples. The students were also asked about possible ways of eliminating this type of error. In real applications, for example, the simplest procedure used is a
sequence of three unitary operations: grinding, homogeniza tion (by mixing), and separation of the samples by different ranges of mesh (defined ranges of particle sizes) before sam pling occurs.
Conclusions
This classroom experiment with candies has been used for six consecutive semesters in a chemistry course for under graduates, and in a graduate chemistry course. It successfully introduced the undergraduate students to important concepts of statistics and sampling techniques. This approach is easy to implement and engages students in learning in a stimulating way that is lucid and concrete—as well as tasty—because all the statistical data used was obtained by the students themselves.
Notes
1. M&M candies have chocolate interiors; some also have peanuts
or almonds in the center. For more information, see the manufacturer’s Web page: http://global.mms.com/br/about/products/milkchocolate.jsp (accessed Jun 2008).
2. As the name implies, M&M Minis are smallersized than the
conventional version.
3. Confeti candies are manufactured by Kraft Foods Brazil S. A.
Acknowledgments
The authors are grateful to all the students who participated in the exercises, and thank C. H. Collins for language assis tance.
Literature Cited
8. Miller, J. C.; Miller, J. N. Statistics for Analytical Chemistry, 3rd ed.; Ellis Horwood PTR Prentice Hall: New York, USA, 1994.
9. Horwitz, W. Pure Appl. Chem. 1990, 62, 1193–1208.
Supporting JCE Online Material
Abstract and keywords
Full text (PDF)
Links to cited URLs and JCE articles
Color figures
Bien plus que des documents.
Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.
Annulez à tout moment.