# Hope Carter

Fawnna Hentish
Dylan Dusoe
Alyssa Hewitson
Skittles Count Group Project

Introduction

Detailed Goal:

This paper will serve as a record of our groups hypothesis and the series of tests we used to

either prove the hypothesis correct, or incorrect. Our group will use statistics to determine if bags

of Skittles overall have more of one color than the rest. Statistics is a series of steps that will lead

to a goal of either proving our hypothesis correct, or incorrect. Our group will be using methods

learned in class, such as, organizing and analyzing collected data, drawing conclusions using

confidence intervals and hypothesis tests, and presenting our work in an organized paper at the

end. To begin, our hypothesis is that bags of Skittles have more RED candies than the other

colors.

Method:

We must first start our tests by collecting data. Since it is impossible to test every bag of

Skittles, we will using a sample population to represent the entire population of Skittles. The

sample data used for this report was gathered from fourteen students of an evening Math 1040

class at Salt Lake Community College. The fourteen students each purchased one, 2.17-ounce

bag of Original Skittles each, and recorded the color data from their individual bag. Each student

then e-mailed their individual data to the classs professor. The classs professor then complied
all fourteen individual sets of data into one comprehensive list (image A), which we as a group

will be using as reference. Our group of four students were then tasked with organizing and

analyzing the sample data, drawing conclusions using confidence intervals and hypothesis tests,

## Red Orange Yellow Green Purple Total

10 16 9 12 12 59
11 8 14 18 9 60
13 11 9 9 16 58
22 14 9 13 4 62
13 12 15 9 11 60
14 11 13 11 14 63
11 12 10 13 12 58
11 14 8 18 9 60
12 11 15 12 11 61
15 11 12 9 13 60
18 10 16 8 12 64
18 12 9 8 11 58
12 14 12 8 14 60
16 8 9 15 8 56
Image A
Organizing and Displaying Categorical Data: COLORS

Out of the fourteen individual 2.17 ounce bags of Original Skittles, there was a total of

839 candies. The total number of RED candies was 196, ORANGE was 164, YELLOW was

160, GREEN was 163, and PURPLE was 156. Below is a pie chart showing the percentages of

the sample size of candies for the entire class and for my individual bag. My individual bag of

Skittles contained 58 candies out of the class total of 839. All my color percentages are smaller

than the sample size of the entire class, except for the PURPLE candies, where my percentage is

almost double the sample size. I originally expected that my percentages would be the same as

the entire class, since the candies are the same brand and the bags are the same size. However,

after compiling the data into graphs, I noticed that each individual bag would have slightly

different percentages, but when you compile all the data as a whole, you start to recognize a

pattern of which colors will usually be the largest in number per bag of candy.
Sample Size of Candies: Entire Class

Color Number
Red 196
Orange 164
Yellow 160
Green 163
Purple 156

Orange Yellow
20% 19%

Red Green
23% 19%

Purple
19%
My Individual Skittle Data

Orange Yellow
19% 15.5%

Red Green
22% 15.5%

Purple
28%

Color Number

Red 13

Orange 11

Yellow 9

Green 9

Purple 16

Sample Size of Candies: Entire Class

PERCENT OF CUMULATIVE
Color OCCURRENCES TOTAL PERCENT
Red 196 23.36% 23.36%
Orange 164 19.55% 42.91%
Yellow 160 19.07% 61.98%
Green 163 19.43% 81.41%
Purple 156 18.59% 100.00%

## Sample Size of Candies: Entire Class

250 100%

90%

200 80%

70%

150 60%

50%

100 40%

30%

50 20%

10%

0 0%
Red Orange Yellow Green Purple

## OCCURRENCES CUMULATIVE PERCENT

My Individual Sample

PERCENT OF CUMULATIVE
PROBLEM AREA OCCURRENCES TOTAL PERCENT
Red 13 22.41% 22.41%
Orange 11 18.97% 41.38%
Yellow 9 15.52% 56.90%
Green 9 15.52% 72.41%
Purple 16 27.59% 100.00%

My Individual Sample
18 100%

16 90%

80%
14

70%
12
60%
10
50%
8
40%
6
30%

4
20%

2 10%

0 0%
Red Orange Yellow Green Purple

## OCCURRENCES CUMULATIVE PERCENT

Based on the charts showing the percentages of the entire sample size versus my

individual sample, I must conclude that each individual bag of candy will fluctuate in the ratio of

candy colors. However, when you gather a large enough sample, like our entire class, you can

## find some similarities and the percentages even out.

Organizing and Displaying Quantitative Data: the Number of Candies per Bag:

## Number of Candies per Bag

Shown above are a histogram and a box-plot for the number of Skittles per bag for our

class sample. This information can also be seen in the chart below, which shows the 5-number

Dev.

## Total 14 59.9 2.13 56 58 60 61 64

My observation of this data is that the distribution is normal, and isnt skewed in either

direction. The graph did reflect what I expected to see, which was consistency because the bags

are sold by weight so there shouldnt be a large range in the number per bag. This is also shown

by the standard deviation being only 2.13, which shows that even two standard deviations away,

theres only a four or five Skittle difference per bag. In my own bag of Skittles I had 63, which

isnt far from the median of 59.9. Our sample data included fourteen bags of Skittles.

## Categorical data is sorted by a quality or attribute, and cant be counted or given a

numerical value. An example of this in our data is the distribution of colors of Skittles per bag,

because you can categorize it but you cant do any math with it. Other examples of categorical

data would be yes or no questions, or pass or fail grades. There isnt a way to rank these things,

we can only organize them by category. Pie or bar charts are best fit for categorical data because

they compare the size of the categories with each slice or bar showing proportion to the whole.

Quantitative data is measured on a numeric scale, and you can do math with it. In our

data it would be the total number of Skittles per bag. It would also make sense to use quantitative

data when looking at height, weight, or temperatures. This data is best represented by a
histogram or boxplot, because they show the difference in distribution for each number, or class

of numbers. They also show the underlying structure of the data, and can be used to figure out if

## the distribution is normal or skewed.

Confidence Intervals
Confidence intervals are a range of values with a specified probability that the parameter

is within that range. Confidence intervals are used in statistics to estimate the parameter, or range

of value gives a margin of error, because we cannot be 100% certain. Also, how confident you

want to be affects the error. For example, if you want to be 99% confident the range of numbers

## will be larger than if you only want to be 95% confident.

For the first confidence interval constructed at a 99% confidence interval estimate for the

true proportion of yellow candies I found that 99% of all bags of Skittles will have around

## 34.9% of yellow candies.

For the second confidence interval I constructed a 95% confidence interval for the

number of candies per bag. What I found is that it is around 60 candies per bag.
You would construct a confidence interval if you ever need to gather a large amount of

data from a certain sample or population. By doing a confidence interval you can have an

estimate of what the total amount of something for a population may be.

Hypothesis Tests

A hypothesis test uses data to determine whether to reject, or fail to reject (accept), a

claim. Using a hypothesis test can help verify claims made about data to determine whether or

not you are getting what you paid for as the customer. This is also useful for quality control with

manufacturing.

The first hypothesis test we had to do was to test the claim that 20% of all skittle candies

are red. After calculating the critical values with a 0.05 significance level and the z-score , we
found that a 20% proportion of red skittles is not plausible and so we rejected the claim. Above

## are the calculations for the first hypothesis test.

The second hypothesis test was to test the claim that the mean number of candies in a bag

of skittles is 55. By calculating the critical test statistic with a 0.01 significance level we found

that in order for that claim to be true, our t value must fall between -2.678 and 2.678. In reality,
the t value came out to be 8.61, which is way outside the range of acceptable numbers, therefore,

we reject the claim that the mean number of candies in a bag of skittles is 55. Above are the

## calculations for the second hypothesis test.

The conditions for doing interval estimates and hypothesis tests are that the sample needs

to be random, the sample size large enough (n>30), and that the sample is less than 10% of the

populations. Our sample was random and was less than 5% of the population, but we did not

have a number of samples that was thirty or more. Since our sample size was smaller than thirty,

errors could have occurred in our data in that our conclusions about the population are incorrect.

The sampling method could have been improved by collecting more sample data to meet the

requirements of hypothesis testing. The conclusions that have been drawn from the research,

cannot be said to be valid since our sample size was not large enough.