Vous êtes sur la page 1sur 8

Skittles Project Stats 1040

Jordan Greenhalgh

Each class member bought a bag of original skittles and data was collected from each bag for
the number of each color contained within the bag. All the data from each individual was compiled into
one large sample. Working as both individuals and in groups of five the data was used to create graphs,
calculate proportions and confidence intervals. Using this method material from the class was
demonstrated in a real world application.
Data:
Data from my individual bag as well as the overall class sample was put into a table and displayed bar
graphs based on frequency count.

My bag
Class
counts

Count
Red

Count
Orange

Count
Yellow

Count
Green

Count
Purple

Total

11
1164

12
1117

12
1189

16
1087

11
1093

62
5650

my bag
18
16
14

Frequency

12
10
8
6
4
2
0
red

orange
red

orange

yellow
yellow

green
green

purple

purple

class sample
1200
1180
1160
1140

Frequency

1120
1100
1080
1060
1040
1020
red

orange

yellow

green

purple

I expected that the numbers would be about equal in their representation for each color. I
thought that each color would appear equally in all of the samples. The graphs show that is correct and
that the colors are present in almost equal amounts in the class sample. The only surprise was that
green and purple are slightly lower than the other colors, when expected yellow to be slightly lower
since it is a less favored flavor of candy. There does not appear to be any outliers in the data. The
distribution in my graph is different than the distribution of the class graph. My graph has a lot lower
number of yellow, and red with a lot higher number of green. Additionally, my graph shows the
proportions of colors being closer together than on the class graph.
Using the data, the colors were broken into proportions of a whole which were calculated using
the color counts for each color and divided by the whole count of 5,650 skittles. As a group we discussed
what type of sampling method was used. Proportions were then used to make a pie graph and a pareto
chart.
Guess! What do you expect the proportions to be? Why?
We have agreed as a group, that we guess the proportions of yellow and red candies to be slightly
higher in the class sample, as it is believed that as a marketing tactic consumers are more attracted to
those colors. Therefore, we believe there are slightly higher counts for those candies, not a substantial
amount just a little bit higher than all of the other colors.

TOTAL CLASS SKITTLES


Red

Orange

Yellow

Purple
19.3%

Green

Purple

Red
20.6%

Green
19.2%

Orange
19.8%
Yellow
21.0%

Skittle Counts
1200

1189

1180

1164

Frequency

1160
1140

1117

1120
1100

1093

1087

Purple

Green

1080
1060
1040
1020
Yellow

Red

Orange

Colors

Does the class represent a random sample? What would the population be?
The class data does not represent a random sample, but rather a convenience sample. We
believe this is because random in statistics specifically means that all members of a population have an
equal and independent chance of being selected. The sample is the 94 2.17 oz bags of Original Skittles
bags selected by the students in our Statistics class. The population are all 2.17 oz Original Skittles bags.
So this population is not specific to the state of Utah or the US alone as we are sure they are sold in
other parts of the world. We believe that we each bought a bag of Skittles from the most convenient
place we can buy one, or the closest supermarket or gas station. Therefore, a bag of Skittles sold in
another state or country is part of the population, but does not have an equal chance of being selected
as one down the street from us, so this is not a random sample.

Next we figured out the mean number in each individual bag, the standard deviation and the
five number summary. The data for the number of candies in individual bags was then used to draw
both a histogram and a boxplot. Both charts showed that the sample had outliers which were affecting
the mean number of candies in individual bags.
Organizing and Displaying Quantitative Data: The Number of Candies per Bag
Mean number of candies per bag: 60.1
Standard deviation of candies per bag: 5.6
Five number summary of candies per bag: Min: 37, Q1;58, Med: 60, Q3: 62, Max: 82
Histogram

Class Skittles Data

Boxplot

Class Skittles Data

Frequency

The shape of this distribution is uniform since the whiskers on the box plot are the same length
due to the one outlier of eighty-two skittles in one bag. This is what I expected since I thought that the
bags would all have about the same amount of skittles in each with a couple that had a lot less and a
couple that had a lot more. The overall class data does agree with my single bag data. My bag had sixtytwo skittles in the bag and the mean for the number of skittles in the ninety-four class bags was 60.1 so
there was less than a two skittle difference which fits within the standard deviation of about five and a
half skittles.
Categorical data is also called either qualitative data or attribute variable data since it is not a
specific numerical value, but a countable number of categories or groups that may or may not have a
logical order. Examples of this type of data includes categories such as gender, day of the week or class
type. Each category can also be assigned a number in order to better fit into calculations such as male
being assigned a value of zero and female being assigned a value of one. Quantitative data can be
ordered or measures by definitive numbers. Examples of this type of data include the weight of an
object, number of complaints or number of students taking a specific class. Since categorical data is a
description about the data graphs such as pie charts or bar graphs work best for this type of data. One
feature of these graphs that makes them work best is that they are split into categories and change due
to frequency within that category. Other graphs such as box plots and scatter plots do not work as well
since these count each number as a specific data point and not as a frequency within the category. Box

plots and scatter plots work better for quantitative data since it shows bigger changes in data that goes
with numerical data. Pie charts and bar graphs do not show this as well and are broken into categories.
Frequency calculations, such as percentage, make sense for categorical data since it is not associated
with specific numbers, but with how often that category occurs. On the other hand, summary
calculations, such as mean, median and standard deviation make sense of quantitative data since it is
more detailed and each number has a larger effect. This does not work for categorical since it is
descriptive.
Now that we had calculated and displayed different values from the data we used calculated
confidence intervals in order to encapsulate the true population values. These intervals included just
yellow candies, mean number in each bag and the standard deviation.
A confidence level is used to describe and estimate the level of uncertainty in an interval
estimate for a sample. If the same sampling method is used to select another sample from the
population and the interval estimate is calculated, we would expect the new calculated and the true
value to be contained within the confidence interval. The population parameter is between the lower
bounds and the upper bounds which are calculated using the sample data. For example, if we calculate a
confidence interval of 95%, we are 95% confident that the true population value to fall within the
calculated interval. This means that there is a 5% uncertainty that the true value may not fall within the
interval.
n = 5650 (total candies in the whole sample)
n = 94 (total number of bags in the sample)
p = 1189 5650 (proportion of yellow candies)
s = 5.6
X = 60.1
Conditions:
np (1 p ) 10 5650 (1189/5650) (1 1189/5650) = 938.78
n .005N 5650 100 5 113,000 < N of all candies in Population Therefore, 5650 .005N
n = 94 is a large sample that is also from a normally distributed population
99% confidence interval estimate for the population proportion of yellow candies.
Lower and Upper bounds= p Zl n p ( 1p )
Zl2 = ( 1 ) 100 % = ( 1- ) = .99 = .01
n p ( 1p ) 5650 1189/5650 ( 1 1189/5650 )
= 2.940821972e
l2 = .005
Z.= 2.575 E ( margin of error ) = Z n p ( 1p )

p E .575 5650
1189 2 2.940821972e
( .196 , .224 )
We are 99% confident, that the proportion of yellow candies is between .196 and .224
Construct a 95% confidence interval estimate for the population mean number of candies per bag.
(critical value) t/2=t.025= invT(.025,95)=1.99
E= t(s/Vn)= 1.99(5.6/V96)= 1.14
Lower bound: 60.1-1.14=58.86
Upper bound: 60.1+1.14=61.14
(58.86,61.14)

We are 95% confident, that the true population mean number of candies per bag is
between 58.86 and 61.14.
98% confidence interval estimate for the population standard deviation of the number of candies per
bag
Lower and Upper bounds= R n1(s) < < L n1(s)
R2 = ( 1 ) 100 %
( 1- ) = .98
= .02
= .01
= 124.116
XL2=1-.01
= 61.754
93(31.6)/124.116 < < 93(31.36)/61.754 =
( 4.847, 6.872 )
We are 98% confident, that the population standard deviation of the number of candies per bag
is between4.847 and 6.872.
Each part of this project has correlated with the material that was being taught in the class at
the time to demonstrate how the material can be applied. This project has helped me to better learn
and retain the material from assignments by making it more applicable. Modules and my statlab helped
to learn the information, but it still seemed kind of arbitrary and not applicable outside of example
problems. By applying the same principles and equations to actual data that was collected as a class it
helped demonstrate how statistics is used in real world situations. For example, it helped solidify in my

mind confidence intervals and how they are calculated without the help of the hint buttons that are
available on assignments. It showed that even though we had our own data sets the true population
data could fall within that interval. Working in groups for part of the project also helped me to use
others in class that are learning the same material as a resource to help fix any misconceptions I was
having and to help me explain and show my work to others to be checked.
Since this project helped me better understand the material I have a better time seeing how it is
applicable in other classes and in real world situations. Other classes such as psychology use statistics a
lot and having a better understanding of how researchers calculated the experimental data helps me
better evaluate their research. Knowing that their confidence interval is ninety-five percent tells me that
they are ninety-five percent confident the actual value is within their interval and that there is still a
small five percent chance that the value is not within the interval. If their data had outliers I can
understand why they either left those points out of calculations or I can evaluate how the outliers are
affecting the measurements, they are reporting. This becomes particularly helpful in my future career as
a physical therapist since the medical field uses a lot of studies to determine what the best method to
treat injuries and specific populations is.

Vous aimerez peut-être aussi