Vous êtes sur la page 1sur 7

Skittles Group Project

Wagma Shour
Anthony Anderson
Aubre Mcqueen
Alexander Vincent

Introduction:

We were each told to buy a 2.17 ounce bag of Skittles candy and separate the candies by color. After
separating them, we counted how many of each color we had and collected the total which we emailed to our
professor. Then the professor compiled the complete dataset of every students total. From which we had to
calculate the mean, standard deviation and five number summary and then arranged the data in graphs to
show the distribution and proportion of the combined data.

Organizing and Displaying Categorical Data: Colors

Skittle Class Data


Color of Candies Class Total Percentages Accumulated Percentage
Red 330 18.50% 18.50%
Orange 345 19.30% 37.80%
Yellow 364 20.40% 58.20%
Green 368 20.60% 78.80%
Purple 379 21.20% 100%

Total 1786
Personal Data
Color of Candies Personal Total Percentages
Red 14 24.10%
Orange 10 17.20%
Yellow 12 20.70%
Green 11 19%
Purple 11 19%

58

Throughout the research of the Skittle data, our graphs reflect the data that was recorded
from the class averages. It shows that the purple colored candy is the most common candy in the bag
of skittles while the red colored candy is the least common candy in a small bag of Skittles. These
results vary from my own personal bag of candy which shows that the red color candy is the most
common while the orange candy is the least common to show up in a bag of Skittles. I would have
imagined the red would have been a more common color because of its significance in culture and
food in general where as purple is often a taboo color in food.
Organizing and Displaying Quantitative Data: The Number of Candies per Bag

For the sample size used (30), the following data is calculated:

Mean: 59.5
Standard Deviation: 1.72

5-Number Summary: Min = 57


Q1 = 58
Med = 59
Q3 = 61
Max = 64

Boxplot:

Frequency Histogram:
Both the boxplot and the frequency histogram indicate the distribution shape of the data is
skewed to the right. In the data set used, the mean is greater than the median. When the mean is
greater than the median, the expected result is a graph that is skewed to the right. Both the boxplot
and the frequency histogram follow this rule correctly so yes, the charts of data correctly indicate
what was expected. For my bag of candy, the mean is 11.5 and the median is 12 so my data compared
to the overall class data is not the same because according to the numbers in my bag of candy, my
data would be skewed to the left. The total number of candy in my bag is 58 and the total number of
bags in the sample is 30.

Reflection:

Categorical or qualitative data can be split into 2 categories. The first is nominal, which is
information containing the lowest amount of data such as yes or no answers. The second is ordinal,
which is answers that have an order to them such as the grading system consisting of grades A,B,C,D
and F. Quantitative data can also be split into 2 categories. The first is interval, which is that a
difference in numbers makes sense because you can do meaningful math with those numbers. The
second in this category is ratio, which is a ratio or proportion of the numbers being used and as seen
with interval data, you can also do meaningful math with ratios and proportions.
For categorical data, pie charts are the best way to represent the data. Bar graphs can be used
but because they must represent the data as a whole, they might not be the best graph for this type
of data. For quantitative data, the best graphs used are bar graphs and pie charts because they show
how large a category is in comparison to the whole or population of observation.

Confidence Interval Estimates:

The main idea behind confidence level is to be some amount, X%, confident that the mean will
be within a confidence interval. Through confidence interval, we can also find how confident we are
about the predictions that we make and how much Error we have. The more sample size we have, the
more Error we have.
1. Construct a 99% confidence interval estimate for the true proportion of yellow candies:
From our data. X= 364 for yellow candies and n= 1786 for the total number of candies.
Using the confidence interval proportion formula: ​p̂ ± zα/2 * √ p̂(1- p̂)/n
p̂= x/n = 364/1786 = 0.204
Z α/2= 2.575
So, 0.204 ± 2.575 * √0.204(1-0.204)/1786 = (0.176, 0.232). From this result, we are 99% confident
that the true proportion of yellow candies fall between 0.176 and 0.232, also for every 100
candies, there will be 17 to 23 yellow candies.
2. Construct a 95% confidence interval estimate for true mean number of candies per bag:
Using the mean proportion formula: x̅ ± tα/2 * s/√n
tα/2 = 2.045, using the t-table (where n-1 is 29), n = 30, x̅ = 59.5 and s = 1.72, from our data.
59.9 ± 2.045 * 1.72/√30 = (58.858, 60.142)
From the results, it can be concluded that we are 95% confident that the true mean number of
candies per bag fall between 58.858 and 60.142.
3. Construct a 98% confidence interval estimate for the standard deviation of the number of
candies per bag:
Using the population standard deviation proportion formula: x̅ ± zα/2 * σ/√n
From our data: x̅ = 59.5 and σ = 1.72 and zα/2 = 2.33, using the z-table.
59.5 ± 2.33 * 1.72/√30 = (58.768, 60.232)
From the results above, we are 98% confident that the standard deviation of the number of candies
per bag fall between 58.768 and 60.232

Hypothesis test:

A hypothesis test is a test that measures the validity of a claim. As someone questions previous
research or observations, a hypothesis may be made that there have been changes since the initial
data was reported. Hypothesis tests are done to see whether or not the previous claims are still
corresponding to valid results.

Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.
Use a 0.01 significance level to test the claim that the mean number of candies in a bag of Skittles is
55.
In both of the hypothesis tests, the null hypothesis was the claim. After doing the calculations, it was
determined that both tests should have the null hypothesis rejected since the results were both greater
than 20% and a mean of 55.

Reflection:

The condition for doing the hypothesis test and the confidence interval is ≤ to 0.05 and the hypothesis
must fit the conditions of, n*p (1-p0) ≥ 10. The errors could have been miscounting total Skittles count
per bag, purchasing the wrong size bag of candies or counting half-candies as whole-candies. The
sampling method could be improved by providing the candies per person instead of purchasing them
on our own.

Our conclusions for the Skittles group project is that the individual data did not meet the entire class
data. We needed a bigger sample size in order to estimate data about the whole population. Overall
this project allowed us to gain a better perspective about how to implement statistical data in a real
life situation.

Vous aimerez peut-être aussi