Vous êtes sur la page 1sur 10

Nicole Denkers

Statistics 1040 Final Project


Introduction:
As a class in this data set, we worked together to individually collect a bag of 2.17 oz Original
Skittles and divided our own bags into 5 color groupings to be submitted as a set of total class
data. For each bag they were split into all of the color categories of Red, Orange, Yellow, Green,
and Purple and counted out respectively. With each total count and color category acquired, a
whole class data sheet was created to outline the frequency of colors, and total amount in each
bag to be compared. The purpose of which was to show the variability of each color and count
within the individual bags.
Number of Candies in My bag of Skittles
Number of
red candies
12

Number of
orange
candies
16

Number of
yellow
candies
7

Number of
green
candies
16

Number of
purple
Candies
10

Total

61

Total Number of Candies in Class Sample:

Proportion

Number of
red
candies
317

Number of
orange
candies
346

Number of
yellow
candies
321

Number of
green
candies
352

Number of
purple
candies
298

19%

21%

20%

22%

18%

Total

1634

Nicole Denkers
Statistics 1040 Final Project

Nicole Denkers
Statistics 1040 Final Project

Organinzing and Displaying Categorical Data: Colors


The proportion of each color for the class total sample was obtained, and calculated by dividing
the total number for each color of candy, by that of the total number of Skittles obtained by the
class. The Pie Chart visually represented what I would have expected to see with each color of
Skittle, it appears to show the most even ratio of the 5 colors represented. There is a noticeable
amount of difference between the amounts of each color, indicating that the ratio between them
is not even, and not every bag has the same amount of Skittles per weight. But the class data
does coincide with my individual count, indicating that the top most colors in descending order
are: Green, Orange, Yellow, Red, and finally Purple.
Frequency
Distribution
# of Skittles Freque
Per Bag
ncy
50-52
0
53-55
1
56-58
5
59-61
11
62-64
10

Nicole Denkers
Statistics 1040 Final Project

Nicole Denkers
Statistics 1040 Final Project

MIN
1/4 %
MED
3/4%
MAX

Red

Orang
e

Yello
w

Green

Purpl
e

5
10
12
14
18

7
10
13
14
21

5
9.5
12
14.5
21

6
11.5
14
15
17

5
9
11
13
16

Orang
e
7

Yello
w
5

Purpl
e
5

4.5

5.5

2.5

2.5

2.5

6.5

Red
MIN
.25 % MIN
MED - .
25%
0.75%MED
MAX - .
75%

Green

Organizing and Displaying Quantitative Data: the Number of Candies per Bag
Supposedly each bag weight the 2.17 oz of Skittles indicated on the packaging, yet there was
variation in each bags count. The total of 1634 from entire class sample of 27 bags gave a mean
of 60.519 candies per bag, and the Standard Deviation for the number of candies per bag was
2.471. The frequency distribution of the Skittles gave a normal distribution between 56-64
candies per bag with only one outlier. The graphs appear to have a slightly skewed right
distribution, and again this is not what I expected, as I would have assumed that the same
amount for each color/flavor would be produced and packed in each individual bag. The data of
the class does also support my individual bag count.

Nicole Denkers
Statistics 1040 Final Project
Reflection
Quantitative data comprises of data that is countable or measurable, in our case it is the number
of Skittles per bag. Categorical (or Qualitative) data comprises of data that is given meaning, but
cannot actually be measured by numbers, and is instead a representative measure (i.e. colors of
the Skittles).
Quantitative data can be represented using Scatter Plots, Dot Plots, Stem Plots, and Time Series
Plots.
Categorical Data can be represented using Pie Charts, Pareto Charts, and Bar Graphs.

Confidence Interval Estimates:


The purpose of a Confidence Interval is used in the measuring of the probability that a population
parameter will fall in between 2 sets of values.

The Confidence Interval values were used to determine the proportion of Skittles that were Yellow
in each bag. We were 99% confident that the interval of 0.171 to 0.221 contained the value of
the population proportion of the Skittles. Meaning that if random Skittle bags were selected, then
99% of them would contain the true value of the population proportion.

Nicole Denkers
Statistics 1040 Final Project

The Confidence Interval values were used to determine the mean number of Skittles per bag. We
are confident that 95% from the interval 60.011 to 61.989 contained the value of the mean
number of candies per bag in the population. Meaning if random bags of Skittles samples of
these bags were selected, then 95% of them would contain the true value of the population
mean.

Nicole Denkers
Statistics 1040 Final Project
The Confidence Interval values were performed to determine the Standard Deviation for the
number of Skittles per bag. We have 98% confidence that the intervals of 1.887 to 3.650
contained the value of the Standard Deviation of the number of candies per bag in the
population of Skittles. Meaning if random bags were selected, 98% of them would actually
contain the true value of the population Standard Deviation.
Hypothesis Tests:
This references the procedures in which Statistical analysis is used to either accept or reject the
null hypothesis. It is to prove the hypothesis about whether or not a population parameter is
true.

Test Statistic -0.606 is within accept region, Do Not Reject H.


There is sufficient evidence to indicate acceptance of the claim that 20% of all Skittles are Red.

Nicole Denkers
Statistics 1040 Final Project

Test Statistic is 12.471 and is within reject region. Reject H, sufficient evidence that rejection of
mean number of candies in bag of Skittles is 55.
The purpose of a confidence interval is used in the measuring of the probability that a population
parameter will fall in between 2 sets of values.

REFLECTION
Interval Estimates and Hypothesis Tests for:
Population Proportions:
1. The sample must be of random observations
-This condition was met
2. The conditions for the binomial distribution must be met and satisfied (i.e. fixed number of
trials, trials are independent, 2 categories for outcomes, and the probability remains constant for
each trial)
-Binomial distribution condition is met
3. At least 5 sucesses (np) and 5 failures (nq) must occur (n=1634)
- Condition is also met
Population Mean:

Nicole Denkers
Statistics 1040 Final Project
1. The sample must be of random observations
- This condition was met
2. The population must be normally distributed OR the number of observations must be > 30
(this condition is not met, but it was normally distributed so overall condition was met)
Population Standard Deviation
1. The sample must be of random observations
- This condition was met
2. The population must be normally distributed
- This condition was met
The possible errors include miscount, incorrect data entry accidentally being submitted, color
blindness, inability to correctly use Excel, or simple miscalculation.
Sampling method could be improved by a larger sample size, repetitive counting to verify work,
or having another individual recount or double check data collected.