Académique Documents
Professionnel Documents
Culture Documents
Ana de Freitas
Salt Lake Community College
Introduction:
In my introduction to statistics class, we are required to do a term project. This project
involves skittles and statistical analysis. Each student was given a bag of Original Skittles and
counted how many skittles (of each color) were in the bag (disregarding any partial skittles). We
could then compare our individual data to the class data, using graphs and so forth. This allows
us as students to use many concepts we have learned in class, including organizing data,
analyzing data, interpreting the results, and much more. This report should show all my
calculations and data that I have collected as well as the process. It should also explain the
meaning or significance of the data and results.
Number of
Orange
12
Number of
Yellow
15
Number of
Green
5
Number of
Purple
14
Red
Orange
Yellow
291; 18%
Green
Purple
315; 19%
328; 20%
Colors:
My data:
Class total:
Total
Red
14
376
Proportion of
Orange
12
315
Proportion
Yellow
15
328
Proportion
Green
5
291
Proportion
Purple
14
329
Proportion of
number of
skittles=
1639
red
(376/1639)=
0.229
of orange
(315/1639)=
0.192
of Yellow
(328/1639)=
0.200
of Green
(291/1639)=
0.178
Purple
(329/1639)=
0.201
Count 200
150
100
50
0
Red
Orange
Yellow
Green
Color
Count
8
6
4
2
0
Red
Orange
Yellow
Color
Green
Purple
Purple
Count
8
6
4
2
0
Red
Orange
Yellow
Green
Purple
Color
Count 200
150
100
50
0
Red
Orange
Yellow
Green
Purple
Color
Maximum = 64.0
Q1= 59.0
Q2 (median) = 61.0
Bi Frequ
n
ency
57 0
58 1
59 6
60 5
61 7
62 5
63 2
64 1
65 0
Q3 = 62.0
Frequency Histogram
8
7
6
5
4
Frequency
3
2
1
0
58
59
60
61
62
63
64
65
More
Box Plot:
50
58 59
61 62
64
70
Reflection:
Quantitative data consists of numbers that represent counts. Categorical data consists of labels
or names that are not numerical.
Qualitative:
Pie graphs are great for numerical data that falls into categories. The categories are
represented by slices of the pie, whose area is proportional to its percentage. Calculations are
limited in qualitative data. An example of a calculation would be finding the mode or percentage.
If you ask a person a yes or no question, then there are only two categories into which their
answer can fall. Then you could calculate the percentage of those who said yes or no. Bar
charts are a good choice for qualitative data as well.
Quantitative:
Numerical data has a lot of ways in which it can be represented. One can use
graphs or tables. Some graphs include: line graphs, bar graphs, dot plots, timeseries, scatterplots, histograms, etc. When working with a set of numbers, one can
calculate the mean, median, standard deviation, frequency, mode, maximum,
minimum, etc. It really just depends on what they are looking for. It would not make
sense to use numerical data when the numbers have no relationship to what a
person is trying to find. The numbers should have significance or meaning behind
them. It is very important to label graphs correctly. If I tell someone I have 42. They
will most likely wonder what it is I have, it could 42 grams, 42 shoes, etc. Graphs
help you see relationships between variables. A visual representation of data is
always nice to have.
Confidence Interval Estimates:
A confidence interval is a range of values, derived from sample statistics,
which is likely to include the value of an unknown population parameter. We use
confidence intervals when we want to bound the mean or standard deviation.
a). 99% confidence interval estimate for the true proportion of green candies.
.154 < p < .202
b). 95% confidence interval estimate for the true mean number of candies per bag.
60.106 <
< 61.294
c). 98% confidence interval estimate for the standard deviation of the number of
candies/bag.
1.132 <
< 2.189
Hypothesis Tests:
Hypothesis testing refers to the formal procedures used by statisticians to accept or reject
statistical hypotheses (claims or assumptions). An assumption about a population parameter, to
be precise.
a). 0.5 significance level to test claim that 20% of all skittle candies are purple.
Fail to reject Ho
b). 0.01 significance level to test the claim that the mean number of candies in a bag of skittles
is 62.0.
Reject Ho
MY HAND-WRITTEN CALCULATIONS AND RESULTS ARE POSTED ON A SEPARATE PDF.
Reflection:
The conditions for:
Proportion: Simple random sample, at least 5 successes, 5 failures, and meets the conditions
for a binomial distribution
Mean: normal distribution or sample size is greater than 30. Simple random sample.
Standard Deviation: Simple random Sample has to be a normal distribution
There are a lot of checkpoints when it comes to statistics. One checkpoint or requirement that I
know we did not meet was the population size. Our population size (n) was under 30, but our
distribution is normal. In our frequency histogram it has an overall bell shape curve, but on the
59 bin (59 candies per bag) there seems to be a big bump (peak). This could be considered an
outlier, but at the same time, it is showing the data accurately, so I would consider the graph to
be a normal distribution. The sampling method could have been improved by using a larger
sample, maybe compiling data from more than one class. There is also the chance that there
may be a Type 1 or Type 2 error because we may be rejecting a true null hypothesis or failing to
reject a false null hypothesis.