Vous êtes sur la page 1sur 3

Hannah

Horrocks Math 1040

Skittles Term Project


Introduction:

To further develop our understanding about Statistics, we were assigned the Skittles Project. I
collected data from a skittles bag, using my bag as the sample and everyone elses data as the
population. Then I graphed out the data on both a pie chart and bar chart according to the skittles
colors. In order to visually see how many skittles were in each bag, I created a boxplot summary
and a frequency bar graph. I also calculated a confidence interval estimate of yellow candies and
the true mean of skittles per bag. Lastly, in order to test the idea that 20% of the bag consists of
red candies and the true mean number of skittles per bag is 55, I performed a hypothesis test on
the data.

Organizing and Displaying Categorical Data: Colors

The pie chart didnt turn out how I was expecting. StatCrunch rounded all the data to 20%, but if
rounded to three decimal places, they arent all 20%. The Bar Chart shows the data better. It is
easier to see that yellow is lacking overall. The class data closely relates to my own data, but
there are slight differences. For example, lowest were orange and green, and they were tied at 10.

Organizing and Displaying Quantitative Data: Number of Candies per Bag

Mean: 60.3 Standard Deviation: 3.40 Minimum: 50 N= 51 (# of bags)


Quartile I: 59 Median: 61 Quartile III: 62 Maximum: 65 My bag= 65 candies
According to the Boxplot the data is skewed left, but the histogram is a little different. The data
ultimately looks skewed left. The graphs display what I was expecting to see based on the data.
My bag of skittles total was 65, so I was in the maximum amount.

Reflection: Categorical and Quantitative Data

Categorical data is when the information is divided into categories, such as the colors for the
skittles. For categorical data the best charts to use are pie charts and bar graphs, because they
show the different categories more effectively. It wouldnt make sense to put the different colors
of skittles into a boxplot, because it is qualitative, not quantitative. Quantitative data is
expressing a certain quantity, amount or range. The best way to present this information is with
the Boxplot, because it shows the range and where most of the data is located. It also shows the
maximum and minimum effectively for the reader to understand. It wouldnt make sense to put
the 5-number summary data into a pie chart, because you wouldnt be able to see the distribution
and shape of the data.

Confidence Interval Estimates

A confidence interval is used in statistics to describe uncertainty related with the interval
estimate and is followed with a probability statement.
Using calculator x=581 (yellow candies), n= 3076 (skittles total), C-level= .99
(0.1707, 0.20706)
Using calculator mean= 60.3, Standard Deviation= 3.40, n= 51, C-level= .95
(59.344, 61.256)
Interpretation: We are 99% confident that the proportion of yellow candies is between the
interval (.1707, .2071). We are 95% confident that the true mean number of candies per
bag is between 59.344 and 61.256.
Hypothesis Tests

Hypothesis testing is a procedure, based on sample evidence and probability, used to test
statements regarding a characteristic of one or more populations. The main point is to reject or
accept the null hypothesis based on the data given. If you reject the null hypothesis, you must be
supporting the alternative hypothesis.
Using calculator P0=0.20, x=642 (red candies), n=3076 (skittles total), P0.20
Z= 1.21, P=0.23, = 0.05
H0: P=0.20 H1: P0.20
Since the p-value is greater than alpha, we would not reject the H0, because there is
insufficient evidence to prove that 20% of all skittles are not red.
Using calculator 0=55, x-bar=60.3, Standard deviation= 3.40, n=3076, 55
t= 86.46, P=0, =0.01
H0: =55 H1: 55
Since the p-value is less than alpha, we would reject the H0, because there is sufficient
evidence to support that the true mean is not equal to 55.
Reflection: Confidence Intervals and Hypothesis Testing
There are 3 conditions for doing interval estimates and hypothesis tests. First, the data
must be from a random sample. Second, the shape must be considered normal (if n 30, it is
approximately normal). We can also figure this out about drawing a boxplot if n < 30, and seeing
that there are no outliers. Lastly, the data must be independent, n 0.05 N. Our samples met all
three of these conditions. The bags that the students bought could have been from anywhere,
there is more than 30 bags of skittles in the data, and it is independent because the sample size of
the skittles we collected is less than 5% of all the skittles in America.
Errors could have occurred while gathering this data. For example, we are all just college
students, so I would bet that some of the students did not actually go out and buy a bag of skittles
and instead faked their data. In order to improve this error, we could have brought in our bags of
skittles and counted them in class, so we would know that everyone actually bought a bag of
skittles.
After researching and gathering all of the data, I came to the conclusion that in general,
skittles are pretty evenly distributed. The frequency graph showed that a majority of the students
had close to the same amount of skittles in their bag. Also, everyone generally had more red
candies in their bags than the other colors.

Vous aimerez peut-être aussi