Vous êtes sur la page 1sur 13

Angelica Langford

Skittles Term Project

Question: How many skittles are in each bag of every bag of Skittles in the world?

In this project we are trying to show the benefit of using excel to create various charts and tables relative to statistics. We have a class of 25 people and each of us bought

a bag of Skittles. We counted each of the colors, and have made a pie chart, a Pareto chart,

a five number summary, a boxplot, and a histogram out of the data of the classes colors and

numbers of skittles. We will end with multiple confidence intervals, and hypothesis testing.

Data for the class

 Number of Number of Number of Number of Number of Red orange yellow green purple Total # Name Candies candies candies candies candies Candies 1 Karen 10 14 8 10 18 60 2 Shalon 19 11 9 9 12 60 3 Samuel 11 14 10 15 10 60 4 Leslie 15 17 7 15 8 62 5 Dialma 14 5 9 18 15 61 6 Maria 13 11 8 5 21 58 7 Margrethe 10 9 19 11 12 61 8 Haley 14 19 12 8 9 62 9 Rupa 12 11 15 9 8 55 10 Heather 10 16 12 11 10 59 11 Allie 13 8 12 12 13 58 12 Brad 19 14 5 13 9 60 13 Bridgette 13 12 18 11 7 61 14 Milene 10 9 12 13 18 62 15 Jameson 6 14 11 17 14 62 16 Jessica 9 19 11 9 15 63 17 Marie 9 11 18 16 5 59 18 Emmoly 10 13 16 11 10 60 19 Cole 12 12 8 14 15 61 20 Angelica 8 13 13 15 13 62 21 Jessica 10 12 9 11 20 62 22 Eli 14 10 18 5 15 62 23 Dallin 5 9 20 11 15 60 24 Adam 12 15 7 14 14 62 25 Nate 14 9 12 14 11 60 TOTAL 292 307 299 297 317 1512

Data for me alone

20 Angelica

8

13

13

15

13

62

Pie Charts

Class Skittles Data Number of Red Candies Number of green candies Number of orange candies Number of purple candies  Number of yellow candies Pareto Charts Class Skittles Data 350
300
250
Class Skittles Data
200
150
100
50
0

Number of Red Candies Number of yellow candies Number of purple candies Number of orange candies Number of green candies Class Skittles Data for Every Student 25
20
15
10
5
0
1 2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Number of Red Candies Number of green candies Number of orange candies Number of purple candies  Number of yellow candies

Organizing and Displaying Quantitative Data: the Number of Candies per Bag Mean: 60.6 Std. Dev: 1.4434 5 NUMBER SUMMARY Min: 55 LQ1: 60.5 Median: 61 LQ3: 62.5 Maximum: 63 Box and Whisker Plot of 5 Number Summary      55

60.5

61

62.5

63 3D Histogram of 5 Number Summary
64
62
60
58
56
54
52
50
1
Minimum
L1Q
Median
L3Q
Maximum

Graph Questions:

What is the shape of the distribution? The shape is approximately normal.

Do the graphs reflect what you expected to see? Yes, I expected to see a normal distribution, and everyone having about the same amount of candies in their bags.

Does the overall data collected by the whole class agree with your own data from a single bag of candies? Yes. Their data was quite similar.

In addition to the summary statistics and boxplot, include the number of candies from your own bag and the total number of bags in the sample. I had 62 candies in my bag, and there were a total of 25 bags in the sample of the class.

Explain the difference between categorical and quantitative data. Categorical data is not based on measurements or numbers but on categories. Quantitative data is based on numbers and things that can be calculated with numbers.

What types of graphs make sense and what types of graphs do not make sense for categorical data? It would make sense to make a graph of the hair colors in your school. It would not make sense to make a graph of the amount of lockers in your school.

For quantitative data? It would make sense to make a graph of the amount of cars on your street. It would not make sense to make a graph of the colors of all of the cars on your street.

Explain why. Categorical data is more about categories and colors, etc. whereas quantitative data is more about calculations and numbers.

What types of calculations make sense and what types of calculations do not make sense for categorical data? It would make sense to calculate how many races there are in America, to put them into columns. It would not make sense to calculate how many times a day people die.

For quantitative data? It would make sense to calculate the number of weeks in a seventy year period. It would not make sense to calculate the holidays and what days they land on.

Explain why. Because categorical data relies more on categories, not calculations of numbers, and quantitative data is the opposite.

Confidence Interval Estimates

Statisticians use a confidence interval to describe the amount of uncertainty associated with a sample estimate of a population parameter. Basically, how confident you are that a sample is close to the population. We are trying to figure out if the sample of Skittles we got looks like the population.

Construct a 95% confidence interval estimate for the true proportion of purple candies.

n=1512=sample size

p^= total # purple candies/ total # candies= (317/1512)=.21

Confidence Interval: 95%

=.05

z=+1.96 (z score associated with .025)

Estimating population parameter: (p^ - E< p < p^ + E)

E=Margin of Error= /2 sqrt(p^(1-p^)/n) =.021

Population parameter: (.189 < p < .23)

This means that we are 95% confident that the population proportion of purple skittles falls between these two numbers.

Construct a 99% confidence interval estimate for the true mean number of candies per bag.

n=25

Mean=60.48

=.01%

Confidence interval: 99%

Estimating population parameter: (mean- E < μ < mean+ E)

E=margin of error= t_ /2 * s/sqrt(n)=.985 (s= std. dev)

Population parameter: (59.5 < μ < 61.5)

This means that we are 99% confident that the population mean for the number of Skittles in a bag falls between these two numbers.

Construct a 98% confidence interval estimate for the standard deviation of the number of

candies per bag.

DF= 24

n=25

s=1.76

Confidence interval=98%

= .02

/2=.01

Estimating population parameter: (1) 2

2

<

< √(−1) 2

2

2 = 42.980

2 = 10.856

Population parameter: Using the equation from above I plugged in the numbers to come up with this confidence interval:

. 201 < < .794

This means that we are 98% confident that the population standard deviation of candies per bag falls between these two numbers.

Hypothesis Tests

Explain in general the purpose and meaning of a hypothesis test. A hypothesis is a claim or statement about a property of a population. A hypothesis test is a procedure for testing a claim about a property of a population.

Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green.

=.01

2

=.005

n=1512

p^=# of green candies/total # of candies=.1964

Null Hypothesis: 0 = = .20

Alternative Hypothesis: 1 = ≠ .20

This will be a two-tailed graph

z=p^-p/sqrt(p(1-p)/n = 1.964-.20/sqrt(.20(1-.20)/1512) = -.350

z score: 2.575

The graph above shows that our score isn’t even close to being outside our critical values, showing that this hypothesis is very likely. Fail to reject 0 .

Conclusion: There is not sufficient evidence to reject the claim that 20% of all Skittles candies are green. .005
.005
-2.575
+2.575

Z=-.350

Use a 0.05 significance level to test the claim that the mean number of candies in a bag of

Skittles is 56.

DF=24

n=25

mean=60.5

s=1.76

=56

=.05

t=(mean- )/s/sqrt(n) = 60.5-56/1.76.sqrt(25) =12.78 (already very far from the mean)

Critical t value=2.064 -2.064
+2.064
12.78

According to our graph, our test statistic is outside of our critical range, therefore there is sufficient evidence to reject the hypothesis that the mean number of candies in a Skittles bag is 56. Reject 0 .

Reflection

State the conditions for doing interval estimates and hypothesis tests for population proportions and discuss whether or not your samples met these conditions.

- Interval estimates: To design one, you must describe the amount of uncertainty associated with a sample estimate of a population proportion parameter. You have to choose the formula (p^ - E< p < p^ + E). My sample met these conditions.

- Hypothesis tests: You have to want to prove a test for population right or wrong to use a hypothesis test. You have three formulas to choose from. You may choose one. My sample met these conditions.

State the conditions for doing interval estimates and hypothesis tests for population means and discuss whether or not your samples met these conditions.

- Interval estimates: To design one, you must describe the amount of uncertainty associated with a sample estimate of a population mean parameter. You have to choose the formula (mean-E <<mean+E). My sample met these conditions.

- Hypothesis tests: You have to want to prove a test for population mean right or wrong to use a hypothesis test. You have three formulas to choose from. You may choose one. My sample met these conditions

State the conditions for doing interval estimates for population standard deviations and discuss whether or not your samples met these conditions.

- Interval estimates: To design one, you must describe the amount of uncertainty associated with a sample estimate of a population standard deviation parameter.

You have to choose the formula ( (1) 2

these conditions.

2

<

< √(−1) 2

2

). My sample met

- Hypothesis tests: You have to want to prove a test for population standard deviation right or wrong to use a hypothesis test. You have three formulas to choose from. You may choose one. My sample met these conditions

What possible errors could have been made by using this data?

Our class could have had bags of Skittles that were above or below average. Our sample of 25 students is pretty small to observe population. Calculation errors also could have been made in the data, and not counting the broken pieces of Skittles at the bottoms of the bags could make a slight calculation difference.

How could the sampling method be improved? By obtaining a larger sample, we could do the same statistical experiment, and it would have more accurate results for assumption of the population.

State what conclusions you have drawn from your statistical research. I have drawn conclusions that our data wasn’t flawed, but the sample size is too small. The data we did collect was of normal distribution and had pretty average numbers of Skittles. The Pie Chart and Pareto chart show the numbers and colors of candies well. The confidence interval assumes that my conclusions are basically the exact same as the population, but with such a small sample, who’s to know. The hypothesis testing worked great to prove and disprove the hypotheses that were given. Overall this statistical research method is very important and applicable to everyday life.

PROJECT REFLECTION:

What have you learned as a result of this project? This project has helped me while

in my statistics course to learn how to apply the concepts we have been learning to the real

world.

Discuss how the math skills that you applied in this project will impact other classes you will

take in your school career. The math skills I have applied in this class will help me in other

classes that I take. Entropy, for example. Calculating entropy without the use of excel and

knowing how to make charts is miserable. But with the concepts and skills I have learned in

this class, calculating entropy for my physics class was a breeze.

Identify specific parts of the project and your own process in completing the project that may

have applications for other classes. This project can have impact on a lot of other classes.

Especially sharpening your thinking skills and in other math classes. Distributions appear in

physics a ton.

Discuss how the project helped to develop your problem solving skills. This project helped

me develop problem solving skills through learning to follow a pattern of thinking that leads

to an answer, not just plugging numbers into a formula. It helps when there are real world

concepts to apply them to, like how many people drink alcohol in the state of Utah.

Discuss how this project changed the way you think about real-world math applications.If