Vous êtes sur la page 1sur 11

Wanicia Surur

Tyson Bowden
Jayne Schiess
Cierra Ribald
Alex Panas
George Macey

Math 1040 Skittles Term Project


Introduction (Wanicia)

Color Count

Red Skittles 561

Orange Skittles 555

Yellow Skittles 589

Green Skittles 607

Purple Skittles 651

Total number of skittles in class 2963


Although each student was asked to buy their own bag of skittles and not every bag of skittles
in the region had an equal chance of being selected, the distribution of skittles from the central
plant/warehouse was most likely random. The skittles company most likely does not count
colors as they load the bags and simply loads by weight, and assuming students did not make
any biased decisions about which bag to grab off the shelf every bag produced had an equal
chance of being shipped to any location in the country and being selected at random by a
student in the class.

Skittles Color Class Total. My total Proportion


Proportion

Red Skittles 561 0.189 12 0.193

Orange Skittles 555 0.187 12 0.193

Yellow Skittles 589 0.198 12 0.193

Green Skittles 607 0.204 12 0.193

Purple Skittles 651 0.219 14 0.225


Total Skittles 2963 62

Categorical and Quantitative are two types of variables used in statistics. Categorical is
also known as qualitative, referring to the quality of the data, such as the color of a skittle.
Quantitative refers to the quantity such as the amount of skittles in each bag.
Since the type of data is different with each variable, different graphs are used to best
represent the type of data. Using graphs where you can easily compare the data is best when
working with categorical data. A pie chart would be most effective when working with
categorical data and skittles to show the percentage of each color of skittle. Bar graphs would
also be good for this kind of data but also work well for quantitative data. Stem leaf plots,
histograms, and dot plots are all effective graphs when working with quantitative data. They
can all be used well to measure the quantity of certain variables.
Calculations for categorical and quantitative are different because finding the mean if
something categorial is not effective. A good way to look at categorial calculations is by finding
which color occurs most often, in other words the mode. Calculations best used for quantitative
data would be mean and median. The mean to find the average quantity and the median to find
the middle value of the data, both are more useful calculations used for quantitative data.

Part 3 (Jayne)
Mean: 60.1
SD: 2.05
Min: 55
Q1: 59
Med: 60
Q3: 62
Max: 65
Number of bags = 49
Number of Skittles = 2947

​Number of candies per bag


After looking over the data, It is approximately normal however the box plot makes It look like
it is slightly skewed to the right. This is what I would expect to see because there is not a
drastic change between the the numbers. However I would expect to see less bags to have a
lot of candies because the companies are trying to save money by putting less candy in there. I
had 55 candies in my one bag, so my bag is on the lower end of the graph and doesn't really
agree with the other 48 bags (49 bags total).

Part 4 (Alex)

When we select a sample to represent a population, the sample will almost never
represent the population perfectly. The purpose of a confidence interval is to express the range
in which we are fairly sure the population parameter lies. Therefore, a 99% confidence interval
means that 99% of all possible samples result in an interval that includes the parameter and 1%
of the samples result in an interval that does not capture the parameter.

1. Construct a 99% confidence interval estimate for the true proportion of yellow candies.
n= 2947 (total # of candies in the sample)
x= 609 (total # of yellow candies in the sample)
α= 0.01
(Critical Value) Z ​α/2​= Z​0.005​= 2.575
609
p̂= 2947
= 0.2067

0.2067(1−0.2067)
E= 2.575 2947
= 0.0192

Using the TI-84: STAT → Test → 1PropZInt


Lower Bound= 0.1874
Upper Bound= 0.2259

2. Construct a 95% confidence interval estimate for the true mean number of candies per
bag
n= 50 (total # of bags in the sample)
s= 2.0716 (calculated with values of candies per bag)
Degrees of freedom= 49

(Critical Value) t ​α/2​= t​0.025=​ 2.009


2947
x̄ = 50
= 58.94

2.0716
E= (2.009) √50
= 0.5886

Lower Bound: 58.94 - 0.5886= 58.3514


Upper Bound: 58.94 + 0.5886= 59.5286

3. Construct a 98% confidence interval estimate for the standard deviation of the number
of candies per bag

n= 50
s= 2.0716
S​2​= 4.2915
α = 0.02
Degrees of Freedom= 49


(50−1)4.2915
Lower Bound = 76.154
= √2.7613 = 1.6617


(50−1)4.2915
Upper Bound = 29.707
= √7.0786 = 2.6606

4. Discuss and interpret the results of each of your three interval estimates.
1. We are 99% confident that the population proportion of yellow Skittles lies between
0.1874 and 0.2259
2. From the information gathered, we are 95% confident that the true value of the
population mean number of candies per bag is between 58.3514 and 59.5286.
3. We can say with 98% confidence that the true value of the population standard
deviation of the number of candies per bag is between 1.6617 and 2.6606.

Part 5 (George)

Hypothesis Tests

A hypothesis test is a data analysis that tests the claim of a value of a population proportion, a
population mean, or a population standard deviation. The purpose of a hypothesis test is to
make a conclusion about a claim. By evaluating two mutually exclusive statements about a
population, a hypothesis test aims to determine which statement is best supported by the
sample data. In other words when we say data is significant statistically it means the
hypothesis test resulted in supporting a claim.

Step1: We specify the Null Hypothesis


Step2: Specify the alternative Hypothesis
Step 3: Set the significance level
Step 4: calculate the test statistic
Step 5: Draw a conclusion about the claim.

Use a 0.05 significance level to test the claim that 20% of all skittles are RED.
THE CLAIM: that 20% of all of the skittles are RED P= 0.20
1: The null hypothesis is P= 0.20
2: The alternative hypothesis is P not equal to 0.20
3.The significance level is 0.05
The Handwritten version is as follows:

Next: Use a 0.01 significance level to test the claim that the mean number of candies in a bag
of skittles is 55.
THE CLAIM: that the mean number of candies is 55
1:The null hypothesis is H(subscript 0) =55
2: the alternative hypothesis is the H(subscript 1) does not equal 55
3. The significance level is 0.01

The handwritten equation is as follows:


Hypothesis tests:

I chose the critical value method to conduct hypothesis testing. I used a significance value of
(.05) to test the null hypothesis that 20% of all Skittles candies are red. Since our test statistic of
-2.966 is in the rejected region (below the critical region between -1.96 and 1.96), there is
sufficient reason to warrant rejection of the claim (null hypothesis) that 20% of all Skittles
candies are red. I used a significance level of (0.01) to test the null hypothesis that the mean
number of all Skittles candies is 55. The limits of our accepted region (critical region) were
between -2.575 and 2.575. The test statistic is 17.378 is in the rejected region. Therefore, there
is also strong evidence to warrant rejection of the claim (null hypothesis) that the mean
number of candies in a bag of Skittles is 55.
The conditions needed for interval estimates include that the sample is a random sample and that
the population is normally distributed. Our samples met these requirements as our data came
from a subset of samples that were part of a larger set. The collection of information allows for
our sample to be normally distributed, with our total n being 2947. Errors that could have been
made using this data are counting errors of how many candies were in a bag or how many there
were of a certain color. The sampling method could be improved by increasing the sample size
as well as if every person participated. Another way to improve the sampling method would be
to buy bags from different areas of the world rather than all locally. The conclusions I have
drawn from this research is the true mean number is close to the actual mean we found collecting
our data of candies in each Skittles bag.

Skittles Statistics Investigation

Hypothesis

In a bag of Skittles there are equal numbers of the five different colors (Green, Orange, Purple,
Yellow and Red). We aim to investigate this by carrying out a statistical analysis of a sample of
bags of Skittles.

Step 1 – Sampling

A random sample of six bags of Skittles has been selected.

Step 2 – Data Collection

We will collect the data for each individual bag and all of these will be combined to give pooled
data.

Step 3 – Analysis and Presentation of Data

Comment on the modal color for each bag and for the pooled data.
Use the pooled data to draw a Pie Chart for the different colors.
Work out the Experimental Probability for picking Green or NOT Green. Do this using first 10
trials and then 30 trials.
Work out the Theoretical Probability of choosing a Green Skittle from the pooled sample.
How many Skittles would you expect to be in a seventh bag if it was opened?
Use Expectation to work out how many Green Skittles you would expect to be in the seventh
bag.
Comment on whether the outcome predicted in part 6 would realistically occur.

Step 4 – Conclusions

Do you agree or disagree with the initial hypothesis? Give reasons for your answer.
Step 5 – Improvements and Extensions

How could the experiment be improved or what extensions could you carry out.

Vous aimerez peut-être aussi