Vous êtes sur la page 1sur 12

Part 2

StatCrunch Row Numbers: 10, 20, 17

Red Skittles

Orange
Skittles

Yellow
Skittles

Green
Skittles

Purple
Skittles

Total
Number of
Skittles

Bag 1 Row
10

13

12

16

13

60

Bag 2 Row
20

18

13

17

62

Bag 3 Row
17

10

14

14

16

61

Sample
Totals

41

39

37

36

30

183

The StatCrunch rows our group used are 10, 20, and 17. The count for each color of candy associated
with the three bags of Skittles has been evaluated and displayed in the table above.
To better assist the students in organizing our collected data, our instructor created a spreadsheet that
includes the number of each students bag of skittles and their respective numbers of each color. There
are 23 (n) sets of numbers, in correspondent with 23 number of students. To randomly pick three bags,
my partner, Jose printed the data collection the instructor provided on Canvas. Next, I went online to
http://www.mathgoodies.com/calculators/random_no_custom.html in order to access a custom
random number generator. I input the numbers for the lower limit (1) and upper limit (23) and clicked
ENTER. The numbers that were randomly selected were in the order of: 10, 20, and 17. As each
number was picked, Jose would read across the row on the data collection while I record them on my
paper. We had to verify the number he read was in the same order in terms of colors on my table. After I
finish recording Row 17, I took out a calculator and add up all the numbers per row to get the total
number of each bag and then add each column to get the total number of samples for each color. I then
ask him to go home and recalculate using the numbers I wrote down to make sure it adds up with my
table. We used a clustered sampling method because the class data (population) was gathered by each
student. With each student representing a cluster. And by random selecting three clusters and using all
data in each cluster, it can be concluded a clustered sampling method was use for our project. The part
where we randomly selected the bags using a random number generator was just simple random
sampling.
The correct sample totals for each color is red-41, orange-39, yellow-37, green-36, purple-30, and n (size
of sample) is 183.
There are a few possible errors that could have been made using these data. The first problem is the
source of each bags of Skittles. Since all the students live in Utah and attend class in Utah, I can only
conclude that these bags of Skittles may be consistent across the state of Utah. But since students
attend class at Salt Lake Community College Taylorsville Campus location, the bags of Skittles used in the
study may be regionally consistent, say within 20 miles of Taylorsville because all the retailers may be
receiving the same batch of Skittles. A method to improve this would force students to all pick one
retailer and buy their pack of Skittles from specific retailer, for example the Walmart on 54th. Another
method that may be better would be dividing the student into several groups and depending on which
group the students are in, they are forced to purchase their bag of Skittles from that specific retailer and
include that information in our data collection. A second possible error could have occurred in our group

during the transcribing process, specifically when Jose was reading numbers to me while I record them.
A third error is using the online random number generator. These random number generator are
carefully programmed, but the machine is always at the mercy of its programming. In addition to all
those errors, there are also human errors where they accidently drop skittles while counting.The results
may be sufficiently complex to make the pattern difficult to identify, but because it is ruled by a carefully
defined and consistently repeated algorithm, the number it produces are not truly random.
I do believe my sample is representative of the class data set. The total numbers of skittles in each bag is
relatively close to other sets of data. The numbers of skittles for each color is relatively consistent when
comparing the three bags to each other. There isnt a number that Ive yet find surprising. From first
glance, I feel our sample can be a representative of the class data set. Further evaluation regarding
outliers can conclude whether or not these numbers are considered normal.
Part 3
Candy colors in the Skittles Project are considered categorical data. I know this because categorical data
deals with descriptions, they can be observed but not measured. In this case, the difference in candy
colors are just their colors, there isnt a measurable difference between two skittles of different color.
Its either red or not, which means they can be measure using a nominal scale. Nominal scales are used
for labeling variables, without any quantitative value.

It is not appropriate to discuss the shape of the distribution for candy color because it is a categorical
data. When looking at pareto charts or pie chart, the difference between each subjects are not relevant
therefore looking at an overall shape of the distribution is not necessary.
The number of candies per bag is a quantitative data because its data that can be collected consisting of
numbers representing finite counts of measurements. They are information about quantities: that is,
information that can be measured and written down with numbers. The difference between those
numbers have meanings.

Median = Red Line


Mean = Green Line
5 Number Summary: 53, 58, 61, 61, 65
It is absolutely appropriate to discuss the shape of the distribution for number of candies per bag. Since
the data are quantitative, we can interpret the meanings between differences in numbers. Looking at
the box plot, we see that the distribution is not normal (symmetrical), it is right skewed. This means the
mass of the distribution is concentrated on the left of the figure. The outliers (numbers that lies an
abnormal distance from other values in a random sample from a population) 53, is not a usual number
of skittles in a 2.17 ounce skittles bag.
Part 4
Proportion of Each Colors
Red Skittles: 291/1380
Orange Skittles: 301/1380
Yellow Skittle: 295/1380
Green Skittles: 243/1380

Purple Skittles: 250/1380


Summary statistics:
Column
Frequency

n
23

Mean
60

Std. dev.
2.5584086

Min Max Q1 Q3 IQR


53

65 58 61

Mode
61

Median
61

5 number summary: 53, 58, 61, 61, 65


Outlier Range
58-(1.5*3)=53.5 = lower fence
61+(1.5*3)= 65.5 = upper fence
Outliers: 53
The total numbers of candies in my bags are not outliers since it was 61.
Part 5
The purpose of taking a random sample from a lot or population and computing a statistic, such as the
mean from the data, is to approximate the mean of the population. How well the sample statistic
estimates the underlying population value is always an issue. The general purpose of a confidence
interval is it addresses this issue by providing a range of values which is likely to contain the population
parameter of interest.
Confidence intervals are constructed at a confidence level, such as ninety-percent, selected by the user.
This means that if the same population is sampled on numerous occasions and intervals estimates are
made on each occasion, the resulting intervals would bracket the true population parameter in
approximately ninety-percent of the cases.
Interpretation:
Confident interval assessments were performed to determine the true proportion of Skittles that are
yellow. Based on these calculations, we are 99% confident that the interval from 0.1856 to 0.2424
actually contains the true value of the population proportion (p). This means that if we were to
randomly select different samples of the same size (1380 candies) and construct corresponding
confidence intervals, 99% of them would actually contain the true value of the population proportion p.
The proportion of yellow candies in the single bag of candy that I purchased is a likely value for the true
population proportion because the number is between the two intervals (13 yellow candies out of 61
total number of candies equal a proportion of .2131).

Confident interval assessments were performed to determine the true mean number of Skittles per bag.
Based on these calculations, I am 95% confident that the interval from 58.89 to 61.11 actually does
contain the true value of the mean number of candies per bag in the population (). This means that if
we were to randomly select different samples of the same size (23 bags of Skittles) and construct
confidence intervals, 95% of them would actually contain the true value of the population mean .
The bag of candy I purchase contain 61 skittles and that number lies within the two calculated interval,
therefore it is a likely value for the population mean.

Confident interval assessments were performed to determine the true standard deviation for the
number of Skittles per bag. Based on the results of our confident interval assessments, I have 98%
confidence that the limit from 1.8905 to 3.8847 actually contains the true value for the standard
deviation of the number of candies per bag in the population (). This means that if we were to
randomly select different samples of the same size (23 bags of Skittles) and construct confidence
intervals, 98% of them would actually contain the true value of the population standard deviation .
Based on my interval for the true standard deviation of number of cadies per bag (1.8905 < < 3.8847),
it does appear that the manufacturing process does a consistent job of putting candies into 2.17 ounce
bags. The standard deviation I drew from our class samples is 2.5584 and it is a possible value for the
true standard deviation of the skittles population.

Part 6
Hypothesis testing refers to the formal procedures used in statistical analysis to accept or reject
statistical hypotheses. A statistical hypothesis is an assumption about a population parameter. This
assumption may or may not be true. The usual process of hypothesis testing consists of several steps. A
basic outline is as follows:
Formulate the null hypothesis (HO) and the alternate hypothesis (H1).
Identify a test statistic that can be used to assess the truth of the null hypothesis.
Draw a graph to include the test statistic, critical values, and critical region (if using the critical
value method).
Reject the null hypothesis (HO) if the test statistic is in the critical region. Fail to reject the null
hypothesis if the test statistic is not in the critical region.
Restate this previous decision in simple, non-technical terms, and address the original claim.

1. The hypothesis test to claim that 20% of all skittles candies are red: (see attached work)
Conclusion: Since the test statistic 1.0123, is not in the rejection region, there isnt sufficient to
reject the claim that 20% of all skittles candies are red. We conclude that there is not sufficient
evident to warrant rejection of the claim that 20% of skittles candies are red.
2. The hypothesis test to claim that the mean number of candies in a bag of Skittles is more than
55.
Conclusion: The critical value 2.508 and the test statistic was 9.3727. And since our test statistic
does not fall in the critical region bounded by the critical value of 2.508, we fail to reject the null
hypothesis. Because we fail to reject the null hypothesis, then there is not sufficient evident to
support the claim that the mean number of candies in a skittle bag is greater than 55.
The requirements for doing each hypothesis tests:
(1)Hypothesis tests requirements for a claim
about a population proportion p.

(2) Hypothesis tests requirements for a claim


about a population mean (with sigma not
known)

The sample observations are a simple random The sample is a simple random sample.
sample.
The conditions for a binomial distribution are
satisfied. (There is a fixed number of
independent trials having constant
probabilities, and each trial has two outcome
categories of success and failure.

Either or both of these condition is satisfied:


The population is normally distributed or n is
greater than 30.

The conditions np is greater than or equal to


5, nq is greater than or equal to 5, are both
satisfied.

For proportion, our sample did not meet the requirements for a simple random sample. Since each
student in the class was asked to go to the store and purchase a bag of skittles. The bag of skittles
chosen were not randomly chosen but rather wherever they can find (out of convenience rather than
using a random generator, etc.) then using all the candies in the bag (cluster) to supplement our class
data. There was a fixed number of independent trials since each color candy didnt affect the color of
the next candy. In this hypothesis test, they were
either red or not red making each trial having only two outcomes. The condition of np
(1380*.20) and nq (1380*.80) are both greater than 5 which satisfies the last condition.
For mu, we used a clustered sampling method because the class data (population) was gathered by each
student. With each student representing a cluster. And by random selecting three clusters and using all
data in each cluster, it can be concluded a clustered sampling method was use for our project. The part
where we randomly selected the bags using a random number generator was just simple random
sampling. Judging by the graph from Part 3, the population approximately appears to be a normal
distribution.

Part 7
I do not believe height of the person who purchase bag of candies have any relationship to the bag of
candies other than being the person who bought it. Height cannot be used to predict the number of
candies that will be in a bag of Skittles because they are both independent variables.
The explanatory variable is the height of each person who purchase the bag of candies. The response
variable is the number of candies that will be in a bag of Skittles.
r=-.192
There is not a significant relationship between the two variables since the Critical Value is .396,
and|.192|=.192 and that is not greater than the critical value hence no significant relationship.
Regression Equation is: y= -.14x + 69.5
Y = -.14(63.5) + 69.5= 60.61
It is not appropriate to use height to predict the number of candies per bag because the two shows no
significant relationship.

Assuming there is a significant relationship between height and number of candies per bag, it would not
be appropriate to use Yao Mings height to predict number of candies per bag because he is consider an
outlier

Part 8