Vous êtes sur la page 1sur 9

# Group project #1 skittles

Individual data
Counts
Proportions

Red
10
.169

Orange
8
.136

Yellow
13
.220

Green
10
.169

Purple
18
.305

Green
240
.201

Purple
228
.191

Class data
Counts
Proportions

Red
242
.202

Orange
234
.196

Yellow
252
.211

This graph is a multiple bar graph with the proportion of each color of
skittles. I have included blue as the class color proportions and red as the
individual proportions. The graph descends with individual amounts. The Y

axis is labeled with proportions and the X axis is labeled with colors. Multiple
bar graphs are used to compare two or more data sets. With having both the
group and individual proportions next to each other, I will have an easier
time comparing and contrasting the data.
Comparing my individual data with that of the class data shows that the
colors are close to an even distribution, besides my individual purple, which
has a slightly higher proportion compared to the rest of my colors. I wasnt
expecting that much of a difference from any of my colors compared to the
class color, even though I was expecting variation. With the individual purple
being at a .305 proportion and the class being at a .191, it leaves a more
extreme difference. My other color that had a bit of variation compared to
the class total is orange. My individual bag had a proportion of .136 and the
class proportion is.196. If some of the purple skittles in my bag had been
orange, I would have had a fairly close match to that of the class counts.
The population would be the total amount of skittles in all of the bags, which
is 1,196 skittles. The class data is a random sample because each color of
skittle has an equal chance of being selected for each bag. We created a Pie
chart and a Pareto chart. We noticed that the Pareto chart is more effective
at explaining the data totals. Pareto charts do a better job of showing the
relative size of the different components. With our charts we can see that
yellow is the most frequent color and that purple is the least frequent color,
but the colors are pretty evenly distributed for the class totals.
Mean number of Candies per Bag: 59.8
The Standard Deviation of the number of candies per bag: 3.33
5-Number Summary:
50, 59, 60, 61, 65

The graphs reflect a bell shape, because the amount starts low, increases to
a maximum, and then decreases again. The graphs followed my expectations
because I believed the bags wouldnt have the same amounts in them, due
to them containing random amounts of skittles, but I knew they would have
similar amounts. My personal bag contained 59 skittles, which is the first
quartile for the combined 20 bags of class skittles.
Categorical data can be put into a countable number of categories or
different groups, which may or may not have some logical order. With
categorical data, things are grouped according to some common property
and the numbers of members are recorded, such as vehicle type,
representation of letter grades, and males/females. Quantitative data can be
ordered or measured, such as length, weight, and height, and is often used
to group or subset the data in graphs.
Bar graphs are frequently used with categorical data to compare size of
categories. An example is someones political parties, like democrat,
republican, or liberal, and the height of the bars would reflect the counts for
each category. Another graph that is associated with categorical data would
be a pie chart, and they will reflect the percentages of each category while
representing it as a whole. An example would be taking a classes grades,
and representing As Bs Cs and Ds percentages, which together will
represent the whole class.
Stemplots would be a graph associated with quantitative data. They are used
to display shapes of distributions and to organize the numbers to make them
more comprehensible. They include the actual numerical values of the
observation, where the value is separated into two parts, a stem and a leaf.
Histograms are also useful with quantitative data. Like stemplots, histograms
show the shapes of distributions of the observations. Histograms and
stemplots can show how the data is distributed, if it is bell-curved, right
skewed, left-skewed or neither, and they are both useful for identifying
outliers. Boxplots are also graphs for quantitative data. They reveal the 5
number summary, which incudes the minimum, first quartile, median, third
quartile, and maximum scores.

1.

## Construct a 99% confidence interval estimate for the population

proportion of yellow candies.

E = .0304
n= 1196

x = 252

=.211 q= .789

Za/2 = 2.575

## Confidence interval = .1806 < P < .2414

This confidence interval shows that there is a 99% confidence that the
population proportion of yellow candies falls between .1806 and .2414.
2. Construct a 95% confidence interval estimate for the population mean
Number of candies per bag.

E = 1.558

n-= 20

59.8

s = 3.33

Ta/2 = 2.093

## Confidence interval = 58.242 <

< 61.358

This confidence interval shows that there is a 95% confidence that the
population mean for number of candies per bag falls in between 58.242 and
61.358.
3. Construct a 98% confidence interval estimate for the population
standard deviation of the number of candies per bag

n = 20

s = 3.33

= 36.191

## Confidence interval = 2.413 <

= 7.633

< 5.254

This confidence interval shows that there is a 98% confidence that the
population standard deviation of number of candies per bag falls between
2.413 and 5.254.

## A confidence interval is a range of values used to estimate the true value of

a population parameter. It is an observed interval calculated from the
observations, is different from sample to sample, and includes the parameter
of interest if the experiment is repeated. It will give you a range according to
the percentage of confidence you seek. The higher the confidence
percentage, the wider the range will be. If you want to be 99% confident of
the value of population parameter, you will have a wider range of numbers
than if you want to be 90% confident of the value of population parameter.
The confidence level is the probability 1-a (such as .95 or 95%) that the
confidence interval actually does contain the population parameter,
assuming that the estimation process is repeated a large number of times.

1.

## In statistics, a hypothesis is a claim or statement about a

property of a population. This claim may or may not be true. A
hypothesis test is a procedure for testing a claim about a
property of a population.

2.

Use a 0.05 significance level to test the claim that 20% of all
Skittles candies are red, using the entire class data set as your
sample.

H 1 : p=0.05

H 0 : p 0.05

Z=

^p p

pq
n

Z=

.2023.20

.20.80
1196

=P value .8414

Since the P value .8414 is greater than the 0.05 significance level
then we fail to reject the null hypothesis because there is
insufficient evidence to support the alternative hypothesis.
3.

Use a 0.01 significance level to test the claim that the mean
number of candies in a bag of Skittles is 55, using the entire
class data set as your sample.

H 1 : p=0.01

t=

x^
s
n

t=

59.855
=P value .0000
3.33
20

H 0 : P 0.01

Since the P value .0000 is less than the 0.01 significance level we
reject the null hypothesis because there is sufficient evidence to
support the alternative hypothesis.
This Term project has taught us how to better understand of collecting data,
analyzing it, and coming to conclusions about the data. We learned how to
graph data and how to get the 5-point summary. We also learned how to
construct confidence intervals, which helps give a percentage of confidence
for the mean, population portion, and the standard deviation for the
population. We also learned how to do hypothesis testing, which lets us know
if claims may or may not be true. Everything we have learned can be utilized
in our own life and throughout our schooling. I will better understand
statistical data in my other classes, and can even provide confidence
intervals and hypothesis testing to these situations. This class has definitely
changed the way I think about real-world statistics applications. Before
taking this class I was unaware of the multiple math applications/formulas
that can be used in statistical data. This class has broadened my
understanding and will change the way I perceive statistical information in
the future.