Académique Documents
Professionnel Documents
Culture Documents
Class
Count
Number
of Red
17
234
Number
of Orange
15
243
Number
of Yellow
12
235
Number
of Green
9
258
Number
of Purple
16
228
Total
69
1198
The graphs mostly reflect what I expected to see I expected one of two things would
happen. I thought that there would be roughly 20% of each color since there are 5
colors they would be distributed evenly. Or I thought that all but one color would be a
lot higher just because every time I get a bag of skittles I feel like I always get all
purple. Even though in theory I thought they might be evenly distributed it was a
pleasant surprise to find that they are. I did not really notice any outliers. The
distribution was a little off. No the distribution from the class does not quite match up
with my individual bag of candies. In my bag I had about 24.64% of reds whereas the
class had about 19.53%, for orange I had 21.74% the class had 20.28% which was
pretty close, then for yellow I had about 17.4% and the class had 19.62%, for green I
had about 13.04% and the class had a much higher percent at 21.54%, and for purple
I was all the way to about 23.19% while the class was at a lower percent of about
19.03%. I had more reds, oranges, and purples, and less yellows and greens then the
rest of the class.
1.
Determine the proportion of each color within the overall sample gathered by the class.
2.
In StatCrunch, create a pie chart and a Pareto chart for the total number of candies of each
color in our class data set. Submit copies of your graphs in this report.
Pie Chart:
Pareto Chart:
3.
Does the class data represent a random sample? What would the population be? Collaborate
1. The shape of the total candies in each bag was skewed left. I was expecting it
to be symmetrical before seeing the numbers but after seeing the numbers I
was guessing it would be skewed left because of low number of 38 where the
rest of them was closer to the 60s. Yes the data from the class is very similar
to my own bag of candies, although I was on the higher end of the data. I had
about 69 in my bag. The class ranged from 38-69, there were 20 bags collected
from the class. If this were to be a box plot my bag would have definitely been
one of the outliers. The median of the class was 61, so my data was on the
high end for the class.
2. The difference between categorical and quantitative data is that quantitative
data are things that can be measured such as length in inches, number of
questions on an exam, weight in lbs, time in minutes, these are all things that
can be counted. Whereas categorical data cannot be counted it is things such
as gender, model of car, pass or fail these are in groups. Since categorical data
is grouped into categories bar graphs, line graphs, and pie charts are great at
displaying the data. This makes it easy to understand because of the clearly
indicated groups and the titles on the x, and y axis. These also make it easy for
the reader to understand the proportions of the different groups that are being
compared.
Graphs best used for quantitative data are stem and leaf plots, histograms, and
box plots. Stem and leaf plots are useful because they help show the shape of
distribution and organize numbers. It is a good technique that gives a really
good overall impression of the data. Histograms are a great way to show
quantitative data because they also show the distribution of the observations,
based on frequencies and intervals. Boxplots are good for quantitative data
because they show how the data is spread out by using the 5 number
summary, the min, Q1, the median, Q3, and the max. In a box plot there is a
rectangular box which represents the middle half of the data between q1 and
q3. With this box you can also see whiskers which will show you where the
outliers are.
Data you would calculate for categorical data would be much different you
could group it into color of car, make of car, year of car, or type of car i.e. SUV,
truck, etc. Whereas the data you would count for quantitative data would be
numbers such as number of people in a movie theater on a Friday night,
number of cars in a parking lot. These are meaningful measures that can be
counted.
3. Boxplot:
Confidenc
e Interval
A confidence interval measures that probability that a population parameter
will fall between two sets of values, it is the amount of error that is allowed in the
data and the analysis. It is an observed interval and changes sample to sample.
Confidence intervals consist if a range that is an estimate of the unknown
population parameter. It is a guess in the form of a percentage of the confidence
that the true value of the parameter is in the interval.
1. P = (.167, .226)
2. = (58, 63)
3. = (4.35 [5 candies], 9.48 [10 candies])
4.
For #1, the 99% confidence population proportion, we took the information of the
proportion of total
Yellow Candies in the population of all candies our class sampled, and plugged those
numbers (235/1198) into my calculator in the 1-proportion Z-Interval function. The
interval that came out, to me, means that I have 99% confidence that the
proportion of yellow candies in the parent population will be found between the
proportions of 16.7% and 22.6% of that group.
For #2, the 95% confidence interval for the population mean, we took the totals
from each bag and plugged that proportion into the T-Interval function in the
calculator. The results illustrate that I have 95% confidence that the mean number
of candies in each 12 ounce bag sold (of the parent population) will be between 58
and 63 candies.
For #3, the 98% confidence interval for the population standard deviation of candies
per bag, We took the candies-per-bag totals and created a standard deviation (right
skewed) distribution graph plugged in the sample population, the sample standard
deviation, and the confidence percentage into the equation. From the results, I
can say I am 98% confident that the parent population standard deviation of
candies from each 12 ounce bag will be between 4.35 (5), and 9.48 (10) candies.
We started this exercise by checking if our experiment is normally distributed. On a
graph on my calculator, it looked normal, and np(1-p) 10; also, our sample is
certainly less than 5% of the parent population
the person every day in class. I feel like this helped my communication and time management
skills in order to work with a group online it was very important to not procrastinate so everyone
had a fair chance of participation.
In part two we made a pie and pareto chart of the data obtained. These graphs showed us
potential surprises and outliers of the classes data. These also showed if it was normally
distributed or not. This will help me with future classes to be able to have a better understanding
of how outliers affect the data, and just how to make and understand graphs in general. This also
showed how the classes data as a whole varied from our individual data which taught me about
sample sizes and how they can vary.
In part 3 we learned about shapes of the distribution and how they can change. And how
to make graphs for the different types of data and how to calculate for categorical and
quantitative data.
After doing this project I feel more confident in my problem solving skills, and my ability
to use math in real life applications. I now know how to compute my GPA instead of looking it
up all the time, I now know how to more effectively group data that has been gathered, and my
graphing skills have improved. I have also been able to make more sense of medical journals I
have read.