20 vues

Transféré par api-316991888

- science fair display
- Bordens and Abbott 2008
- Mini Tab
- Odds Are Its Wrong Statistics
- MATH30-6 Course Syllabus
- syllabus (2)
- Meta-Analysis of Means
- 16
- BasicStatistical.ppt
- Assessing the Positional and Thematic Accuracy of Remotely Sensed Data
- QT-MBA Sem I
- A Simple Selection Test Between Gompertz and Logistic Growth Models
- ARM Presentation
- Math 1005 Statistics
- BA1040 exam 2011
- Final Review Solns
- ARM Assignment
- Performance Level of Teachers in Test Development Skills Imperative for Improving School Based Cognitive Assessment
- Non Parametric Testing
- Ada 589086

Vous êtes sur la page 1sur 11

Stats 1040

April 29th 2016

Prof. Ping

Report Introduction

As a collective, our elementary statistics class participated in a project to statistically analyze the colors

of skittles present in any one bag of candies. Each student participated in this project by buying their own bag

of skittles, and journaling the counts per color of candies within their bag. These numbers were then compiled

by our classes Professor, whom returned with a spreadsheet analysis of total counts of each candy color, total

count of all candies, and sample size count of number of bags of candy. With this data, we got the chance as

students to look critically at what we founded as a class about the color distribution in each random bag of

candies.

Organizing and Displaying Categorical Data: Colors

Pie Chart: Overall Combined Class Skittle Colors

Purple Skittles

0.182

Red Skittles

0.194

Orange Skittles

0.212

Green Skittles

0.215

Red Skittles

Orange Skittles

Yellow Skittles

0.196

Yellow Skittles

Green Skittles

Colors

GREEN

ORANGE

YELLOW

RED

PURPLE

Total

Frequency

352

346

321

317

298

1634

Cum. Freq.

352

698

1019

1336

1634

Cum %

21.542%

42.717%

62.362%

81.763%

100.000%

Purple Skittles

360

352

350

346

Skittle QTY

340

330

321

320

317

310

298

300

290

280

270

GREEN

ORANGE

YELLOW

RED

PURPLE

Skittle Color

Pictured above are two handy graphs that help put this data into perspective. Upon observation, I

found that the candy color distribution seems to be fairly evenly distributed when engaging with the pie chart.

Percentages of distributions displayed only vary by a few points for each color. When observing this same data

within the Pareto chart, we can clearly see in descending order the distribution amounts of candy colors within

the sample. For our class project, Green was the most commonly occurring candy color, where Purple was the

least commonly occurring candy color. Together, these graphs dont necessarily represent what I imagined to

see within the bags of skittles collectively. I assumed that distribution would be more even, displaying the same

number of candy colors per bag. I was assuming this because packaging of skittles is likely highly automated.

Purple Skittles

0.133

Red Skittles

0.200

Green Skittles

0.233

Orange Skittles

0.250

Yellow Skittles

0.183

Red Skittles

Orange Skittles

Yellow Skittles

Green Skittles

Purple Skittles

Within my personal bag of skittles, my data is shown in the pie chart (above) and histogram (next

page). My data revealed the most commonly occurring color to be orange, with green at a close second. The

least frequent color was purple as well. My data does match the class collective, revealing a higher likelihood

of green and orange skittles then purple skittles.

14

15

11

12

Skittle QTY

15

10

5

0

Purple Skittles

Green Skittles

Yellow Skittles

Orange Skittles

Red Skittles

Organizing and Displaying Quantitative Data: the number of candies per bag

360

Green, 352

Orange, 346

350

340

Skittle QTY

330

Yellow, 321

Red, 317

320

310

Purple, 298

300

290

280

270

Red

Orange

Green

Yellow

Purple

Skittle Color

Number of Bags

5

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0

3

2

1

0

54

55

0

56

57

58

59

60

61

62

63

64

Let us now discuss the mean number of skittles per bag of candies submitted by the students of our

class. Our sample size for this project was 27 bags of skittles (one submitted by each student). The mean

number of candies per bag was 60.5. The median for our sample was 61, the mode was 61 and 62, and the

range was 54-64. The sample standard deviation of our data is 2.5. Pictured below is a box plot with a 5

number summary of the sample data set.

# of each bag

54

59

61

62

64

MIN

0.25%

MED

0.75%

MAX

Class Sample Size: 27 bags

1

53

54

55

56

57

58

59

60

61

62

63

64

65

The mean reveals that the average number of candies per bag is 60.5. This is consistent with my

personal bag, which contained 60 candies. If we observe the 5 number summary plot, we can see that our data

is slightly shifted, having a higher occurrence of more candies per bag then less. We can see this clearly by

observing that the median of the data, or Quartile 2, is shifted and occurs further to the right within our range

of data. The graph does not match what I expected to see, assuming that the skittles packaging process is

highly automated. I assumed that there would be little variation between number of skittles per bag, assuming

that each candy weighed approximately similar amounts, and that machine packaging would provide a very

close to similar distribution of candies per bag according to weight of the bag (in ounces). Within our data set, I

feel that 54 seems to be a bit of an outlier perhaps, and that the majority of students bags were between

57-64 candies per bag.

When observing the data pictured in the Frequency Histogram, we can see a nearly normal distribution

of candy colors emerges. The box plot for skittle colors is a great visual for the distribution that occurred within

our data set as well, however I believe mine needs further corrections. (The max numbers are incorrect at this

time, will correct for future turn in)

Reflection

The differences between categorical and quantitative data are important to understand. All data is not

created equally. As we learned in the first few lectures of our class, some data cannot be carefully calculated.

These types of data may include things like gender, yes and no, or pass and fail. These are considered a part of

the categorical category of data. These types of data can generally not be ranked, but can be put into

categories. Another common piece of categorical data we often see is color. Color cannot be ranked generally.

However, in our term project we found an exception to this property when we compared the frequency of

colors by assigning numerical values to the colors we observed in our data. This now qualifies as quantitative

data. Quantitative data has the ability to be observed and ranked by their numerical properties. Common data

sets within our quantitative category are heights in inches, time in seconds, and weight in pounds. These things

can easily be ranked, organized, and further calculated.

Graphs needed to display these two categories of data are not created equally as well. Certain graphs

and charts are better than others for the job at hand. For instance, with categorical data, pie charts work

wonders to offer a visual aide representing percentages of the data when applicable. Other simple forms of

charts work well, depending on the data in question. A data set of preferred car manufacturers in the state of

Utah could be displayed by a bar graph; where each bar represents a car make, and the vertical axis displays

numbers (in percentages) of preferred vehicles by Utah drivers input in the data set. Quantitative data can be

well displayed on a variety of different charts and graphs. Favorable formats would be boxplots, stem and leaf

plots, as well as frequency histograms. The important thing to note here, is that depending on what you want

to emphasize, you can pick a graph/chart accordingly to display your data. You should pick the chart that

displays your data in the easiest way to read and comprehend. In the charts I made, my favorite for the

quantitative data in our term project would be the frequency histogram for the color of candies and their

frequency distribution.

1800

1634

1580

1600

1523

1466

1408

1400

1350

1292

1233

1200

1174

1114

1054

1000

Skittle QTY

994

934

873

812

800

751

690

629

600

567

505

443

400

381

319

256

200

192

128

64

0

64

64

64

64

63

62

62

62

62

62

61

61

61

60

60

60

59

59

58

58

57

57

54

CM DR VC

JB

JB

AA

QL

JL

JM DP

PA

EB ND TM EH

AC

AL

DP NV YA

CC

KF MR KS

LC

RS

NR

61

61

60

58

# Each Bag

Cum. Freq.

A confidence interval helps us to gauge the probability that a specified population parameter will occur

within 2 set values. Within inferential statistics, 95% and 99% are the most commonly used confidence

intervals. To explain this better, a good way to look at our confidence intervals is to imagine a sample of data

that was all collected using the same methods. For this example we will use our 95% confidence interval.

Examining this data we would want to calculate the upper and lower limits of the population parameter that

we would expect to see 95% of the time. These calculations can be performed with the formulas mentioned

below.

Construct a 99% Confidence Interval Estimate for the true proportion of yellow candies.

Yellow candies x=321 Total candies n=1634

=

321

=

= 0.196

1634

Critical Value = 2.575

(1 )

= /2

= 2.575

(. 196)(.804)

1634

= .025

Proportion: 0.171 < p < 0.221

First we were asked to construct a 99% confidence interval estimate for the true proportion of yellow

candies. Our "p hat" represents the percentage of candies present in our sample = .196.To discover the margin

of error we used the appropriate confidence interval from Z Score table A-2 to find 2.575. The margin of error

found for this sample and confidence was = .025. The interval equation is simple, merely subtract and add the

margin of error to the p hat to discover the lower and upper limits of the interval. This revealed the true

proportion of yellow candies to be .171<p<.221. This interval describes that the occurrence of yellow candies

based on our collection would find between 17% and 22% yellow candies in random samplings of skittles.

Construct a 95% Confidence Interval Estimate for the true mean number of candies per bag.

n = 27

Degree Free = 26

s = 2.5

x = 60.5

= /2

= 2.056

2.5

27

= 0.989

59.5 < < 61.5

We then were asked to construct a 95% confidence interval estimate for the true mean number of

candies per bag. To find the t critical value I consulted t-Critical values chart A-3 to find 2.056. The margin of

error was found using this critical value = .0989. The interval equation is similar to proportion (shown above) in

that we are looking for the upper and lower limits using the margin of error with our mean or x bar. X bar =

60.5 so our interval revealed the true mean to be 59.5< < 61.5. This tells us that from our data that we are

95% confident that any random bag of skittles will have a true mean of somewhere between 59.5 candies to

61.5 candies per bag.

Construct a 98% Confidence Interval Estimate for the standard deviation of the numbers of candies per bag.

n = 27

Degree Free = 26

(1) 2

2

(26)2.52

45.642

s = 2.5

< <

(1) 2

(26)2.52

< <

12.198

. < < .

The final step in calculating inference intervals for our project was to construct a 98% Confidence

Interval estimate for the standard deviation of the number of candies per bag. Our project standard deviation

was 2.5. While referring to table A-4 in our text, we found the Chi Square Distributions to be 45.642 (right) and

12.198 (left). Using these critical values we were able to use the formula required for computing the standard

deviation interval, revealing 1.88< <3.65. This tells us the upper and lower limits for the standard deviation

with a 98% confidence.

Hypothesis Testing

Hypothesis testing is a valuable tool used by statisticians and researchers around the world. The basic

principles are straight forward, and can be mathematically described and solved with the following steps

Identify the null hypothesis and alternative hypothesis from a given claim, and express both in

symbolic form

Calculate the value of the test statistic, given a claim and sample data

Choose the sampling distribution that is relevant

Either find the P-value of identify the Critical value(s)

State the conclusion about the claim {based on the original claim} in simple and nontechnical terms

*Conditions found on pg 382 of text

Red = 317

n = 1634

317

=

= 0.194

1634

Z=

(1)

.194 .20

Z=

(.20)(.80)

1634

Z = -0.61

Critical Value = 1.96

Within our first round of Hypothesis testing, we were asked to test the claim that the proportion of red

skittles occurring in a bag of candies would be 20% with a 0.05 significance level. The first step is to state our

null and alternative hypothesis. Next we solved for p-hat = .194. P hat was our proportion (percentage) or red

skittles in our class sample. Using our p-hat we can solve for the test statistic, which = -0.061. The critical value

that corresponds with our significance level is 1.96, and that is the boundary we will use to test our test

statistic of -0.061. Since this is a two tailed test, we can see that -.061 falls within our boundaries framed by

critical values positive and negative 1.96. In this instance we support our original claim: we have sufficient

evidence to support that 20% of skittle candies are red.

0 : Red skittles = 55

n = 27

Degree Free = 26

x = 60.5

=

1 : Red skittles 55

= 0.01

C.V. = 2.779

s = 2.5

60.5 55

2.5

27

t = 11.432

Test statistic fell within rejection region. There is sufficient evidence to warrant rejection of the claim that the

mean number of candies in a bag of Skittles is 55.

Our second hypothesis was to test the claim that the mean number of candies in a bag of skittles is 55,

using a significance level of 0.01. Here we stated our null and alternative hypothesis, and then located our

critical value using table A-3 which = 2.779. Solving our test statistic we found a value of 11.432. This data was

startling to me, for it fell way further our on the distribution then I had seen previously. This test statistic was

well within the rejection zone. Therefore there is sufficient evidence to warrant rejection of the claim that the

mean number of candies in a bag of skittles is 55. After consulting our original class data, it became

exceedingly clear why this test was so far in the rejection zone. The majority of our class bags contained more

than 57 candies, with the mean falling at 60.5. However, it was interesting to calculate the hypothesis of a

much lower mean to see the results that it provided.

I felt it was a cool opportunity to apply the concept we have been studying in class to something

tangible like data collection on our own part. Performing the calculations on our own sample have been

challenging and interesting, and good variance from our text book. I felt that the confidence interval

calculations were appropriate to our sample. For example; the question of proportion of yellow candies

revealed an interval of 17%-22%. When reviewing the data gathered my individual class mates, this matched

up fairly well and even better than I expected. This observation makes me feel confident about the concept of

using confidence intervals within inferential statistics.

The hypothesis testing was also fun and challenging, and even had me stuck on a few occasions. Solving

the test statistics based on our class data provided some insight that allowed us to make conclusions about our

data. When the test statistic value was so off the chart in part two, it was a great representation for me about

how a hypothesis can be a good place to start and actually lead to some solid inferences. I also really enjoyed

the visual representation of how far gone our test statistic was in that example.

I think general errors can be made in equations and problems like these. For one thing, in the beginning

I felt like the student with a bag of only 54 candies was most certainly an outlier. I think an outlier can affect

the class mean and standard deviation. It could have been a possibility to reject that bag from our sample,

being as the rest of the class displayed a normal distribution. In addition: I made many calculation errors

through-out the project. I think human error is common with almost anything, and it is something to take into

account; especially with statistics. I was fortunately able to work with peers and my professor in order to

sufficiently iron out flaws with my round offs and even data entry errors with my calculations. This project

taught me a lot about team work, and I benefited greatly from connecting with my classmates and working

together. I also feel that we all did much better on the test after using the skills we used when engaging with

this project.

In conclusion to what the statistical analysis of skittles has revealed, I believe there may be cause

enough to state my original hypothesis. When counting out the candies in my personal bag a thought occurred

to me: what if the frequency of the colors in the bags is affected by the cost for Skittles co to manufacture

candies of that color? Sure enough after reviewing the class-wide collection of data, is clear to see that not all

colors are created equally. Purple and red always had low counts, while orange and green had high counts.

Could it be rational to assume that orange and green are cheaper colors to manufacture? I feel we would need

a more robust sampling of skittles to perform testing of this hypothesis. In addition I feel we would also need

skittles wholesale price information about some of the products used during the manufacturing of the candies.

Reflection

This project has proven itself to be a fun and intuitive process. I have learned a lot about how to better

apply the concepts I have learned from this class, and I have surely learned a lot more about skittle color and

quantity frequency distributions!

I have admittedly struggled with math for what has seemed to be my entire life. Yet, I have felt very

fortunate for taking statistics this semester. Statistics has been the most rewarding and intuitive time I have

ever experienced within mathematics that I can recall. The concepts were easy to grasp, and the ability to use

visual aids truly helped me process and comprehend the statistics being found. I really enjoyed the majority of

the learning processes and lessons. I am looking forward to the lessons I have learned being tools in my tool

belt that I can utilize later in my schooling career. In spring of 2017, I will transfer to the University of Utah to

complete my bachelors in Psychology. After my completed undergrad I plan to pursue application and

acceptance into the Masters of Occupational Therapy program also offered at the U. I have learned throughout

my schooling, especially in the sciences involved with Psychology, that statistics are heavily present. I

acknowledge that as I dive more fully into research and methods within my undergrad, the careful keen eye

and knowledgebase I have gained here in my statistics class will help me in phenomenal ways.

In this project my classmates and I learned more about how to engage with hypothesis testing. I think

that it is a pivotal part of inferential statistics, and is truly relevant to where I want to be in my future.

Hypothesis testing still doesnt feel exactly like my strong suit, so to speak: yet I have gained valuable insight

from using the techniques practically. My highest praise and thanks goes out to my small study group I have

worked with this semester. As a team, every Saturday we would hash out new materials and explain to one

another so we could gain a better understanding. After missing a lecture from being out of town, I felt

completely lost with the concepts involved with hypothesis testing: the set-up, the test statistics, critical values,

literally everything. If it werent for my time spent with my team mates, I would have done poorly on the final

section of this term project as well as our final section test. One skill that will endure well past my schooling is

the ability to work with my cohorts and gain powerful interpersonal relationships. Many of my most successful

school, work, and life moments have stemmed from my ability to connect and network with a team of people

around me. I will hold this value in high esteem as I push forward in schooling, and in life.

Problem solving was a huge component within this project. At times the information and set up would

seem relatively straight forward and intuitive. At other times, I found it increasingly challenging to properly

extract the correct data that I would need from our sample in order to get the proper proportions or test

statistics. When working with these things, this became frustrating to me and overwhelming. It however gave

me a wonderful chance to dive in deeper and read with more careful study and comprehension. I felt I gained

more from the set-up of these problems then I did from the standard text book questions. The more visual and

practical application provided a sturdy foundations to spring into better understanding for me. The other huge

element of critical thinking and problem solving was the presentation of this project. I have almost zero

experience with excel, and I remember feeling completely overwhelmed with the task at hand. How on Earth

were all of these graphs and charts going to happen?! Magically?! It was daunting to say the least. Fortunately

for us these days, YouTube had all of the answers. I learned through a lot of trial and error that excel is actually

a fairly intuitive program, with far more robust possibilities then I originally expected. I learned a ton while

performing this project in both Excel and in Word processing.

After this class, I have already noticed that I more critically observe and assess statistics I see in my day

to day life. I believe that a fuller understanding of statistics methods can prepare the everyday consumer to be

mindful. In our current day and age it is so common for us to be bombarded by statistics: but they are generally

so heavily biased that it is clear that the only intention present is the goal of selling a product. This must be the

price of consumer and capitalist society: but I do not think ignorance is bliss. Having even a basic

understanding of how statistics are found can help us to stay away from hoax products, or even just to look

twice before believing something we read online. I think it is important to have as non-biased statistics as

possible, and I will continue to honor that concept as I push into research methods of my own here in the next

few years of my continued education.

- science fair displayTransféré parapi-287088391
- Bordens and Abbott 2008Transféré parDamien Pigott
- Mini TabTransféré parCidAlexanderRami
- Odds Are Its Wrong StatisticsTransféré parIvan Popov
- MATH30-6 Course SyllabusTransféré parthequick4560
- syllabus (2)Transféré parfaignacio
- Meta-Analysis of MeansTransféré parscjofyWFawlroa2r06YFVabfbaj
- 16Transféré parAndresAmaya
- BasicStatistical.pptTransféré parDecker Chua
- Assessing the Positional and Thematic Accuracy of Remotely Sensed DataTransféré parKostPutriMalang
- QT-MBA Sem ITransféré partusharhrm
- A Simple Selection Test Between Gompertz and Logistic Growth ModelsTransféré parEder Nelson Arriaga Pillco
- ARM PresentationTransféré parDarshan Kumar
- Math 1005 StatisticsTransféré parsuitup10
- BA1040 exam 2011Transféré parS.L.L.C
- Final Review SolnsTransféré parMorgan Sanchez
- ARM AssignmentTransféré parAyaz Ul Haq
- Performance Level of Teachers in Test Development Skills Imperative for Improving School Based Cognitive AssessmentTransféré parIJARP Publications
- Non Parametric TestingTransféré pardrnareshchauhan
- Ada 589086Transféré parask101
- HandbookBioStatTransféré parADWINDT
- Statistical InferenceTransféré parYoga Romdoni
- Effect of Anti Ageing Cream Advertisement on working womenTransféré parAshutosh Chandekar
- Statistical Computation Research Method1Transféré parmunawarasikin
- TTestLecture.pdfTransféré parMara Depaur
- Mh 3511 Midterm 2018 So LnTransféré parFrancis Tan
- emily covington critical review relaxofonTransféré parapi-254759511
- Hypothesis TestingTransféré parmohapatra82
- Scientific Writing Grading RubricTransféré parTroy
- 08472794.pdfTransféré parBijit Kumar Das

- 217054454 Hypothesis TestingTransféré parSayaliRewale
- Rapid Object Detection Using a Boosted Cascade of Simple FeaturesTransféré parEd McManus
- z score z test statisticsTransféré parSukhi Sohal
- ch09 (1)Transféré parParth Vaswani
- Stock Market Capitalization and Economic Growth in GhanaTransféré parAlexander Decker
- hidden markov modelTransféré parSikkandhar Jabbar
- What is Social Science Research.pdfTransféré parINTERNACIONALISTA
- Cancellation Task in Very Low Educated PeopleTransféré parIcaro
- Statistical Inference and Hypothesis TestingTransféré parhusain
- Final Project Opr Zarb AzabTransféré parIbraheem Hussain
- CholistanTransféré parAhsan Latif
- 第二次期中題目與解答Transféré parpipi
- Roc Logistic NowTransféré parRobert Samohyl
- skittles project - senah parkTransféré parapi-388620252
- skittles project april 7Transféré parapi-253708332
- BS-1-4Transféré pararchit sahay
- jkTransféré parFabio Saccone
- 6 sigma-7Transféré pardheenasweet
- HypothesisTransféré parKavitha Naveen
- Chapters 5-8 a Statistical Journey_taming of the Skew Teaching Slides by Dr. DeMoulin & Dr. KritsonisTransféré parAnonymous sewU7e6
- Unit 09Transféré parsastrylanka_1980
- Hypothesis Testing MBATransféré paraksabhishek88
- The Massive Fraud Behind HIV Tests by Jon Rappoport.pdfTransféré parDan Bo
- taylor wilson skittles term project finalTransféré parapi-253604342
- Opus One Spam Rpt.Transféré parAnthony Lobosco
- Tests for Two ProportionsTransféré parAdolfo Cordova
- Fisher 2017Transféré parSixto Gutiérrez Saavedra
- Drug_test_australia_TR.120Transféré parTarso Araujo
- Visual InspectionTransféré parjitendrans
- Keil CommitteeReport 2014Transféré parthebeholder