11 vues

Transféré par S.Waqquas

Lecture Stats

- Assignment 3 - Probability Distribution-1 (1)
- Distributions, Sampling and Normality
- 2015cep2096_lab 3 Trafficmovement Headway
- WINSEM2018-19_MGT1051_TH_SJTG23_VL2018195003627_Reference Material I_12-11_C1_BAE.pdf
- KLE MSC Bio PG Dip Syllabus
- study_guide_1
- DataStatisticsReport-TITIK
- Question Bank Rajsarma
- Session 3 Distribtion
- Surveying Lab 1
- 3 - Error Analysis (Mini)
- Statistics Analysis Scientific Data
- Standard Normal Curve
- Chapter-6 Normal Distribution
- Ejercicios PH
- Hor Vitz
- prob
- ISE401Quiz2_PracticeProblems_4_1_11
- stat worksheet
- Distribution

Vous êtes sur la page 1sur 42

and Economics

Module 1:Probability Theory and

Statistical Inference

Spring 2010

Lecture 3: Continuous probability distributions

Priyantha Wijayatunga, Department of Statistics, Ume

University

These materials

are altered ones from copyrighted lecture slides ( 2009 W.H.

priyantha.wijayatunga@stat.umu.se

Freeman and Company) from the homepage of the book:

The Practice of Business Statistics Using Data for Decisions :Second Edition

by Moore, McCabe, Duckworth and Alwan.

Continuous probability

distributions

Probability density

Sampling distributions

Distributions

Let X denote the # of days a student comes to class (in a week).

Probability distibution is

0.1

0.2

P X x p ( x) 0.2

0.3

0.2

if x 1

if x 2

if x 3

if x 4

if x 5

then

1)what is the probability that a student comes to the class more than 3 days?

2)what is the probability that a student comes to the class 2 or 3 days?

Continuous Probability

A

continuous random variable X takes all values in an interval.

Distributions

Example: There is an infinity of numbers between 0 and 1 (e.g., 0.001, 0.4, 0.0063876).

by a density curve ( also called density function or probability

density).

The probability of any event is the area under the density curve for the

values of X that make up the event.

This is a uniform density curve for the variable X.

The probability that X falls between 0.3 and 0.7 is

the area under the density curve for that interval:

P(0.3 X 0.7) = (0.7 0.3)*1 = 0.4

Density function:

X

f(x)= 1; for 0 x 1

f(x)= 0; for x<0 or x>1

Intervals

All continuous probability distributions assign probability 0 to every

individual outcome. Only intervals can have a positive probability, represented

by the area under the density curve for that interval.

P(X=1) = (1 1)*1 = 0

Height

=1

boundary values are included or excluded:

P(0 X 0.5) = (0.5 0)*1 = 0.5

P(0 < X < 0.5) = (0.5 0)*1 = 0.5

P(X < 0.5 or X > 0.8) = P(X < 0.5) + P(X > 0.8) = 1 P(0.5 < X < 0.8) = 0.7

outcomes

curve.

If

all possible outcomes are equally likely: for example, obtaining a

outcomes

value from 0 to 1 is equally likely.

P(0.3 X 0.7) = 0.4

Similarly, P(X < 0.5 or X > 0.8) = 0.5 +0.2 = 0.7

If

the outcomes are equally likely for any value in between two numbers a and b

distribution

(random variable X can take any value in between a and b) where a<b,

f (x)

(b - a)

if a x b

otherwise

takes to solve a math problem is

known to be any number in between

10 to 20 with equal chances.

Find the probability that a student

takes more than 6 but less than 12

minutes to solve a given math problem.

distribution

The shaded area under a density

curve shows the proportion, or %,

of individuals in a population with

values of X between x1 and x2.

one individual at random

depends on the frequency of this

type of individual in the population,

the probability is also the shaded

area under the curve.

% individuals with X

such that x1 < X < x2

in a recent year had the normal distribution with mean =18.6 and

standard deviation = 5.9.

What is the probability that a randomly chosen student scores 21 or

higher?

Normal probability

distributions

The

probability distribution of many random variables is a normal

distribution. It shows what values the random variable can take and is

used to assign probabilities to those values.

Example: Probability

distribution of womens

heights.

Here since we chose a woman

randomly, her height, X, is a

random variable.

standardize the random variable (z score) and use Table A.

Normal distributions

Normal or Gaussian distributions are a family of symmetrical, bell

shaped density curves defined by a mean (mu) and a standard

deviation (sigma) : N().

f ( x)

1

2

1 x

x

e = 2.71828 The base of the natural logarithm

= pi = 3.14159

Here means are the same ( = 15)

while standard deviations are

different ( = 2, 4, and 6).

( = 10, 15, and 20) while

standard deviations are the same

( = 3)

Inflection point

mean = 64.5

Because all Normal distributions share the same properties, we can

standardize our data to transform any Normal curve N() into the

standard Normal curve N(0,1).

N(64.5, 2.5)

N(0,1)

=>

Standardizing: calculating zA

z-score measures the number of standard deviations that a data

scores

value x is from the mean .

(x )

z

than the mean, then z = 1.

for x , z

than the mean, then z = 2.

for x 2 , z

2 2

When x is smaller than the mean, z is negative.

N(, ) =

N(64.5, 2.5)

distribution. What percent of women are

Area= ???

mean = 64.5"

standard deviation = 2.5"

x (height) = 67"

Area = ???

= 64.5 x = 67

z=0

z=1

(x )

(67 64.5) 2.5

, z

2.5

2.5

Because of the 68-95-99.7 rule, we can conclude that the percent of women

shorter than 67 should be, approximately, .68 + half of (1 - .68) = .84 or 84%.

What is the probability, if we pick one woman at random, that her height will be

some value X? For instance, between 68 and 70 inches P(68 < X < 70)?

Because the woman is selected at random, X is a random variable.

(x )

z

N(, ) =

N(64.5, 2.5)

For x = 68",

(68 64.5)

1. 4

2.5

For x = 70",

(70 64.5)

2.2

2.5

0.9192

0.9861

The area under the curve for the interval [68" to 70"] is 0.9861 0.9192 = 0.0669.

Thus, the probability that a randomly chosen woman falls into this range is 6.69%.

P(68 < X < 70) = 6.69%

Using Table A

Table A gives the area under the standard Normal curve to the left of any z value.

.0082 is the

area under

N(0,1) left

of z = -2.40

under N(0,1) left

of z = -2.41

under N(0,1) left

of z = -2.46

()

For z = 1.00, the area under

the standard Normal curve

to the left of z is 0.8413.

N(, ) =

N(64.5, 2.5)

Area 0.84

Conclusion:

84.13% of women are shorter than 67.

Area 0.16

women are taller than 67".

= 64.5 x = 67

z=1

Because the Normal distribution

is symmetrical, there are 2 ways

Area = 0.9901

under the standard Normal curve

Area = 0.0099

z = -2.33

area right of z =

area left of z

To calculate the area between 2 z-values, first get the area under N(0,1)

to the left for each z-value from Table A.

Then subtract the

smaller area from the

larger area.

A common mistake made by

students is to subtract both zvalues, but the Normal curve is

not uniform.

area left of z1 area left of z2

(Try calculating the area to the left of z minus that same area!)

score at least 820 on the combined math and verbal SAT exam to compete in their

first college year. The SAT scores of 2003 were approximately normal with mean

1026 and standard deviation 209.

What proportion of all students would be NCAA qualifiers (SAT 820)?

x 820

1026

209

(x )

z

(820 1026)

z

209

206

z

0.99

209

Table A : area under

N(0,1) to the left of

z - .99 is 0.1611

or approx.16%.

=

=

total area

1

0.1611

84%

exactly 820 on the SAT. However, the proportion of scores

exactly equal to 820 is 0 for a normal distribution is a

consequence of the idealized smoothing of density curves.

The NCAA defines a partial qualifier eligible to practice and receive an athletic

scholarship, but not to compete, as a combined SAT score is at least 720.

What proportion of all students who take the SAT would be partial

qualifiers? That is, what proportion have scores between 720 and 820?

x 720

1026

209

(x )

z

(720 1026)

z

209

306

z

1.46

209

Table A : area under

N(0,1) to the left of

z - .99 is 0.0721

or approx. 7%.

area between

720 and 820

9%

=

=

0.1611

0.0721

between 720 and 820.

normally distributed data is that

we can manipulate it and then find

answers to questions that involve

comparing seemingly noncomparable distributions.

data. All this involves is changing

the scale so that the mean now = 0

and the standard deviation = 1. If

you do this to different distributions

it makes them comparable.

(x )

z

N(0,1)

Backward normal calculations: We may also want to find

the observed range of values that correspond to a given proportion under the

curve.

For that, we use Table A backward:

area/proportion in the

body of the table

corresponding z-value

from the left column and

top row

For an area to the left of 1.25 % (0.0125),

the z-value is -2.24

approximately the N(25.7, 5.88) distribution. How many miles per gallon

must a vehicle get to place in the top 10% of all 2001 model compact cars?

1. z = 1.28 is the standardized

value with area 0.9 to its left and

0.1 to its right.

2. Unstandardize

x 25.7

1.28

5.88

Solving for x gives x = 33.2

miles per gallon.

probability tables

0.2

0.0

0.1

density

0.3

0.4

-3

-2

-1

Z

P(Z > 1.87 )= 0.03

X 10

P X 11 P

11.025 10

0.3

P Z 1.87

1 P Z 1.87

1 - 0.9693

0.0307

0.3

One way to assess if a distribution is indeed approximately normal is to

plot the data on a normal quantile plot.

The data points are ranked and the percentile ranks are converted to zscores with Table A. The z-scores are then used for the x axis against

which the data are plotted on the y axis of the normal quantile plot.

If the distribution is indeed normal the plot will show a straight line,

indicating a good match between the data and a normal distribution.

distribution. Outliers appear as points that are far away from the overall

pattern of the plot.

the earnings of 15 black

female hourly workers at

National Bank. This

distribution is roughly

Normal except for one

low outlier.

the salaries of Cincinnati

Reds players on opening

day of the 2000 season.

This distribution is

skewed to the right.

As the number of randomly drawn

observations in a sample increases,

the mean of the sample

gets

mean .

This is the law of large numbers. It

is valid for any population.

but it is wrong. The law of large numbers only applies to really large numbers.

distribution?

The sampling distribution of a statistic is the distribution of all

possible values taken by the statistic when all possible samples of a

fixed size n are taken from the population. It is a theoretical idea we

do not actually build it.

of that statistic.

Sampling distribution of

We

take many random

samples of a given size n from a population

sample

mean

with mean and standard deviation

Some sample means will be above the population mean and some

will be below, making up the sampling distribution.

Sampling

distribution

of x bar

Histogram

of some

sample

averages

The mean of the sampling distribution is equal to the population

mean

is the sample size.

The

sample

mean

Mean of a sampling distribution of

x

below even if the distribution of the raw data is skewed. Thus, the mean

of the sampling distribution is an unbiased estimate of the population

mean it will be correct on average in many samples.

standard deviation of the population by a factor of n. Averages are

less variable than individual observations. Also, the results of large

samples are less variable than the results of small samples.

populations

When a variable in a population is normally distributed, the sampling

distribution of the sample mean for all possible samples of size n is

also normally distributed.

Sampling distribution

If the population is N( )

then the sample means

distribution is N( /n).

Population

Central Limit Theorem: When randomly sampling from any population

with mean and standard deviation , when n is large enough, the

sampling distribution of x bar is approximately normal: ~ N( /n).

Population with

strongly skewed

distribution

Sampling

distribution of

x for n = 2

observations

Sampling

distribution of

x for n = 10

observations

Sampling

distribution of

x for n = 25

observations

Histogram of 1000 sample means of 50-sized samples

Density

1.0

1.0

0.5

0.5

0.0

0.0

Density

1.5

1.5

2.0

2.5

Bin(5,0.7)

3.0

3.2

3.4

3.6

3.8

sample mean

random samples with n=50 and get their sample means

Relative frequency distribution is pproximately normal (bell shaped)

mean=3.50164 and sd=0.1471508

1.024695/ 50 0.1449138

In a large population of adults, the mean IQ is 112 with standard deviation 20.

Suppose 200 adults are randomly selected for a market research campaign.

The

B) Approximately normal, mean 112, standard deviation 20

C) Approximately normal, mean 112 , standard deviation 1.414

D) Approximately normal, mean 112, standard deviation 0.1

Application

Hypokalemia is diagnosed when blood potassium levels are low, below

3.5mEq/dl. Lets assume that we know a patient whose measured potassium

levels vary daily according to a normal distribution N( = 3.8, = 0.2).

If only one measurement is made, what is the probability that this patient will be

misdiagnosed hypokalemic?

( x ) 3.5 3.8

z

0.2

of such a misdiagnosis?

( x ) 3.5 3.8

z

n

0.2 4

Note: Make sure to standardize (z) using the standard deviation for the sampling

distribution.

Income distribution

Lets consider the very large database of individual incomes from the Bureau of

Labor Statistics as our population. It is strongly right skewed.

We take 1000 SRSs of 100 incomes, calculate the sample mean for

each, and make a histogram of these 1000 means.

We also take 1000 SRSs of 25 incomes, calculate the sample mean for

each, and make a histogram of these 1000 means.

Which histogram

corresponds to the

samples of size

100? 25?

It depends on the population distribution. More observations are

required if the population distribution is far from normal.

distribution from a strong skewness or even mild outliers.

skewness and outliers.

even for strange population distributions we can

assume a normal sampling distribution of the mean

and work with it to solve problems.

- Assignment 3 - Probability Distribution-1 (1)Transféré parlakshya
- Distributions, Sampling and NormalityTransféré parnascem
- 2015cep2096_lab 3 Trafficmovement HeadwayTransféré parMohit Kohli
- WINSEM2018-19_MGT1051_TH_SJTG23_VL2018195003627_Reference Material I_12-11_C1_BAE.pdfTransféré parSatnam Bhatia
- KLE MSC Bio PG Dip SyllabusTransféré parMarkWeber
- study_guide_1Transféré parShivam Narine
- DataStatisticsReport-TITIKTransféré parESTUARY
- Question Bank RajsarmaTransféré parbaby0310
- Session 3 DistribtionTransféré parChaitu Un PrediCtble
- Surveying Lab 1Transféré parMyles Quintero
- 3 - Error Analysis (Mini)Transféré paresoteries
- Statistics Analysis Scientific DataTransféré paryonaye
- Standard Normal CurveTransféré parSridharan Venkat
- Chapter-6 Normal DistributionTransféré parSomsankar Bhattacharjee
- Ejercicios PHTransféré parMogrosa_001
- Hor VitzTransféré parAnarMasimov
- probTransféré parYodaking Matt
- ISE401Quiz2_PracticeProblems_4_1_11Transféré parmikhanii
- stat worksheetTransféré parBabylyn Datio
- DistributionTransféré parSarith Sagar
- statTransféré parkamran
- report postTransféré parapi-418912669
- GridDataReport-gravedad graviTransféré parAdanarii Fernandez
- 4Transféré parVinesh Jugmohun
- Chapter 3Transféré parFadly Nurullah
- statistics Q&ATransféré paraysegul
- Problem Set 1 - 2012.2Transféré parpedrovidal85
- R AssignmentTransféré parundead_ub
- elcc post-survey report 1 1Transféré parapi-449626495
- Stats Ch3 TbTransféré parRamiro Diaz

- Math139Formulas-Transféré parS.Waqquas
- lfstat3e_ppt_08Transféré parS.Waqquas
- ExercisesTransféré parS.Waqquas
- ExercisesTransféré parS.Waqquas
- ExercisesTransféré parS.Waqquas
- Lecture-1Transféré parS.Waqquas
- AP Stats Project 15Transféré parS.Waqquas
- Lecture-2Transféré parS.Waqquas
- lfstat3e_ppt_07Transféré parS.Waqquas
- Lecture-5Transféré parS.Waqquas
- Mechanics For Advanced Level PhysicsTransféré parHubbak Khan
- ExercisesTransféré parS.Waqquas
- ExercisesTransféré parS.Waqquas
- NOtesTransféré parS.Waqquas
- Revisionguide - StatsTransféré parS.Waqquas
- p1-p3Transféré parBoodish Radhakeesoon
- Aqa w Trb Pract PapersTransféré parSarahBukhsh
- LectureTransféré parS.Waqquas
- Probability, Sampling and DistributionsTransféré parS.Waqquas
- 271649503 Edexcel Statistics 3Transféré parA4L

- Uncertainty MeasurementTransféré parYayan Irawan
- Which of the following are measures of central tendency_ Select all that apply A++Transféré parteacher.theacestud
- FM Statistics Holiday WorkTransféré parPawat Silawattakun
- Mode GroupedTransféré parPetunia Grunt
- PoniewierskiTransféré parskorp88
- 2500 Part 2 - Sampling Proceudure for Inspection by % DefectiveTransféré parKaushik Sengupta
- statistical tests - parametricTransféré parapi-87967494
- oto00020.pdfTransféré paramd mhm
- D 5628 - 96 R01 _RDU2MJG_.pdfTransféré parJuan Pablo Apaza
- 49 Lecture18compatibilitymode 141123121730 Conversion Gate02Transféré parAmandeep Singh
- Chapter 8Transféré parTingTingZheng
- Ge mathTransféré parKristel Unay
- A4 sheet for examTransféré parSara Voysey
- Flood+Frequency+Analysis+Methods+-+Chapter-Section+Number+-D.4.3Transféré parAminur Rahman
- Stats NotesTransféré parKevin McNeill
- Four Semesters MBATransféré parmeher7
- DOC-20180314-WA0000Transféré parADINDA
- 89504802 Business Statistics a First Course 4th Edition Chapter 1Transféré parAvinashRai
- 9709_s12_qp_61Transféré parHubbak Khan
- Quantitative Methods (Self Test)Transféré parMohammad Bin Shahid
- Tugas Akhir Fix Bahasa InggrisTransféré parNindis Pristya
- Introduction_to_Bayesian_Statistics1Transféré parSwarna Khare
- A better formalism for interpreting confidence intervalsTransféré parThomas Sittler
- Collaborative Statistics Supplemental Course MaterialsTransféré parjojoj
- HW3 MidtermTransféré parPi Ka Chu
- Sampling DistributionsTransféré parASHISH
- Exam 1KeyTransféré parLeila Tatum
- 4afeworkasfawTransféré parConnelia Augustine
- Biostatistics Introduction 1Transféré parISRAEL
- sample-12Transféré parkishlay88