Vous êtes sur la page 1sur 48

Probability Distribution

The Concept of ``Distribution''


 Measurements on any variable, even the same
variable on the same subject, will always vary.
 The pattern of variation of a variable is called its
distribution, which can be described both
mathematically and graphically.
 In essence, the distribution records all possible
numerical values of a variable and how often each
value occurs (its frequency).
 The most famous example of a distribution is the
bell-shaped normal distribution.
9/14/2015 2
Discrete Probability Distributions
 Binomial distribution – the random variable can
only assume 1 of 2 possible outcomes. There are a
fixed number of trials and the results of the trials are
independent.
i.e. flipping a coin and counting the number of heads in 10
trials.

 Poisson Distribution – random variable can


assume a value between 0 and infinity.
Counts usually follow a poisson distribution (i.e. number
of ambulances needed in a city in a given night)

14 September 2015 3
Discrete Random Variable

 A discrete random variable X has a finite number of


possible values. The probability distribution of X lists the
values and their probabilities.

Value of X x1 x2 x3 … xk
Probability p1 p2 p3 … pk
1. Every probability pi is a number between 0 and 1.
2. The sum of the probabilities must be 1.
 Find the probabilities of any event by adding the
probabilities of the particular values that make up the event.

14 September 2015 4
Example
 The instructor in a large class gives 15% each of A’s and
D’s, 30% each of B’s and C’s and 10% F’s. The student’s
grade on a 4-point scale is a random variable X (A=4).

Grade F=0 D=1 C=2 B=3 A=4


Probability 0.1 .15 .30 .30 .15

 What is the probability that a student selected at random


will have a B or better?

 ANSWER: P(grade of 3 or 4)=P(X=3) + P(X=4)


= 0.3 + 0.15 = 0.45
14 September 2015 5
Continuous Probability Distributions

 Between two values of a continuous random


variable we can always find a third.

A histogram is used to represent a discrete


probability distribution and a smooth curve called
the probability density is used to represent a
continuous probability distribution.

14 September 2015 6
The Histogram and the Probability
Density

.05
.04
Percentage of Men

.03
.02
.01
0

80 100 120 140 160


Systolic BP (mmHg)

The same histogram of BP measurements from a sample of 113 men.


We are going to compare this to a histogram for a larger sample, and
for the entire population.
14 September 2015 7
The Histogram and the Probability
Density

.03
Percentage of Men

.02
.01
0

80 100 120 140


Blood Pressure (mmHg)

Histogram of blood pressure measurements, this time for a sample of


5,000 men: notice how the shape of the histogram is more defined
that with the previous sample of 113 men.
14 September 2015 8
The Histogram and the Probability
Density

.4
.3
Percentage

.2
.1
0

80 100 120 140 160 180


Systolic BP (mmHg)

The Probability Density for BP values in the entire population of men –


because the population is infinite, there are no bars, and the y-axis can not
have actual counts.
14 September 2015 9
The Histogram and the Probability
Density

 The probability density is a smooth


idealized curve that shows the shape of the
distribution in the population
 Areas in an interval under the curve
represent the percent of the population in
the interval

14 September 2015 10
Normal Distribution
Histogram of Blood Pressure
Sample of 113 Men

20
15
Number of Men

10
5
0

80 100 120 140 160


Blood Pressure

The normal (Gaussian) distribution with the same mean


and standard deviation (superimposed)
14 September 2015 11
Normal Distribution
Q Is every variable normally distributed?
A Absolutely not

Q Then why do we spend so much time


studying the normal distribution?
A Some variables are normally distributed;
a bigger reason is the “Central Limit
Theorem”

14 September 2015 12
Normal Distribution

 There are lots of normal distributions!


 You can tell which normal distribution you have by knowing
the mean and standard deviation
 The mean is the center
 The standard deviation measures the spread (variability)

Standard Deviation
Standard Deviation

Mean Mean
14 September 2015 13
Normal Distribution
 The most common continuous distribution is the
normal distribution – the bell shaped curve.
 The normal curve is unimodal and symmetric
about its mean ().
 In this distribution the mean, median and mode
are all identical.
 The standard deviation () specifies the amount
of dispersion around the mean.
 The two parameters  and  completely define a
normal curve.

14 September 2015 14
Typical normal density with Two normal densities with different
mean=5 and variance=1 mean values and same variance

Two normal densities with different variances and


the same mean
14 September 2015 15
Normal Distribution - Notes
 The total area enclosed by the normal distribution curve is
1.0 and the cumulative probabilities are given by:
F ( x)  P( X  x)
 Calculating cumulative probabilities from the normal
distribution (area under the curve) is a numeric problem and
no easy formula exists.
 There are tables and excel functions to calculate these
probabilities.
 The tables are for normally distributed random variables with
mean=0 and variance=1 (=0 and =1) - STANDARD
NORMAL VARIABLE
14 September 2015 16
The 68-95-99.7 Rule for the Normal
Distribution
 68% of the observations fall within one
standard deviation of the mean
 95% of the observations fall within two
standard deviations of the mean
 99.7% of the observations fall within three
standard deviations of the mean
 When applied to ‘real data’, these estimates are
considered approximate!

14 September 2015 17
Distributions of Blood Pressure

.4

Mean = 125 mmHG


.3
s = 14 mmHG
68%

.2
95%
.1 99.7%

0
83 97 111 125 139 153 167

The 68-95-99.7 rule applied to the distribution


of systolic blood pressure in men.
14 September 2015 18
Standard Normal Variable

 It is customary to call a standard normal random


variable Z.
 The outcomes of the random variable Z are denoted
by z.
 The table in the coming slide give the area under
the curve (probabilities) between the mean and z.
 The probabilities in the table refer to the likelihood
that a randomly selected value Z is equal to or less
than a given value of z and greater than 0 (the mean
of the standard normal).
14 September 2015 19
Standard Normal Curve

14 September 2015 20
Standard Normal Distribution

50% of probability in 50% of probability in


here –probability=0.5 here–probability=0.5

14 September 2015 21
Standard Normal Distribution

95% of
probability
in here

2.5% of probability 2.5% of


in here probability in here

Standard Normal
Distribution with 95% area
marked

14 September 2015 22
Calculating Probabilities

 Probability calculations are always


concerned with finding the probability that
the variable assumes any value in an interval
between two specific points a and b.

 The probability that a continuous variable


assumes the a value between a and b is the
area under the graph of the density between
a and b.

14 September 2015 23
14 September 2015 24
Example 1
 What is the probability of obtaining a z value between
-1 and 1?
Want this area:

P(1  z  1)  P(1  z  0)  P(0  z  1)


 2 * P(0  z  1)
 2 * 0.3413  0.6826

14 September 2015 25
Example 2
 What is the probability of obtaining a z value between
1 and 1.58?
Want this area:

P(1  z  1.58)  P(0  z  1.58)  P(0  z  1)


 0.4429  0.3413

14 September 2015 26
Example 3
 What is the probability of obtaining a z value of -0.5 or
larger?
Want this area:

P(0.5  z )  P(0.5  z  0)  P(0  z )


 P(0.5  z  0)  P(0  z )
 0.1915  0.5  0.6915

14 September 2015 27
Example 4
 Find a z value such that the probability of obtaining a larger
z value is only 0.10.
Probability of
area=0.10

What is this z value?


P ( z  ?)  0.10
 scanning the table in App B we find
0.3997 (area between 0 and 1.28)
 P ( z  1.28)  0.5  p (0  z  1.28)
 0.5 - 0.3997  0.10
14 September 2015 28
Standard Normal Scores
 How many standard deviations away from
the mean are you?

Observation – mean
 Standard Score (Z) =
Standard deviation

 “Z”is normal with mean 0 and standard


deviation of 1.

14 September 2015 29
14 September 2015 30
14 September 2015 31
Standard Normal Scores
A standard score of:
 Z = 1: The observation lies one SD above
the mean
 Z = 2: The observation is two SD above
the mean
 Z = -1: The observation lies 1 SD below
the mean
 Z = -2: The observation lies 2 SD below
the mean

14 September 2015 32
Standard Normal Scores
 Example:Male Blood Pressure,
mean = 125, s = 14 mmHg
BP = 167 mmHg
167  125
Z  3.0
14

BP = 97 mmHg
97  125
Z  2.0
14
14 September 2015 33
What is the Usefulness of a Standard
Normal Score?

 It tells you how many SDs (s) an observation is


from the mean
 Thus, it is a way of quickly assessing how
“unusual” an observation is
 Example: Suppose the mean BP is 125 mmHg, and
standard deviation = 14 mmHg
 Is 167 mmHg an unusually high measure?
 If we know Z = 3.0, does that help us?

14 September 2015 34
Fraction of Population
More than Z More than Z
Within Z SDs SDs above SDs above or
of the mean the mean below the mean
Z

1.0 68.27% 15.87% 31.73%


2.0 95.45% 2.28% 4.55%
2.5 98.76% 0.62 % 1.24%
3.0 99.73% 0.13% 0.27%

14 September 2015 35
Why Do We Like The Normal
Distribution So Much?

 There is nothing “special” about standard


normal scores
 These can be computed for observations from any
sample/population of continuous data values
 The score measures how far an observation is
from its mean in standard units of statistical
distance
 But, if distribution is not normal, we may not
be able to use Z-score approach.

14 September 2015 36
Review of Standard Normal

 Recall: Z~N(0,1) means that Z is normally


distributed with a mean ()=0 and standard
deviation ()=1.

 Tocalculate the probability that Z will have values


between a and b [P(a ≤ z ≤ b)] we calculate the
area under the standard normal curve between a
and b.

 These probability calculations can be done using


the table of Normal Distribution.

14 September 2015 37
Standardization cont.

 The associated cumulative probability associated


with a normally distributed random variable X is:

F ( x)  P( X  x)
 X  x 
 P  
   
 x 
 P Z  
  
 P( Z  z )

14 September 2015 38
Example
 Suppose X represents the weight of a 5-year old male child
sampled from a normal distribution (=43 lbs and =5 lbs).
Find the probability that the weight of the male child is less
Want this area:
than 35.5 lbs.
 X   35.5  43 
P( X  35.5)  P  
  5 
 P( Z  1.5)
 0.5  P( Z  1.5)

ANSWER: 0.067

14 September 2015 39
Example
 Suppose X represents the weight of a 5-year old
male child sampled from a normal distribution (=43 lbs
and =5 lbs). Find the probability that the weight of the
male child is greater than 34 lbs and less than 52 lbs.

P(34  X  52)  P X  52  P X  34


 X   52  43   X   34  43 
 P    P  
  5    5 
 P( Z  1.8)  P( Z  1.8)
 0.5  P(0  Z  1.8)  (0.5  P(0  Z  1.8))
 0.4641  .4641  0.9282

14 September 2015 40
Want this area:

14 September 2015 41
Example
 Suppose X represents the weight of a 5-year old male child
sampled from a normal distribution (=43 lbs and =5 lbs).
Find the weight such that 20% of the children will weigh
more than this weight.
P( X  ?)  .20
 X   ? 43 
 P    0.20
  5 
 ? 43 
 P Z    0.20
 5 
Looking in Appendix B we see that the critical value of z  0.84 relates to an area
under the curve of 0.2995 or approximately 0.30.
? 43
  0.84
5
 ?  ((0.84) * (5))  43
 ?  47.2 lbs

14 September 2015 42
This area is equal to 0.20

14 September 2015 43
T-Distribution
 Similar to the standard normal in that it is unimodal,
bell-shaped and symmetric.
 The tail on the distribution are “thicker” than the
standard normal
 The distribution is indexed by “degrees of freedom”
(df).
 The degrees of freedom measure the amount of
information available in the data set that can be used for
estimating the population variance (df=n-1).
 Area under the curve still equals 1.
 Probabilities for the t-distribution with infinite df equals
those of the standard normal.

14 September 2015 44
T-Distribution
 The table of t-distribution will give you the
probability to the right of a critical value –
i.e. area in the upper tail.

 We are only given the area (or probability)


for a few selected critical values for each
degree of freedom.

14 September 2015 45
T-Distribution Example
 Fora t-curve from a sample of size 15 find the
area to the left of 2.145.

 Answer: df=15-1=14
In the table of the t~distribution, the area to
the right of 2.145 is 0.025.
Therefore the area to the left of 2.145 is:
1-0.025=0.975

14 September 2015 46
14 September 2015 47
14 September 2015 48

Vous aimerez peut-être aussi