Probability Distribution

Probability Distribution
The Concept of ``Distribution''

 Measurements on any variable, even the same
variable on the same subject, will always vary.
 The pattern of variation of a variable is called its
distribution, which can be described both
mathematically and graphically.
 In essence, the distribution records all possible
numerical values of a variable and how often each
value occurs (its frequency).
 The most famous example of a distribution is the
bell-shaped normal distribution.
9/14/2015 2
Discrete Probability Distributions
 Binomial distribution – the random variable can
only assume 1 of 2 possible outcomes. There are a
fixed number of trials and the results of the trials are
independent.
i.e. flipping a coin and counting the number of heads in 10
trials.
 Poisson Distribution – random variable can

assume a value between 0 and infinity.
Counts usually follow a poisson distribution (i.e. number
of ambulances needed in a city in a given night)
14 September 2015 3
Discrete Random Variable
 A discrete random variable X has a finite number of

possible values. The probability distribution of X lists the
values and their probabilities.
Value of X x1 x2 x3 … xk
Probability p1 p2 p3 … pk
1. Every probability pi is a number between 0 and 1.
2. The sum of the probabilities must be 1.
 Find the probabilities of any event by adding the
probabilities of the particular values that make up the event.
14 September 2015 4
Example
 The instructor in a large class gives 15% each of A’s and
D’s, 30% each of B’s and C’s and 10% F’s. The student’s
grade on a 4-point scale is a random variable X (A=4).
Grade F=0 D=1 C=2 B=3 A=4

Probability 0.1 .15 .30 .30 .15
 What is the probability that a student selected at random

will have a B or better?
 ANSWER: P(grade of 3 or 4)=P(X=3) + P(X=4)

= 0.3 + 0.15 = 0.45
14 September 2015 5
Continuous Probability Distributions
 Between two values of a continuous random

variable we can always find a third.
A histogram is used to represent a discrete

probability distribution and a smooth curve called
the probability density is used to represent a
continuous probability distribution.
14 September 2015 6
The Histogram and the Probability
Density
.05
.04
Percentage of Men
.03
.02
.01
0
80 100 120 140 160

Systolic BP (mmHg)
The same histogram of BP measurements from a sample of 113 men.

We are going to compare this to a histogram for a larger sample, and
for the entire population.
14 September 2015 7
Density
.03
Percentage of Men
.02
.01
0
80 100 120 140

Blood Pressure (mmHg)
Histogram of blood pressure measurements, this time for a sample of

5,000 men: notice how the shape of the histogram is more defined
that with the previous sample of 113 men.
14 September 2015 8
Density
.4
.3
Percentage
.2
.1
0
80 100 120 140 160 180

Systolic BP (mmHg)
The Probability Density for BP values in the entire population of men –

because the population is infinite, there are no bars, and the y-axis can not
have actual counts.
14 September 2015 9
Density
 The probability density is a smooth

idealized curve that shows the shape of the
distribution in the population
 Areas in an interval under the curve
represent the percent of the population in
the interval
14 September 2015 10
Normal Distribution
Histogram of Blood Pressure
Sample of 113 Men
20
15
Number of Men
10
5
0
80 100 120 140 160

Blood Pressure
The normal (Gaussian) distribution with the same mean

and standard deviation (superimposed)
Normal Distribution
Q Is every variable normally distributed?
A Absolutely not
Q Then why do we spend so much time

studying the normal distribution?
A Some variables are normally distributed;
a bigger reason is the “Central Limit
Theorem”
Normal Distribution
 There are lots of normal distributions!

 You can tell which normal distribution you have by knowing
the mean and standard deviation
 The mean is the center
 The standard deviation measures the spread (variability)
Standard Deviation
Standard Deviation
Mean Mean
Normal Distribution
 The most common continuous distribution is the
normal distribution – the bell shaped curve.
 The normal curve is unimodal and symmetric
about its mean ().
 In this distribution the mean, median and mode
are all identical.
 The standard deviation () specifies the amount
of dispersion around the mean.
 The two parameters  and  completely define a
normal curve.
Typical normal density with Two normal densities with different
mean=5 and variance=1 mean values and same variance
Two normal densities with different variances and

the same mean
Normal Distribution - Notes
 The total area enclosed by the normal distribution curve is
1.0 and the cumulative probabilities are given by:
F ( x)  P( X  x)
 Calculating cumulative probabilities from the normal
distribution (area under the curve) is a numeric problem and
no easy formula exists.
 There are tables and excel functions to calculate these
probabilities.
 The tables are for normally distributed random variables with
mean=0 and variance=1 (=0 and =1) - STANDARD
NORMAL VARIABLE
The 68-95-99.7 Rule for the Normal
Distribution
 68% of the observations fall within one
standard deviation of the mean
 95% of the observations fall within two
standard deviations of the mean
 99.7% of the observations fall within three
standard deviations of the mean
 When applied to ‘real data’, these estimates are
considered approximate!
Distributions of Blood Pressure
.4
Mean = 125 mmHG

.3
s = 14 mmHG
68%
.2
95%
.1 99.7%
0
83 97 111 125 139 153 167
The 68-95-99.7 rule applied to the distribution

of systolic blood pressure in men.
Standard Normal Variable
 It is customary to call a standard normal random

variable Z.
 The outcomes of the random variable Z are denoted
by z.
 The table in the coming slide give the area under
the curve (probabilities) between the mean and z.
 The probabilities in the table refer to the likelihood
that a randomly selected value Z is equal to or less
than a given value of z and greater than 0 (the mean
of the standard normal).
Standard Normal Curve
Standard Normal Distribution
50% of probability in 50% of probability in

here –probability=0.5 here–probability=0.5
Standard Normal Distribution
95% of
probability
in here
2.5% of probability 2.5% of

in here probability in here
Standard Normal
Distribution with 95% area
marked
Calculating Probabilities
 Probability calculations are always

concerned with finding the probability that
the variable assumes any value in an interval
between two specific points a and b.
 The probability that a continuous variable

assumes the a value between a and b is the
area under the graph of the density between
a and b.
Example 1
 What is the probability of obtaining a z value between
-1 and 1?
Want this area:
P(1  z  1)  P(1  z  0)  P(0  z  1)

 2 * P(0  z  1)
 2 * 0.3413  0.6826
Example 2
 What is the probability of obtaining a z value between
1 and 1.58?
Want this area:
P(1  z  1.58)  P(0  z  1.58)  P(0  z  1)

 0.4429  0.3413
Example 3
 What is the probability of obtaining a z value of -0.5 or
larger?
Want this area:
P(0.5  z )  P(0.5  z  0)  P(0  z )

 P(0.5  z  0)  P(0  z )
 0.1915  0.5  0.6915
Example 4
 Find a z value such that the probability of obtaining a larger
z value is only 0.10.
Probability of
area=0.10
What is this z value?

P ( z  ?)  0.10
 scanning the table in App B we find
0.3997 (area between 0 and 1.28)
 P ( z  1.28)  0.5  p (0  z  1.28)
 0.5 - 0.3997  0.10
Standard Normal Scores
 How many standard deviations away from
the mean are you?
Observation – mean
 Standard Score (Z) =
Standard deviation
 “Z”is normal with mean 0 and standard

deviation of 1.
A standard score of:
 Z = 1: The observation lies one SD above
the mean
 Z = 2: The observation is two SD above
the mean
 Z = -1: The observation lies 1 SD below
the mean
 Z = -2: The observation lies 2 SD below
the mean
 Example:Male Blood Pressure,
mean = 125, s = 14 mmHg
BP = 167 mmHg
167  125
Z  3.0
14
BP = 97 mmHg
97  125
Z  2.0
14
What is the Usefulness of a Standard
Normal Score?
 It tells you how many SDs (s) an observation is

from the mean
 Thus, it is a way of quickly assessing how
“unusual” an observation is
 Example: Suppose the mean BP is 125 mmHg, and
standard deviation = 14 mmHg
 Is 167 mmHg an unusually high measure?
 If we know Z = 3.0, does that help us?
Fraction of Population
More than Z More than Z
Within Z SDs SDs above SDs above or
of the mean the mean below the mean
Z
1.0 68.27% 15.87% 31.73%

2.0 95.45% 2.28% 4.55%
2.5 98.76% 0.62 % 1.24%
3.0 99.73% 0.13% 0.27%
Why Do We Like The Normal
Distribution So Much?
 There is nothing “special” about standard

normal scores
 These can be computed for observations from any
sample/population of continuous data values
 The score measures how far an observation is
from its mean in standard units of statistical
distance
 But, if distribution is not normal, we may not
be able to use Z-score approach.
Review of Standard Normal
 Recall: Z~N(0,1) means that Z is normally

distributed with a mean ()=0 and standard
deviation ()=1.
 Tocalculate the probability that Z will have values

between a and b [P(a ≤ z ≤ b)] we calculate the
area under the standard normal curve between a
and b.
 These probability calculations can be done using

the table of Normal Distribution.
Standardization cont.
 The associated cumulative probability associated

with a normally distributed random variable X is:
F ( x)  P( X  x)
 X  x 
 P  
   
 x 
 P Z  
  
 P( Z  z )
Example
 Suppose X represents the weight of a 5-year old male child
sampled from a normal distribution (=43 lbs and =5 lbs).
Find the probability that the weight of the male child is less
Want this area:
than 35.5 lbs.
 X   35.5  43 
P( X  35.5)  P  
  5 
 P( Z  1.5)
 0.5  P( Z  1.5)
ANSWER: 0.067
Example
 Suppose X represents the weight of a 5-year old
male child sampled from a normal distribution (=43 lbs
and =5 lbs). Find the probability that the weight of the
male child is greater than 34 lbs and less than 52 lbs.
P(34  X  52)  P X  52  P X  34

 X   52  43   X   34  43 
 P    P  
  5    5 
 P( Z  1.8)  P( Z  1.8)
 0.5  P(0  Z  1.8)  (0.5  P(0  Z  1.8))
 0.4641  .4641  0.9282
Want this area:
Example
 Suppose X represents the weight of a 5-year old male child
sampled from a normal distribution (=43 lbs and =5 lbs).
Find the weight such that 20% of the children will weigh
more than this weight.
P( X  ?)  .20
 X   ? 43 
 P    0.20
  5 
 ? 43 
 P Z    0.20
 5 
Looking in Appendix B we see that the critical value of z  0.84 relates to an area
under the curve of 0.2995 or approximately 0.30.
? 43
  0.84
5
 ?  ((0.84) * (5))  43
 ?  47.2 lbs
This area is equal to 0.20
T-Distribution
 Similar to the standard normal in that it is unimodal,
bell-shaped and symmetric.
 The tail on the distribution are “thicker” than the
standard normal
 The distribution is indexed by “degrees of freedom”
(df).
 The degrees of freedom measure the amount of
information available in the data set that can be used for
estimating the population variance (df=n-1).
 Area under the curve still equals 1.
 Probabilities for the t-distribution with infinite df equals
those of the standard normal.
T-Distribution
 The table of t-distribution will give you the
probability to the right of a critical value –
i.e. area in the upper tail.
 We are only given the area (or probability)

for a few selected critical values for each
degree of freedom.
T-Distribution Example
 Fora t-curve from a sample of size 15 find the
area to the left of 2.145.
 Answer: df=15-1=14
In the table of the t~distribution, the area to
the right of 2.145 is 0.025.
Therefore the area to the left of 2.145 is:
1-0.025=0.975

Probability Distribution

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Probability Distribution

Transféré par

Droits d'auteur :

Formats disponibles

Probability Distribution

The Concept of ``Distribution''

 Poisson Distribution – random variable can

 A discrete random variable X has a finite number of

Grade F=0 D=1 C=2 B=3 A=4

 What is the probability that a student selected at random

 ANSWER: P(grade of 3 or 4)=P(X=3) + P(X=4)

 Between two values of a continuous random

A histogram is used to represent a discrete

80 100 120 140 160

The same histogram of BP measurements from a sample of 113 men.

80 100 120 140

Histogram of blood pressure measurements, this time for a sample of

80 100 120 140 160 180

The Probability Density for BP values in the entire population of men –

 The probability density is a smooth

80 100 120 140 160

The normal (Gaussian) distribution with the same mean

Q Then why do we spend so much time

 There are lots of normal distributions!

Two normal densities with different variances and

Mean = 125 mmHG

The 68-95-99.7 rule applied to the distribution

 It is customary to call a standard normal random

50% of probability in 50% of probability in

2.5% of probability 2.5% of

 Probability calculations are always

 The probability that a continuous variable

P(1  z  1)  P(1  z  0)  P(0  z  1)

P(1  z  1.58)  P(0  z  1.58)  P(0  z  1)

P(0.5  z )  P(0.5  z  0)  P(0  z )

What is this z value?

 “Z”is normal with mean 0 and standard

 It tells you how many SDs (s) an observation is

1.0 68.27% 15.87% 31.73%

 There is nothing “special” about standard

 Recall: Z~N(0,1) means that Z is normally

 Tocalculate the probability that Z will have values

 These probability calculations can be done using

 The associated cumulative probability associated

P(34  X  52)  P X  52  P X  34

 We are only given the area (or probability)

Vous aimerez peut-être aussi