Académique Documents
Professionnel Documents
Culture Documents
Probability Distributions
Probability: With random sampling or a
randomized experiment, the probability an
observation takes a particular value is the
proportion of times that outcome would occur in
a long sequence of observations.
Usually corresponds to a population proportion
(and thus falls between 0 and 1) for some real or
conceptual population.
Basic probability rules
Let A, B denotes possible outcomes
P(not A) = 1 P(A)
For distinct (separate) possible outcomes A and B,
P(A or B) = P(A) + P(B)
If A and B not distinct,
P(A and B) = P(A)P(B given A)
For independent outcomes, P(B given A) = P(B),
so P(A and B) = P(A)P(B).
Happiness (2008 GSS data)
Income Very Pretty Not too Total
-------------------------------
Above Aver. 164 233 26 423
Average 293 473 117 883
Below Aver. 132 383 172 687
------------------------------
Total 589 1089 315 1993
Let A = above average income, B = very happy
P(A) estimated by 423/1993 = 0.212 (a marginal probability),
P(not A) = 1 P(A) = 0.788
P(B) estimated by 589/1993 = 0.296
P(B given A) estimated by 164/423 = 0.388 (a conditional prob.)
P(A and B) = P(A)P(B given A) estimated by 0.212(0.388) = 0.082
(which equals 164/1993, a joint probability)
P(B) = 0.296, P(B given A) = 0.388, so A and B not independent.
If A and B were independent, then
P(A and B) = P(A)P(B) = 0.212(0.296) = 0.063
But, P(A and B) = 0.082.
Inference questions: In this sample, A and B are not
independent, but can we infer anything about whether
events involving happiness and events involving family
income are independent (or dependent) in the
population?
If dependent, what is the nature of the dependence?
(does higher income tend to go with higher happiness?)
Success of New England Patriots
Before season, suppose we let
A1 = win game 1
A2 = win game 2, , A16 = win game 16
If Las Vegas odds makers predict P(A1) = 0.55, P(A2)
= 0.45, then P(A1 and A2) = P(win both games) =
(0.55)(0.45) = 0.2475, if outcomes independent
If P(A1) = P(A2) = = P(A16) = 0.50 and indep.
(like flipping a coin), then
P(A1 and A2 and A3 and A16) = P(win all 16 games)
= (0.50)(0.50)(0.50) = (0.50)
16
= 0.000015.
Probability distribution
of a variable
Lists the possible outcomes for the random variable
and their probabilities
Discrete variable:
Assign probabilities P(y) to individual values y, with
0 ( ) 1, ( ) 1 P y P y s s E =
Example: Randomly sample 3 people and ask
whether they favor (F) or oppose (O)
legalization of same-sex marriage
y = number who favor (0, 1, 2, or 3)
For possible samples of size n = 3,
Sample y Sample y
(O, O, O) 0 (O, F, F) 2
(O, O, F) 1 (F, O, F) 2
(O, F, O) 1 (F, F, O) 2
(F, O, O) 1 (F, F, F) 3
If population equally split between F and O, these eight
samples are equally likely.
Probability distribution of y is then
y P(y)
0 1/8
1 3/8
2 3/8
3 1/8
(special case of binomial distribution, introduced in
Chap. 6). In practice, probability distributions are
often estimated from sample data, and then have the
form of frequency distributions
Probability distribution for y = happiness
(estimated from GSS)
Outcome y Probability
Very happy 0.30
Pretty happy 0.54
Not too happy 0.16
Example: GSS results on y = number of people
you knew personally who committed suicide in
past 12 months (variable suiknew).
Estimated probability distribution is
y P(y)
0 .895
1 .084
2 .015
3 .006
Like frequency distributions, probability
distributions have descriptive measures,
such as mean and standard deviation
Mean (expected value) -
( ) ( ) E Y yP y = =