Académique Documents
Professionnel Documents
Culture Documents
Distributions
Random Variables - Random outcomes corresponding
to subjects randomly selected from a population.
Probability Distributions - A listing of the possible
outcomes and their probabilities (discrete r.v.s) or their
densities (continuous r.v.s)
Normal Distribution - Bell-shaped continuous
distribution widely used in statistical inference
Sampling Distributions - Distributions corresponding
to sample statistics (such as mean and proportion)
computed from random samples
Discrete Probability Distributions
Discrete RV - Random variable that can
take on a finite (or countably infinite) set of
discontinuous possible outcomes (Y)
Discrete Probability Distribution - Listing
of outcomes and their corresponding
probabilities (y , P(y))
0 P( y ) 1 all y
P ( y ) 1
Example - Supreme Court Vacancies
Supreme Court Vacancies by Year 1837-
1975
Y # Vacancies in Randomly selected year
# Vacancies (y) Frequency (# of Years) Proportion (P(y))
0 81 81/139=.5827
1 43 43/139=.3094
2 14 14/139=.1007
3 1 1/139=.0072
>3 0 0/139=.0000
Total 139 1.0000
Source: R.J. Morrison (1977), FDR and the Supreme Court: An Example of the Use of Probability Theory in Political History,
History and Theory, Vol. 16, pp 137-146
Parameters of a P.D.
Mean (aka Expected Value) - Long run
average outcome
E (Y ) yP( y )
Standard Deviation - Measure of the typical
distance of an outcome from the mean
E (Y ) 2 ( y ) 2
P( y ) y 2
P ( y ) 2
Example - Supreme Court Vacancies
y P(y) yP(y) y2P(y)
0 .5827 .0000 .0000
1 .3094 .3094 .3094
2 .1007 .2014 .4028
3 .0072 .0216 .0648
Total 1.0000 .5324 .7770
yP( y ) .5324
y 2
P ( y ) 2
. 7770 (.5324 ) 2
.4936 .7025
Normal Distribution
Bell-shaped, symmetric family of distributions
Classified by 2 parameters: Mean () and standard
deviation (). These represent location and spread
Random variables that are approximately normal have
the following properties wrt individual measurements:
Approximately half (50%) fall above (and below) mean
Approximately 68% fall within 1 standard deviation of mean
Approximately 95% fall within 2 standard deviations of mean
Virtually all fall within 3 standard deviations of mean
Notation when Y is normally distributed with mean and
standard deviation :
Y ~ N ( , )
Normal Distribution
20
20
18
16
14
12
10 10
INCHESM
INCHESF
Cases weighted by PCTM
Cases weighted by PCTF
Y
Y ~ N ( , ) Z ~ N (0,1)
Probabilities of certain ranges of values and specific
percentiles of interest can be obtained through the
standard normal (Z) distribution
Standard Normal (Z) Distribution
Standard Normal Distribution Characteristics:
P(Z 0) = P(Y ) = 0.5000
P(-1 Z 1) = P(-Y +) = 0.6826
P(-2 Z 2) = P(-2Y +2) = 0.9544
P(Z za) = P(Z -za) = a (using Z-table)
Yp z p
Example - Adult Male Heights
Above what height do the tallest 5% of males lie above?
Step 1 - Y ~ N(69.1 , 2.6)
Step 2 - Want to determine 95th percentile (p = .95)
Step 3 - Since 100p > 50, a = 1-p = 0.05
zp = za = z.05 = 1.645
Step 4 - Y.95 = 69.1 + (1.645)(2.6) = 73.4
Y (Y ) Y
Here is a fixed constant and is a random variable
In practice will be unknown, and we will use sample data to
estimate or make statements regarding its value
Sampling Distributions and the Central
Limit Theorem
Sample statistics based on random samples are also random
variables and have sampling distributions that are probability
distributions for the statistic (outcomes that would vary across
samples)
When samples are large and measurements independent then
many estimators have normal sampling distributions (CLT):
Sample Mean:
Y ~ N ,
Sample Proportion: n
^ (1 )
~ N ,
n
Example - Adult Female Heights
2.5
Y ~ N 63.5, N (63.5,0.25)
100