Vous êtes sur la page 1sur 71

DAO1704/DSC1007 Lecture 4

Discrete Probability Distribution


Agenda
• Binomial Distribution
• Poisson Distribution
• Covariance and correlation
• Joint probability distributions and independence
• Linear Functions of a Random Variable
• Sums of Random Variables
Binomial Distribution
Summary Measures of Probability Distributions
Where do probability distributions come from?

I. Empirically (from data)

Example: KFC sells chicken in “buckets” of 2, 3, 4, 8, 12, 16 or 20 pieces.


Over the last week, orders for fried chicken had the following data:
pieces in order orders
2 170
xi Probability
3 200 2 0.170
3 0.200
4 260
4 0.260
8 165
8 0.165
12 120
12 0.120
16 50
16 0.050
20 35
20 0.035
1,000

Let X = number of pieces of chicken in an order.


Develop a probability distribution for X.
• If there are 10 tutors participating in this
experiment, what is the probability that 3
international students are chosen?
Binomial Distribution
n independent trials

Each trial has exactly two outcomes :


“success” or “failure”

Each trial has same probability:


success p, failure 1 p
Binomial Distribution

We also say that X obeys a binomial distribution with


parameters n and p : Binomial (n, p) or B(n,p)

Binomial Distribution

𝑛!
𝑃 𝑿=𝑥 = 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 for 𝑥 = 0, 1, . . . , 𝑛
𝑥! 𝑛−𝑥 !
Binomial Distribution

Expected Value and Variance

If X obeys a binomial distribution with parameters n and p ,


then the mean, variance and standard deviation of X are:

Mean 𝐸 𝑿 = 𝜇𝑿 = 𝑛𝑝

Variance 𝑉𝑎𝑟 𝑿 = 𝜎𝑿2 = 𝑛𝑝(1 − 𝑝)

Std deviation 𝜎𝑿 = 𝑛𝑝(1 − 𝑝)


Binomial Distribution
𝑛!
𝑃 𝐗=𝑥 = 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 for x = 0, 1, . . . , n
𝑥! 𝑛−𝑥 !

EXCEL Function : BINOMDIST (x, n, p, cumulative)

cumulative = 0 ( or FALSE) P(X=x)


1 ( or TRUE) P(X x)
Binomial Distribution

EXCEL Function : BINOMDIST (x, n, p, cumulative)

Example Summary :
number of lasers (out of 15) that will pass the test
X Binomial (15, 0.75) P (X = 15) = 0.013363
P (X ≥ 14) = 0.0802

P (X = 15) = BINOMDIST (15, 15, 0.75, 0)

P (X 14) = 1 P (X 13)
= 1 BINOMDIST (13, 15, 0.75, 1)
Application of Binomial Distribution
It all starts with a mysterious
phone call…

Hi Dear friend. This is Mike.


I am an investment broker from the L.Q.Z. Company.
… Yes! L.Q.Z. Now you remember ah. The L.Q.Z. loh.
I have a very good news to share with you.
We have done a very thorough research on 400 mutual funds.
You know what? We found a star fund that has beaten a
standard market index in 37 out of 52 weeks.
It all starts with a mysterious
phone call…

YES! 37 out of 52 weeks!


If you invested in this fund, a simple math can tell you how
much you can earn.
Your math must be good, right? Sure one…
So… why bother studying so hard?
Invest with me and you will be rich soon.
Question 1

• We say a fund beats the market purely by chance if


each week the fund has a fifty-fifty chance of
beating the market index, independently of its
performance in other weeks.
• What is the probability for such fund to beat the
market at least 37 out of 52 weeks?
Question 2

• Suppose that all the 400 funds beat market purely


by chance. What is the probability that the best of
them beats the market at least 37 out of 52 weeks?

• Conclusion?
Poisson Distribution
Bortkiewicz(1898)

The Law of Small Numbers


Year GC C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C14 C15
1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0
1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1
1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0
1878 1 2 2 1 1 0 0 0 0 0 1 0 1 0
1879 0 0 0 1 1 2 2 0 1 0 0 2 1 0
1880 0 3 2 1 1 1 0 0 0 2 1 4 3 0
1881 1 0 0 2 1 0 0 1 0 1 0 0 0 0
1882 1 2 0 0 0 0 1 0 1 1 2 1 4 1
1883 0 0 1 2 0 1 2 1 0 1 0 3 0 0
1884 3 0 1 0 0 0 0 1 0 0 2 0 1 1
1885 0 0 0 0 0 0 1 0 0 2 0 1 0 1
1886 2 1 0 0 1 1 1 0 0 1 0 1 3 0
1887 1 1 2 1 0 0 3 2 1 1 0 1 2 0
1888 0 1 1 0 0 1 1 0 0 0 0 1 1 0
1889 0 0 1 1 0 1 1 0 0 1 2 2 0 2
1890 1 2 0 2 0 1 1 2 0 2 1 1 2 2
1891 0 0 0 1 1 1 0 1 1 0 3 3 1 0
1892 1 3 2 0 1 1 3 0 1 1 0 1 1 0
1893 0 1 0 0 0 1 0 2 0 0 1 3 0 0
1894 1 0 0 0 0 0 0 0 1 0 1 1 0 0
Poisson Distribution

• Poisson Distribution
was derived by the
French Mathematician
Siméon Poisson in
1837.
• His name is one of the
72 names inscribed on
the Eiffel Tower.
Poisson Distribution
Useful for modelling the number of occurrences of an event over a
specified interval of time or space.
Examples :
•number of customer orders received in one hour
•number of failures in a large computer system per month

Properties
•Probability of an occurrence is the same for any two intervals of
equal length.
•Occurrences in nonoverlapping intervals are independent of one
another.
Poisson Distribution

• Examples and Applications:


– The number of phone calls arriving at a call
centre per minute.
– The number of goals in sports involving two
competing teams.
– The number of infant death per year.
– The number defaults found per newly TOP
flat.
–…
Poisson Distribution

A random variable X is said to be a Poisson r.v. with parameter


(> 0) if it has the probability function
𝑒 −𝝀 𝝀𝑖
𝑃 𝐗=𝑖 = for 𝑖 = 0, 1, 2, . . .
𝑖!

Note:
X is a discrete r.v. that takes on values 0, 1, 2, . . .
𝑒 −𝝀 𝝀0
𝑃 𝐗=0 = = 𝑒 −𝝀
0!
𝑒 −𝝀 𝝀1
𝑃 𝐗=1 = = 𝑒 −𝝀 𝝀
1!
Poisson Distribution

A random variable X is said to be a Poisson r.v. with parameter


(> 0) if it has the probability function
𝑒 −𝝀 𝝀𝑖
𝑃 𝐗=𝑖 = for 𝑖 = 0, 1, 2, . . .
𝑖!

It can be shown that Thus, parameter can


be interpreted as the
Mean E(X) =
average number of
Variance Var (X) = occurrences per unit
time or space
Poisson Distribution

X is said to be a Poisson r.v. with parameter (> 0) if


𝑒 −𝝀 𝝀𝑖
𝑃 𝐗=𝑖 = for 𝑖 = 0, 1, 2, . . .
𝑖!

Example
Patients arrive at the A & E of a hospital at the average rate of 6 per hour
on weekend evenings. What is the probability of 4 arrivals in 30 minutes
on a weekend evening?

Can expect patient arrivals to be approximately Poisson.


Average arrival rate is 6 / hour.

Let X be the number of patient arrivals in 30 minutes


X is Poisson with parameter = 3

𝑒 −𝟑 𝟑4 .0497871 (81)
𝑃 𝐗=4 = = ≅ 0.1680
4! 24
Poisson Distribution
𝑒 −𝝀 𝝀𝑖
𝑃 𝐗=𝑖 = for 𝑖 = 0, 1, 2, . . .
𝑖!

Excel Function : POISSON (x, , cumulative)

cumulative = 0 ( or FALSE) P(X=x)


1 ( or TRUE) P(X x)
Poisson Distribution

Excel Function : POISSON (x, , cumulative)

Example Summary :
Let X be the number of patient arrivals in 30 minutes
X is Poisson with parameter =3
P (X = 4) = 0.1680

P (X = 4) = POISSON (4, 3, 0)
Example
• In 1898 L. J. Bortkiewicz published a book entitled The Law of Small
Numbers. He used data collected over 20 years to show that the
number of soldiers killed by horse kicks each year in each corps in
the Prussian cavalry followed a Poisson distribution with a mean of
0.63.
– What is the probability that at least one deaths caused by horse kick in a corps
in a year?
– What is the probability that there is no death caused by horse kick in a corps
over 6 years?
Linear Function of a Random
Variable
Linear Functions of a Random Variable
Example
Suppose daily demand for croissants at a bakery shop is given by

Daily Demand Probability Let X = daily demand for croissants


60 0.05
64 0.15 We can easily compute
68 0.20
E (X) = 71.15
72 0.25
75 0.15 Var (X) = 29.5275
77 0.10
80 0.10

Suppose it costs $135 per day to run the croissant operation, and that
the cost of producing one croissant is $0.75.
Daily cost of croissant operations = 0.75 X + 135
Linear Functions of a Random Variable
Example

X Probability Y = .75X + 135


E (X) = 71.15
60 0.05 180.00
Var (X) = 29.5275
64 0.15 183.00

68 0.20 186.00

72 0.25 189.00 Y = 0.75 X + 135


191.25
75 0.15 E (Y) = ?
77 0.10 192.75

80 0.10 195.00 Var (Y) = ?

How are the means and variances related ?

E (Y) = 0.75 E (X) + 135


Var (Y) = 0.752 Var (X)
Linear Functions of a Random Variable

Note:
If Y = a X + b Formulas apply
to continuous
E (Y) = a E (X) + b
r.v.’s as well
Var (Y) = a2 Var (X)
Covariance and Correlation
Covariance and Correlation

How do we summarize the relationship between two variables?

Specifically : how do we summarize what we observe in a scatter


plot?

Examples:
Unemployment rate vs Crime rate
Stock market vs Property market
Time spent on DSC1007 vs DSC1007 Exam marks
Covariance and Correlation
Q : How do we describe the relationship between two rv’s?
Example: Chain of upscale cafés sells gourmet hot coffees and cold beverages.
From past sales data, daily sales at one of their café obey the following
probability distribution for (X, Y),
X = # hot coffees, Y = # cold beverages sold per day
Probability No. of Hot Coffees Sold No. of Cold Drinks Sold
pi xi yi
0.10 360 360
0.10 790 110
0.15 840 30
0.05 260 90
0.15 190 450
0.10 300 230
0.10 490 60
0.10 150 290
0.10 550 140
0.05 510 290
Mean
Standard Deviation
From the scatter plot, we can conclude that
the sales of hot coffees and the sales of
cold beverage are negatively related.

Is it correct?
Covariance and Correlation
We now define the covariance of two random variables X and Y
with means X and Y :

Probability X Y
P ( X = x1, Y = y1 ) x1 y1
P ( X = x 2 , Y = y2 ) x2 y2

P ( X = xN, Y = yN ) xN yN

Covariance
𝐶𝑜𝑣 𝑿, 𝒀 = 𝐸 𝑿 − 𝜇𝑋 𝒀 − 𝜇𝑌
= 𝑃 𝑿 = 𝑥𝑖 , 𝒀 = 𝑦𝑖 𝑥𝑖 − 𝜇𝑋 𝑦𝑖 − 𝜇𝑌
𝑖
Covariance and Correlation

Covariance
𝐶𝑜𝑣 𝑿, 𝒀 = 𝐸 𝑿 − 𝜇𝑋 𝒀 − 𝜇𝑌
= 𝑃 𝑿 = 𝑥𝑖 , 𝒀 = 𝑦𝑖 𝑥𝑖 − 𝜇𝑋 𝑦𝑖 − 𝜇𝑌
𝑖

Observe from the above that:

2
𝐶𝑜𝑣(𝑿, 𝑿) = 𝐸 𝑿 − 𝜇𝑋 = 𝑉𝑎𝑟 𝑿

𝐶𝑜𝑣 𝑿, 𝒀 = 𝐸 𝑿𝒀 − 𝐸 𝑿 𝐸(𝒀)
= 𝐸 𝑿𝒀 − 𝜇𝑋 𝜇𝑌
The bigger the covariance is, the stronger
the relationship is.

Is it true?
Covariance and Correlation

Covariance
𝐶𝑜𝑣 𝑿, 𝒀 = 𝐸 𝑿 − 𝜇𝑋 𝒀 − 𝜇𝑌
= 𝑃 𝑿 = 𝑥𝑖 , 𝒀 = 𝑦𝑖 𝑥𝑖 − 𝜇𝑋 𝑦𝑖 − 𝜇𝑌
𝑖

We introduce a standardized measure of interdependence


between two rv’s :

𝐶𝑜𝑣(𝑿, 𝒀)
Correlation 𝐶𝑜𝑟𝑟 𝑿, 𝒀 =
𝜎𝑋 𝜎𝑌
Comments :
• The measure of correlation is unit-free.
• Corr (X, Y) is always between 1.0 and 1.0
Covariance and Correlation

𝐶𝑜𝑣(𝑿, 𝒀)
Correlation 𝐶𝑜𝑟𝑟 𝑿, 𝒀 =
𝜎𝑋 𝜎𝑌

Corr (X,Y) = 1.0 perfect positive linear relationship


= 0 no linear relationship between X and Y
= 1.0 perfect negative linear relationship
Covariance and Correlation
If higher than average values of X are apt to occur with higher
than average values of Y, then Cov(X, Y) > 0 and Corr(X, Y) > 0.
X and Y are positively correlated.

If higher than average values of X are apt to occur with lower than
average values of Y, then Cov(X, Y) < 0 and Corr(X, Y) < 0.
X and Y are negatively correlated.
Given the following two variables:

X Y
1 2
1 3
1 4
1 5

Without calculation, are they correlated?


Given the following two variables:

X Y
1 2
1 3
1 4
1 5
2 1
3 1
4 1
5 1

Without calculation, are they correlated?


Correlation VS Causality
Covariance and Correlation

Correlation is not the same as Causality!

Common fallacy
A occurs in correlation with B
Therefore : A causes B
Reverse Causation
The more firemen fighting fire, The bigger the fire is observed to be

Therefore, firemen cause fire.


Bidirectional Causation

Higher jobless claims causes declining of stock market


Third Factor Causation

• Young children who sleep with the light on are


much more likely to develop myopia in later life.
• Therefore, sleeping with the light on causes
myopia.
Third Factor Causation

• As ice cream sales increase, the rate of drowning


deaths increases sharply.
• Therefore, ice cream causes drowning.
Pure Coincidence
David Leinweber’s Finding

Guess: what is the indicator with the most


statistically significant correlation with the S&P
500 index?

Butter production in Bangladesh

Stupid Data Miner Tricks: Overfitting the S&P 500


Explain this???
• The University of California, Berkeley was sued for bias against women who had
applied for admission to graduate school here.

Applicants Admitted
Department Men Women
Men 8442 44%
Applicants Admitted Applicants Admitted
Women 4321 35%
A 825 62% 108 82%

B 560 63% 25 68%

C 325 37% 593 34%

D 417 33% 375 35%

E 191 28% 393 24%

F 272 6% 341 7%
Simpson’s Paradox

• Simpson’s Paradox is a paradox in which a


correlation present in different group is reversed
when the groups are combined.
Explanation
• Women tended to apply to competitive departments with low rates of admission even
among qualified applicants, whereas men tended to apply to less competitive
departments with high rates of admission among the qualified applicants.
Joint Probability Distribution
Covariance and Correlation
Q : How do we describe the relationship between two rv’s?
Example: Chain of upscale cafés sells gourmet hot coffees and cold beverages.
From past sales data, daily sales at one of their café obey the following
probability distribution for (X, Y),
X = # hot coffees, Y = # cold beverages sold per day
Probability No. of Hot Coffees Sold No. of Cold Drinks Sold
pi xi yi
0.10 360 360
0.10 790 110
0.15 840 30
0.05 260 90
0.15 190 450
0.10 300 230
0.10 490 60
0.10 150 290
0.10 550 140
0.05 510 290
Mean
Standard Deviation
Joint Probability Distributions

Consider two random variables X and Y that assume values given by

Probability X Y
p1 P ( X = x1, Y = y1 ) x1 y1
p2 P ( X = x 2 , Y = y2 ) x2 y2

pN P ( X = xN, Y = yN ) xN yN

Denote by f(x i ,y i )

f is called the joint probability distribution function of ( X , Y )


Joint Probability Distributions
The concept of independent events leads quite naturally to a similar
definition for independent random variables.

Two random variables X and Y are said to be independent if


P ( X = x ,Y= y ) = P ( X = x ) P(Y= y )

Roughly : X and Y are independent if knowing the value of one


does not change the distribution of the other.

Thus, if X and Y are independent, then

E ( XY ) = E ( X) E ( Y)

It follows that if X and Y are independent, then

Cov ( X , Y ) = 0 ( or Co rr ( X , Y ) = 0 )
Covariance and Correlation
We know : independent random variables are always uncorrelated.

But : dependent random variables may also be uncorrelated!!

Example : Consider r.v.’s X and Y with the following joint probability distribution
P (X , Y) X Y
1/3 1 1
1/3 0 1
1/3 1 1

Check : are X and Y independent?


P(X=1)=1/3 P(Y=1)=2/3 P(X=1, Y=1)=1/3

Check : are X and Y uncorrelated?


check whether Corr(X,Y)=0???
Sum of Random Variables
Sum of Random Variables
We have seen that:
Note:
X is a r.v. a X + b is a r.v.
Formulas apply to
E (a X + b) = a E (X) + b continuous r.v.’s as
well
Var (a X + b) = a2 Var (X)

Suppose now we have two r.v.’s : X and Y


Then the sum X + Y is also a r.v.
Q: What is the mean and variance of X + Y ?

Similarly for the weighted sum aX + bY


Sum of Random Variables
E(X) = 457, Var (X) = 59,671
E(Y) = 210, Var (Y) = 21,210
Example: Chain of upscale cafés sells gourmet hot coffees and cold beverages.
Covat(X,Y)
From past sales data, daily sales = their
one of 27,260
café obey the following
probability distribution for (X, Y),
X = # hot coffees, Y = # cold beverages sold per day
Probability No. of Hot Coffees Sold No. of Cold Drinks Sold
pi xi yi
0.10 360 360
0.10 790 110
0.15 840 30
Suppose
0.05: cold beverages
260 (Y) are $2.50/glass;
90
0.15
0.10
hot coffees (X)190
300
$1.50/cup. 450
230
0.10 490 60
0.10 150 290
0.10 550 140
0.05 510 290
Mean X = 457.00 Y = 210.00
Standard Deviation 244.28 145.64
Sum of Random Variables

Mean
E(aX + bY) = aE(X) + bE(Y)

Variance

Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y)


or :
Var(aX + bY) = a2Var(X) + b2Var(Y) + 2ab X Y Corr(X,Y)

Note: Formulas apply to continuous r.v.’s as well


Sum of Random Variables

If X and Y are independent Cov(X,Y)= 0

Variance
Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y)
Sum of Random Variables
E(X) = 457, Var (X) = 59,671
cold beverages (Y) are $2.50/glass;
E(Y) = 210, Var (Y) = 21,210
hot coffees (X) $1.50/cup.
Cov (X,Y) = 27,260
Determine the following :
mean and standard deviation of daily sales of cold beverages
E(2.5Y) = 2.5
? E(Y) SD(2.5Y) = ?2.5 SD(Y) = 2.5√Var (Y)

mean and standard deviation of daily sales of hot coffees


E(1.5X) = 1.5
? E(X) SD(1.5X) = ?1.5 SD(X) = 1.5√Var(X)

mean and standard deviation of total daily sales of all beverages

E(1.5X + 2.5Y) = ? SD(1.5X + 2.5Y) = ?


Sum of Random Variables
E(X) = 457, Var (X) = 59,671
cold beverages (Y) are $2.50/glass;
E(Y) = 210, Var (Y) = 21,210
hot coffees (X) $1.50/cup.
Cov (X,Y) = 27,260
Determine the following :
mean and standard deviation of daily sales of cold beverages
E(2.5Y) = 2.5
? E(Y) SD(2.5Y) = ?2.5 SD(Y) = 2.5√Var (Y)

mean and standard deviation of daily sales of hot coffees


E(1.5X) = 1.5
? E(X) SD(1.5X) = ?1.5 SD(X) = 1.5√Var(X)

mean and standard deviation of total daily sales of all beverages

E(1.5X + 2.5Y) = ? SD(1.5X + 2.5Y) = ?√ Var (1.5X + 2.5Y)

= √ 1.52 Var(X) + 2.52 Var(Y) + 2*1.5*2.5*Cov(X,Y)