Vous êtes sur la page 1sur 110

ECON509 Probability and Statistics

Slides 2

Bilkent

This Version: 21 Oct 2013

(Bilkent) ECON509 This Version: 21 Oct 2013 1 / 110


Introduction

In this part of the lectures, our interest is on the behaviour of functions of a random
variable, X , whose cdf we know.
This part is heavily based on Casella & Berger, Chapters 2 and 3.
Speci…cally, we cover material from Sections 2.1, 2.2, 2.3, 2.6, 3.1, 3.2, 3.3 and 3.5.

(Bilkent) ECON509 This Version: 21 Oct 2013 2 / 110


Distributions of Functions of a Random Variable

If X is a random variable with cdf FX , then Y = g (X ) is also a random variable.


Importantly, since Y is a function of X , we can determine its random behaviour in
terms of the behaviour of X .
Then, for any set A,
P (Y 2 A ) = P (g (X ) 2 A ).
This clearly shows that the distribution of Y depends on the function g ( ) and the
cdf FX .
Formally,
g (x ) : X ! Y ,
where X and Y are the sample spaces of X and Y , respectively.

(Bilkent) ECON509 This Version: 21 Oct 2013 3 / 110


Distributions of Functions of a Random Variable

Notice that the mapping g ( ) is associated with the inverse mapping g 1( ), a


mapping from the subsets of Y to those of X :
1
g (A ) = fx 2 X : g (x ) 2 A g. (1)

Therefore, the mapping g 1 ( ) takes sets into sets; that is, g 1 (A ) is the set of
points in X that g (x ) takes into the set A.
If A = fy g, a point set, then
1
g (fy g) = fx 2 X : g (x ) = y g.

Now, if Y = g (X ), then for all A 2 Y ,

P (Y 2 A ) = P (g (X ) 2 A )
= P (fx 2 X : g (x ) 2 A g)
1
= P (X 2 g (A )), (2)

where the last line follows from (1). This de…nes the probability distribution of Y .

(Bilkent) ECON509 This Version: 21 Oct 2013 4 / 110


Distributions of Functions of a Random Variable

If X is a discrete random variable, then X is countable. The sample space for


Y = g (X ) is Y = fy : y = g (x ), x 2 X g, which is also a countable set. Thus, Y is
also a discrete random variable.
Then, using (2), the pdf for Y is given by

fY (z ) = P (Y = z ) = ∑ P (X = q ) = ∑ fX (q ), for z 2 Y ,
q 2g 1 (z ) q 2g 1 (z )

and fY (z ) = 0 for z 2
/ Y.
Note that ∑q 2g 1 (z ) P (X = q ) simply amounts to adding up the probabilities
P (X = q ) for all q such that g (q ) = z.

(Bilkent) ECON509 This Version: 21 Oct 2013 5 / 110


Distributions of Functions of a Random Variable
Example (2.1.1): A discrete random variable X has a binomial distribution if its
pmf is of the form
n x
fX (x ) = P (X = x ) = p (1 p )n x
, x = 0, 1, ..., n,
x
where n is a positive integer and 0 p 1. Values such as n and p are called
parameters of a distribution. Di¤erent parameter values imply di¤erent distributions.
Consider Y = g (X ) where g (x ) = n x . Therefore, Y = n X . Here
X = f0, 1, ..., n g and Y = fy : y = g (x ), x 2 X g = f0, 1, ..., n g.
For any y 2 Y , n x = g (x ) = y if and only if x = n y . Thus, g 1 (y ) is the
single point x = n y .
Now,
n
fY (y ) = ∑ fX (x ) = fX (n y) =
n y
pn y
(1 p )n (n y)

x 2g 1 (y )

n
= (1 p )y p n y
,
y

since (n n y ) = (n n!y )!y ! = (yn ).


Therefore, Y also has a binomial distribution, but with parameters n and 1 p.
(Bilkent) ECON509 This Version: 21 Oct 2013 6 / 110
Distributions of Functions of a Random Variable

Behind all these dry results, our main objective actually is to obtain simple formulas
for the cdf and pdf of Y in terms of the mapping g ( ) and the cdf and pdf of X .
Now, the cdf of Y = g (X ) is

F Y (y ) = P (Y y ) = P (g (X ) y)
= P (fx 2 X : g (x ) y g)
Z
= fX (x )dx .
fx 2X :g (x ) y g

However, in some cases it might be di¢ cult to identify the set fx 2 X : g (x ) y g.

(Bilkent) ECON509 This Version: 21 Oct 2013 7 / 110


Distributions of Functions of a Random Variable
First, we consider the following result, which is not of direct interest, but it will help
us to obtain a useful theorem.
Theorem (2.1.3): Let X have cdf FX (x ), let Y = g (X ) and let X and Y be
de…ned as

X = f x : fX (x ) > 0 g and Y = fy : y = g (x ) for some x 2 X g. (3)

1 If g is an increasing function on X , F Y (y ) = F X (g 1 (y )) for y 2 Y .


2 If g is a decreasing function on X and X is a continuous random variable,
F Y (y ) = 1 F X (g 1 (y )) for y 2 Y .
Proof: If g is increasing
1 1
fx 2 X : g (x ) yg = fx 2 X : g (g (x )) g (y )g
= fx 2 X : x g 1 (y )g.

If, on the other hand g is decreasing, then


1 1
fx 2 X : g (x ) yg = fx 2 X : g (g (x )) g (y )g
= fx 2 X : x g 1 (y )g.

Note the change of direction of the inequality in the second case.

(Bilkent) ECON509 This Version: 21 Oct 2013 8 / 110


Distributions of Functions of a Random Variable

Now, for increasing g (x ),


Z Z g 1
(y )
1
F Y (y ) = fX (x )dx = fX (x )dx = FX (g (y )).
fx 2X :x g 1 (y )g ∞

Finally, for decreasing g (x ),


Z Z ∞ Z g 1
(y )
F Y (y ) = fX (x )dx = fX (x )dx = 1 fX (x )dx
fx 2X :x g 1 (y )g g 1 (y ) ∞
1
= 1 F X (g (y )).

Now comes the more interesting result.

(Bilkent) ECON509 This Version: 21 Oct 2013 9 / 110


Distributions of Functions of a Random Variable

Theorem (2.1.5): Let X have pdf fX (x ) and let Y = g (X ), where g is a monotone


function. Let X and Y be de…ned as in (3). Suppose that fX (x ) is continuous on X
and that g 1 (y ) has a continuous derivative on Y . Then the pdf of Y is given by
(
d
fX (g 1 (y )) dy g 1 (y ) y 2 Y
fY (y ) = .
0 otherwise

Proof: From Theorem (2.1.3) we have, by the chain rule,


(
d
d fX (g 1 (y )) dy g 1 (y ) if g is increasing
fY (y ) = F Y (y ) = 1 d 1 ,
dy fX (g (y )) dy g (y ) if g is decreasing

which is identical to the result given in Theorem (2.1.5).

(Bilkent) ECON509 This Version: 21 Oct 2013 10 / 110


Distributions of Functions of a Random Variable

Example (2.1.4): Suppose X fX (x ) = 1 for 0 < x < 1 and 0 otherwise, which is


the uniform (0 , 1 ) distribution. Observe that FX (x ) = x , 0 < x < 1. We now make
the transformation Y = g (X ) = log X . Then,
d 1
g 0 (x ) = ( log x ) = <0 for 0 < x < 1;
dx x
hence, g (x ) is monotone and has a continuous derivative on 0 < x < 1. Also,
Y = (0, ∞). Observe that g 1 (y ) = e y . Then, using Theorem (2.1.5),
y
fY (y ) = 1 e if 0 < y < ∞
y
= e if 0 < y < ∞.

(Bilkent) ECON509 This Version: 21 Oct 2013 11 / 110


Distributions of Functions of a Random Variable

Example (2.1.6): Let fX (x ) be the gamma pdf


1
f (x ) = xn 1
e x /β
, 0 < x < ∞,
(n 1)!βn

where β > 0 and n is a positive integer. We want to …nd the pdf of g (X ) = 1/X .
Note that here the support sets X and Y are both the interval (0, ∞). Now,
d
y = g (x ), and so, g 1 (y ) = 1/y and dy g 1 (y ) = 1/y 2 . By Theorem (2.1.5), for
y 2 (0, ∞), we obtain

1 d 1 1 1 1/( βy ) 1
fY (y ) = fX (g (y )) g (y ) = e
dy (n 1)!βn y n 1 y2
1 1 1/( βy )
= e .
(n 1)!βn y n +1

This is a special case of a pdf known as the inverted gamma pdf.

(Bilkent) ECON509 This Version: 21 Oct 2013 12 / 110


Distributions of Functions of a Random Variable
In many cases, g ( ) may be neither increasing nor decreasing. However, as long as it
is monotone over certain intervals, we are still doing …ne.
Example (2.1.7): Suppose X is a continuous random variable. For y > 0, the cdf of
Y = X 2 is
p p
F Y (y ) = P (Y y ) = P (X 2 y ) = P ( y X y ).

Because x is continuous, we can drop the equality from the left endpoint and obtain
p p
F Y (y ) = P ( y <X y)
p p
= P (X y ) P (X y)
p p
= FX ( y ) FX ( y ).

Then,
d d p p
fY (y ) = F (y ) = [F ( y ) F X ( y )]
dy Y dy X
1 p 1 p
= p f ( y ) + p fX ( y ).
2 y X 2 y

Importantly, the pdf of Y consists of two pieces which represent the intervals where
g (x ) = x 2 is monotone.
(Bilkent) ECON509 This Version: 21 Oct 2013 13 / 110
Distributions of Functions of a Random Variable

The above idea can be extended to cases where we need a larger number of intervals
to obtain, if you like, interval-by-interval monotonicity.
Theorem (2.1.8): Let X have pdf fX (x ), let Y = g (X ) and de…ne the sample
space X as in (3). Suppose there exists a partition A0 , A1 , A2 , ..., Ak of X such that
P (X 2 A0 ) = 0 and fX (x ) is continuous on each Ai . Further, suppose there exist
functions g1 (x ), ..., gk (x ) de…ned on A1 , ..., Ak , respectively satisfying
1 g (x ) = g i (x ), for x 2 A i ,
2 g i (x ) is monotone on A i ,
3 the set Y = fy : y = g i (x ) for some x 2 A i g is the same for each i = 1, ..., k ,
4 g i 1 (y ) has a continuous derivative on Y , for each i = 1, ..., k .
Then, (
1 1
∑ki=1 fX (gi (y )) d
dy gi (y ) y 2Y
fY (y ) = .
0 otherwise

(Bilkent) ECON509 This Version: 21 Oct 2013 14 / 110


Distributions of Functions of a Random Variable
Example (2.1.9): Let X have the standard normal distribution,
1 x 2 /2
fX (x ) = p e , ∞ < x < ∞.

Consider Y = X 2 . The function g (x ) = x 2 is monotone on ( ∞, 0) and on (0, ∞).


The set Y = (0, ∞). Applying Theorem (2.1.8), we take

A0 = f0 g;
p
A1 = ( ∞, 0), g1 (x ) = x 2 , g1 1 (y ) = y;
p
A2 = (0, ∞), g2 (x ) = x 2 , 1
g2 (y ) = y .

Then, the pdf of Y is


1 p 2 1 1 p 2 1
fY (y ) = p e ( y ) /2 p + p e ( y ) /2 p
2π 2 y 2π 2 y
1 1
= p p e y /2 , 0 < y < ∞.
2π y
This is the pdf of a chi squared random variables with 1 degree of freedom. You will
use this distribution frequently in your later econometrics courses.

(Bilkent) ECON509 This Version: 21 Oct 2013 15 / 110


Distributions of Functions of a Random Variable

We …nish with another important result known as the Probability Integral Transform.
Theorem (2.1.10): Let X have continuous cdf FX (x ) and de…ne the random
variable Y as Y = FX (X ). Then, Y is uniformly distributed on (0, 1), that is

P (Y y) = y, 0 < y < 1.

Proof: For Y = FX (X ) we have, for 0 < y < 1,

P (Y y) = P (F X (X ) y)
= P (FX 1 [FX (X )] FX 1 (y ))
= P (X FX 1 (y ))
= FX (FX 1 (y ))
= y.

At the endpoints we have P (Y y ) = 1 for y 1 and P (Y y ) = 0 for y 0.


Hence, Y has a uniform distribution.

(Bilkent) ECON509 This Version: 21 Oct 2013 16 / 110


Distributions of Functions of a Random Variable

Why is this result useful?


This result connects any random variable with some cdf FX (x ) with a uniformly
distributed random variable. Hence, if we want to simulate random numbers from
some distribution FX (x ), all we have to do is to generate uniformly distrbuted
random variables, Y , and then solve for FX (x ) = y . As long as we can compute
FX 1 (y ), we can generate random numbers from the distribution FX (x ).
This result is also employed when using Copula methods. Copulas are used to model
dependence between random variables in …nancial econometrics.

(Bilkent) ECON509 This Version: 21 Oct 2013 17 / 110


Expected Value

In this section, we will introduce one of the most widely used concepts in
econometrics, the expected value.
As we will see in more detail, this is one of the moments that a random variable can
possess.
This concept is akin to the concept of “average.” The standard “average” is an
arithmetic average where all available observations are weighted equally.
The expected value, on the other hand, is the average of all possible values a
random variable can take, weighted by the probability distribution.
The question is, which value would we expect the random variable to take on
average.

(Bilkent) ECON509 This Version: 21 Oct 2013 18 / 110


Expected Value

De…nition (2.2.1): The expected value or mean of a random variable g (X ),


denoted by E [g (X )], is
( R

E [g (X )] = ∞ g (x )fX (x )dx if X is continuous
,
∑x 2X g (x )fX (x ) = ∑x 2X g (x )P (X = x ) if X discrete

provided that the integral or sum exists. If E [jg (X )j] = ∞, we say that E [g (X )]
does not exist.
In both cases, the idea is that we are taking the average of g (x ) over all of its
possible values (x 2 X ), where these values are weighted by the respective value of
the pdf, fX (x ).

(Bilkent) ECON509 This Version: 21 Oct 2013 19 / 110


Expected Value
Example (2.2.2): Suppose X has an exponential (λ) distribution, that is, it has pdf
given by
1
fX (x ) = e x /λ , 0 x < ∞, λ > 0.
λ
Then,
Z ∞ ∞ Z ∞
1 x /λ x /λ x /λ
E [X ] = xe dx = xe + e dx (4)
0 λ 0 0
Z ∞
x /λ
= e dx = λ. (5)
0

To obtain this result, we use a method called integration by parts. This is based on
Z Z
udv = uv vdu.

Then, taking

u = x, du = 1dx ,
x /λ 1 x /λ
v = e , dv = λ e dx ,

gives (4).

(Bilkent) ECON509 This Version: 21 Oct 2013 20 / 110


Expected Value

To obtain (5), notice that, by L’Hôpital’s Rule,


d
x dx x 1
lim = lim = lim 1 x /λ
= 0.
x !∞ e x /λ x !∞ d e x /λ x !∞ λ e
dx

Finally, Z ∞ ∞
x /λ x /λ
e dx = λe = λ.
0 0

(Bilkent) ECON509 This Version: 21 Oct 2013 21 / 110


Expected Value

A very useful property of the expectation operator is that it is a linear operator.


For example, consider some X such that E [X ] = µ.
Then, for two constants a and b,

E [a + Xb ] = a + E [Xb ] = a + bE [X ] = a + bµ.

Notice that, clearly, the expectation of a constant is equal to itself.

(Bilkent) ECON509 This Version: 21 Oct 2013 22 / 110


Expected Value
More generally, we have the following results.
Theorem (2.2.5): Let X be a random variable and let a, b and c be constants.
Then for any functions g1 (x ) and g2 (x ) whose expectations exist,
1 E [ag 1 (X ) + bg 2 (X ) + c ] = aE [g 1 (X )] + bE [g 2 (X )] + c .
2 If g 1 (x ) 0 for all x , then E [g 1 (X )] 0.
3 If g 1 (x ) g 2 (x ) for all x , then E [g 1 (X )] E [g 2 (X )].
4 If a g 1 (x ) b for all x , then a E [g 1 (X )] b.
Proof: The most useful of these results is (1). Observe that, by de…nition,
Z ∞
E [ag1 (X ) + bg2 (X ) + c ] = [ag1 (x ) + bg2 (x ) + c ]fX (x )dx

Z ∞ Z ∞
= ag1 (x )fX (x )dx + bg2 (x )fX (x )dx
∞ ∞
Z ∞
+ cfX (x )dx

Z ∞ Z ∞
= a g1 (x )fX (x )dx + b g2 (x )fX (x )dx
∞ ∞
Z ∞
+c fX (x )dx

= aE [g1 (X )] + bE [g2 (X )] + c 1.

(Bilkent) ECON509 This Version: 21 Oct 2013 23 / 110


Expected Value

Also consider (3). Now,


Z ∞ Z ∞
E [g1 (X )] E [g2 (X )] = g1 (x )fX (x )dx g2 (x )fX (x )dx
∞ ∞
Z ∞
= [g1 (x ) g2 (x )]fX (x )dx .

But we know that g1 (x ) g2 (x ) 0 for all x ! So, we are integrating over a function
which is nonnegative everywhere, and we are weighting by fX (x ) 0 for all x .
Hence, E [g1 (X )] E [g2 (X )] 0. The rest can be shown similarly.

(Bilkent) ECON509 This Version: 21 Oct 2013 24 / 110


Expected Value

When evaluating expectations of nonlinear functions of X , we can proceed in one of


two ways. From the de…nition of E [g (X )], we could directly calculate
Z ∞
E [g (X )] = g (x )fX (x )dx .

But we could also …nd the pdf fY (y ) of Y = g (X ) and we would have


Z ∞
E [g (X )] = E [Y ] = yfY (y )dy .

Let’s consider the following examples. We will …rst …nd the pdf of the inverted
gamma distribution. Then, we will use this result to …nd the expectation of a
random variable which has the inverted gamma distribution.

(Bilkent) ECON509 This Version: 21 Oct 2013 25 / 110


Expected Value
Example (2.2.7): Let X have a uniform distribution, such that
(
1 if 0 x 1
fX (x ) = .
0 otherwise

De…ne g (X ) = log X . Then,


Z 1 1
E [g (X )] = E [ log X ] = log xdx = ( x log x + x ) = 1,
0 0

where we use integration by parts. We can also use fY (y ) to calculate E [Y ] directly.


In Example (2.1.4), it was shown that, for the case at hand
y
fY (y ) = e if 0 < y < ∞.

Remember from Example (2.2.2) that for Y fY (y ) = λ 1 e y /λ , where


0 y < ∞ and λ > 0, we have E [Y ] = λ. Notice that this comes down to the same
pdf if we pick λ = 1. Hence, E [Y ] = 1.
Note that the textbook is a bit vague on the set of possible values for X . In Example
(2.1.4) we have 0 < x < 1 while in Example (2.2.7), the textbook actually considers
0 x 1.

(Bilkent) ECON509 This Version: 21 Oct 2013 26 / 110


Moments

Another widely used moment is the variance of a random variable. Obviously, this
moment measures the variation/dispersion/spread of the random variable (around its
expectation).
While the expectation is usually denoted by µ, σ2 is generally used for variance.
Variance is a second-order moment. If available, higher order moments of a random
variable can be calculated, as well.
For example, the third and fourth moments are concerned with how symmetric and
fat-tailed the underlying distribution is. We will talk more about these.

(Bilkent) ECON509 This Version: 21 Oct 2013 27 / 110


Moments

De…nition (2.3.1): For each integer n, the n th moment of X is

µn0 = E [X n ].

The n th central moment of X , µn , is

µn = E [(X µ )n ],

where µ = µ10 = E [X ].

De…nition (2.3.2): The variance of a random variable X is its second central


moment, h i
Var (X ) = E (X µ)2 ,
p
while Var (X ) is known as the standard deviation of X .
Importantly, h i
Var (X ) = E (X µ )2 = E [X 2 ] µ2 .

(Bilkent) ECON509 This Version: 21 Oct 2013 28 / 110


Moments

Variance/risk receives a huge attention in …nance and …nancial econometrics. Some


of the most central topics in these areas are either directly or indirectly related to
variance (and covariance).
Think of a portfolio consisting of many assets. These assets have di¤erent means
and variances. Some assets are very secure (little variance) but bring a lower return.
On the other hand, some of the highest expected returns could be due to highly
risky assets.
Obviously, other things being equal, if two portfolios have the same return, then the
one with lower risk (variance) would be preferred. These issues fall under the focus
of …nancial economics.
Financial econometrics is more concerned with modelling risk itself by looking at
large amounts of daily and/or intra-daily data. The idea is that, we can look at how
volatile an asset was in the past, in order to model its volatility today. This idea and
the resulting GARCH type literature warranted a Nobel prize to Robert F Engle in
2003. See the original paper by Engle (1982, Econometrica).

(Bilkent) ECON509 This Version: 21 Oct 2013 29 / 110


Moments

Let us digress for a moment and brie‡y mention another important concept:
covariance. When it exists, the covariance of two random variables X and Y is
de…ned as
Cov (X , Y ) = E (fX E [X ]g fY E [Y ]g) .
We will talk in more detail about this, when we deal with multivariate random
variables. For the time being, su¢ ce to say that the covariance is a measure of the
co-movement between two random variables.

(Bilkent) ECON509 This Version: 21 Oct 2013 30 / 110


Common Families of Distributions

We will be dealing with moments in more detail. However, this is a good time to
stop and review some of the commonly used distributions.
Usually, one deals with families of distributions. Families of distributions are indexed
by one or more parameters, which allow one to vary certain characteristics of the
distribution while staying within one functional form.
To give an example, one of the most commonly employed distributions, the Normal
distribution has two parameters, the mean and the variance, denoted by µ and σ2 ,
respectively.
Although one might know the value of σ2 for the random variable at hand, the
actual value of µ might be unknown.
In that case, the distribution will be indexed by µ, and the behaviour of the random
variable will change as µ varies.

(Bilkent) ECON509 This Version: 21 Oct 2013 31 / 110


Common Families of Distributions

0.4
sigmasq= 1
sigmasq= 2
0.35 sigmasq= 4

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4

Figure: The normal distribution centred at zero, for varying values of σ2 .


(Bilkent) ECON509 This Version: 21 Oct 2013 32 / 110
Common Families of Distributions
Discrete Distributions

This part will also provide a good opportunity to put some of the abstract concepts
we have learned into action.
We start with discrete distributions.
A random variable X is said to have a discrete distribution if the range of X , the
sample space, is countable.

(Bilkent) ECON509 This Version: 21 Oct 2013 33 / 110


Common Families of Distributions
Discrete Distributions: Discrete Uniform Distribution

A random variable X has a discrete uniform (1, N ) distribution if


1
P (X = x jN ) = , x = 1, 2, ..., N,
N
where N is a speci…ed integer. This distribution puts equal mass on each of the
outcomes 1, 2, ..., N.
Some useful results: Remember that
k k
k (k + 1 ) k (k + 1)(2k + 1)
∑i = 2
and ∑ i2 = 6
. (6)
i =1 i =1

Using (6) we can now calculate the mean and the variance.

(Bilkent) ECON509 This Version: 21 Oct 2013 34 / 110


Common Families of Distributions
Discrete Distributions: Discrete Uniform Distribution

Now,
N N
1 N (N + 1 ) 1 N +1
E [X ] = ∑ xP (X = x jN ) = ∑ xN =
2 N
=
2
,
x =1 x =1
N N
1 (N + 1)(2N + 1)
and E [X 2 ] = ∑ x 2 P (X = x jN ) = ∑ x2 N =
6
.
x =1 x =1

Therefore, since Var (X ) = E [X 2 ] fE [X ]g2 , we have


2
(N + 1)(2N + 1) N +1
Var (X ) =
6 2
(N + 1)(N 1)
= .
12
It is easy to generalise this distribution so that the sample space is any range of
integers, N0 , N0 + 1, ..., N1 , with pdf P (X = x jN0 , N1 ) = 1/(N1 N0 + 1).

(Bilkent) ECON509 This Version: 21 Oct 2013 35 / 110


Common Families of Distributions
Discrete Distributions: Binomial Distribution

This is based on a Bernoulli trial (after James Bernoulli) which is an experiment


with two, and only two, possible outcomes.
A random variable X has a Bernoulli (p ) distribution if
(
1 with probability p
X = , 0 p 1.
0 with probability 1 p

X = 1 is often termed a “success” and p is, accordingly, the probability of success.


Similarly, X = 0 is termed a “failure.”
Now,

E [X ] = 1 p+0 (1 p ) = p,
and Var (X ) = (1 p )2 p + (0 p )2 (1 p ) = p (1 p ).

Examples:
1 Tossing a coin (p=probability of a head, X = 1 if heads)
2 Roulette (X = 1 if red occurs, p=probability of red)
3 Election polls (X = 1 if candidate A gets vote)
4 Incidence of disease (p=probability that a random person gets infected)

(Bilkent) ECON509 This Version: 21 Oct 2013 36 / 110


Common Families of Distributions
Discrete Distributions: Binomial Distribution

We can extend the scope to a collection of many independent trials.


De…ne
Ai = fX = 1 on the i th trialg, i = 1, 2, ..., n.
Assuming that A1 , ..., An are independent events, we can derive the distribution of
the total number of successes in n trials. De…ne Y =“total number of successes in n
trials”.
The event fY = y g means that out of n trials, y resulted as success. Therefore,
n y trials have been unsuccessful.
In other words, exactly y of A1 , ..., An must have occurred.
There are many possible orderings of the events that would lead to this outcome.
Any particular such ordering has probability

p y (1 p )n y
.

Since there are (yn ) such sequences, we have

n y
P (Y = y jn, p ) = p (1 p )n y
, y = 0, 1, 2, ..., n,
y
and Y is called a binomial (n, p ) random variable.
(Bilkent) ECON509 This Version: 21 Oct 2013 37 / 110
Common Families of Distributions
Discrete Distributions: Binomial Distribution

The following theorem, called the “Binomial Theorem” is a useful result, which can
be used to show that ∑ny =0 P (Y = y ) = 1 for the binomial (n, p ) random variable
mentioned above.
Theorem (3.2.2): For any real numbers x and y and integer n 0,
n
n i n
(x + y )n = ∑ i
x y i
.
i =0

Proof: We have (x + y )n = (x + y ) ... (x + y ) . From each factor (x + y ) , we


| {z }
n times
choose either an x or y , and multiply together the n choices. For i = 0 we will
choose no x 0 s; for i = 1 we will choose only one x ; for i = 17 we will choose 17 etc.
These selections will give us x i y n i each time. Now, for each of these particular
values for i , we have (ni ) di¤erent ways of making a selection. Hence, the result
follows.
Now, for x = p and y = 1 p, we get
n n
n i
∑ P (Y = i) = ∑ i
p (1 p )n i
= (p + (1 p ))n = 1.
i =0 i =0

(Bilkent) ECON509 This Version: 21 Oct 2013 38 / 110


Common Families of Distributions
Discrete Distributions: Binomial Distribution

Let’s calculate the mean and variance for the Binomial Distribution.
Example (2.2.3):
n n
n x n x
E [X ] = ∑x x
p (1 p )n x
= ∑x x
p (1 p )n x
,
x =0 x =1

since x (xn )p x (1 p )n x = 0 for x = 0.


We can show that x (xn ) = n (xn 1
1 ).

n n! (n 1 ) ! (n 1 ) ! n 1
x =x =n 1 =n =n .
x x ! (n x ) ! x x ! (n x ) ! (x 1 ) ! (n x ) ! x 1

(Bilkent) ECON509 This Version: 21 Oct 2013 39 / 110


Common Families of Distributions
Discrete Distributions: Binomial Distribution

Then,
n n 1
n 1 x n 1
E [X ] = ∑n x 1
p (1 p )n x
= ∑n y
p y +1 (1 p )n (y +1 )
x =1 y =0
n 1
n 1
= np ∑ y
p y (1 p )n 1 y
= np.
y =0

The second equality above follows by substituting y = x 1. The interval for


summation changes accordingly. The …nal result becomes clear once we observe that
P (Y = y jn 1, p ) = (n y 1 )p y (1 p )n 1 y . Hence, by de…nition,
∑ny =10 P (Y = y jn 1, p ) = 1.
To calculate the variance, we need to calculate E [X 2 ].
Example (2.3.5): Start with
n
n x
E [X 2 ] = ∑ x2 x
p (1 p )n x
, x = 0, 1, ..., n.
x =0

(Bilkent) ECON509 This Version: 21 Oct 2013 40 / 110


Common Families of Distributions
Discrete Distributions: Binomial Distribution

Observe that
n n! (n 1 ) ! n 1
x2 = x2 = xn = xn .
x (n x )!x ! (n x ) ! (x 1 ) ! x 1
Then,
n n
n x n 1 x
∑ x2 x
p (1 p )n x
= ∑ xn x 1
p (1 p )n x
x =0 x =1
n 1
n 1
= n ∑ (y + 1 ) y
p y +1 (1 p )n y 1
y =0
n 1
n 1
= np ∑y y
p y (1 p )n y 1
y =0
n 1
n 1
+np ∑ y
p y (1 p )n y 1
.
y =0

(Bilkent) ECON509 This Version: 21 Oct 2013 41 / 110


Common Families of Distributions
Discrete Distributions: Binomial Distribution

We have to deal with


n 1 n 1
n 1 n 1
np ∑y y
p y (1 p )n y 1
+ np ∑ y
p y (1 p )n y 1
.
y =0 y =0

Think about the …rst sum …rst. This is equal to E [Z ] for Z binomial (n 1, p ).
What about the second sum? Following a similar reasoning as above, it is equal to
one.
Hence,
n 1 n 1
n 1 n 1
E [X 2 ] = np ∑ y p y (1 p )n y 1
+ np ∑ p y (1 p )n y 1
y =0 y y =0 y
| {z } | {z }
(n 1 )p 1

= n (n 1)p 2 + np.

Finally,

Var (X ) = E [X 2 ] µ2 = n (n 1)p 2 + np (np )2 = np 2 + np = np (1 p ).

(Bilkent) ECON509 This Version: 21 Oct 2013 42 / 110


Common Families of Distributions
Discrete Distributions: Binomial Distribution

Example (3.2.3): Suppose we are interested in …nding the probability of obtaining


at least one 6 in four rolls of a fair die. This experiment can be modelled as a
sequence of four Bernoulli trials with success probability P (die shows 6) = p = 1/6.
De…ne the random variable X by

X = total number of 6s in four rolls.

Then X binomial (4, 1/6) and

P (at least one 6) = P (X > 0 ) = 1 P (X = 0 )


0 4
4 1 5
= 1
0 6 6
4
5
= 1
6
= .518.

(Bilkent) ECON509 This Version: 21 Oct 2013 43 / 110


Common Families of Distributions
Discrete Distributions: Binomial Distribution

Example (3.2.3) - cont’d: Now we consider another game; throw a pair of dice 24
times and ask for the probability of at least one double 6. This, again, can be
modelled by the binomial distribution with success probability p, where
1
p = P (roll a double 6) = .
36
So, if Y =number of double 6s in 24 rolls, Y binomial (24, 1/36) and

P (at least one double 6) = P (Y > 0 ) = 1 P (Y = 0 )


0 24
24 1 35
= 1
0 36 36
24
35
= 1 = .491.
36
This is the calculation originally done in the eighteenth century by Pascal at the
request of the gambler de Meré, who thought both events had the same probability.

(Bilkent) ECON509 This Version: 21 Oct 2013 44 / 110


Common Families of Distributions
Discrete Distributions: Poisson Distribution

If we are modelling a phenomenon in which we are waiting for an occurrence (such


as waiting for a bus, waiting for customers to arrive in a bank), the number of
occurrences in a given time interval can sometimes be modelled by the Poisson
distribution.
The basic assumption is as follows: for small time intervals, the probability of an
arrival is proportional to the length of waiting time.
If we are waiting for the bus, the probability that a bus will arrive within the next
hour is higher than the probability that it will arrive within 5 minutes.
Similarly, the more we wait, the more likely it is that a customer will enter the bank.
Other possible applications are distribution of bomb hits in an area or distribution of
…sh in a lake.
The only parameter is λ, also sometimes called the “intensity parameter.”

(Bilkent) ECON509 This Version: 21 Oct 2013 45 / 110


Common Families of Distributions
Discrete Distributions: Poisson Distribution

A random variable X . taking values in the nonnegative integers, has a Poisson (λ)
distribution if
e λ λx
P (X = x j λ ) = , x = 0, 1, ... .
x!
Is ∑x∞=0 P (X = x jλ) = 1? The Taylor series expansion of e y is given by

yi
ey = ∑ i!
.
i =0

Don’t worry about this for the moment, if you are not familiar with this expansion.
We will cover it later in the course.
So,
∞ λ λx ∞
e λx
∑ x!
= e λ
∑ x!
= 1.
x =0 x =0
| {z }

(Bilkent) ECON509 This Version: 21 Oct 2013 46 / 110


Common Families of Distributions
Discrete Distributions: Poisson Distribution

Let’s calculate the mean:


∞ λ λx ∞ ∞
e λx 1 λx 1
E [X ] = ∑x = λe λ
∑x = λe λ

x =0 x! x =1 x! x =1 (x 1)!
∞ y
λ
= λe λ
∑ y!
= λ,
y =0
| {z }

where we substituted y = x 1.

(Bilkent) ECON509 This Version: 21 Oct 2013 47 / 110


Common Families of Distributions
Discrete Distributions: Poisson Distribution

Similarly,
∞ λ λx ∞ x 1 ∞
e λλ λx 1
E [X 2 ] = ∑ x2 x!
=λ ∑ x 2e x!
=λ ∑ xe λ
(x 1 ) !
x =0 x =1 x =1
∞ x 1 ∞ x 1
λ λ
= λ ∑ (x 1 )e λ
(x 1 ) !
+λ ∑ e λ
(x 1 ) !
x =1 x =1
∞ y ∞
λλ λy
= λ ∑ ye y!
+ λe λ
∑ ,
y =0 y =0 y !
| {z }

where ∑y∞=0 ye
y
λλ = E [Y ] for Y Poisson (λ). Therefore,
y!

E [X 2 ] = λ2 + λ.

Thus,
Var (X ) = E [X 2 ] µ2 = λ2 + λ λ2 = λ.

(Bilkent) ECON509 This Version: 21 Oct 2013 48 / 110


Common Families of Distributions
Discrete Distributions: Poisson Distribution

Example (3.2.4): As an example of a waiting-for-occurrence application, consider a


telephone operator who, on average, handles …ve calls every 3 minutes. What is the
probability that there will be no calls in the next minute? At least two calls?
If we let X =number of calls in a minute, then X has a Poisson distribution with
E [X ] = λ = 5/3. So,

P (no calls in the next minute) = P (X = 0 )


e (5/3)0
5/3
= = e 5/3 = .189
0!
and P (at least two calls in the next minute) = P (X 2)
= 1 P (X = 0 ) P (X = 1 )
e 5/3 (5/3)1
= 1 .189
1!
= .496.

(Bilkent) ECON509 This Version: 21 Oct 2013 49 / 110


Common Families of Distributions
Discrete Distributions: Poisson Distribution

Examples of other discrete distributions are geometric distribution, negative binomial


distribution, hypergeometric distribution etc. You can check them out yourselves; we
will not cover them here.

(Bilkent) ECON509 This Version: 21 Oct 2013 50 / 110


Common Families of Distributions
Continuous Distributions

We will now cover some continuous distributions. These are


1 Uniform distribution
2 Gamma distribution
3 χ2 -distribution
4 Exponential distribution
5 Normal distribution
6 t-distribution

(Bilkent) ECON509 This Version: 21 Oct 2013 51 / 110


Common Families of Distributions
Continuous Distributions: Uniform Distribution

The continuous uniform distribution is de…ned by spreading mass uniformly over an


interval [a, b ]. Its pdf is given by
(
1
b a if x 2 [a, b ]
f (x ja, b ) = .
0 otherwise

One can easily show that


Z b
f (x )dx = 1,
a
b+a
E [X ] = ,
2
(b a )2
Var (X ) = .
12
In many cases, when people say Uniform distribution, they implictly mean
(a, b ) = (0, 1).

(Bilkent) ECON509 This Version: 21 Oct 2013 52 / 110


Common Families of Distributions
Continuous Distributions: Gamma Distribution

The gamma family of distributions is a ‡exible family of distributions on [0, ∞).


If α is a positive constant, the gamma function is given by
Z ∞
1 t
Γ(α) = tα e dt < ∞.
0

Importantly, it can be veri…ed using integration by parts that

Γ(α + 1) = αΓ(α), α > 0.

Notice that Z ∞
t
Γ (1 ) = e dt = 1,
0
and so, for any integer n > 0,

Γ (n ) = (n 1)!

Finally, p
Γ(1/2) = π.
These results make life a lot easier when calculating Γ(c ) for some positive c.

(Bilkent) ECON509 This Version: 21 Oct 2013 53 / 110


Common Families of Distributions
Continuous Distributions: Gamma Distribution

The gamma (α, β) family is given by


1 1 x /β
f (x jα, β) = xα e , 0 < x < ∞, α > 0, β > 0.
Γ(α) βα
α is known as the shape parameter, since it most in‡uences the peakedness of the
distribution.
β is called the scale parameter, since most of its in‡uence is on the spread of the
distribution.

(Bilkent) ECON509 This Version: 21 Oct 2013 54 / 110


Common Families of Distributions
Continuous Distributions: Gamma Distribution

0.45
α=1, β=2
α=2, β=2
α=3, β=2
0.4
α=5, β=1
α=9, β=0.5

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 2 4 6 8 10 12 14 16 18 20

Figure: Plots of Gamma distribution for various values of α and β.

(Bilkent) ECON509 This Version: 21 Oct 2013 55 / 110


Common Families of Distributions
Continuous Distributions: Gamma Distribution
Now, Z
1 ∞
E [X ] = xx α 1 e x /β dx .
Γ(α) β 0
α
R∞
This is almost identical to 0 f (x jα + 1, β)dx . Notice that, by de…nition,
Z ∞ Z ∞
1 1 x /β
f (x jα, β)dx = xα e dx = 1.
0 0 Γ(α) βα
Then,
Z ∞
1 x /β
xα e dx = Γ(α) βα ,
0
Z ∞
x αe x /β
dx = Γ ( α + 1 ) β α +1 .
0
So,
Z ∞
1 1 x /β Γ ( α + 1 ) β α +1 αΓ(α) βα+1
E [X ] = xx α e dx = = = αβ.
Γ(α) βα 0 Γ(α) β α
Γ(α) βα
You will be asked in an exercise to show that
Var (X ) = αβ2 .

(Bilkent) ECON509 This Version: 21 Oct 2013 56 / 110


Common Families of Distributions
Continuous Distributions: Chi Squared Distribution

Some interesting distribution functions are special cases of the gamma distribution.
For example, for α = p/2, where p is an integer, and β = 2,
1
f (x ja, β) = f (x jp ) = x (p/2 ) 1
e x /2
, 0 < x < ∞.
Γ(p/2)2p/2

This is known as the pdf of the χ2(p ) distribution, read “chi squared distribution with
p degrees of freedom”.
Obviously, for X χ2(p ) ,

E [X ] = p and Var (X ) = 2p.

This distribution is used heavily in hypothesis testing. You will come back to this
distribution next term.

(Bilkent) ECON509 This Version: 21 Oct 2013 57 / 110


Common Families of Distributions
Continuous Distributions: Exponential Distribution

Now consider, α = 1 :
1 1
f (x ja, β) = f (x j1, β) = x 0e x /β
= e x /β
, 0 < x < ∞.
Γ (1 ) β1 β

Again, using our previous results, for X exponential ( β) we have

E [X ] = β and Var (X ) = β2

A peculiar feature of this distribution is that it has no memory.

(Bilkent) ECON509 This Version: 21 Oct 2013 58 / 110


Common Families of Distributions
Continuous Distributions: Exponential Distribution

If X exponential ( β), then, for s > t 0,

P (X > s, X > t ) P (X > s )


P (X > s jX > t ) = =
P (X > t ) P (X > t )
R ∞ 1 x /β
s βe dx e /β
s
= R ∞ 1 x /β =
t βe dx e t /β

= e (s t )/β
= P (X > s t ).

This is because,
Z ∞ ∞
1 x /β x /β (s t )/β
e dx = e =e .
s t β x =s t

What does this mean? When calculating P (X > s jX > t ), what matters is not
whether X has passed a threshold or not. What matters is the distance between the
threshold and the value to be reached.
If Mr X has been …red twice, what is the probability that he will be …red three
times? It is not di¤erent from the probability that a person, who has never been
…red, is …red. History does not matter.

(Bilkent) ECON509 This Version: 21 Oct 2013 59 / 110


Common Families of Distributions
Continuous Distributions: Exponential Distribution

Example 3.3.1: Let X gamma (α, β) where α is an integer and let


Y Poisson (x /β). Then, for any x ,

P (X x ) = P (Y α ). (7)

Let’s sketch the proof of this result. Remember that, if α is an integer. Then,

Γ (α) = (α 1)!.

Hence,
Z x
1 1 t /β
P (X x) = tα e dt
(α 1)!βα 0
Z x
1 1 t /β x 2 t /β
= tα βe jt =0 + (α 1 )t α βe dt ,
(α 1)!βα 0

where we use integration by parts with


1 2
u = tα , du = (α 1 )t α ,
t /β t /β
v = βe , dv = e dt.

(Bilkent) ECON509 This Version: 21 Oct 2013 60 / 110


Common Families of Distributions
Continuous Distributions: Exponential Distribution

Now,
Z x
1 1 t /β x 2 t /β
P (X x) = tα βe jt =0 + (α 1 )t α βe dt
(α 1)!βα 0
Z x
(x /β)α 1 e x /β 1 2 t /β
= + 1
tα e dt
(α 1)! (α 2)!βα 0
Z x
1 2 t /β
= P (Y = α 1) + 1
tα e dt,
(α 2)!βα 0

where Y Poisson (x /β). If we keep doing the same operation, we will eventually
obtain (7).

(Bilkent) ECON509 This Version: 21 Oct 2013 61 / 110


Common Families of Distributions
Continuous Distributions: Normal Distribution

The normal distribution or the Gaussian distribution is the one distribution which
you should know by heart!
Why is this distribution so popular?
1 Analytical tractability.
2 Bell shaped or symmetric
3 It is central to the Central Limit Theorem; this type of results guarantee that, under
(mild) conditions, the normal distribution can be used to approximate a large variety of
distributions in large samples.
The distribution has two parameters: mean and variance, denoted by µ and σ2 ,
respectively.
The pdf is given by,

1 1 (x µ )2
f (x jµ, σ2 ) = p exp .
σ 2π 2 σ2
You should memorise this!!!

(Bilkent) ECON509 This Version: 21 Oct 2013 62 / 110


Common Families of Distributions
Continuous Distributions: Normal Distribution

This distribution is usually denoted as N (µ, σ2 ).


A very useful result is that for X N (µ, σ2 ),
X µ
Z = N (0, 1).
σ
N (0, 1) is known as the standard normal distribution.
To see this, consider the following:
X µ
P (Z z) = P z
σ
= P (X z σ + µ)
Z z σ+µ
1 2 2
= p e (x µ) /(2σ ) dx
σ 2π ∞
Z z
1 2
= p e t /2 dt,
2π ∞

where we substitute t = (x µ)/σ. Notice that this implies that dt/dx = 1/σ.
This shows that P (Z z ) is the standard normal cdf.

(Bilkent) ECON509 This Version: 21 Oct 2013 63 / 110


Common Families of Distributions
Continuous Distributions: Normal Distribution

Then, we can do all calculations for the standard normal variable and then convert
these results for whatever normal random variable we have in mind.
Consider, for Z N (0, 1), the following:
Z ∞ ∞
1 z 2 /2 1 z 2 /2
E [Z ] = p ze dz = p e = 0.
2π ∞ 2π

Then, to …nd E [X ] for X N (µ, σ2 ), we can use X = µ + Z σ :

E [X ] = E [µ + Z σ ] = µ + σE [Z ] = µ + σ 0 = µ.

Similarly,
Var (X ) = Var (µ + Z σ ) = σ2 Var (Z ) = σ2 .
What about Z ∞
1 z 2 /2 ?
p e dz = 1.
2π ∞

(Bilkent) ECON509 This Version: 21 Oct 2013 64 / 110


Common Families of Distributions
Continuous Distributions: Normal Distribution

Remember …rst that


p Z ∞
1 1/2 w
π=Γ = w e dw
2 0

Moreover, the integrand is symmetric about 0. So, …rst observe that for w = z 2 /2
we have dw = zdz. Then, using this substitution we obtain
Z ∞ Z ∞ Z ∞
1 z 2 /2 1 1 w 1 1 w 1
p e dz = p p e dw = p p e dw = .
2π 0 2π 0 2w 2 π 0 w 2
| {z }
Γ( 12 )

Hence, Z ∞
1 z 2 /2
p e dz = 1.
2π ∞

(Bilkent) ECON509 This Version: 21 Oct 2013 65 / 110


Common Families of Distributions
Continuous Distributions: Normal Distribution

An important characteristic of the normal distribution is that the shape and location
of the distribution are determined completely by its two parameters.
It can be shown easily that the normal pdf has its maximum at x = µ.
The probability content within 1, 2, or 3 standard deviations of the mean is

P (jX µj σ ) = P (jZ j 1) = .6826,


P (jX µj 2σ ) = P (jZ j 2) = .9544,
P (jX µj 3σ ) = P (jZ j 3) = .9974,

where X N (µ, σ2 ) and Z N (0, 1).

(Bilkent) ECON509 This Version: 21 Oct 2013 66 / 110


Common Families of Distributions
Continuous Distributions: Normal Distribution

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4

Figure: The standard normal distribution.

(Bilkent) ECON509 This Version: 21 Oct 2013 67 / 110


Common Families of Distributions
Continuous Distributions: Normal Distribution

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4

Figure: The standard normal distribution.

(Bilkent) ECON509 This Version: 21 Oct 2013 68 / 110


Common Families of Distributions
Continuous Distributions: Lognormal Distribution

Let X be a random variable such that

log X N (µ, σ2 ).

Then, X is said to have a lognormal distribution.


By using a transformation argument (Theorem (2.1.5)), the pdf of X is given by,

1 1 (log x µ)2
f (x jµ, σ2 ) = p exp ,
2πσ2 x 2σ2
where 0 < x < ∞, ∞ < µ < ∞, and σ > 0.
How? Take W = log X . We start from the distribution of W and want to …nd the
distribution of X = exp W . Then, g (W ) = exp(W ) and g 1 (X ) = log (X ). The
rest follows by using Theorem (2.1.5).

(Bilkent) ECON509 This Version: 21 Oct 2013 69 / 110


Common Families of Distributions
Continuous Distributions: Lognormal Distribution

We will prove later that

σ2 h i
E [X ] = exp µ + and Var (X ) = exp 2(µ + σ2 ) exp 2µ + σ2 .
2
Why use this distribution? It is similar in appearance to the Gamma distribution: it
is skewed to the right. Convenient for some variables which are skewed to the right,
such as income.
But why not use Gamma instead? Lognormal is based on the normal distribution so
it allows one to use normal-theory statistics, which is technically more convenient.

(Bilkent) ECON509 This Version: 21 Oct 2013 70 / 110


Common Families of Distributions
Continuous Distributions: t-Distribution

This distribution is also known as the Student’s t distribution.


It is generated by a ratio of two random variables. Suppose Z N (0, 1), X χ2v
and that X and Z are independent. Then,
Z
tv = p ,
X /v
is a random variable with a t-distribution with v degrees of freedom.
This distribution is also symmetric about 0.
An important feature of Student’s t random variables is that

E [jtv jr ] < ∞ i¤ v > r.

When v = 1, the distribution is called the Cauchy distribution. Note that in this
case, even the mean does not exist.

(Bilkent) ECON509 This Version: 21 Oct 2013 71 / 110


Common Families of Distributions
Location and Scale Families

The use of the term “family” will be a bit less intuitive in this part.
We will consider three types of families under this heading: location families, scale
families and location-scale families.
Let’s start with the following Theorem.
Theorem (3.5.1): Let f (x ) be any pdf and let µ and σ > 0 be any given constants.
Then the function
1 x µ
g (x jµ, σ ) = f
σ σ
is a pdf.

(Bilkent) ECON509 This Version: 21 Oct 2013 72 / 110


Common Families of Distributions
Location and Scale Families

Proof: What we need to check is that


1 x µ
f , (8)
σ σ
as a function of x is a pdf for every value of µ and σ. In other words, we need to
ensure that (8)
1 is nonnegative,
2 integrates to one.
x µ
Observe that f ( ) is a pdf, so f σ 0 for all values of x , µ and σ2 .
1 x µ
Consequently σf σ 0.
Moreover, Z ∞ Z ∞
1 x µ
f dx = f (y ) dy = 1,
∞ σ σ ∞
x µ 1 dx .
by substituting y = σ which implies dy = σ The second inequality follows
from the fact that f ( ) is a pdf.

(Bilkent) ECON509 This Version: 21 Oct 2013 73 / 110


Common Families of Distributions
Location and Scale Families

De…nition (3.5.2): Let f (x ) be any pdf. Then the family of pdfs f (x µ), indexed
by the parameter µ, ∞ < µ < ∞, is called the location family with standard pdf
f (x ) and µ is called the location parameter for the family.
This simply changes the location of the distribution without changing any other
properties of it.

0.4
f(x )
f(x - µ), µ=3
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -2 0 2 4 6 8

(Bilkent) ECON509 This Version: 21 Oct 2013 74 / 110


Common Families of Distributions
Location and Scale Families

In general, at x = µ + a, we have f (x µ) = f (a ). Therefore, the transformation


means that all points have been shifted by µ units.
So, the distributions that belong to a location family are identical except that their
locations vary according to the particular value of µ.
In the previous …gure, µ = 3. What this does is simply shifting the distribution by
three units.
If X has pdf f (x µ), then, for example,

P( 1 X 2jµ = 0) = P (µ 1 X µ + 2jµ = 3)
= P (2 X 5jµ = 3)

(Bilkent) ECON509 This Version: 21 Oct 2013 75 / 110


Common Families of Distributions
Location and Scale Families

Some of the families of continuous distributions discussed here have location families.
Consider the normal distribution for some speci…ed σ > 0.
1 x
f (x j σ 2 ) = p exp .
2πσ2 2σ2
By replacing x with x µ, we will obtain normal distribution with di¤erent means
for each particular value of µ. Hence, the location family with a standard f (x ) is the
set of normal distributions with unknown mean µ and known variance σ2 .
De…nition (3.5.2) simply says that we can start with any pdf f (x ) and introduce a
location parameter µ which will generate a family of pdfs.

(Bilkent) ECON509 This Version: 21 Oct 2013 76 / 110


Common Families of Distributions
Location and Scale Families

Now, consider X such that its pdf is given by f (x µ ).


Then, one can represent X as X = Z + µ where Z is a random variable with the pdf
f (z ). We will formalise this idea in a moment.
However, for the time being, it would be revealing to think about situations where
considering a framework of location families would be useful.
Suppose we want to measure some physical constant µ, but our measurement is
subject to a measurement error, Z . Then, what we observe is X = Z + µ. For some
reason, we might have a good idea about the distribution of this measurement error,
with the pdf f (z ). Then, the pdf of the observed value is given by f (x µ).
Now, let Z be the reaction time of any given driver, with the known pdf f (z ). Let µ
be the treatment e¤ect for, say, three glasses of wine; obviously, alcohol
consumption a¤ects the reaction time. Then, after the treatment, the reaction time
will be X = Z + µ. Consequently, the family of possible distributions for X will be
given by f (x µ).

(Bilkent) ECON509 This Version: 21 Oct 2013 77 / 110


Common Families of Distributions
Location and Scale Families

We now introduce scale families.


De…nition (3.5.4): Let f (x ) be any pdf. Then for any σ > 0, the family of pdfs
(1/σ)f (x /σ), indexed by the parameter σ, is called the scale family with standard
pdf f (x ) and σ is called the scale parameter of the family.
This usually serves the purpose of either stretching (σ > 0) or contracting (σ < 0)
the distribution while still keeping the same shape.
Most often, this is used when either f (x ) is symmetric about 0 or positive only for
x > 0. Correspondingly, stretching will either be symmetric around zero or in the
positive direction, respectively.
Examples of scale families are the normal family with µ = 0 and σ the scale
parameter, the gamma family if α is a …xed value and β is the scale parameter and
the exponential family.
In any case, the standard pdf is obtained by setting σ = 1 which yields
(1/1)f (x /1) = f (x ).

(Bilkent) ECON509 This Version: 21 Oct 2013 78 / 110


Common Families of Distributions
Location and Scale Families

0.4
s igmas q=1
s igmas q=2
0.35 s igmas q=4

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4

Figure: The normal distribution centred at zero, for varying values of σ2 .

(Bilkent) ECON509 This Version: 21 Oct 2013 79 / 110


Common Families of Distributions
Location and Scale Families

De…nition (3.5.5): Let f (x ) be any pdf. Then for any µ, ∞ < µ < ∞, and, any
σ > 0, the family of pdfs
1 x µ
f ,
σ σ
indexed by the parameter (µ, σ ) , is called the location-scale family with standard
pdf f (x ); µ is called the location parameter and σ is called the scale parameter.
From the previous discussion, it is obvious that the scale parameters are used to
stretch/contract the distribution while the location parameter shifts the distribution.
The normal family is an example of location-scale families.

(Bilkent) ECON509 This Version: 21 Oct 2013 80 / 110


Common Families of Distributions
Location and Scale Families

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-6 -4 -2 0 2 4 6

Figure: An example of a location-scale family: the normal distribution for varying values of µ and
σ.

(Bilkent) ECON509 This Version: 21 Oct 2013 81 / 110


Common Families of Distributions
Location and Scale Families

Theorem (3.5.6): Let f ( ) be any pdf. Let µ be any real number, and let σ be any
positive real number. Then X is a random variable with pdf
1 x µ
f
σ σ

if and only if there exists a random variable Z with pdf f (z ) and X = σZ + µ.


Proof: Let g (z ) = σz + µ. Then X = g (Z ), g is a monotone function,
g 1 (x ) = (x µ)/σ, and (d /dx )g 1 (x ) = 1/σ. Thus, by Theorem (2.1.5), the
pdf of X is
d x µ 1
fX (x ) = fZ (g 1 (x )) g 1 (x ) = f .
dx σ σ
This proves the “if” part.

(Bilkent) ECON509 This Version: 21 Oct 2013 82 / 110


Common Families of Distributions
Location and Scale Families

Proof (cont’d): Now de…ne g (x ) = (x µ)/σ and let Z = g (X ). Theorem (2.1.5)


again applies: g 1 (z ) = σz + µ, (d /dz )g 1 (z ) = σ, and the pdf of Z is

1 d 1 1 σz + µ µ
fZ (z ) = fX (g (z )) g (z ) = f σ = f (z ).
dz σ σ
Also,
X µ
σZ + µ = σg (X ) + µ = σ + µ = X.
σ
This proves the “only if” part.

(Bilkent) ECON509 This Version: 21 Oct 2013 83 / 110


Common Families of Distributions
Location and Scale Families

Now, let’s consider Z = (X µ) /σ again, for a second. According to Theorem


(3.5.6), the pdf for Z is given by

fZ (z ) = f (z ),

1 x µ
which is the same as σ f σ for µ = 0 and σ = 1.
Therefore, the distribution of Z is that member of the location-scale family
corresponding to µ = 0 and σ = 1. For the normal family, remember that we have
already showed that for Z de…ned as above, Z is a normally distributed random
variable with µ = 0 and σ = 1.

(Bilkent) ECON509 This Version: 21 Oct 2013 84 / 110


Common Families of Distributions
Location and Scale Families

Note that, probabilities for any member of a location-scale family may be computed
in terms of the standard variable Z .
This is because,
X µ x µ x µ
P (X x) = P =P Z .
σ σ σ
x µ
Consider the normal family. If we know P Z σ for all values of x , µ and σ
where Z is distributed with the standard normal distribution, then we can calculate
P (X x ) for all values of x where X is a normally distributed random variable with
some mean µ and variance σ2 .
There is another special family of densities known as the exponential family, used
very much in statistics and to some extent in econometrics, thanks to its convenient
properties. For the time being we do not cover this topic, but will return to it if time
permits.

(Bilkent) ECON509 This Version: 21 Oct 2013 85 / 110


Moments and Moment Generating Functions

Now, let’s get back on track.


So far we have spoken mainly about the …rst two orders of moments.
We now introduce a new function that is associated with a probability distribution,
the moment generating function.
This function can be used to obtain moments of a random variable.
In practice, it is much easier in many cases to calculate moments directly than to
use the moment generating function. However, the main use of the mgf is not to
generate moments, but to help in characterising a distribution.

(Bilkent) ECON509 This Version: 21 Oct 2013 86 / 110


Moments and Moment Generating Functions

De…nition (2.3.6): Let X be a random variable with cdf FX . The moment


generating function (mgf) of X (or FX ), denoted by MX (t ), is

MX (t ) = E [e tX ],

provided that the expectation exists for t in some neighbourhood of 0. That is, there
is an h > 0 such that, for all t in h < t < h, E [e tX ] exists. If the expectation does
not exist in a neighbourhood of 0, we say that the mgf does not exist.
We can write the mgf of X as
Z ∞
M X (t ) = e tx fX (x )dx if X is continuous,

M X (t ) = ∑e tx
P (X = x ) if X is discrete.
x

But why is this called a moment generating function?

(Bilkent) ECON509 This Version: 21 Oct 2013 87 / 110


Moments and Moment Generating Functions

Theorem (2.3.7): If X has mgf MX (t ), then


(n )
E [X n ] = M X (0 ),

where we de…ne
(n ) dn
M X (0 ) = M (t ) .
dt n X t =0

That is, the n th moment is equal to the n th derivative of MX (t ) evaluated at t = 0.


Proof: Assuming that we can di¤erentiate under the integral sign,
Z ∞ Z ∞
d d d tx
M (t ) = e tx fX (x )dx = e fX (x )dx
dt X dt ∞ ∞ dt
Z ∞
= (xe tx )fX (x )dx = E [Xe tX ].

Hence,
d
M (t ) = E [Xe tX ]jt =0 = E [X ].
dt X t =0

(Bilkent) ECON509 This Version: 21 Oct 2013 88 / 110


Moments and Moment Generating Functions

Similarly,
Z ∞ Z ∞
d2 d2 d 2 tx
M (t ) = e tx fX (x )dx = e fX (x )dx
dt 2 X dt 2 ∞ ∞ dt 2
Z ∞
= (x 2 e tx )fX (x )dx = E [X 2 e tX ],

and
d2
M (t ) = E [X 2 e tX ]jt =0 = E [X 2 ].
dt 2 X t =0
Proceeding in the same manner, it can be shown that
dn
M (t ) = E [X n e tX ]jt =0 = E [X n ].
dt n X t =0

(Bilkent) ECON509 This Version: 21 Oct 2013 89 / 110


Moments and Moment Generating Functions

There is an alternative, perhaps less formal, proof which we will also go through.
But …rst we have to informally introduce a useful tool.
De…nition: If a function g (x ) has derivatives of order r , that is,
dr
g (r ) (x ) = g (x )
dx r
exists, then for any constant a, the Taylor expansion of order r about a is
r
g (i ) (a )
g (x ) = ∑ i!
(x a )i + R ,
i =0

where R is some (hopefully negligible) remainder term.


For example, the second order expansion is given by
1
g (x ) = g (a ) + g 0 (a )(x a ) + g 00 (a )(x a )2 + R .
2
A more formal discussion of this useful tool will be provided later in the course.

(Bilkent) ECON509 This Version: 21 Oct 2013 90 / 110


Moments and Moment Generating Functions
Now, let’s get back to the moment generating function.
Informal Proof (Gallant (1997), p.105):
e tX te tX
e tX = (X 0 )0 + (X 0 )1
0! X =0 1! X =0
t 2 e tX t 3 e tX
+ (X 0 )2 + (X 0)3 + ...
2! X =0 3! X =0
e t0 te t0 t 2 e t0 t 3 e t0
= (X 0 )0 + (X 0 )1 + (X 0 )2 + (X 0)3 + ...
0! 1! 2! 3!
1 1
= 1 + tX + t 2 X 2 + t 3 X 3 + ...,
2! 3!
which we obtain by the Taylor expansion of e tX about X = 0.
Then,
dj d j tX
M X (t ) = E e (9)
dt j t =0 dt j t =0
1 2 j +2
= E X j + tX j +1 + t X + ... (10)
2! t =0

= E [X j ].

(Bilkent) ECON509 This Version: 21 Oct 2013 91 / 110


Moments and Moment Generating Functions

A Word of Caution: Notice that


dj dj
j
M X (t ) = j E [e tX ]
dt t =0 dt t =0

So, we have changed the order of expectation (integration) and di¤erentiation in


(9). If we had not done it, then we would not have obtained the convenient
expression of (10).

(Bilkent) ECON509 This Version: 21 Oct 2013 92 / 110


Moments and Moment Generating Functions
Gamma mgf

We will now talk about the moment generating functions of some common
distributions. But …rst introduce the concept of kernel.
De…nition: The kernel of a function is the main part of the function, the part that
remains when constants are disregarded.
Example (2.3.8): Remember that the gamma pdf is
1 1 x /β
f (x ) = xα e , 0 < x < ∞, α > 0, β > 0.
Γ(α) βα

So its kernel is given by x α 1 e x /β .

Now,
∞ Z
1
M X (t ) = e tx x α 1 e x /β
dx
Γ(α) βα 0
Z ∞
1 1
= x α 1 exp t x dx
Γ(α) βα 0 β
Z ∞
" #
1
1 1 β
= xα exp x dx . (11)
Γ(α) βα 0 1 βt

(Bilkent) ECON509 This Version: 21 Oct 2013 93 / 110


Moments and Moment Generating Functions
Gamma mgf
1
Notice that, x α 1 exp x
β
is the kernel of another gamma pdf (to see
1 βt

this, substitute b = β(1 βt ) 1 ).

In addition, we know that for any a > 0, b > 0,


Z ∞
1
xa 1
e x /b
dx = 1
0 Γ (a )b a
(can you see why?). Therefore,
Z ∞
xa 1
e x /b
dx = Γ(a )b a .
0

Hence, picking a = α and b = β(1 βt ) 1

Γ (a )b a Γ(α) β α
1 α
1
M X (t ) = = = if t < .
Γ(α) βα Γ(α) βα 1 βt 1 βt β
1 1
If t β , then β(1 βt ) < 0 and the integral in (11) is in…nite.
Then,
d αβ
E [X ] =
M (t ) = = αβ,
dt X t =0 (1 βt )α+1 t =0
as we have shown previously.
(Bilkent) ECON509 This Version: 21 Oct 2013 94 / 110
Moments and Moment Generating Functions
Binomial mgf

Example (2.3.9): Remember that the binomial (n, p ) pmf is given by

n x
fX (x ) = p (1 p )n x
.
x
Then,
n n
n x n
M X (t ) = ∑ e tx x
p (1 p )n x
= ∑ x
(pe t )x (1 p )n x
.
x =0 x =0

Remember the binomial formula from Theorem (3.2.2):


n
n x n
(u + v )n = ∑ x
u v x
.
i =0

Then, for u = pe t and v = 1 p, we have

MX (t ) = [pe t + (1 p )]n .

(Bilkent) ECON509 This Version: 21 Oct 2013 95 / 110


Moments and Moment Generating Functions
Normal mgf

Now consider the pdf for X N (µ, σ2 )


" #
2
1 1 x µ
f (x ) = p exp , ∞ < x < ∞.
2πσ2 2 σ

The mgf is given by


Z ∞
1 µ)2 /2σ2
MX (t ) = E [e Xt ] = e tx p e (x dx
∞ 2πσ2
Z ∞
1 2
2µx +µ2 2σ2 tx )/2σ2
= p e (x dx .
∞ 2πσ2
Observe that one can complete the square as follows.

x2 2µx + µ2 2σ2 tx = x2 2 ( µ + σ 2 t )x ( µ + σ 2 t )2 + µ2
h i2
= x ( µ + σ2 t ) ( µ + σ 2 t )2 + µ2
h i2 h i
= x ( µ + σ2 t ) 2µσ2 t + (σ2 t )2 .

(Bilkent) ECON509 This Version: 21 Oct 2013 96 / 110


Moments and Moment Generating Functions
Normal mgf

Therefore,
Z ∞ n 2
o
1 [x ( µ + σ2 t ) ] [2µσ2 t +(σ2 t )2 ] /2σ2
M X (t ) = p e dx
∞ 2πσ2
Z ∞
2 2 2 2 1 2 2
/2σ2
= e [2µσ t +(σ t ) ]/2σ p e [x ( µ + σ t ) ] dx .
2πσ2 ∞

Notice that
1 2 2
/2σ2
g (x ) = p e [x ( µ + σ t ) ]
2πσ2
can be considered as the pdf of a random variable Y N (a, b ) where a = µ + σ2 t
and b = σ2 .

(Bilkent) ECON509 This Version: 21 Oct 2013 97 / 110


Moments and Moment Generating Functions
Normal mgf

Then
2 2 2 2 σ2 t 2
MX (t ) = e [2µσ t +(σ t ) ]/2σ = exp µt + .
2
Clearly,

d σ2 t 2
E [X ] = M (t ) = µ + σ2 t exp µt + = µ,
dt X t =0 2 t =0
d2 σ2 t 2
E [X 2 ] = 2
M X (t ) = σ2 exp µt +
dt t =0 2 t =0
2
2 σ2 t 2
+ µ + σ2 t exp µt +
2 t =0
2 2
= σ +µ ,
Var (X ) = E [X 2 ] fE [X ]g2 = σ2 + µ2 µ2 = σ 2 .

(Bilkent) ECON509 This Version: 21 Oct 2013 98 / 110


Moments and Moment Generating Functions

The major usefulness of mgf is not its ability to generate moments.


The more important feature is that, in many cases, the moment generating function
can characterise a distribution.
However, there are some technical di¢ culties associated with this feature.
Now, if the mgf exists, it characterises an in…nite set of moments. Does
characterising the in…nite set of moments uniquely determine a distribution function?
Unfortunately, no.
There may actually be two distinct random variables having the same moments.

(Bilkent) ECON509 This Version: 21 Oct 2013 99 / 110


Moments and Moment Generating Functions

Example (2.3.10): Consider


1 h i
f1 (x ) = p exp (log x )2 /2 , 0 x < ∞,
2πx 2
f2 (x ) = f1 (x )[1 + sin(2π log x )], 0 x < ∞.

Now, for X1 f1 (x ),
2
E [X1r ] = e r /2
, r = 0, 1, ...,
so X1 has all of its moments.
Now, take X2 f2 (x ). Then,
Z ∞
E [X2r ] = x r f1 (x )[1 + sin(2π log x )]dx
0
Z ∞
= E [X1r ] + x r f1 (x )[sin(2π log x )]dx .
0

It can be shown that the integral is actually equal to zero for r = 0, 1, ... .
Hence, even though X1 and X2 have distinct pdfs, they have the same moments for
all r .

(Bilkent) ECON509 This Version: 21 Oct 2013 100 / 110


Moments and Moment Generating Functions

Is this the end of the story?


If random variables have bounded support, then the problem of uniqueness of
moments does not occur. In this case, the in…nite sequence of moments does
uniquely determine the distribution.
In addition, if the mgf exists in a neighbourhood of 0, then the distribution is
uniquely determined, no matter what its support.
Theorem (2.3.11): Let FX (x ) and FY (y ) be two cdfs all of whose moments exist.
1 If X and Y have bounded support, then F X (u ) = F Y (u ) for all u if and only if
E [X r ] = E [Y r ] for all integers r = 0, 1, 2, ... .
2 If the moment generating functions exist and M X (t ) = M Y (t ) for all t in some
neighbourhood of 0, then F X (u ) = F Y (u ) for all u.
Notice that in (1), we only say that the moments of all orders exist. This does not
necessarily mean that the mgf exists, as well.
Existence of all moments is not equivalent to existence of the moment generating
function!

(Bilkent) ECON509 This Version: 21 Oct 2013 101 / 110


Moments and Moment Generating Functions

Now consider the following Theorem.


Theorem (2.3.12): Suppose fXi , i = 1, 2, ...g is a sequence of random variables,
each with mgf MX i (t ). Furthermore, suppose that

lim MX i (t ) = MX (t ), for all t in a neighbourhood of 0,


i !∞

and MX (t ) is an mgf. Then, there is a unique cdf FX whose moments are


determined by MX (t ) and, for all x where FX (x ) is continuous we have

lim FX i (x ) = FX (x ).
i !∞

That is, convergence, for jt j < h, of mgfs to an mgf implies convergence of cdfs.

(Bilkent) ECON509 This Version: 21 Oct 2013 102 / 110


Moments and Moment Generating Functions

The following discussion is an aside.


The possible nonuniqueness of the moment sequence is a nuisance. Even if we can
show that a sequence of moments converges, we will not be able to conclude
formally that the variables converge.
To do so, we would have to verify the uniqueness of the moment sequence - and this
is a very di¢ cult job. For X FX , one would have to show that
T
1
lim
T !∞
∑ fE [X 2r ]g1/2r
= +∞.
r =1

However, if the sequence of mgfs converges in a neighbourhood of 0, then the


random variables converge.
Thus, convergence of mgfs is su¢ cient, but not necessary, for the sequence of
random variables to converge.

(Bilkent) ECON509 This Version: 21 Oct 2013 103 / 110


Moments and Moment Generating Functions

Example (2.3.13): In many elementary textbooks, it is mentioned that one can


approximate the binomial probabilities by Poisson probabilities, under certain
assumptions on the parameters.
Let X binomial (n, p ) and Y Poisson (λ). Let also λ = np. Then, the rule of
thumb for this approximation to work is that n is large and np is small.
Let’s see what the moment generating functions say to this. We have already shown
that
MX (t ) = [pe t + (1 p )]n .
You will also show in a homework that
t
M Y (t ) = e λ (e 1)
.

(Bilkent) ECON509 This Version: 21 Oct 2013 104 / 110


Moments and Moment Generating Functions

The following Lemma will be useful in obtaining the desired result.


Lemma (2.3.14): Let a1 , a2 , ... be a sequence of numbers converging to a, that is,
limn !∞ an = a. Then,
an n
lim 1 + = ea.
n !∞ n
Now, de…ne
n n
1 t 1
MX n (t ) = [pe t + (1 p )]n = 1 + (e 1)np = 1 + (e t 1)λ ,
n n

and letting an = a = λ(e t 1 ),

1 t n h an in
lim MX n (t ) = lim 1+ (e 1)λ = lim 1 +
n !∞ n !∞ n n !∞ n
= exp λ(e t 1 ) = M Y (t ).

(Bilkent) ECON509 This Version: 21 Oct 2013 105 / 110


Moments and Moment Generating Functions

We …nish the discussion of moment generating functions with the following Theorem.
Theorem (2.3.15): For any constants a and b, the mgf of the random variable
aX + b is given by
MaX +b (t ) = e bt MX (at ).
Proof: This is pretty easy to show;
h i
MaX +b (t ) = E e (aX +b )t
h i
= E e (aX )t e bt
h i
= e bt E e (at )X

= e bt MX (at ),

where the last line follows from the de…nition of the moment generating function.

(Bilkent) ECON509 This Version: 21 Oct 2013 106 / 110


Moments and Moment Generating Functions
Other “Generating Functions”

It is now more or less clear that using moment generating functions is not the most
straightforward way to determine the distribution function. However, there are other
“generating functions” we can consider.
De…nition: The characteristic function of X is de…ned by

φX (t ) = E [e itX ] = E [cos tX ] + iE [sin tX ],


p
where i is the complex number 1.
The main problem about this function is that it is much more complicated and it
involves integration with complex numbers.
However, there are many advantages:
1 When the moments of F X exist, φX can be used to generate them.
2 The characteristic function always exists (since both the sine and cosine functions are
bounded by one).
3 The characteristic function completely determines the distribution. In other words,
there a unique characteristic function for every distribution function.

(Bilkent) ECON509 This Version: 21 Oct 2013 107 / 110


Moments and Moment Generating Functions
Other “Generating Functions”

Theorem 11.5, Severini (2005): Suppose Xk , k = 1, 2, ..., is a sequence of random


variables, each with characteristic function φX k (t ) and let φX (t ) denote the
characteristic function of X . Then,

lim φX k (t ) = φX (t ), for all ∞ < t < ∞,


k !∞

if and only if
lim FX k (x ) = FX (x ),
k !∞
at every x where FX (x ) is continuous. Here, FX k (x ) and FX (x ) are the cdfs for Xk
and X , respectively.

(Bilkent) ECON509 This Version: 21 Oct 2013 108 / 110


Moments and Moment Generating Functions
Other “Generating Functions”

Another generating function, closely related to the moment generating function, is


the cumulant generating function.
De…nition: For a random variable X , the cumulant generating function is

KX (t ) = log E [e tX ] = log MX (t ).

When the moment generating function exists in a neighbourhood of 0, then KX (t )


has the following Taylor expansion:

1
K X (t ) = ∑ n!
κ n (X )t n .
n =1

Here, κ n (X ) is called the n th cumulant of X .

(Bilkent) ECON509 This Version: 21 Oct 2013 109 / 110


Moments and Moment Generating Functions
Other “Generating Functions”

The cumulant generating function is, obviously, closely related to the moment
generating function. In fact, κ n (X ) can be expressed in terms of moments (and
vice-versa). For example,

E [X ] = κ 1 (X ),
E [X 2 ] = κ 2 (X ) + [κ 1 (X )]2 ,
E [X 3 ] = κ 3 (X ) + 3κ 2 (X )κ 1 (X ) + [κ 1 (X )]3 ,
etc... .

Then, for example

Var (X ) = E [X 2 ] fE [X ]g2 = κ 2 (X ) + [κ 1 (X )]2 [κ 1 (X )]2 = κ 2 (X ).

So the the expected value and the variance correspond to the …rst two cumulants!
Depending on the question at hand, it might be more convenient to work with
cumulant or moment generating functions. In any case, information on the
cumulants can give us information on the moments (and vice versa).
A detailed reference for cumulant generating functions is “Tensor Methods in
Statistics” by McCullagh (1987).

(Bilkent) ECON509 This Version: 21 Oct 2013 110 / 110

Vous aimerez peut-être aussi