Académique Documents
Professionnel Documents
Culture Documents
Lecture Notes 3
Renato Feres
1
The cumulative distribution function of X is defined (both for continuous and
discrete random variables) as:
FX (x) = P (X x), for all x.
In terms of the probability density function, Fx (x) takes the form
Z x
FX (x) = P (X x) = fX (z)dz.
1.2 Expectation
The most basic parameter associated to a random variable is its expected value
or mean. Fix a probability space (S, F, P ) and let X : S R be a random
variable.
Definition 1.1 (Expectation) The expectation or mean value of the random
variable X is defined as
P
i=1 xi P (X = xi ) if X is discrete
E[X] =
R
xfX (x)dx if X is continuous.
2
Example 1.1 (A game of dice) A game consists in tossing a die and receiv-
ing a payoff X equal to $n for n pips. It is natural to define the fair price to
play one round of the game as being the expected value of X. If you could play
the game for less than E[X], you would make a sure profit by playing it long
enough, and if you pay more you are sure to lose money in the long run. The
fair price is then
X 6
E[X] = i/6 = 21/6 = $3.50
i=1
Example 1.2 (Waiting in line) Let us suppose that the waiting time to be
served at the post office at a particular location and time of day is known to
follow an exponential distribution with parameter = 6 (in units 1/hour).
What is the expected time of wait? We have now a continuous random variable
T with probability density function fT (t) = et . The expected value is easily
calculated to be: Z
1
E[T ] = tet dt = .
0
Therefore, the mean time of wait is one-sixth of an hour, or 10 minutes.
For discrete random variables, the same integral represents the sum in definition
1.1.
Here are a few simple properties of expectations.
Proposition 1.3 Let X and Y be random variables on the probability space
(S, F, P ). Then:
1. If X 0 then E[X] 0.
2. For any real number a, E[aX] = aE[X].
3
3. E[X + Y ] = E[X] + E[Y ].
4. If X is constant equal to a, then E[a] = a.
5. E[XY ]2 E[X 2 ]E[Y 2 ], with the equality if and only if X and Y are
linearly dependent, i.e., there are constants a, b, not both zero, such that
P (aX + bY = 0) = 1. (This is the Cauchy-Schwartz inequality.)
6. If X and Y are independent and both E[|X|] and E[|Y |] are finite, then
E[XY ] = E[X]E[Y ].
It is not difficult to obtain from the defintion that if Y = g(X) for some
function g(x) and X is a continuous random variable with probability density
function fX (x), then
Z Z
E[g(X)] = g(X(s))dP (s) = g(x)fX (x)dx.
S
1.3 Variance
The variance of a random variable X refines our knowledge of the probability
distribution of X by giving a broad measure of how X is dispersed around its
mean.
Definition 1.2 (Variance) Let (S, F, P ) be a probability space and consider
a random variable X : S R with expectation m = E[X]. (We assume that
m exists and is finite.) We define the variance of X as the mean square of the
difference X m; that is,
Z
Var(X) = E[(X m)2 ] = (X(s) m)2 dP (s).
S
p
The standard deviation of X is defined as (X) = Var(X).
Example 1.3 Let D be the determinant of the matrix
X Y
D = det ,
Z T
where X, Y, Z, T are independent random variables uniformly distributed in
[0, 1]. We wish to find E[D] and Var(D). The probability space of this problem
is S = [0, 1]4 , F the -algebra generated by 4-dimensional parallelepipeds, and
P the 4-dimensional volume obtained by integrating dV = dx dy dz dt. Thus
Z 1Z 1Z 1Z 1
E[D] = (xt zy)dx dy dz dt = 0.
0 0 0 0
The variance is
Z 1 Z 1 Z 1 Z 1
Var(D) = (xt zy)2 dx dy dz dt = 7/72.
0 0 0 0
4
Some of the general properties of variance are enumerated in the next propo-
sition. They can be derived from the definitions by simple calculations. The
details are left as exercises.
Sn = X 1 + + X n .
Then
2
Sn
Var = .
n n
Mean and variance are examples of moments and central moments of prob-
ability distributions. These are defined as follows.
Definition 1.3 (Moments) The moment of order k = 1, 2, 3, . . . of a random
variable X is defined as
Z
k
E[X ] = X(s)k dP (s).
S
The meaning of the central moments, and the variance in particular, is easier
to interpret using Chebyshevs inequality. Broadly speaking, this inequality says
that if the central moments are small, then the random variable cannot deviate
much from its mean.
5
1. Let X 0 (with probability 1) and let k N. Then
P (X ) E[X k ]/k .
P (|X m| ) Var(X)/2 .
P (|X m| c) 1/c2 .
Proof. The third inequality follows from the second by taking = c, and the
second follows from the first by substituting |X m| for X. Thus we only need
to prove the first. This is done by noting that
Z
k
E[X ] = X(s)k dP (s)
S
Z
X(s)k dP (s)
{sS:X(s)}
Z
k dP (s)
{sS:X(s)}
= k P (X ).
Example 1.5 (Tosses of a fair coin) We make N = 1000 tosses of a fair coin
and denote by SN the number of heads. Notice that SN = X1 + X2 + + XN ,
where Xi is 1 if the i-th toss obtains head, and 0 if tail. We assume that the Xi
are independent and P (Xi = 0) = P (Xi = 1) = 1/2. Then E[SN ] = N/2 = 500
and Var(SN ) = N/4 = 250. From the second inequality in theorem 1.1 we have
A better estimate of the dispersion around the mean will be provided by the
central limit theorem, discussed later.
6
2 The Laws of Large Numbers
The frequentist interpretation of probability rests on the intuitive idea that if
we perform a large number of independent trials of an experiment, yielding
numerical outcomes x1 , x2 , . . . , then the averages (x1 + + xn )/n converge to
some value x as n grows. For example, if we toss a fair coin n times and let
xi be 0 or 1 when the coin gives, respectively, head or tail, then the running
averages should converge to the relative frequency of tails, which is 0.5. We
discuss now two theorem that make this idea precise.
Sn = X 1 + + X n .
7
2.2 The strong law of large numbers
The weak law of large numbers, applied to a sequence Xi {0, 1} of coin tosses,
says that Sn /n must lie in an arbitrarily small interval around 1/2 with high
probability (arbitrarily close to 1) if n is taken big enough. A stronger statement
would be to say that, with probability one, a sequence of coin tosses yields a
sum Sn such that Sn /n actually converges to 1/2.
To explain the meaning of the stronger claim, let us be more explicit and
view the random variables as functions Xi : S R on the same probability
space (S, F, P ). Then, for each s S we can consider the sample sequence
X1 (s), X2 (s), . . . , as well as the arithmetic averages Sn (s)/n, and ask whether
Sn (s)/n (an ordinary sequence of numbers) actually converges to 1/2. The
strong law of large numbers states that the set of s for which this holds is an
event of probability 1. This is a much more subtle result than the weak law,
and we will be content with simply stating the general theorem.
Sn = X 1 + X 2 + + X n .
8
converges in distribution to a standard normal random variable. In other words,
Z z
1 1 2
P (Zn z) (z) = e 2 u du
2
as n .
f f f
2 2.5
2
1.5
1.5
1
1
0.5
0.5
0 0
1 0.5 0 0.5 1 2 1 0 1 2
f f f f f f f
4 6
5
3
4
2 3
2
1
1
0 0
4 2 0 2 4 4 2 0 2 4
Figure 1: Convolution powers of the function f (x) = 1 over [1, 1]. By the
central limit theorem, after centering and re-scaling (not done in the figure),
f n approaches a normal distribution.
9
Example 3.1 (Die tossing) Consider the experiment of tossing a fair die n
times. Let Xi be the number obtained in the i-th toss and Sn = X1 + + Xn .
The Xi are independent and have a common discrete distribution with mean =
3.5 and 2 = 35/12. Assuming n = 1000, by the central limit theorem Sn has
approximately the pnormal distribution with mean (Sn ) = 3500 and standard
deviation (Sn ) = 35 1000/12, which is approximately 54. Therefore, if we
simulate the experiment of tossing a die 1000 times, repeat the experiment a
number of times (say 500) and plot a histogram of the result, what we obtain
should be approximated by the function
1 1 x 2
f (x) = e 2 ( ) ,
2
where = 3500 and = 54.
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
3300 3350 3400 3450 3500 3550 3600 3650 3700
Figure 2: Comparison between the sample distribution given by the stem plot
and the normal distribution for the experiment of tossing a die 1000 times and
counting the total number of pips.
Example 3.2 (Die tossing II) We would like to compute the probability that
after tossing a die 1000 times, one obtains more than 150 6s. Here, we con-
sider the random variable Xi , i = 1, . . . , 1000, taking values in {0, 1}, with
P (Xi = 1) = 1/6. (Xi = 1 represents the event of getting a 6 in the i-th toss.)
Writing Sn = X1 + +Xn , we wish to compute the probability
p P (S1000 > 150).
Each Xi has mean p = 1/6 and standard deviation (1 p)p. By the central
limit theorem, we approximate the probability p distribution of Sn by a normal
distribution with = 1000p and variance = 1000(1 p)p. This is approx-
imately = 166.67 and = 11.79. Now, the distribution of (Sn )/ is
10
approximately the standard normal, so we can write
The integral above was evaluated numerically by a simple Riemann sum over
the interval [1.41, 10] and step-size 0.01. We conclude that the probability of
obtaining at least 150 6s in 1000 tosses is approximately 0.92.
This is the basic idea behind Monte-Carlo integration. It may happen that we
cannot simulate realizations of X, but we can simulate realizations y1 , y2 , . . . , yn
of a random variable Y with probability density function h(x) which is related
to X in that h(x) is not 0 unless f (x) is 0. In this case we can write
Z
E[g(X)] = g(x)f (x)dx
Z
g(x)f (x)
= h(x)dx
h(x)
= E[g(Y )f (Y )/h(Y )]
n
1 X g(yi )f (yi )
.
n i=1 h(yi )
11
The above procedure is known as importance sampling. We need now ways to
simulate realizations of random variables. This is typically not an easy task,
but a few general techniques are available. We describe some below.
where M and K are integers. Since only finitely many different numbers occur,
the modulus M should be chosen as large as possible. To prevent cycling with
a period less than M the multiplier K should be taken relatively prime to M .
Typically b is set to 0, in which case the pseudo-random number generator is
called a multiplicative congruential generator. A good choice of parameters are
K = 75 and M = 231 1. Then x = u/M gives a pseudo-random number in
[0, 1] that simulates a uniformly distributed random variable.
We may return to this topic later and discuss statistical tests for evaluating
the quality of such pseudo-random numbers. For now, we will continue to take
for granted that this is a good way to simulate uniform random numbers over
[0, 1].
If we want to simulate a random point in the square [0, 1] [0, 1], we can
naturally do it by picking a pair of independent random variables X1 , X2 with
the uniform distribution on [0, 1]. (Similarly for cubes [0, 1]n in any dimension
n.)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rand(seed,121)
n=500000;
X=2*rand(n,2)-1;
a=4*sum(X(:,1).^2 + X(:,2).^2<=1)/n
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
12
Figure 3: Simulation of 1000 random points on the square [1, 1]2 with the
uniform distribution. To approximate the ratio of the area of the disc over the
area of the square we compute the fraction of points that fall on the disc.
The above example should prompt the question: How do we estimate the
error involved in, say, our calculation of , and how do we determine the number
of random points needed for a given precision? First, consider the probability
space S = [1, 1]2 with probability measure given by
ZZ
1
P (E) = dxdy,
4 E
and the random variable D : S {0, 1} which is 1 for a point in the disc and 0
for a point in the complement of the disc. The expected value of D is = /4
and the variance is easily calculated to be 2 = (1 ) = (4 )/16. If we
draw n independent points on the square, and call the outcomes D1 , D2 , . . . , Dn ,
then the fraction of points in the disc is given by the random variable
D1 + + Dn
Dn = .
n
As we have already seen (proposition 1.5), Dn must have mean value and
standard deviation / n. Fix a positive number K. One way to estimate the
error in our calculation
is to ask for the probability that |Dn | is bigger than
the error K/ n. Equivalently, we ask for the probability P (|Zn | K), where
Dn
Zn = .
/ n
This probability can now be estimated using the central limit theorem. Re-
call that the probability distribution density of Zn , for big n, is very nearly a
13
standard normal distribution. Thus
Z
2 1 2
P (|Dn | K/ n) e 2 z dz.
2 K
Example 4.2 The random variable X = a + (b a)U clearly has the uniform
distribution on [a, b] if U U (0, 1). This follows from the proposition since
xa
F (x) = ,
ba
which has inverse function F 1 (u) = a + (b a)u.
14
Example 4.3 (Exponential random variables) If U U (0, 1) and > 0,
then
1
X = log(U )
has an exponential distribution with parameter . In fact, an exponential ran-
dom variable has PDF
f (x) = ex
and its cumulative distribution function is easily obtained by explicit integra-
tion:
F (x) = 1 ex .
Therefore,
1
F 1 (u) = log(1 u).
But 1 U U (0, 1) if U U (0, 1), so we have the claim.
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=exponential(lambda,n)
%Simulates n independent realizations of a
%random variable with the exponential
%distribution with parameter lambda.
y=-log(rand(1,n))/lambda;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15
Here are the commands used to produce figure 4.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
y=exponential(1,5000);
[n,xout]=hist(y,40);
stem(xout,n/5000)
grid
hold on
dx=xout(2)-xout(1);
fdx=exp(-xout)*dx;
plot(xout,fdx)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
X = min{k : qk U }.
P (X = k) = P (U (qk1 , qk ]) = qk qk1 = pk .
4.4 Scaling
We have already seen that if Y = aX + b for a non-zero a, then
1 yb
fY (y) = f .
|a| a
16
4.5 The uniform rejection method
The methods of this and the next subsection are examples of the rejection
sampler method.
Suppose we want to simulate a random variable with PDF f (x) such that
f (x) is zero outside of the interval [a, b] and f (x) L for all x. Choose X
U (a, b) and Y U (0, L) independently. If Y < f (X), accept X as the simulated
value we want. If the acceptance condition is not satisfied, try again enough
times until it holds. Then take that X for which Y < f (X) as the output of the
algorithm and call it X. This procedure is referred to as the uniform rejection
method for density f (x).
Proof. Let A represent the region in [a, b] [0, L] consisting of points (x, y)
such that y < f (x). We call A the acceptance region. As above, we denote by
(X, Y ) a random variable uniformly distributed on [a, b] [0, L]. Let F (x) =
P (X x) denote
R x the cumulative distribution function of X. We wish to show
that F (x) = a f (s)ds. This is a consequence of the following calculation, which
uses the continuous version of the total probability formula and the key fact:
P ((X, Y ) A|X = s) = f (s)/L.
F (x) = P (X x)
= P (X x|(X, Y ) A)
P ({X x} {(X, Y ) A})
=
P ((X, Y ) A)
1
Rb
P ({X x} {(X, Y ) A}|X = s)ds
= ba a Rb
1
ba a P ((X, Y ) A|X = s)ds
Rx
P ((X, Y ) A|X = s)ds
= Rab
a
P ((X, Y ) A|X = s)ds
R x f (s)
L ds
= Rab f (s)
ds
Z ax L
= f (s)ds.
a
It is clear that the efficiency of the rejection method will depend on the
probability that a random point (X, Y ) will be accepted, i.e., will fall in the
17
acceptance region A. This probability can be estimated as follows:
P (accept) = P ((X, Y ) A)
Z b
1
= P ((X, Y ) A|X = s)ds
ba a
Z b
1
= f (s)ds
(b a)L a
1
= .
(b a)L
18
dx=xout(2)-xout(1);
fdx=(1/2)*sin(xout)*dx;
plot(xout,fdx)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0 0.5 1 1.5 2 2.5 3
19
apply the integral form of the total probability formula.) The result is
1
P (accept) = .
a
Proposition 4.3 (The envelope method) The envelope method for f (x) de-
scribed above simulates a random variable X with probability distribution f (x).
Proof. The argument is essentially the same as for the uniform rejection method.
Note now that P (U f (Y )|Y = s) = f (s)/(ag(s)). With this in mind, we have:
F (x) = P (X x)
= P (Y x|U f (Y ))
P ({Y x} {U f (Y )})
=
P (U f (Y ))
R
P ({Y x} {U f (Y )}|Y = s)g(s)ds
= R
P (U f (Y )|Y = s)g(s)ds
Rx
P (U f (Y )|Y = s)g(s)ds
= R
P (U f (Y )|Y = s)g(s)ds
R x f (s)
a ds
= R
f (s)
a
ds
Z x
= f (s)ds.
Example 4.5 (Envelope method) This is the same as the previous example,
but we now approach the problem via the envelope method. We wish to simulate
a random variable X with PDF (1/2) sin(x) over [0, ]. We first simulate a
random variable Y with probability density g(y), where
(
4
2y if y [0, /2]
g(y) = 4
2 ( y) if y [/2, ].
Notice that f (x) ag(x) for a = 2 /8. Therefore, the envelope method will
have probability of acceptance 1/a = 0.81. To simulate the random variable Y ,
note that g(x) = (h h)(x), where h(x) = 2/ over [0, /2]. Therefore, we can
take Y = V1 + V2 , where Vi are identically distributed uniform random variables
over [0, /2].
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function x=samplefromsine2(n)
20
%Simulates n independent realizations of a random
%variable with PDF (1/2)sin(x) over the interval
%[0,pi], using the envelope method.
x=[];
for i=1:n
U=1/2;
Y=0;
while U>=(1/2)*sin(Y)
Y=(pi/2)*sum(rand(1,2));
U=(pi^2/8)*((2/pi)-(4/pi^2)*abs(Y-pi/2))*rand;
end
x=[x Y];
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
21
The variance is similarly calculated. Its value is
n2 1
Var(X) = .
12
The following is a simple way to simulate a DU (n) random variable;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=discreteuniform(n,m)
%Simulates m independent samples of a DU(n) random variable.
y=[];
for i=1:m
y=[y ceil(n*rand)];
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
22
Example 5.1 (Urn problem) An urn contains N balls, of which K are black
and N K are red. We draw with replacement n balls and count the number
X of black balls drawn. Let p = K/N . Then X B(n, p).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=binomial(n,p,m)
%Simulates drawing m independent samples of a
%binomial random variable B(n,p).
y=sum(rand(n,m)<=p);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Example 5.2 Suppose that 100 independent observations are taken from a uni-
form distribution on [0, 1]. We partition the interval into 10 equal subintervals
(bins), and record the numbers X1 , . . . , X10 of observations that fall in each
bin. The information is then represented as a histogram, or bar graph, in which
the bar over the bin labeled by i equals Xi . Then, the histogram itself can be
viewed as a random variable with the multinomial distribution, where n = 100
and pi = 1/10 for i = 1, 2, . . . , 10.
The following program simulates one sample draw of a random variable with
the multinomial distribution.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=multinomial(n,p)
%Simulates drawing a sample vector y=[y1, ... yr]
%with the multinomial distribution M(n,p),
%where p=[p1 ... pr] is a probability vector.
r=length(p);
x=rand(n,1);
a=0;
23
A=zeros(n,1);
for i=1:r
A=A+i*(a<=x & x<a+p(i));
a=a+p(i);
end
y=zeros(1,r);
for j=1:r
y(j)=sum(A==j);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
P (X = k) = (1 p)k1 p.
X = min{n 1 : Xn = 1} Geom(p).
X X p 1
E[X] = iP (X = i) = i(1 p)i1 p = = .
i=1 i=1
(1 (1 p))2 p
Similarly:
X X 1 + (1 p) 2p
E[X 2 ] = i2 P (X = i) = i2 (1 p)i1 p = p = ,
i=1 i=1
(1 (1 p))3 p2
24
from which we obtain the variance:
2p 1 1p
Var = E[X 2 ] E[X]2 = 2 = .
p2 p p2
We have used above the following formulas:
X 1 X 1 X 1+a
ai1 = , iai1 = , i2 ai1 =
i=1
1a i=1
(1 a)2 i=1
(1 a)3
Example 5.3 (Waiting for a six) How long should we expect to have to wait
to get a 6 in a sequence of die tosses? Let X denote the number of tosses until
6 appears for the first time. Then the probability that X = k is
k1
5 1
P (X = k) = .
6 6
In other words we have k 1 failures, each with probability 5/6, until a success,
with probability 1/6. The expected value of X is
k1
X X 5 1
kP (X = k) = k = 6.
6 6
k=1 k=1
The following program simulates one sample draw of a random variable with
the Geom(p) distribution.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=geometric(p)
%Simulates one draw of a geometric
%random variable with parameter p.
a=0;
y=0;
while a==0
y=y+1;
a=(rand<p);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
25
5.5 The negative binomial distribution
A random variable X has the negative binomial distribution, also called the
Pascal distribution, denoted X N B(n, p), if there exists an integer n 1 and
a real number p (0, 1) such that
n+k1
P (X = n + k) = pn (1 p)k , k = 0, 1, 2, . . .
k
n n(1 p)
E[X] = , Var(X) = .
p p2
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=negbinomial(n,p)
%Simulates one draw of a negative binomial
%random variable with parameters n and p.
y=0;
for i=1:n
a=0;
u=0;
while a==0
u=u+1;
a=(rand<p);
end
y=y+u;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
k
P (X = k) = e , k = 0, 1, 2, . . . .
k!
This is a very ubiquitous distribution and we will encounter it many times in
the course. One way to think about the Poisson distribution is as the limit of a
26
binomial distribution B(n, p) as n , p 0, while = np = E[X] remains
constant. In fact, replacing p by /n in the binomial distribution gives
k nk
n
P (X = k) = 1
kn n
k nk
n!
= 1
k!(n k)! n n
k n! (1 /n)n
=
k! (n k)!n (1 /n)k
k
k n (n 1) (n 2) (n k + 1) (1 /n)n
=
k! n n n n (1 /n)k
k
e .
k!
Notice that we have used the limit (1 /n)n e .
The expectation and variance of a Poisson random variable are easily calcu-
lated from the definition or by the limit of the corresponding quantities for the
binomial distribution. The result is:
E[X] = , Var(X) = .
One noteworthy property of Poisson random variables is that, if X Po()
and Y Po() are independent Poisson random variables, then Z = X + Y
Po( + ).
A numerical example may help clarify the meaning of the Poisson distribu-
tion. Consider the interval [0, 1] partitioned into a large number, n, of subinter-
vals of equal length: [0, 1/n), [1/n, 2/n), . . . , [(n 1)/n, 1]. To each subinterval
we randomly assign a value 1 with a small probability /n (for a fixed ) and
0 with probability 1 /n. Let X be the number of 1s. Then, for large n, the
random variable X is approximately Poisson with parameter . The following
Matlab script illustrates this procedure. It produces samples of a Poisson ran-
dom variable with parameter = 3 over the interval [0, 1] and a graph that
shows the positions where an event occurs.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Approximate Poisson random variable, X, with parameter lambda.
lambda=3;
n=500;
p=lambda/n;
a=(rand(1,n)<p);
x=1/n:1/n:1;
X=sum(a)
stem(x,a)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
27
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
for i=1:1000
z(i,1)=(1/lambda)*log(1/(1-rand(1,1))); %interarrival times
if i==1
t(i,1)=z(i);
else
t(i,1)=t(i-1)+z(i,1);
end
if t(i)>T
break
end
end
M=length(t)-1;
a=t(1:M);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
28
5.7 The hypergeometric distribution
A random variable X is said to have a hypergeometric distribution if there exist
positive integers r, n and m such that for any k = 0, 1, 2, . . . , n we have
r nr
k mk
P (X = k) = .
n
m
The binomial, the multinomial and the Poisson distributions arise when one
wants to count the number of successes in situations that generally correspond to
drawing from a population with replacement. The hypergeometric distribution
arises when the experiment involves drawing without replacement.
The expectation and variance of a random variable X having the hypergeo-
metric distribution are given by
N n
E[X] = np, Var(X) = npq .
N 1
Example 5.4 (Drawing without replacement) An urn contains n balls, r
of which are black and nr are red. We draw from the urn without replacement
m bals and denote by X the number of black balls among them. Then X has
the hypergeometric distribution with parameters r, n and m.
29
5.8 The uniform distribution
A random variable X has a uniform distribution over the range [a, b], written
X U (a, b), if the PDF is given by
(
1
if a x b
fX (x) = ba
0 otherwise.
30
Exponential random variables often arise as random times. We will see
them very often. The following propositions contain some of their most notable
properties. The first property states can be interpreted as follows. Suppose
that something has a random life span which is exponentially distributed with
parameter . (For example, the atom of a radioactive element.) Then, having
survived for a time t, the probability of surviving for an additional time s is the
same probability it had initially to survive for a time s. Thus the system does
not keep any memory of the passage of time. To put it differently, if an entity
has a life span that is exponentially distributed, its death cannot be due to an
aging mechanism since, having survived by time t, the chances of it surviving
an extra time s are the same as the chances that it would have survived to time
s from the very beginning. More precisely, we have the following proposition.
for any s, t 0.
Proof.
The next proposition states that the inter-event times for a Poisson random
variable with parameter are exponentially distributed with parameter .
Proposition 5.3 Consider a Poisson process with rate . Let T be the time to
the first event (after 0). Then T Exp().
31
Proof. Let Nt be the number of events in the interval (0, t] (for given fixed
t > 0). Then Nt Po(t). Consider the cumulative distribution function of T :
FT (t) = P (T t)
= 1 P (T > t)
= 1 P (Nt = 0)
(t)0 et
=1
0!
= 1 et .
P (T h) 1 eh 1 (1 h) O(h2 )
= = +
h h h h
as h 0. So for very small h, P (T h) is approximately h and due to the
independence property of the Poisson process, this is the probability for any time
interval of length h. The Poisson process can therefore be thought of as a process
with constant event hazard , where the hazard is essentially a measure of
event density on the time axis. The exponential distribution with parameter
can therefore also be reinterpreted as the time to an event of constant hazard
.
The next proposition describes the distribution of the minimum of a collec-
tion of independent exponential random variables.
32
Proof. First note that for X Exp() we have P (X > x) = ex . Then
= ex(1 ++n )
= e0 x .
Proof.
Z
P (X < Y ) = P (X < Y |Y = y)f (y)dy
Z0
= P (X < y)f (y)dy
0
Z
= (1 ey )ey dy
0
= .
+
The next result gives the likelihood of a particular exponential random vari-
able of an independent collection being the smallest.
Proof. For each j, define the random variable Y = mink6=j {Xk } and set j =
33
0 j . Then
P (J = j) = P (Xj < mink6=j {Xk })
= P (Xj < Y )
j
=
j + j
j
= .
0
34
Note that the PDF is symmetric about x = , so the median and mean of
the distribution will be . Checking that the density integrates to 1 requires
the well-known integral
Z r
x2
e dx = , > 0.
We leave the calculation of this and the variance as an exercise. The result is
E[X] = , Var(X) = 2 .
Y N (1 + 2 , 12 + 22 ).
35
In Matlab we can sample a standard normal random variable using the
command randn, which has the same usage as rand. This is what we will
typically use when sampling from a normal distribution.
One simple way to generate a normal random variable is to use the central
limit theorem. Consider
n
12 X
Z= (Ui 1/2)
n i=1
where Ui are independent random variables with the uniform distribution on
[0, 1]. Then Z has mean 0 and variance 1 and is approximately normal. This
method is not very efficient since it requires sampling from the uniform distri-
bution many times for a single realization of the normal distribution.
36
Therefore the PDF of (R, ) splits as a product of a (constant) function of
and a function of r. This shows that R and are independent, U (0, 2)
2
and R has PDF rer /2 . Applying the transformation formula again, this time
for the function g(r) = r2 we obtain that the PDF for R2 is
1 1u
fR2 (u) = e 2 .
2
This shows that R2 and are as claimed. The converse is shown similarly.
5 4 3 2 1 0 1 2 3 4 5
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=stdnormal2d(n)
%Simulates n realizations of two independent
%standard normal random variables (X1,X2) using
%the Box-Muller method. Requires the function
%exponential(lambda,n). The output is a matrix
%of size n-by-2.
theta=2*pi*rand(1,n);
r=sqrt(exponential(0.5,n));
x1=r.*cos(theta);
x2=r.*sin(theta);
y=[x1;x2];
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
37
5.13 The 2n distribution
If Zi are independent standard normal random variables, then
n
X
X= Zi
i=1
has a 2n distribution.
It is not difficult to show from the definition that (1) = 1 and (x+1) = x(x).
If x = n isa positive integer, it follows that (n + 1) = n!. Also worth noting,
(1/2) = .
A random variable X has a gamma distribution with parameters , .0,
written X (, ), if it has PDF
(
x1 ex if x > 0
f (x) = ()
0 if x 0.
38
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0 5 10 15
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=gamma(r,n)
%Simulates n independent realizations of a Gamma(r,1)
%random variable.
for j=1:n
t=2;
u=3;
while (t<u)
e=-r*log(rand);
t=((e/r)^(r-1))*exp((1-r)*(e/r-1));
u=rand;
end
y(j)=e;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
39
5.15 The beta distribution
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=beta(a,b,n)
%Simulates n independent realizations of a Beta(a,b)
%random variable.
x1=gamma(a,n);
x2=gamma(b,n);
y=x1./(x1+x2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
40
6.1 Convolution of PDFs
Exercise 6.8 Let f (x) and g(x) be two functions of a real variable x R.
Suppose that f (x) is zero for x in the complement of the interval [a, b], and
g(x) is zero in the complement of [c, d]. Show that (f g)(x) is zero in the
complement of [a + c, b + d]. Hint: show that the convolution of the indicator
functions of the first two intervals is zero in the complement of the third.
Exercise 6.9 Suppose that f (x) and g(x) are two functions of a real variable
x R which are zero outside of the intervals [a, b] and [c, d], respectively. We
wish to obtain an approximation formula for the convolution h = f g of f and
g by discretizing the convolution integral. Assume that the lengths of the two
intervals are multiples of a common small positive step size e. This means that
there are positive integers N and M such that
ba dc
=e= .
N M
Show that the approximation of (f g)(x) over the interval [a + c, b + d] by
Riemann sum discretization of the convolution integral is
min{j,M +1}
X
h(xj ) = f (a + (j i)e)g(c + (i 1)e)e,
i=max{1,jN }
for j = 1, . . . , N + M + 1.
The following script implements the approximation for the convolution in-
tegral to obtain the nth convolution power of a function f .
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function g=convolution(f,a,b,n)
%Input - f vector discretization of a function
% - a and b are the left and right endpoints
% of an interval outside of which f is zero
% - n degree of convolution
%Output - h vector approximating of the n degree
% convolution of f with itself
% over interval [na, nb]
N=length(f)-1;
e=(b-a)/N;
s=[a:e:b];
g=f;
for k=2:n
x=[k*a:e:k*b];
h=zeros(size(x));
for j=1:k*N+1
for i=max([j-N,1]):min([j,(k-1)*N+1])
41
h(j)=h(j)+f(j-i+1)*g(i)*e;
end
end
g=h;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Exercise 6.10 Let f be the function f (x) = cx2 over the interval [1, 1], where
c is a normalization constant. We discretize it and write in Matlab:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
x=[-1:0.01:1]; f=x.^2; f=f/sum(f);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
hold off
stem(xout,n/M) %stem plot of the relative frequencies
42
f=(1/(s*sqrt(2*pi)))*exp(-0.5*((x-m)/s).^2)*dx;
hold on
plot(x,f)
grid
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Exercise 6.11 (Transformation method) Write a program to simulate a
random variable X with probability density function f (x) = 3x2 over the inter-
val [0, 1] using the transformation method. Simulate 1000 realizations of X and
plot a stem plot with 20 bins. On the same coordinate system, superpose the
graph of f (x) (appropriately normalized so as to give be correct frequencies of
each bin.)
Exercise 6.12 (Uniform rejection method) Write a program to simulate a
random variable taking values in [1, 1] with probability density
3
f (x) = (1 x2 ).
4
Use the uniform rejection method. Simulate 1000 realizations of X and plot a
stem plot with 20 bins. On the same coordinate system, superpose the graph of
f (x) (appropriately normalized so as to give be correct frequencies of each bin.)
Multinomial
P k k
( ni )! Y ni X
P (n1 , n2 , . . . , nk ) = pi , pi = 1.
n1 !n2 ! . . . nk ! i=1 i=1
Geometric
P (X = k) = (1 p)k1 p, k = 1, 2, . . . , n.
43
Poisson
k e
P (X = k) = , k = 0, 1, 2, . . .
k!
Uniform
1
f (x) = I[a,b] (x), x R.
ba
Negative binomial
n+k1
P (X = n + k) = pn (1 p)k , k = 0, 1, 2, . . .
k
Exponential
f (x) = ex , x 0.
Gamma
n xn1 ex
f (x) = , x 0.
(n)
x/21 ex/2
f (x) = , x 0.
(/2)2/2
t
x2
(( + 1)/2)
f (x) = 1+ , x R.
(/2)
Cauchy
1
f (x) = , x R.
2 + (x )2
Weibull
f (x) = (x)1 e(x) , x 0.
Normal
1 1 x 2
f (x) = e 2 ( ) , x R.
2
Half-normal r
1 2 21 ( x 2
) ,
f (x) = e x 0.
Multivariate normal
1
f (x) = (2)k/2 | det(C)|1/2 exp (x )0 C 1 (x ) , x Rk .
2
44
Logistic
e(+x)
f (x) = , x R.
(1 + e(+x) )2
Log-logistic
e x(+1)
f (x) = , x 0.
(1 + e x )2
References
[CKO] S. Cyganowski, P. Kloeden, and J. Ombach. From Elementary Probabil-
ity to Stochastic Differential Equations with MAPLE, Springer, 2002.
[Mor] Byron J.T. Morgan. Applied stochastic modelling, Arnold, 2000.
[Rice] J.A. Rice. Mathematical Statistics and Data Analysis. Wadsworth &
Brooks/Cole, 1988.
[Wilk] Darren J. Wilkinson. Stochastic Modelling for Systems Biology, Chap-
man and Hall/CRC, 2006.
45