Vous êtes sur la page 1sur 45

Random number generation Generating random numbers Generating random vectors

Lecture 8 - Random number generation in R

Björn Andersson (w/ Xijia Liu)


Department of Statistics, Uppsala University

February 26, 2015

1 / 45
Random number generation Generating random numbers Generating random vectors

Table of Contents

1 Random number generation


True random numbers
Pseudo-random numbers
Direct methods
Indirect methods
Composition methods

2 Generating random numbers

3 Generating random vectors

2 / 45
Random number generation Generating random numbers Generating random vectors

True random numbers

”True” random numbers

A sequence of ”true” random numbers is a sequence which is in


practice impossible to predict. That is, it is not derived only from a
computer algorithm. Examples:
Coin toss, throwing of a dice
Atmospheric data (http://www.random.org)
True random numbers derived from physical phenomena
which are indeterministic
”True” random numbers are typically used in situations when the
decisions based on the sequence are particularly important or where
security necessitates that the sequence is impossible to predict.
Gambling, encryption etc.

3 / 45
Random number generation Generating random numbers Generating random vectors

True random numbers

Actually true random numbers

A truly random sequence of numbers is one which is in principle


impossible to predict. Quantum mechanical properties are used to
derive the random numbers.
Radioactive decay (http://www.fourmilab.ch/hotbits)
Photons travelling through a semi-transparent mirror
These methods are slow and can contain biases from the
measuring instruments. However, if you want a truly random
decision - use these methods.

4 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Pseudo-random number: definition

Uniform pseudo-random number generator (from Monte Carlo


Statistical Methods, Robert and Casella, 2004):
A uniform pseudo-random number generator is an algorithm
which, starting from an initial value u0 and a transformation
D, produces a sequence (ui ) = (D i (u0 )) of values in [0, 1].
For all n, the values (u1 , . . . , un ) reproduce the behaviour of
an i.i.d. sample of uniform random variables when compared
through a usual set of tests.

5 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Why use pseudo-random numbers?

The ”true” random numbers are not practical to use since they
require physical phenomena to derive. For statistical applications
pseudo-random numbers are most often used. These are derived
from a computer algorithm and as such it is possible to predict a
sequence of these numbers.
Pseudo-random numbers
are fast to calculate
have bias-properties which are easy to control
are reproducible

6 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Congruential generators

Let m be a large integer and let b be another integer which is


smaller than m
Select an integer x0 between 1 and m. We call x0 the seed.
Note the modulo operation (modulus): a mod m equals the
remainder of a/m.
With the chosen seed x0 we have the following algorithm
(Lehmer):
1 Let xi = bxi−1 mod m
2 Set ui = xi /m as the pseudo-random number
3 Let i = i + 1

7 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Congruential generator example

Let m = 11, b = 3 and x0 = 2. Then


x1 = 3 × 2 mod 11 = 6, u1 = 6/11 ≈ 0.545
x2 = 3 × 6 mod 11 = 7, u1 = 7/11 ≈ 0.636
x3 = 3 × 7 mod 11 = 10, u1 = 10/11 ≈ 0.909
x4 = 3 × 10 mod 11 = 9, u1 = 9/11 ≈ 0.727
x5 = 3 × 9 mod 11 = 6, u1 = 6/11 ≈ 0.182

8 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Congruential generator in R code

> congruential <- function(m, b, x0, n){


+ out <- numeric(n)
+ xi <- x0
+ for(i in 1:n){
+ xi <- (b*xi) %% m
+ out[i] <- xi/m
+ }
+ return(out)
+ }

9 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Congruential generator parameters

The integers m and b must be chosen properly:


The integer m should be large and not be divisible by b;
choose prime numbers
The value of b is often chosen to be near the square root of m
and should ensure that the cycle length is actually m.

10 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Uniform random numbers in R

The default R generator of uniform pseudo-random numbers is the


Mersenne twister. It has a period of 219937 − 1 and passes most
randomness tests.

runif(n, min = 0, max = 1)

11 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Note: choosing the seed

The seed determines which pseudo-random numbers you will


obtain.
If you want a less predictable random sequence you may
choose the seed based on the current time or some other
related method.
To get reproducible results, choose the random seed yourself
and remember it.
The R function set.seed() specifies the seed
> set.seed(1)
> runif(1)
[1] 0.2655087

12 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Inverse-transform method

Let X be a random variable with cdf F . Since F is a


non-decreasing function, the generalized inverse F −1 may be
defined as

F −1 (y ) = inf{x : F (x) ≥ y }, 0 ≤ y ≤ 1.

The operator inf A means the greatest lower bound of the set A.
Similarly, sup A means the least upper bound of the set A.
Proposition (Probability integral transform)
If U ∼ U(0, 1), then the random variable F −1 (U) has cdf F .

13 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Inverse-transform method

Inverse-transform algorithm:
1 Generate u from ∼ U(0, 1)
2 Return x = F −1 (u).
Thus we may generate random numbers from F by generating a
random number u from the standard uniform distribution and then
calculating the inverse of F and evaluate it with u as argument.

14 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Inverse-transform method: example

The object is to generate pseudo-random numbers from


X ∼ U(a, b). The cdf of X is given by
x −a
F (x) = .
b−a
We solve u = F (x) in terms of x:
x −a
u = F (x) ⇐⇒ u = ⇐⇒ x = (b − a)u + a.
b−a
Hence we retrieve

F −1 (u) = (b − a)u + a.

15 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Inverse-transform method: example i R

> genunif <- function(a, b){


+ u <- runif(1)
+ return((b - a) * u + a)
+ }
> set.seed(1)
> genunif(-2, 2)
[1] -0.9379653
> set.seed(1)
> runif(1, -2, 2)
[1] -0.9379653

16 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Transform method

Sometimes it’s not possible to obtain a closed form expression


of the inverse function of the cdf desired, e.g. chi-square, F, t
distributions etc.
However, these distributions may be related to other random
variables which can be generated easily.
Hence, first X can be generated and then a transformation of
X provides the random variable desired.

17 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

18 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Transform method: Box and Müller

A transform method to generate standard normal random variables


X and Y .
1 Generate U1 and U2 from ∼ U(0, 1)
2 Make the transformation:
X = (−2 ln U1 )1/2 cos(2πU2 )
Y = (−2 ln U1 )1/2 sin(2πU2 )
Such X and Y are independent N(0, 1) random variables.

19 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Transform method: Box and Müller


> BoxMul <- function(n){
+ u <- runif(n)
+ x <- sqrt(-2 * log(u[1:(n/2)])) *
+ cos(2 * pi * u[(n/2 + 1):n])
+ y <- sqrt(-2 * log(u[1:(n/2)])) *
+ sin(2 * pi * u[(n/2 + 1):n])
+ return(c(x, y))
+ }
> xy <- BoxMul(10000)
> shapiro.test(xy[1:5000])$p.value
[1] 0.478975
> shapiro.test(xy[5001:10000])$p.value
[1] 0.07263341
> cor(xy[1:5000], xy[5001:10000])
[1] -0.00960596 20 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Transform method: example

We want to generate Z from ∼ N(µ, σ 2 ). However, we do not


need to work with the cdf of Z directly, since for X ∼ N(0, 1) with
density fX (x), Z = µ + σX .
> set.seed(1)
> z1 <- 2 + 2 * rnorm(10)
> set.seed(1)
> z2 <- rnorm(10, 2, 2)
> z1 - z2
[1] 0 0 0 0 0 0 0 0 0 0

21 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Acceptance-rejection method

Noting that we can write a density fX as


Z fX (x)
u=f (x)
du = u|u=0X = fX (x)
0

we find that fX can be seen as the marginal density of the joint


distribution (X , U) ∼ U{(x, u) : 0 < u < fX (x)}. We can then
generate from the joint distribution by generating random uniform
numbers on the constrained set {(x, u) : 0 < u < fX (x)}. And,
since the marginal distribution of X is fX , by generating a uniform
variable on {(x, u) : 0 < u < fX (x)} we have generated a random
variable from fX . Hence simulating X ∼ fX (x) is equivalent to
simulating (X , U) ∼ U{(x, u) : 0 < u < fX (x)}.

22 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Acceptance-rejection method

Rb
As an example, assume that a fX (x)dx = 1 and that fX is
bounded by c. We can then simulate the random pair
(Y , U) ∼ U(0 < u < c) by simulating Y ∼ U(a, b) and
U|Y = y ∼ U(0, c) and then only accept the pair as an
instantiation from X if 0 < u < fX (y ) is satisfied.

23 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Naive acceptance-rejection method

The preceding motivates a method of generating random numbers.


We wish to generate random numbers from density f .
Assume that the target pdf f is bounded on a finite interval
[a, b] and is zero outside this interval.
Let c = sup{f (x) : x ∈ [a, b]}.
Under these conditions we have the following algorithm (Naive A-R
algorithm):
1 Generate U from U(a, b)
2 Generate V from U(0, c)
3 If V ≤ f (U), return X = U. Otherwise, return to 1.).

24 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Naive acceptance-rejection method: example

We want to generate random numbers from X ∼ Beta(α, β),


where α, β > 1. We note that for such α, β,
 
α−1
sup{fX (x) : x ∈ [0, 1]} = fX .
α+β−2

We proceed as follows:
1 Generate U from U(0, 1)
  
α−1
2 Generate V from U 0, fX α+β−2
3 If V ≤ fX (U), return X = U. Else, return to 1.

25 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Naive acceptance-rejection method: R code

> randbeta <- function(n, alpha, beta){


+ mode <- (alpha - 1) / (alpha + beta - 2)
+ c <- dbeta(mode, alpha, beta)
+ betanumber <- function(alpha, beta, c) {
+ repeat{
+ u <- runif(1)
+ v <- runif(1, 0, c)
+ if(v <= dbeta(u, alpha, beta)) break
+ }
+ return(u)
+ }
+ return(replicate(n, betanumber(alpha, beta, c)))
+ }

26 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Naive acceptance-rejection method: notes

Things to note:
The probability of acceptance is
1
P(accept) = ,
c(b − a)

that is, the percentage of the area covered by the pdf of the
desired R.V. Hence, on average we need to repeat the
procedure c(b − a) times to retrieve one such number.
The domain of the desired density must be a fixed interval
otherwise we can’t produce the corresponding uniformly
distributed random variable.

27 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Acceptance-rejection method
The efficiency of the method can be improved by making an
adjustment to the algorithm. Again, we want to generate random
numbers from a density f . Let X and Y be random variables with
density functions f and g . Assume that we can readily generate
random numbers from g . Assume also that there is a constant c
such that  
f (t)
c = sup < ∞,
g (t)
for all t such that f (t) > 0. Then the following algorithm (A-R
algorithm) can be applied to generate the random variable X :
1 Generate y from g
2 Generate v from U(0, 1)
f (y )
3 If v < c(g (y )) , return x = y . Otherwise go back to 1.
28 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Acceptance-rejection method: generate N(0,1)

We note first that we may generate a random number y from


N(0, 1) by first generating a random number from
r
2 −x 2 /2
f (x) = e , x ≥ 0,
π
and then assign a random sign to the resulting number. Hence we
proceed by generating a random number from f using the A-R
algorithm.

29 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Acceptance-rejection method: generate N(0,1)


Generate from f :
Select g (x) = e −x , the pdf for the Exp(1) distribution.
the smallest constant C such that f (x) ≤ Cg (x). Such
Find p
C = 2e/π.
Generate X from Exp(1) and U from ∼ U(0, 1). The
f (X )
acceptance condition U ≤ Ce −X is then

q
2 −X 2 /2
πe X 2 +2X +1 (X −1)2
U≤p ⇐⇒ U ≤ e − 2 ⇐⇒ U ≤ e − 2
2e/πe −X
(X − 1)2 (X − 1)2
⇐⇒ ln(U) ≤ − ⇐⇒ − ln(U) ≥
2 2
Note that − ln U and X are independent and both Exp(1)
distributed.
30 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Acceptance-rejection method: generate N(0,1)

The preceding results provide the following algorithm (Normal


generation by the A-R algorithm):
1 Generate V1 and V2 independently from Exp(1).
(V2 −1)2
2 If V1 ≥ 2 go to 3), otherwise go back to 1).
3 Generate u from U(0, 1). If u ≥ 0.5, return V2 as the random
number from N(0, 1), otherwise return −V2 .
Exercise: write a function in R with the algorithm above.

31 / 45
Random number generation Generating random numbers Generating random vectors

Pseudo-random numbers

Sums or Convolutions

Let X be a random variable. Let X1 , . . . , Xn be i.i.d. R.V.s


such that ∀Xj ∼ X , j ∈ {1, . . . , n}. Let S = X1 + · · · + Xn .
The distribution function of the sum Sn is called the n-fold
∗(n)
convolution of X and is denoted FX .
We may generate a random number from S by computing the
random numbers from the components of S and summing
them.
Examples:
Simulate from Bin(n, p) by summing n i.i.d. Be(p) random
numbers.
Simulate from χ2 (n) by summing the squared n i.i.d. N(0, 1)
random numbers.
32 / 45
Random number generation Generating random numbers Generating random vectors

Table of Contents

1 Random number generation

2 Generating random numbers


Continuous distributions
Discrete distributions

3 Generating random vectors

33 / 45
Random number generation Generating random numbers Generating random vectors

Continuous distributions

Exponential distribution

We want to generate random numbers from X ∼ Exp(λ). The cdf


is
F (x) = 1 − e −λx , x ≥ 0.
We apply the inverse-transform method and solve for u = F (x) in
terms of x:
ln(1 − u)
u = 1 − e −λx ⇐⇒ ln(e −λx ) = ln(1 − u) ⇐⇒ x = −
λ

So F −1 (u) = − ln(1−u)
λ and we have the algorithm:
1 Generate u from U(0, 1)
2 Return x = − ln(1−u)
λ

34 / 45
Random number generation Generating random numbers Generating random vectors

Continuous distributions

Normal distribution

We may generate from Z ∼ N(µ, σ 2 ) by first generating


X ∼ N(0, 1) random variables and then applying the
transformation Z = µ + σX .
The N(0, 1) random number may be generated by the
Box-Müller approach or the A-R method described previously.

35 / 45
Random number generation Generating random numbers Generating random vectors

Continuous distributions

Gamma distribution

For a R.V. Y ∼ Ga(α, β), where α is integer-valued, we note


that, for i.i.d. Exp(β) R.V.s X1 , . . . , Xα , Y ∼ αi=1 Xi .
P

Hence we may generate a Ga(α, β) random number by


generating α i.i.d. Exp(β) random numbers.
For non-integer α, a method based on the A-R algorithm may be
used. Exercise: implement the above method in R.

36 / 45
Random number generation Generating random numbers Generating random vectors

Continuous distributions

Beta distribution
We have, for X ∼ Beta(α, β),

γ(α + β) α−1
f (x) = x (1 − x)β−1 , 0 ≤ x ≤ 1.
γ(α)γ(β)

Since the support is [0, 1] we may use the acceptance-rejection


method
If either α or β equals 1, the density is

f (x; β = 1) = αx α−1 , 0 ≤ x ≤ 1

or
f (x; β = 1) = β(1 − x)β−1 , 0 ≤ x ≤ 1
and the inverse-transform method can be used.
37 / 45
Random number generation Generating random numbers Generating random vectors

Continuous distributions

Beta distribution

Also note the relationship between two Ga-distributed R.V.s and


the beta distribution:
Proposition
For independent random variables Y1 ∼ Ga(α, 1) and
Y2 ∼ Ga(β, 1),

Y1
X = ∼ Beta(α, β).
Y1 + Y2

Hence we may generate random numbers from two independent


Ga-distributed R.V.s and retrieve a random number from a
Beta-distributed R.V.

38 / 45
Random number generation Generating random numbers Generating random vectors

Discrete distributions

Bernoulli distribution

We want to generate a random number from Be(p). The pmf is

p(x) = p x (1 − p)1−x , x = 0, 1.

Hence p(0) = 1 − p and p(1) = p. We thus have the following


algorithm
1 Generate u from U(0, 1)
2 If u ≤ p, return x = 1. Otherwise return x = 0.

39 / 45
Random number generation Generating random numbers Generating random vectors

Discrete distributions

Binomial distribution

The binomial distribution can be simulated from by considering a


sum of n independent bernoulli R.V.s. However, for large n this is
inefficient. Instead, the normal approximation to the binomial
distribution can be used for large n.
Let X ∼ Bin(n, p). Then, for large n, X ∼ N(np, np(1 − p)).
To obtain a better approximation continuity correction can be
applied, and so the N(np − 1/2, np(1 − p)) distribution is
considered.
Hence we can generate random numbers from Bin(n, p):
1 Generate z from N(0, 1).
p
2 Return x = max{0, bnp − 1/2 + z np(1 − p)c} as the
random number from Bin(n, p).

40 / 45
Random number generation Generating random numbers Generating random vectors

Discrete distributions

Geometric distribution
For a R.V. X ∼ Ge(p) the pmf is:

p(x) = p(1 − p)x−1 , x = 1, 2, . . . .

We may view X as the number of trials required until the first


success occurs in a series of independent bernoulli trials.
Proposition
If Y is a Exp(λ) R.V., where e −λ = 1 − p, then X = bY c + 1 has
the Ge(p) distribution.

Note that:

e −λ = 1 − p ⇐⇒ −λ = ln(1 − p) ⇐⇒ λ = − ln(1 − p)

Hence, if we generate a random number from Exp(− ln(1 − p)),


we can generate a random number from Ge(p).
41 / 45
Random number generation Generating random numbers Generating random vectors

Discrete distributions

Poisson distribution
For X ∼ Po(λ), the pmf is
e −λ λx
p(x) = , x = 0, 1, . . . ; 0 ≤ λ < ∞.
x!
We can interpret a Po(λ) R.V. as the maximum number of
independent Exp(λ) R.V.s whose sum does not exceed 1.
Proposition
Let Yj , j ∈ {1, . . . , n} be independent R.V.s such that
∀j, Yj ∼ Exp(λ). Then
n
X
X = max{n : Yj ≤ 1}
j=1

has the Po(λ) distribution.


42 / 45
Random number generation Generating random numbers Generating random vectors

Discrete distributions

Poisson distribution

Hence, to generate a Po(λ) R.V we generate Exp(λ) R.V.s until


the sum of them is larger than 1.
Exercise: implement the above method to generate Po(λ) in R.

43 / 45
Random number generation Generating random numbers Generating random vectors

Table of Contents

1 Random number generation

2 Generating random numbers

3 Generating random vectors

44 / 45
Random number generation Generating random numbers Generating random vectors

Multivariate normal distribution

A random d-vector x has the multivariate normal (MVN)


distribution, Nd (µ, Σ), if
 
−1/2 1 0 −1
f(x) = |2πΣ| exp − (x − µ) Σ (x − µ) .
2

Note that for y = Cx + b, y ∼ Nd (Cµ + b, CΣC0 ). Hence, for


µ = 0 and Σ = I we have Cz + b ∼ Nd (b, CC0 ), where
z ∼ Nd (0, I). As such, if we want to simulate from x ∼ Nd (µ, Σ),
Σ = CC0 , we can set x = Cz + µ. z can be simulated using
standard procedures for the normal distribution. Thus we only need
to find the matrix C which fulfills Σ = CC0 , which can be done by
the spectral decomposition, choleski factorization or the singular
value decomposition (in R, functions eigen, chol and svd).

45 / 45

Vous aimerez peut-être aussi