R Lecture 08

Random number generation Generating random numbers Generating random vectors
Lecture 8 - Random number generation in R
Björn Andersson (w/ Xijia Liu)

Department of Statistics, Uppsala University
February 26, 2015
1 / 45
Table of Contents
1 Random number generation

True random numbers
Pseudo-random numbers
Direct methods
Indirect methods
Composition methods
2 Generating random numbers
3 Generating random vectors
2 / 45
True random numbers
”True” random numbers
A sequence of ”true” random numbers is a sequence which is in

practice impossible to predict. That is, it is not derived only from a
computer algorithm. Examples:
Coin toss, throwing of a dice
Atmospheric data (http://www.random.org)
True random numbers derived from physical phenomena
which are indeterministic
”True” random numbers are typically used in situations when the
decisions based on the sequence are particularly important or where
security necessitates that the sequence is impossible to predict.
Gambling, encryption etc.
3 / 45
True random numbers
Actually true random numbers
A truly random sequence of numbers is one which is in principle

impossible to predict. Quantum mechanical properties are used to
derive the random numbers.
Radioactive decay (http://www.fourmilab.ch/hotbits)
Photons travelling through a semi-transparent mirror
These methods are slow and can contain biases from the
measuring instruments. However, if you want a truly random
decision - use these methods.
4 / 45
Pseudo-random number: definition
Uniform pseudo-random number generator (from Monte Carlo

Statistical Methods, Robert and Casella, 2004):
A uniform pseudo-random number generator is an algorithm
which, starting from an initial value u0 and a transformation
D, produces a sequence (ui ) = (D i (u0 )) of values in [0, 1].
For all n, the values (u1 , . . . , un ) reproduce the behaviour of
an i.i.d. sample of uniform random variables when compared
through a usual set of tests.
5 / 45
Why use pseudo-random numbers?
The ”true” random numbers are not practical to use since they
require physical phenomena to derive. For statistical applications
pseudo-random numbers are most often used. These are derived
from a computer algorithm and as such it is possible to predict a
sequence of these numbers.
are fast to calculate
have bias-properties which are easy to control
are reproducible
6 / 45
Congruential generators
Let m be a large integer and let b be another integer which is

smaller than m
Select an integer x0 between 1 and m. We call x0 the seed.
Note the modulo operation (modulus): a mod m equals the
remainder of a/m.
With the chosen seed x0 we have the following algorithm
(Lehmer):
1 Let xi = bxi−1 mod m
2 Set ui = xi /m as the pseudo-random number
3 Let i = i + 1
7 / 45
Congruential generator example
Let m = 11, b = 3 and x0 = 2. Then

x1 = 3 × 2 mod 11 = 6, u1 = 6/11 ≈ 0.545
x2 = 3 × 6 mod 11 = 7, u1 = 7/11 ≈ 0.636
x3 = 3 × 7 mod 11 = 10, u1 = 10/11 ≈ 0.909
x4 = 3 × 10 mod 11 = 9, u1 = 9/11 ≈ 0.727
x5 = 3 × 9 mod 11 = 6, u1 = 6/11 ≈ 0.182
8 / 45
Congruential generator in R code
> congruential <- function(m, b, x0, n){

+ out <- numeric(n)
+ xi <- x0
+ for(i in 1:n){
+ xi <- (b*xi) %% m
+ out[i] <- xi/m
+ }
+ return(out)
+ }
9 / 45
Congruential generator parameters
The integers m and b must be chosen properly:

The integer m should be large and not be divisible by b;
choose prime numbers
The value of b is often chosen to be near the square root of m
and should ensure that the cycle length is actually m.
10 / 45
Uniform random numbers in R
The default R generator of uniform pseudo-random numbers is the

Mersenne twister. It has a period of 219937 − 1 and passes most
randomness tests.
runif(n, min = 0, max = 1)
11 / 45
Note: choosing the seed
The seed determines which pseudo-random numbers you will

obtain.
If you want a less predictable random sequence you may
choose the seed based on the current time or some other
related method.
To get reproducible results, choose the random seed yourself
and remember it.
The R function set.seed() specifies the seed
> set.seed(1)
> runif(1)
[1] 0.2655087
12 / 45
Inverse-transform method
Let X be a random variable with cdf F . Since F is a

non-decreasing function, the generalized inverse F −1 may be
defined as
F −1 (y ) = inf{x : F (x) ≥ y }, 0 ≤ y ≤ 1.
The operator inf A means the greatest lower bound of the set A.
Similarly, sup A means the least upper bound of the set A.
Proposition (Probability integral transform)
If U ∼ U(0, 1), then the random variable F −1 (U) has cdf F .
13 / 45
Inverse-transform method
Inverse-transform algorithm:
1 Generate u from ∼ U(0, 1)
2 Return x = F −1 (u).
Thus we may generate random numbers from F by generating a
random number u from the standard uniform distribution and then
calculating the inverse of F and evaluate it with u as argument.
14 / 45
Inverse-transform method: example
The object is to generate pseudo-random numbers from

X ∼ U(a, b). The cdf of X is given by
x −a
F (x) = .
b−a
We solve u = F (x) in terms of x:
x −a
u = F (x) ⇐⇒ u = ⇐⇒ x = (b − a)u + a.
b−a
Hence we retrieve
F −1 (u) = (b − a)u + a.
15 / 45
Inverse-transform method: example i R
> genunif <- function(a, b){

+ u <- runif(1)
+ return((b - a) * u + a)
+ }
> set.seed(1)
> genunif(-2, 2)
[1] -0.9379653
> set.seed(1)
> runif(1, -2, 2)
[1] -0.9379653
16 / 45
Transform method
Sometimes it’s not possible to obtain a closed form expression

of the inverse function of the cdf desired, e.g. chi-square, F, t
distributions etc.
However, these distributions may be related to other random
variables which can be generated easily.
Hence, first X can be generated and then a transformation of
X provides the random variable desired.
17 / 45
18 / 45
Transform method: Box and Müller
A transform method to generate standard normal random variables

X and Y .
1 Generate U1 and U2 from ∼ U(0, 1)
2 Make the transformation:
X = (−2 ln U1 )1/2 cos(2πU2 )
Y = (−2 ln U1 )1/2 sin(2πU2 )
Such X and Y are independent N(0, 1) random variables.
19 / 45
Transform method: Box and Müller

> BoxMul <- function(n){
+ u <- runif(n)
+ x <- sqrt(-2 * log(u[1:(n/2)])) *
+ cos(2 * pi * u[(n/2 + 1):n])
+ y <- sqrt(-2 * log(u[1:(n/2)])) *
+ sin(2 * pi * u[(n/2 + 1):n])
+ return(c(x, y))
+ }
> xy <- BoxMul(10000)
> shapiro.test(xy[1:5000])$p.value
[1] 0.478975
> shapiro.test(xy[5001:10000])$p.value
[1] 0.07263341
> cor(xy[1:5000], xy[5001:10000])
[1] -0.00960596 20 / 45
Transform method: example
We want to generate Z from ∼ N(µ, σ 2 ). However, we do not

need to work with the cdf of Z directly, since for X ∼ N(0, 1) with
density fX (x), Z = µ + σX .
> set.seed(1)
> z1 <- 2 + 2 * rnorm(10)
> set.seed(1)
> z2 <- rnorm(10, 2, 2)
> z1 - z2
[1] 0 0 0 0 0 0 0 0 0 0
21 / 45
Acceptance-rejection method
Noting that we can write a density fX as

Z fX (x)
u=f (x)
du = u|u=0X = fX (x)
0
we find that fX can be seen as the marginal density of the joint

distribution (X , U) ∼ U{(x, u) : 0 < u < fX (x)}. We can then
generate from the joint distribution by generating random uniform
numbers on the constrained set {(x, u) : 0 < u < fX (x)}. And,
since the marginal distribution of X is fX , by generating a uniform
variable on {(x, u) : 0 < u < fX (x)} we have generated a random
variable from fX . Hence simulating X ∼ fX (x) is equivalent to
simulating (X , U) ∼ U{(x, u) : 0 < u < fX (x)}.
22 / 45
Rb
As an example, assume that a fX (x)dx = 1 and that fX is
bounded by c. We can then simulate the random pair
(Y , U) ∼ U(0 < u < c) by simulating Y ∼ U(a, b) and
U|Y = y ∼ U(0, c) and then only accept the pair as an
instantiation from X if 0 < u < fX (y ) is satisfied.
23 / 45
Naive acceptance-rejection method
The preceding motivates a method of generating random numbers.

We wish to generate random numbers from density f .
Assume that the target pdf f is bounded on a finite interval
[a, b] and is zero outside this interval.
Let c = sup{f (x) : x ∈ [a, b]}.
Under these conditions we have the following algorithm (Naive A-R
algorithm):
1 Generate U from U(a, b)
2 Generate V from U(0, c)
3 If V ≤ f (U), return X = U. Otherwise, return to 1.).
24 / 45
Naive acceptance-rejection method: example
We want to generate random numbers from X ∼ Beta(α, β),

where α, β > 1. We note that for such α, β,

α−1
sup{fX (x) : x ∈ [0, 1]} = fX .
α+β−2
We proceed as follows:
1 Generate U from U(0, 1)

α−1
2 Generate V from U 0, fX α+β−2
3 If V ≤ fX (U), return X = U. Else, return to 1.
25 / 45
Naive acceptance-rejection method: R code
> randbeta <- function(n, alpha, beta){

+ mode <- (alpha - 1) / (alpha + beta - 2)
+ c <- dbeta(mode, alpha, beta)
+ betanumber <- function(alpha, beta, c) {
+ repeat{
+ u <- runif(1)
+ v <- runif(1, 0, c)
+ if(v <= dbeta(u, alpha, beta)) break
+ }
+ return(u)
+ }
+ return(replicate(n, betanumber(alpha, beta, c)))
+ }
26 / 45
Naive acceptance-rejection method: notes
Things to note:
The probability of acceptance is
1
P(accept) = ,
c(b − a)
that is, the percentage of the area covered by the pdf of the
desired R.V. Hence, on average we need to repeat the
procedure c(b − a) times to retrieve one such number.
The domain of the desired density must be a fixed interval
otherwise we can’t produce the corresponding uniformly
distributed random variable.
27 / 45
The efficiency of the method can be improved by making an
adjustment to the algorithm. Again, we want to generate random
numbers from a density f . Let X and Y be random variables with
density functions f and g . Assume that we can readily generate
random numbers from g . Assume also that there is a constant c
such that
f (t)
c = sup < ∞,
g (t)
for all t such that f (t) > 0. Then the following algorithm (A-R
algorithm) can be applied to generate the random variable X :
1 Generate y from g
2 Generate v from U(0, 1)
f (y )
3 If v < c(g (y )) , return x = y . Otherwise go back to 1.
28 / 45
Acceptance-rejection method: generate N(0,1)
We note first that we may generate a random number y from

N(0, 1) by first generating a random number from
r
2 −x 2 /2
f (x) = e , x ≥ 0,
π
and then assign a random sign to the resulting number. Hence we
proceed by generating a random number from f using the A-R
algorithm.
29 / 45

Generate from f :
Select g (x) = e −x , the pdf for the Exp(1) distribution.
the smallest constant C such that f (x) ≤ Cg (x). Such
Find p
C = 2e/π.
Generate X from Exp(1) and U from ∼ U(0, 1). The
f (X )
acceptance condition U ≤ Ce −X is then
q
2 −X 2 /2
πe X 2 +2X +1 (X −1)2
U≤p ⇐⇒ U ≤ e − 2 ⇐⇒ U ≤ e − 2
2e/πe −X
(X − 1)2 (X − 1)2
⇐⇒ ln(U) ≤ − ⇐⇒ − ln(U) ≥
2 2
Note that − ln U and X are independent and both Exp(1)
distributed.
30 / 45
The preceding results provide the following algorithm (Normal

generation by the A-R algorithm):
1 Generate V1 and V2 independently from Exp(1).
(V2 −1)2
2 If V1 ≥ 2 go to 3), otherwise go back to 1).
3 Generate u from U(0, 1). If u ≥ 0.5, return V2 as the random
number from N(0, 1), otherwise return −V2 .
Exercise: write a function in R with the algorithm above.
31 / 45
Sums or Convolutions
Let X be a random variable. Let X1 , . . . , Xn be i.i.d. R.V.s

such that ∀Xj ∼ X , j ∈ {1, . . . , n}. Let S = X1 + · · · + Xn .
The distribution function of the sum Sn is called the n-fold
∗(n)
convolution of X and is denoted FX .
We may generate a random number from S by computing the
random numbers from the components of S and summing
them.
Examples:
Simulate from Bin(n, p) by summing n i.i.d. Be(p) random
numbers.
Simulate from χ2 (n) by summing the squared n i.i.d. N(0, 1)
random numbers.
32 / 45
Table of Contents

Continuous distributions
Discrete distributions
33 / 45
Exponential distribution
We want to generate random numbers from X ∼ Exp(λ). The cdf

is
F (x) = 1 − e −λx , x ≥ 0.
We apply the inverse-transform method and solve for u = F (x) in
terms of x:
ln(1 − u)
u = 1 − e −λx ⇐⇒ ln(e −λx ) = ln(1 − u) ⇐⇒ x = −
λ
So F −1 (u) = − ln(1−u)
λ and we have the algorithm:
1 Generate u from U(0, 1)
2 Return x = − ln(1−u)
λ
34 / 45
Normal distribution
We may generate from Z ∼ N(µ, σ 2 ) by first generating

X ∼ N(0, 1) random variables and then applying the
transformation Z = µ + σX .
The N(0, 1) random number may be generated by the
Box-Müller approach or the A-R method described previously.
35 / 45
Gamma distribution
For a R.V. Y ∼ Ga(α, β), where α is integer-valued, we note

that, for i.i.d. Exp(β) R.V.s X1 , . . . , Xα , Y ∼ αi=1 Xi .
P
Hence we may generate a Ga(α, β) random number by

generating α i.i.d. Exp(β) random numbers.
For non-integer α, a method based on the A-R algorithm may be
used. Exercise: implement the above method in R.
36 / 45
Beta distribution
We have, for X ∼ Beta(α, β),
γ(α + β) α−1
f (x) = x (1 − x)β−1 , 0 ≤ x ≤ 1.
γ(α)γ(β)
Since the support is [0, 1] we may use the acceptance-rejection

method
If either α or β equals 1, the density is
f (x; β = 1) = αx α−1 , 0 ≤ x ≤ 1
or
f (x; β = 1) = β(1 − x)β−1 , 0 ≤ x ≤ 1
and the inverse-transform method can be used.
37 / 45
Beta distribution
Also note the relationship between two Ga-distributed R.V.s and

the beta distribution:
Proposition
For independent random variables Y1 ∼ Ga(α, 1) and
Y2 ∼ Ga(β, 1),
Y1
X = ∼ Beta(α, β).
Y1 + Y2
Hence we may generate random numbers from two independent

Ga-distributed R.V.s and retrieve a random number from a
Beta-distributed R.V.
38 / 45
Bernoulli distribution
We want to generate a random number from Be(p). The pmf is
p(x) = p x (1 − p)1−x , x = 0, 1.
Hence p(0) = 1 − p and p(1) = p. We thus have the following

algorithm
1 Generate u from U(0, 1)
2 If u ≤ p, return x = 1. Otherwise return x = 0.
39 / 45
Binomial distribution
The binomial distribution can be simulated from by considering a

sum of n independent bernoulli R.V.s. However, for large n this is
inefficient. Instead, the normal approximation to the binomial
distribution can be used for large n.
Let X ∼ Bin(n, p). Then, for large n, X ∼ N(np, np(1 − p)).
To obtain a better approximation continuity correction can be
applied, and so the N(np − 1/2, np(1 − p)) distribution is
considered.
Hence we can generate random numbers from Bin(n, p):
1 Generate z from N(0, 1).
p
2 Return x = max{0, bnp − 1/2 + z np(1 − p)c} as the
random number from Bin(n, p).
40 / 45
Geometric distribution
For a R.V. X ∼ Ge(p) the pmf is:
p(x) = p(1 − p)x−1 , x = 1, 2, . . . .
We may view X as the number of trials required until the first

success occurs in a series of independent bernoulli trials.
Proposition
If Y is a Exp(λ) R.V., where e −λ = 1 − p, then X = bY c + 1 has
the Ge(p) distribution.
Note that:
e −λ = 1 − p ⇐⇒ −λ = ln(1 − p) ⇐⇒ λ = − ln(1 − p)
Hence, if we generate a random number from Exp(− ln(1 − p)),

we can generate a random number from Ge(p).
41 / 45
Poisson distribution
For X ∼ Po(λ), the pmf is
e −λ λx
p(x) = , x = 0, 1, . . . ; 0 ≤ λ < ∞.
x!
We can interpret a Po(λ) R.V. as the maximum number of
independent Exp(λ) R.V.s whose sum does not exceed 1.
Proposition
Let Yj , j ∈ {1, . . . , n} be independent R.V.s such that
∀j, Yj ∼ Exp(λ). Then
n
X
X = max{n : Yj ≤ 1}
j=1
has the Po(λ) distribution.

42 / 45
Poisson distribution
Hence, to generate a Po(λ) R.V we generate Exp(λ) R.V.s until

the sum of them is larger than 1.
Exercise: implement the above method to generate Po(λ) in R.
43 / 45
Table of Contents
44 / 45
Multivariate normal distribution
A random d-vector x has the multivariate normal (MVN)

distribution, Nd (µ, Σ), if

−1/2 1 0 −1
f(x) = |2πΣ| exp − (x − µ) Σ (x − µ) .
2
Note that for y = Cx + b, y ∼ Nd (Cµ + b, CΣC0 ). Hence, for

µ = 0 and Σ = I we have Cz + b ∼ Nd (b, CC0 ), where
z ∼ Nd (0, I). As such, if we want to simulate from x ∼ Nd (µ, Σ),
Σ = CC0 , we can set x = Cz + µ. z can be simulated using
standard procedures for the normal distribution. Thus we only need
to find the matrix C which fulfills Σ = CC0 , which can be done by
the spectral decomposition, choleski factorization or the singular
value decomposition (in R, functions eigen, chol and svd).
45 / 45

R Lecture 08

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

R Lecture 08

Transféré par

Droits d'auteur :

Formats disponibles

Random number generation Generating random numbers Generating random vectors

Lecture 8 - Random number generation in R

Björn Andersson (w/ Xijia Liu)

February 26, 2015

1 Random number generation

2 Generating random numbers

3 Generating random vectors

True random numbers

”True” random numbers

A sequence of ”true” random numbers is a sequence which is in

True random numbers

Actually true random numbers

A truly random sequence of numbers is one which is in principle

Pseudo-random number: definition

Uniform pseudo-random number generator (from Monte Carlo

Why use pseudo-random numbers?

Let m be a large integer and let b be another integer which is

Congruential generator example

Let m = 11, b = 3 and x0 = 2. Then

Congruential generator in R code

> congruential <- function(m, b, x0, n){

Congruential generator parameters

The integers m and b must be chosen properly:

Uniform random numbers in R

The default R generator of uniform pseudo-random numbers is the

runif(n, min = 0, max = 1)

Note: choosing the seed

The seed determines which pseudo-random numbers you will

Let X be a random variable with cdf F . Since F is a

Inverse-transform method: example

The object is to generate pseudo-random numbers from

Inverse-transform method: example i R

> genunif <- function(a, b){

Sometimes it’s not possible to obtain a closed form expression

Transform method: Box and Müller

A transform method to generate standard normal random variables

Transform method: Box and Müller

Transform method: example

We want to generate Z from ∼ N(µ, σ 2 ). However, we do not

Noting that we can write a density fX as

we find that fX can be seen as the marginal density of the joint

Naive acceptance-rejection method

The preceding motivates a method of generating random numbers.

Naive acceptance-rejection method: example

We want to generate random numbers from X ∼ Beta(α, β),

Naive acceptance-rejection method: R code

> randbeta <- function(n, alpha, beta){

Naive acceptance-rejection method: notes

Acceptance-rejection method: generate N(0,1)

We note first that we may generate a random number y from

Acceptance-rejection method: generate N(0,1)

Acceptance-rejection method: generate N(0,1)

The preceding results provide the following algorithm (Normal

Let X be a random variable. Let X1 , . . . , Xn be i.i.d. R.V.s

1 Random number generation

2 Generating random numbers

3 Generating random vectors

We want to generate random numbers from X ∼ Exp(λ). The cdf

We may generate from Z ∼ N(µ, σ 2 ) by first generating

For a R.V. Y ∼ Ga(α, β), where α is integer-valued, we note

Hence we may generate a Ga(α, β) random number by

Since the support is [0, 1] we may use the acceptance-rejection

Also note the relationship between two Ga-distributed R.V.s and