Académique Documents
Professionnel Documents
Culture Documents
Topic 7
Contents
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
7.2 The Linear Congruential Method . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7.3 Tests For Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7.3.1 Frequency Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.3.2 Runs Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.3.3 Poker Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.4 Random Variate Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.4.1 Inverse Transform Method . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.4.2 Acceptance/Rejection Method . . . . . . . . . . . . . . . . . . . . . . . 10
7.5 Empirical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 TOPIC 7. RANDOM NUMBER GENERATION
7.1 Introduction
In this unit we talk about random numbers. In particular, we will address the following
questions:
;
01314 51333 6
Ideally, you have looked for patterns in these sets of numbers, or you might have thought
about how these numbers where created.
The data of the previous activity were generated as follows:
0;
my office phone number;
random data from a normal distribution;
random data from an exponential distribution;
the first 1250 digits of .
Let us first try to figure out what random means. Possible synonyms are typical,
representative, arbitrary, accidental, scrambled, aimless, patternless, senseless,
causeless, independent, incidental, indeterminate, indescribable, undirected,
uncontrolled, uncertain, unrelated, unordered, unpredictable.
An informal definition of randomness could be the following: Something is random if it is
not possible to describe it more briefly than to reproduce it. To be more precise we will
use the following working definition for random numbers:A (long) sequence of numbers
is random if the shortest computer program printing out the sequence has about the size
(length) of the sequence.
A formal definition of randomness for infinite sequences is the following:An infinite
sequence of numbers is random if, for every block length , the probability that a
particular block of length is observed is . !
For example for a sequence of numbers between 1 and 6 to be random it is not enough
that each number occurs with probability 1/6. We also require that each tuple of
numbers11, 12, 14, 15, 16, 21, 22, 23, 24, 25, 26, 31,..., 66 occurs with probability
1/36, and that each triple of numbers like 111, 123, 461,... occurs with probability 1/216,
etc.
We first consider random numbers for the uniform distribution on the interval [0,1]. Recall
that for this distribution mean and variance are given by
In practice we will have to apply tests to sequences of such numbers to check whether
they are sufficiently random or not.
There are also some more practical issues, even if we accept that we can only generate
pseudo-random numbers:
Example
Problem:
We consider the values D\A @ ] , XL A+Y , = A_^`aA&b
c and F-O = 1, 2, 3, 4.
Solution: d
Y @ ^ ] c e b f g h @Y @@ @ ^ @ ] @ c @ e @ b
FOA @ @ @ ] c@ ^@ @f ^h ef ] f ] ] c e h e ] ch b @ ^e e @
F-OaA&^ ^ ^b @ g c^ ]c eg e
Y @Y ^
F-OaA ] ] ]h eh b ] e@ ^] c] cGf ] e f ^f ] @ @ h e e @@ @ e ]
F-OaAic c e^ ] b ^
Y c
These numbers show how vital a good choice of the parameters is. Also note the large
gaps between the pseudo-random numbers. Even with the period length of 16 the gaps
between the random numbers are quite large.
a particular digit, and then uses the Kolmogorov-Smirnov test to compare with the
expected size of gaps.
l Poker test.Treats numbers grouped together as a poker hand. These hands are
compared to what is expected using the chi-square test.
We will treat some of these tests with the following set of data, created using the C++
nnn
The first test is a frequency test. We sub-divide the interval [0,1] into intervals S tm
of equal length (0,0.1), ,(0.9,1) and count the number of observed numbers in each
interval . The statistics Qu -EI2
has a chi-square distribution with 10 - 1
y
degrees of freedom. Here
is the expected number in each interval, which is 2 K
.
¡ ¢ t vnpwt w
We test the null hypothesis that the data are from a uniform distribution. The hypothesis
is accepted if .
t z x q y v r o w tm
We find the following data
t q w o z x o y y y
xnpz mnpz xnpz t
npo t
npo mnpo t
npo m m m
-EIII t znpo
and 2 . Thus, the hypothesis is accepted.
This frequency test is usually applied if there is enough data so that each £¥¤ y .
7.3.2 Runs Test
The first runs test is about ups and downs, i.e., whether the next number in the list
of random numbers is above or below the current number. The following sequence is
obtained from our data set.
+ - - + - + + - - + - + - + -
+ - + - - + - - + - + - + - -
- + + + - + - - + - + - - + -
+ - - +
¦ §
©ª¬« ¯® ¢ ¢
A consecutive sequence of ’s or ’s is called a run. If the sequence is truly random
the mean and variance of the number of runs are given by = 2N3- 1 and ¨
°
where is the total number of random numbers. If is sufficiently large this distribution °
of runs is approximately normal, and we can use standard hypothesis testing.
We test the null hypothesis that the data come from a uniform normal distribution at
Ä´´
100
90
% will have different digits.
îî22í í Ù Ò where ï é
100
We use again the chi-square test with random variable ç Æ ³¬èiéuê Éáë Æ Öì-íE×I
are the observed frequencies, and ð é the expected frequencies.
ï é ð é Öì-íE×Iî2î2í Âí ٠Ò
With our data of 50 random numbers we find
method.
Figure 7.1
Taking
the derivative of ô with respect to ù shows that the density function is indeed
ù ý\þÿù þ
if
ý else
This result yields the following algorithm for generating random numbers for a probability
distribution with density function [õ÷ó[ø
and cumulative distribution ô7õ÷óüø ú
: -õ÷óüø
Draw uniformly random numbers between 0 and 1.
Substitute into ú+ô7õ÷óø and solve for .ó
In practice the last step is often difficult or even impossible and we have to look for
approximations of the function ó üú+ô õ Ñø
.
c H ERIOT-WATT U NIVERSITY 2004
7.4. RANDOM VARIATE GENERATION 9
Examples
1.
Problem:
We look at the exponential distribution with density function
2.
Problem:
We look at the uniform distribution on the interval \ ]^)_)` . We generated uniformly random
numbers P on the unit interval \ 3 ^ L ` . To get random numbers for the uniform distribution
c <d if a e " ef_
on \ ]^)_)` recall that the density function for this distribution is a b 3 so that
else
5 =
the cumulative distribution is 0 if " eg] , 1 if "h1 _ , and "#76ij> e "#76M8 <= >?#W@k>l6
8 : c @>76 : c <d for a e x e b.
d <d b <d
m
c H ERIOT-WATT U NIVERSITY 2004
10 TOPIC 7. RANDOM NUMBER GENERATION
Solution:
So we solve n2oqp+t rs for u and get uvoxwzyM{}|H~Ywn . Thus drawing random numbers
rs
n for the uniform distribution on
and calculating uowyT{}|H~Ywn gives random
numbers for the uniform distribution on the interval w
)|) .
The above mentioned technique works well if the cumulative distribution has a closed
form and if we can calculate the inverse of this function, i.e., solve for u . However, for
many important distributions, notably the normal distribution, this is not possible. In
practice we have to approximate the function uoE r {nk . A simple and reasonably
good approximation for the inverse of the cumulative distribution of the standard normal
distribution is the function uo r {nko-A / r -r -A / . The approximation gives at
least one correct decimal of the inverse for n taking¢¡?values £W¤ between 0.00135 and 0.9986
(see Schmeiser, Approximations to the inverse cumulative normal function for use on
hand calculators, Applied Statistics 28:179-176, 1979).
Another (much better) approximation (Odeh, Evans, Algorithm AS 70: percentage points
of the normal distribution, Applied Statistics 23:96-97, 1974) is
u¥o¦§y©¨z«ª ¬ ® «¯ « where ¦Vo³² ~H´µ·¶;{WQ~¸nk and ¹ ,º are the following values:
¨ ª«°¬ ± «¯ «
¹ oT~H»½¼I´I´I´I¼I´¾¿ÀÁIÁ º o »ÂÃIÃI¼¾kÁ¾kÄI´+Ä
¹ oT~Å º o»½ÆIÁIÁIÆIÁÆÇ+¾kÃIÆ
¹ÈoT~É»½¼¾k´I´¾k´+ÁIÁIƾÊÇ ºÈo»½ÆI¼IÀ¼¾kÄI´I¼IÄIÄ
¹ËoT~ɻ´++¾k´I¼´À´¾kÆ ºËÉo»·À¼IÆI¼ÇIÇ+ÆI´IÁIÆ
¹ÌoM~É»ÂIII+¾kÆI¼Iľk´I´À¿Í¾kÁ ºÀÌÉo»ÂI¼IÁIÆIÄ+kÇIÄI¼¾
This approximation is accurate up to about 6 digits for »½ÆÏÎ ngÎ . Symmetry of
the normal distribution allows to extend this to the interval ÐÎ nÑÎ 0.5 using the
transformation nÓÒÔ ~Õn and uSÒÔ ~Õu .
ß
c H ERIOT-WATT U NIVERSITY 2004
7.4. RANDOM VARIATE GENERATION 11
We note that the literature shows various modifications of this algorithm to increase
speed.
Examples
1.
Problem:
We consider the density function
çéèSê â â ì ë
àáâã7äæå ë 0 ì
í
else
where the graph is simply a straight line.
ÿ
c H ERIOT-WATT U NIVERSITY 2004
12 TOPIC 7. RANDOM NUMBER GENERATION
? a r
0.9501 0.9931 n r
0.2311 1.7995 y a
0.6068 1.6433 y a
0.4860 1.2898 y a
0.8913 1.6359 n r
0.7621 1.3205 n r
0.4565 0.6839 n r
0.0185 0.5795 y a
0.8214 0.6824 n r
0.4447 1.0682 y a
0.6154 1.4542 y a
0.7919 0.6189 n r
0.9218 1.1361 n r
0.7382 0.7408 n r
0.1763 1.4055 y a
0.4057 0.8898 y a
0.9355 1.3891 n r
0.9169 1.2426 n r
0.4103 1.0931 y a
0.8936 1.5896 n r
Thus, a sequence of random numbers for the density is
Of course, it would be much easier to use the inverse transform technique for this simple
example.
2.
Problem:
*
,+.- /4365
+ 0- / $21 $
Consider the density function !#"%$'&)( which is plotted in
else
Figure 7.4.
7
c H ERIOT-WATT U NIVERSITY 2004
7.4. RANDOM VARIATE GENERATION 13
EGF ?HJI
:
H ILKNM
EGF ?HPO Q9 ? K :';CR S
H I KNM HI T K M
U
c H ERIOT-WATT U NIVERSITY 2004
14 TOPIC 7. RANDOM NUMBER GENERATION
c H ERIOT-WATT U NIVERSITY 2004
7.4. RANDOM VARIATE GENERATION 15
2%
' n if
D
~
- 0.7495x + 1.5736 if N
else
%
D
This function will play the role of g in the algorithm. Note that is not a density
%
D
¡
function. Even though it is everywhere positive it does not satisfy
Indeed, that integral is just the area under the piece-wise linear function , which is
¢ ¢¥¤P¦ ¢¥ª
2%
'
£
©¨
§
G¨®
n P
¤ ¬
¦ « B
¤P¦ ª
n
ª¯
B
ª¯ ¨°
²± ª
¯ ¯ ¤P¦
¯ « ¯
¯ ¯¤ ¦
n
ª B 2
ª C¨)
6
« «
«
³
%
'µ·¶ 2%
'
Call this value ´ (for area), then ¸ is a density function and for ´ we have
%
D %
D
that g dominates ¹ .
Thus the algorithm becomes as follows:
º Draw » uniformly on 6
¼ .
º Draw ½ with density .
º If » ¹ ½ ]¾ g ½ ² ¹ ] ¾2
½ ½ then accept ½ , otherwise reject ½ .
To draw ½ with density let us use the inverse transform method.
%
D ¶ %
D
With ¶,¿ ª _ À] and the cumulative distribution
¢ ¤
Á %
'² %Â]Â
we have
2
%
D if G
- 0.6285x + 1.3070 if N
else
depicted in Figure 7.7:
Ã
c H ERIOT-WATT U NIVERSITY 2004
16 TOPIC 7. RANDOM NUMBER GENERATION
1.4
1.2
0.8
0.6
0.4
0.2
0
-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
ÉÔÅ%Æ'ÇËÊÕ×Ö Ä2Å%Ú]ÇÛÚ
ØÙ
Ê Õ Ü Ö ÎÝÞßÑÑÚÛÚ
ÎÝÞßÑÑ
Ê Úàá ÖÜ
Ñ á
á
ÊlÌÝâãÞÎäÆ à á
If Æ Ü ÍÒÆ£ÍNÑ then
ñ
c H ERIOT-WATT U NIVERSITY 2004
7.4. RANDOM VARIATE GENERATION 17
6
c H ERIOT-WATT U NIVERSITY 2004
18 TOPIC 7. RANDOM NUMBER GENERATION
Example
Problem:
Suppose that data is collected about repair times (in hours):
<
c H ERIOT-WATT U NIVERSITY 2004
7.5. EMPIRICAL DISTRIBUTIONS 19
Figure 7.10
If we need more random numbers we will use a more systematic approach.
If =0>@?A>@=*BDC the cumulative probability function is EGFIH
J K
LMNH ?PFQ=*BDRS5? or ?TFVUWBDR*UE .
Here E takes values between 0 (?XFY= ) and 0.31 (?ZF[=*BDC ).H
J O
MNH
If =*BDC1>\?]>^U the cumulative function has slope H
J _LMNH
J K
L F`=*BDS and the intercept is
LJ H
MNthe
found by solving the equation =*BbacUdF]=*BDSdefUWBg=ihkj using H
J O fact that the point l
UWBg=*m=*BbacUon
is on the line. Thus, the intercept is =*BDS*U and the cumulative function has the equation
EFY=*BDS5?hk=*BDS*U and E takes values between 0.31 and 0.41. The inverse is the function
U
?XF l2Eqpr=*BDS*Uon
= BDS
*
FiC5EsptUWBg=uC
In general, if we have a line through l2? mE n and l2? mE n then the slope of the line is y vw M vy x
H H L L y wyM x
and the intercept is found by solving, for example, E F\y vw M yvx ? hzj so that j{F w|y vx M y xvw
H wM x H wM x
This way we get the following table describing the inverse of the cumulative distribution:
=*Bg==->}E~>t=*BD*U ? F WU BDR*UE
=*BD*Ud>}E~>t=*BbacU ? F C5Eqp1UWBg=uC
=*BbacUd>}E~>t=*BDRR ? F S5EihA=*BU+
=*BDRR>}E~> UWBg== ? F UWBbafEdhk=*BDC
c H ERIOT-WATT U NIVERSITY 2004
20 TOPIC 7. RANDOM NUMBER GENERATION
Figure 7.11
This function can now be used to generate random numbers for our empirical distribution
using the inverse transform method.
7.6 Summary
Complex systems are often simulated using computers. For this we need random
numbers distributed according to various distributions.
One basic problem arises that random numbers generated by a computer are
only pseudo-random. Nevertheless we have to design methods to generate such
numbers which come close to being random.
In practice it is enough to implement a random number generator for the uniform
distribution on the unit interval *+
. Usually the linear congruential method is
used. Random numbers for other distributions can be generated using the inverse
transform technique. This technique also applies to empirical distributions.
In practice the acceptance/rejection method is often used to generate random
numbers for density functions where the inverse transform method does not apply.
Tests are employed to check whether sequences of numbers can be considered
random or not.
c H ERIOT-WATT U NIVERSITY 2004
GLOSSARY 21
Glossary
acceptance/rejection method
A method to generate random numbers for a density by using random numbers
from another density such that ~ for some constant . Some of the random
numbers for are accepted to become random numbers for , some are rejected.
frequency test
The frequency test uses the chi-square distribution to test how the frequency
distribution of a sequence of pseudo-random numbers compares to random
numbers of the uniform distribution.
poker test
The poker test for sequence of pseudo random numbers treats numbers grouped
together as a poker hand. These hands are compared to what is expected using
the chi-square test.
random numbers
An infinite sequence of numbers is random if, for every block length ¢ , the
probability that a particular block of length ¢ is observed is o£ ¢ .
runs test
The simple runs tests test for ups and downs, or for values above and below the
mean in a sequence of pseudo-random numbers.
seed
The seed is the starting value in a method to generate random numbers.
¤
c H ERIOT-WATT U NIVERSITY 2004