Vous êtes sur la page 1sur 50

CPSC 531

Systems Modeling and Simulation


Review
2
Independent Events
Independent events are those that dont have
any effect on each other. That is, knowing one of
them occurs does not provide any information
about the occurrence of the other event.
Mathematically, A and B are independent if
P(A|B) = P(A) and P(B|A) = P(B)
From conditional probability definition, we have
P(AB) = P(A|B)P(B). Therefore, if A and B are
independent events, P(AB) = P(A)P(B)
3
Law of Total Probability
Let B
1
, B
2
, B
3
, , B
k
be mutually disjoint and collectively
exhaustive events from the sample space S. Then, for any
event A in S, we have

=
=
k
j
j j
B A P B P A P
1
) | ( ) ( ) (
Explanation
A = (B
1
A)U (B
2
A)U(B
k
A).
The (B
j
A)s are disjoint
events. Therefore, using
laws of conditional
probability we get:
k j B P
B A P B P A B P A P
j
k
j
j j
k
j
j
,..., 1 for 0 ) ( if
, ) | ( ) ( ) ( ) (
1 1
= >

=
= =
B
1
B
2
B
3
B
4
A
4
Bayes Theorem
Partition: The events B
1
, B
2
, B
3
, , B
k
form a
partition of a set S if they are mutually disjoint
and
S B
k
i i
=
=
U
1
Bayes Theorem: Suppose that B
1
, B
2
, B
3
, , B
k
form a partition of sample space S such that
P(B
j
)> 0 for j = 1, , k. Let A be an event in S such
that P(A)>0. Then, for i = 1, , k,

=
=
k
j
j j
i i
i
B A P B P
B A P B P
A B P
1
) | ( ) (
) | ( ) (
) | (
5
Random Variables
A random variable is a real-valued mapping that
assigns a numerical value to each possible
outcome of an experiment.
Consider arrival of jobs at a CPU. Let X be the
number of jobs that arrive per unit time. X is a
random variable that can take the values
{0,1,2,}.
6
Discrete Random Variables and
PMF
A random variable X is said to be discrete if the number of
possible values of X is finite, or at most, an infinite
sequence of different values.
Discrete random variables are characterized by the
probabilities of the values attained by the variable. These
probabilities are referred to as the Probability Mass
Function (PMF) of X. Mathematically, we define PMF as:

=
=
= =
= =
x s X
X
s P
x s X s P
x X P x p
) (
) (
}) ) ( | ({
) ( ) (
7
Properties of PMF and CDF
1 ) ( 0
) (
) (
) ( ) (
1 ) ( 1 ) (
, 1 ) ( 0

=
=
< =
= =


x F
x p
t X P
t X P t F
x p or x p
x x p
X
t x
X
X
i
i X
x
X
X
: CDF
: PMF
8
Expectation
Definition: weighted average of possible values
of X.
c
i
s are constants
works even if X
i
s are not independent

= =
x
X
x p x X E ) ( ] [

= =
=
=
n
i
i i
n
i
i i
X E c X c E
X cE cX E
1 1
] [ ] [
] [ ] [
9
Binomial Random Variable
Consider n Bernoulli trials, where each trial can result
in a success with probability p. The number of
successes X in such a n-trial sequence is a binomial
random variable.
The PMF for this random variable is given by:
where p is the probability of success of a Bernoulli trial.
E[X] = np

'

= = =

otherwise
n k p p
k
n
k X P k p
k n k
X
, 0
,..., 2 , 1 , 0 , ) 1 (
)} {( ) (
10
Binomial PMF
Binomial Distribution (n = 10)
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
Number of successes (k )
P

(
{
X

=

k
}
)
p = 1/2
p = 1/4
p = 3/4
p = 1/2
11
Geometric Random Variable
The number of Bernoulli trials, X, until first success is a
Geometric random variable.
PMF is given as:
CDF is given as:
Mean and variance:


2
1
1
1
1
) (
1
] [
0 , ) 1 ( 1 ) 1 ( ) (
, 0
,... 2 , 1 , ) 1 (
) (
p
p
X Var
p
X E
t p p p t F
otherwise
k p p
k p
t
i
t i
X
k
X

= =
= =

=
=

12
Geometric PMF
Geometric Distribution (p = 0.5)
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2 3 4 5 6 7 8 9 10
Number of trials until first success (k )
P

(
{
X

=

k
}
)
13
Example: Modeling Packet Loss
Geometric r.v. gives number of trials required to get first
success
It is easy to see p
X
(k) = (1-p)
k-1
p, k = 1,2,
where p is the probability of success of a trial
Modeling packet losses seen at a router
We can model using a Bernoulli process
{Y
0
, Y
1
, Y
2
,} where Y
i
represents a Bernoulli trial for packet
number i
We can say:
P{Y
i
= 1} = p (i.e., a packet loss)
P{Y
i
= 0} = 1 - p (i.e., no loss)
So number of successful packet transmissions before
first loss, X, is geometrically distributed
P{(X= n)} = p (1-p)
n-1
, n = 1,2,(good length distribution)
14
Poisson Random Variable
A discrete random variable, X, that takes only non-
negative integer values is said to be Poisson with
parameter > 0, if X has the following PMF:
Poisson PMF with parameter is a good approximation
of Binomial PMF with parameters n and p, provided
= np, n is very large, and p is very small.

=
=

otherwise
k
k
e
k p
k
X
, 0
,... 2 , 1 , 0 ,
!
) (

15
Poisson PMF
Poisson Distribution
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4 5 6 7 8 9 10
Number of events (k)
P

[
X

=

k
]
= 0.5
= 1
= 5
16
Poisson Approximation to Binomial
Binomial Distribution (n = 100, p =0.02)
0
0.05
0.1
0.15
0.2
0.25
0.3
0 5 10
Number of successes (k )
P

(
{
X

=

k
}
)
Poisson Distribution ( = 2)
0
0.05
0.1
0.15
0.2
0.25
0.3
0 5 10
Number of events (k )
P

(
{
X

=

k
}
)
Binomial distribution with large n and small p can be approximated by
Poisson distribution with = np
17
Poisson Random Variable (cont.)
CDF of Poisson Random Variable:
Mean and variance:
Consider N independent Poisson random variables X
i
,
i=1,2,3,,N, with parameters X
i
. Then X=X
1
+X
2
++X
N
is
also a Poisson r.v. with parameter =
1
+
2
+...+

= =
=

) ( ] [
0 ,
!
) (
0
X Var X E
t
k
e t F
t
k
k
X
18
Example: Job arrivals
Consider modeling number of job arrivals at a
shop in an interval (0,t]
Let be the rate of arrival of jobs
In an interval t 0
P{one arrival in t} = t
P{two or more arrivals in t} is negligible
Divide the interval (0,t] into n subintervals of
equal lengths
Assume arrival of jobs in each interval to be
independent of arrivals in another interval
19
Example: Job arrivals ()
If n , the interval can be viewed as a sequence of
Bernoulli trials with
The number of successes k in n trials can be given by
the Binomial distributions PMF
n
t
t p = =
k n k
p p
k
n

= ) 1 (
20
Example: Job arrivals ()
on distributi Poisson the is which
e
is interval time in events of y probabilit the , Letting
e
to reduces above the Setting
get to for Substitute
-
t -
!
] 1 , 0 ( 1
,... 2 , 1 , 0 ,
!
) (
,..., 1 , 0 , 1
k
k k
k
k
t
n
n k
n
t
n
t
k
n
t/n p
k
k
k n k

=

=

21
Continuous Random Variable
A random variable X is said to be continuous if
there exists a non-negative function f(x),
x(,), with the property that for any set A
of real numbers:
f(x) is called the probability density function
(PDF) of X

=
A
dx x f A X P ) ( }) ({
22
Properties of PDF
}. { ] , [ ., .
) ( }) ({
1 ) (
, 0 ) (
B X P b a B e i
dx x f b X a P
dx x f
x x f
b
a
=
=
=




find to want we and
one. equals curve the under area i.e.,
23
Properties of PDF (continued)
}) ({
}) ({
}) ({
}) ({
0 ) ( }) ({
b X a P
b X a P
b X a P
b X a P
dx x f a X P
a
a
< =
< =
< < =

= = =

: property above the of e Consequenc
: values individual to value 0 assign ons distributi Continuous
24
Cumulative Distribution Function
The CDF F
X
() of a continuous random
variable X with PDF f
X
() can be obtained as
follows:

=
=

x
X
X
dt t f
x X P x F
) (
]}) , ( ({ ) (
25
CDF - PDF Relationship
The PDF can be obtained from the CDF and
vice versa:
Distribution of a continuous random variable
can be represented using either the PDF or the
CDF.
) (
) (
) (
'
x f
dx
x dF
x F
X
X
X
= =
26
PDF and CDF of Uniform R.V.
The PDF of a uniform
random variable X in
the interval [a, b] is:
The CDF of X is:
( )
1
,
0, otherwise
a x b
f x
b a

< <

( )
0,
,
1,
x a
x a
F x a x b
b a
x b

= < <

How did we get F(x)?



= < <

x
a
x
a b
a x
a b
dt
a b
dt
b x a F ) (
27
Uniform R.V. PDF and CDF
PDF of Uniform R.V. (a=1, b=3)
0
0.5
1
0 1 2 3 4
x
f(x)
CDF of Uniform R.V. (a=1, b=3)
0
0.5
1
0 1 2 3 4
x
F(x)
28
Exponential Distribution

1
] [
0 , 1
0 , 0
) (
, 0
0 ,
) (
=


<
=

X E
x e
x
x F
otherwise
x e
x f
x
X
x
X
29
Exponential Models
This distribution has been used to model:
Inter-arrival times between IP packets
Inter-arrival times between calls at a call
centre
Inter-arrival times between web sessions from
a web client
Service time distributions
Lifetime of products
Widely used in queuing theory
30
Exponential PDF and CDF
PDF of Exponential Distribution
0
1
2
3
4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
f(x)
=0.5
=1.0
=2.0
=4.0
CDF of Exponential Distribution
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
x
F(x)
=2.0
31
Memory-less Property of
Exponential Distribution
Suppose inter-arrival times of IP packets are
modelled using Exponential distribution. The
memory-less property states that the distribution
of the expected time to a packet arrival is
independent of the duration there have been no
packet arrivals
Suppose X is an exponentially distributed r.v.
and X t (i.e., no arrivals for time t or less).
Then,
P({ X t+h | X t }) = P({ X h })
32
Pareto Distribution
If X is a random variable with a Pareto distribution, then
its PDF is given by:
where x
m
, is the minimum possible value of X, also
called a location parameter, and k is positive, also
called a shape parameter.
CDF of Pareto distribution is given by:
0 , 0 , , ) (
1
> > =
+
k x x x
x
x
k x f
m m
k
k
m
k
m
x
x
x F

=1 ) (
33
Pareto PDF and CDF
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
10 100 1000
p
r
o
b
a
b
i
l
i
t
y

d
e
n
s
i
t
y
,

f
(
x
)

x
Pareto PDF with x_min = 3, k = 1.2
0.00
0.20
0.40
0.60
0.80
1.00
10 100 1000
c
u
m
m
u
l
a
t
i
v
e

d
i
s
t
r
i
b
u
t
i
o
n
,

F
(
x
)

x
Pareto CDF with x_min = 3, k = 1.2
Example: Distribution of file sizes on a web server
A PDF shows a high probability that a file will be under 10 KB in
size, and a very small probability of being larger than 100 KB
A CDF curve shows proportion of files within certain size threshold,
e.g. nearly all files are under 100 KB in size
34
Pareto Models
This highly left-skewed distribution is heavy-
tailed meaning that a random variable can have
extreme values.
Common models:
Distribution of income
Distribution of files in a P2P system
The values of oil reserves in oil fields (a few large
fields, many small fields)
The length distribution in jobs assigned to
supercomputers (a few large ones, many small ones)
The standardized price returns on individual stocks
35
Normal Distribution
X is a normal random variable with mean and variance

2
if X has the following PDF:
The CDF of a normal distribution is:
There is no closed form for F
X
(x).




= =
< < =
x
t
X
x
X
dt x X P x F
x x f
e
e
2
2
2
2
2
) (
2
) (
2
1
}) ({ ) (
,
2
1
) (



36
PDF of Normal Distribution ( = 0)
0
0.2
0.4
0.6
0.8
1
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
x
f(x)
=1
=0.5
=2
PDF of Normal Distribution
This PDF has a bell shape with peak at x=0.
37
Standard Normal Distribution
If X is normally distributed with parameters
and
2
, then
is normally distributed with parameters 0 and 1.
Z is called the standard normal distribution.

) (
=
X
Z
38
Computing CDF of Normal Distribution
Transform X to standard normal distribution Z
and use tables

Area under standard normal curve in (-,z) is


equal to area under normal curve in (-,x)
Same method is used to obtain P(a < X < b), by
calculating P(X < b) P(X < a)
Alternative formula:
) 1 , 0 (
) (
) , ( ~
2
N
X
Z N X is then If


+ =
2
1
2
1
) (

x
erf x F
X
39
Normal CDF - Example
) , 0 (
) 2 . 0 (
2 . 0
2
5 4 . 5
) ( ) 4 . 5 ( ) 4 . 5 (
) 4 . 5 ( ) 4 , 5 (
) ( ) ( ) (
z
Z P
z for z Z P X P F
F find N For
x
z for z Z P x X P x F
X
X
X
interval the over e.g. given, is area which
check always but curve, the under area gives table The
table" on distributi Normal Standard "
called table the from read is
=

= = =

= = =

40
Normal CDF - Example
Rows mark z-value up to 1 decimal digit
Columns mark z-values 2
nd
decimal digit
For z = 0.2 read value at (0.2, 0.0) 0.0793
Then account for the rest of the area on the left
of y-axis to get the F
X
(5.4) = 0.5 + 0.0793 = 0.5793
0.2852 0.2823 0.2794 0.2764 0.2734 0.2704 0.2673 0.2642 0.2611 0.2580 0.7
0.2549 0.2517 0.2486 0.2454 0.2422 0.2389 0.2357 0.2324 0.2291 0.2257 0.6
0.2224 0.2190 0.2157 0.2123 0.2088 0.2054 0.2019 0.1985 0.1950 0.1915 0.5
0.1879 0.1844 0.1808 0.1772 0.1736 0.1700 0.1664 0.1628 0.1591 0.1554 0.4
0.1517 0.1480 0.1443 0.1406 0.1368 0.1331 0.1293 0.1255 0.1217 0.1179 0.3
0.1141 0.1103 0.1064 0.1026 0.0987 0.0948 0.0910 0.0871 0.0832 0.0793 0.2
0.0753 0.0714 0.0675 0.0636 0.0596 0.0557 0.0517 0.0478 0.0438 0.0398 0.1
0.0359 0.0319 0.0279 0.0239 0.0199 0.0160 0.0120 0.0080 0.0040 0.0000 0.0
0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00
41
Chi-Square Test
Prepare a histogram of the empirical data with k
cells
Let O
i
and E
i
be the observed and expected
frequency of the i
th
cell, respectively. Compute
the following:
has a Chi-Square distribution with (k-1)
degrees of freedom
2
0


=
=
k
i
i
i i
E
E O
1
2
2
0
) (

42
Chi-Square Test (continued )
Define a null hypothesis, H
0
, that observations
come from a specified distribution
The null hypothesis cannot be rejected at a
significance level of if
true) is H | H P(reject
level ce significan of meaning
0 0
=
<



2
] 1 , 1 [
2
0 s k Obtained from a table
43
More on Chi-Square Test
Errors in cells with small E
i
s affect the test
statistics more than cells with large E
i
s.
Minimum size of E
i
debated: [BCNN05]
recommends a value of 3 or more; if not
combine adjacent cells.
Test designed for discrete distributions and large
sample sizes only. For continuous distributions,
Chi-Square test is only an approximation (i.e.,
level of significance holds only for n).
44
Chi-Square Test Example
Example: 500 random numbers generated using a
random number generator; observations categorized
into cells at intervals of 0.1, between 0 and 1. At level of
significance of 0.1, are these numbers IID U(0,1)?
Interval Oi Ei [(Oi-Ei)^2]/Ei
1 50 50 0
2 48 50 0.08
3 49 50 0.02
4 42 50 1.28
5 52 50 0.08
6 45 50 0.5
7 63 50 3.38
8 54 50 0.32
9 50 50 0
10 47 50 0.18
500 5.84
0.10. of level ce significan at accepted Hypothesis
table the from ; 68 . 14 ; 85 . 5
2
] 9 , 9 . 0 [
2
0
= =
45
Fundamental a.k.a. Operational
Laws
Utilization Law
Forced Flow Law
Service Demand Law
Littles Law
Interactive Response Time Law
46
Utilization Law
i i
i i i
i
S X
C
B
T
C

T
B
U = = =
Utilization of a resource (system) is equal to
the product of the throughput of the resource
(system) and average service time of the
resource (system)
Utilization of a resource is the fraction of
time that resource is busy.
U
i
is always between 0 and 1.
47
Forced Flow Law
Each system-level request may require multiple
visits to a system resource.
E.g., A database transaction may require several disk
accesses;
This law relates system throughput to the resource
throughput
X
k
= V
k
X
0
2
1
3
A system consists of many
resources
V
i
:= average # of visits per
request to resource i
X
i
:= throughput at resource i
48
Service Demand Law
D
i
:= mean time spent by a typical request
obtaining service from resource i
Contrast D
i
with S
i
S
i
:= mean service time per visit for resource i
D
i
= S
i
V
i
D
i
= U
i
/X
i
X
i
/X
0
= U
i
/X
0
Typically X
0
and U
i
are easier to obtain than S
i
and V
i
.
49
Littles Law
The most famous Operational Law
Average number in system equals product of the
departure rate of customers (i.e., throughput of the
system) and the average time each customer spends in
the system.
N
i
= X
i
R
i
Arrivals
Completions
Number in
System = N
Black Box == pub
R
i
50
Interactive Response Time Law
X
0
= System throughput
N = # of clients (terminals)
Z = Clients avg. think time
R = Avg. System Response time
Let N
t
= avg. # of clients in think
mode
Let N
w
= avg. # of clients waiting for
response
N
t
+ N
w
= N
N
t
= X
0
Z [Box 1]
N
w
= X
0
R [Box 2],
N = X
0
(R+Z)
R = (N/X
0
) - Z
Terminals
Subsystem
Box 1
Box 2
R
Z
X

Vous aimerez peut-être aussi