Multiple Random Variables

1.
Two Random Variables

• One random variable can be considered as a mapping from the
sample space to the real line.
• Two random variables can be considered as a mapping from the
sample space to the plane.
• Note that given the outcome of the experiment, both X and Y are
determined simultaneously.
1
Joint Probability Mass Function

• Assume that X   X , Y  assumes a countable set of values, i.e. both X and Y
are discrete random variables.{ X  x }  {Y  y }
j k
• The joint probability mass function specifies the probabilities of the
product-form event
p X ,Y x j , yk   P[{ X  x j }  {Y  yk }]
• The probability of any event A is obtained by summing over all outcomes in

A:
P[ A]  p
( x j , yk )  A
X ,Y ( x j , yk )
• The sum over all outcomes is 1:  p

xj yk
X ,Y ( x j , yk )  1
2
Marginal Statistics
• The marginal probability distribution of a random variable in a joint

distribution is simply the probability of the random variable when
considered in isolation.
• The marginal pmf of X is
p [ x ]  P[ X  x ]  P[ X  x , Y  anything]
X j j j

  p [x , y ] X ,Y j k
k 1

• Similarly, pY [ yk ]   p X ,Y [ x j , yk ]
j 1
3
Example
• Toss two fair die, and define
PX  2 Y  2  0 while PX  2 Z  2 

1
9
4
Joint Cumulative Distribution Function
FX ,Y ( x, y)  PX  x Y  y where x, y  
• The joint cumulative distribution function of two random variables X
and Y, F(x,y) is defined as
Y
(x,y)
5
Joint density
• The joint probability density function (pdf) f(x,y) of X and Y is the derivative
of the joint distribution: 2
 F ( x, y )
f ( x, y ) 
xy
• Properties: y x
F ( x, y )    f ( ,  )dd
 
•
 
•  
 
f ( ,  )dd  1
• P[( x, y )  A]   f ( ,  )dd
A
6
The joint pdf is not a probability
• Note that the value of the density function is not a probability.

• However, it is large in regions of the plane with high probability and small in
regions of the plane with low probability.
y  dy x  dx
Px  X  x  dx y  Y  y  dy    f ( ,  )dd

Y y x
dx
 f x, y dxdy
y dy
x X
• It is true that f(x,y) ≥ 0 for all x and y. However, it is not necessarily true that f(x,y)
≤ 1.
7
Marginal densities
• The marginal densities fX(x) or fY(y) can always be recovered from the joint
density f(x,y): 
dFX ( x) dFX ,Y ( x, )
f ( y)  Y 
f ( , y )d

X ,Y
f X ( x)  
dx dx Y
d  x
 
y
 f X ,Y ( ,  )d d Integrate along
dx     this horizontal line
Integrate
  d
for fY(y)
x 
    f X ,Y ( ,  )d d
along this
vertical line

  dx  

for fX(x)

 f X ,Y ( x,  )d x X

• However, it is not generally possible to determine the joint density from the
marginal densities.
8
Independence
• Definition: Two random variables X and Y are said to be independent

or statistically independent if for any events on the real line AX and AY,
P[X  AX   Y  AY ]  P[ X  AX ]P[Y  AY ]
• In other words, the probability of a product form event can be

expressed as the product of the probabilities of the events in X and Y.
• When random variables are independent, knowledge of the
probabilities of the RV’s in isolation is sufficient to specify the
probabilities of joint events.
9
Theorem
The following three statements are equivalent:

1. X and Y are independent.
2. FX ,Y x, y   FX x FY  y  for all x and y.
3. f X ,Y x, y   f X x  f Y  y  for all x and y.
Proof:
(1) → (2) Set AX = {X ≤ x} and AY = {Y ≤ y}.
(2) → (3) Differentiate FX,Y(x,y).

PX  AX  Y  AY     f X ,Y x, y dxdy
(3) → (1) For any AX and AY, AY AX
  
  f X x dx  fY  y dy   PX  AX  PY  AY 
 
 A 10  A 
 X  Y 
Additional facts about independent RVs
• Discrete random variables X and Y are independent if and only if the joint
pmf is the product of the marginal pmf
p X ,Y ( x j , yk )  p X ( x j ) pY ( yk )
• If X and Y are independent, then any functions (linear or nonlinear) of X

and Y, e.g.
W  g( X ) Z  h(Y )
are also independent.
11
Conditional Probability Density Functions
If X is continuous, then P[X = x] = 0. Thus, we define

PA X  x  lim PA x  X  x  h
h 0
We can define the conditional cdf of Y given X by y xh
PY  y  x  X  x  h
  f  ,  d d
X ,Y
FY X  y x   lim  lim  x
h 0 Px  X  x  h h 0 xh
 f  d
x
X
y y
h  f x,  d  f x,  d
X ,Y X ,Y
 lim 
 
if f X x   0
h 0 f X x h f X x 
f X ,Y x, y 
Differentiating with respect to y,
fY X  y x  
f X x 
12
Graphical Interpretation
• The conditional pdf is the probability that Y is

in the infinitesimal strip defined by (y, y+dy)
given that X is in the infinitesimal strip
defined by (x, x+dx).
• The conditional pdf is the mass of the red
column divided by the mass of the entire
yellow strip.
Y
• It can also be interpreted as a slice of the joint
density at X = x normalized to unit area.
f X ,Y  x, y 
fY | X  y | x   
x
 f X ,Y x, y dy X

13
Properties of the Conditional Density
f X ,Y ( x, y )  f X |Y ( x | y) fY ( y) or f X ,Y ( x, y )  fY | X ( y | x) f X ( x)
• Computing the joint density:


fY ( y )   fY | X ( y | x) f X ( x)dx (continuou s random variables)

pY ( j )   pY | X ( j | k ) p X (k ) (discrete random variables)
k
fY | X ( y | x) f X ( x)
• Bayes Theorem f X |Y ( x | y )  


fY | X ( y | x) f X ( x)dx
• If X and Y are independent,

f X |Y ( x | y)  f X ( x) and fY | X ( y | x)  fY ( y)
14
Removing Conditioning via Expectation
Since E[Y|X] is a function of X, it is also a random variable.
Taking the expectation over X, the conditioning disappears!
over X
over Y
This is an “expectation” version of the total probability theorem.
More generally,
15
2. N Random Variables
• An N dimensional random vector is a mapping from a probability

space W to RN
• For example, N = 3. 
W
S
X3
X1
X2
X2
X3
X1
16
Joint Cumulative Distribution Function
FX1 , X 2 ,...,X n ( x1 , x2 ,..., xn )  P[ X 1  x1 , X 2  x2 ,....., X n  xn ]
•
• To eliminate a variable:
FX1 , X 2 ,...,X n1 ( x1 , x2 ,..., xn 1 )  FX1 , X 2 ,...,X n ( x1 , x2 ,..., xn 1 , )
17
Joint Probability Density/Mass Functions
n
f X 1 , X 2 ,..., X n ( x1 , x2 ,..., xn )  FX 1 , X 2 ,..., X n ( x1 , x2 ,..., xn )
x1...xn
• Joint Probability Density Function
PA   f X 1 , X 2 ,..., X n ( x1 , x2 ,..., xn )dx1dx2  dxn
A
• For any set A,
p X1 , X 2 ,...,X n ( x1 , x2 ,..., xn )  P[ X 1  x1 , X 2  x2 ,....., X n  xn ]
• Joint Probability Mass Function
• For any set A, P A    p X1 X 2 X n ( x1 , x2 ,..., xn )

A
18

f X 1 X 2  X n1 x1 , x2 ,, xn 1   f X1 X 2 X n x1 , x2 , x3 ,, xn dxn

Marginal Statistics 
p X 1 X 2  X n1 x1 , x2 ,, xn 1   p
xn  
X1 X 2 X n x1 , x2 , x3 ,, xN 
• Eliminate variables from a pdf (pmf) by integrating (summing)
 
f X 1 x1    f X1 X 2X n x1 , x2 , x3 ,, xn dx2  dxn
 
 
p X 1 x1 , x2 ,, xn 1     p
x2   xn  
X1 X 2X n x1 , x2 , x3 ,, x N 
• Get marginals by successive integrations:
19
Graphical Interpretation
20
Conditional densities
f X1 X 2 X 3 x1 , x2 , x3 
f X1 X 2 | X 3 x1 , x2 | x3  
f X 3 x3 
f X1 X 2 X 3 x1 , x2 , x3 
f X1| X 2 X 3 x1 | x2 , x3  
f X 2 X 3 x2 , x3 
Conditional probability mass functions
p X 1 X 2 X 3 x1 , x2 , x3 
p X 1 X 2 | X 3 x1 , x2 | x3  
p X 3 x3 
p X 1 X 2 X 3 x1 , x2 , x3 
p X 1| X 2 X 3 x1 | x2 , x3  
p X 2 X 3 x2 , x3 
21
Independence
• The following statements are equivalent

1. X 1 , X 2 ,, X n are independent, i.e., for all A1 , A2 ,, An
P[ X 1  A1 , X 2  A2 ,.... X n  An ]  P[ X 1  A1 ]P[ X 2  A2 ]....P[ X n  An ]
2. FX 1 , X 2 ,..., X n ( x1 , x2 ,..., xn )  FX 1 ( x1 ) FX 2 ( x2 ) FX n ( xn )
3. f X 1 , X 2 ,...,X n ( x1 , x2 ,..., xn )  f X 1 ( x1 ) f X 2 ( x2 ).... f X n ( xn )
22
Expected value of a function of random variables
Z  g ( X 1 , X 2 ,..., X n )
Suppose Z is a function of n random variables X1, X2, … Xn.
  
E[ Z ]    ...
  
g ( x1 , x2 ,..., xn ) f ( x1 , x2 ,..., xn )dx1dx2 ...dxn
The expected value of Z can be computed from the joint density:
E[ Z ]  ... g ( x , x ,..., x ) p( x , x ,..., x )

x1 x2 xn
1 2 n 1 2 n
If the Xi are discrete,
23
Linearity of the Expectation
• For any two random variables, the mean of the sum is the sum of the
means: E[ Z ]  E[ X  Y ]    ( x  y) f ( x, y)dxdy
 
Let Z = X + Y    
  x  f ( x, y )dydx   y  f ( x, y )dxdy
   
 
  xf X ( x)dx   yfY ( y )dy  E[ X ]  E[Y ]
 
 
E


i
ai X i  

 a E X 
i
i i where the ai are real constants
• More generally, the order of expectation and any linear operation can
be swapped. For example,
24
Expectation of the product
• If X and Y are independent, then the expectation of the product is the

product of the expected values.
 
E[ XY ]    xyf ( x, y )dxdy
 
 
  xyf X ( x) fY ( y )dxdy by independen ce
 
   xf X ( x)dx     yfY ( y )dy 

  
 
     
 E[ X ]  E[Y ]
• More generally, if Xi are independent and gi(∙) are arbitrary functions

E[ g1 ( X 1 ) g 2 ( X 2 )...g n ( X n )]  E[ g1 ( X 1 )]  E[ g 2 ( X 2 )]  ...  E[ g n ( X n )]
25
Moments and Central Moments
 
Joint moment: E[ X Y ]   
j k
x j y k f ( x, y )dxdy (the jkth moment)
 
Central moment: E[( X  E[ X ]) j (Y  E[Y ]) k ]

The first and second moments are most widely used:
E[ X ]
means
E[Y ]
variances E[( X  E[ X ]) 2 ]  VAR[ X ]

E[(Y  E[Y ]) 2 ]  VAR[Y ]
covariance E[( X  E[ X ])(Y  E[Y ])]  COV ( X , Y )
correlation E[ XY ]
26
Variance of Sum
Let Z = X + Y, where X and Y are random variables.
VAR Z   VAR  X   2COV( X , Y )  VAR Y 
The variance of Z is
Proof VARZ  E[( Z  E[ Z ]) 2 ]

 E[( X  Y  E[ X ]  E[Y ]) 2 ]
 E[( X  E[ X ])  (Y  E[Y ])  ]
2
 E[( X  E[ X ]) 2 ]  2 E[( X  E[ X ])(Y  E[Y ])]  E[(Y  E[Y ]) 2 ]

 VAR X   2COV( X , Y )  VARY 
Note: If X and Y are uncorrelated, then VARZ   VAR X   VARY 

i.e. the variance of the sum is the sum of the variances.
27

Multiple Random Variables

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Multiple Random Variables

Transféré par

Droits d'auteur :

Formats disponibles

1.

Two Random Variables

• The probability of any event A is obtained by summing over all outcomes in

• The sum over all outcomes is 1:  p

• The marginal probability distribution of a random variable in a joint

• Toss two fair die, and define

PX  2 Y  2  0 while PX  2 Z  2 

• Note that the value of the density function is not a probability.

Px  X  x  dx y  Y  y  dy    f ( ,  )dd

• Definition: Two random variables X and Y are said to be independent

• In other words, the probability of a product form event can be

The following three statements are equivalent:

(2) → (3) Differentiate FX,Y(x,y).

• If X and Y are independent, then any functions (linear or nonlinear) of X

are also independent.

If X is continuous, then P[X = x] = 0. Thus, we define

We can define the conditional cdf of Y given X by y xh

• The conditional pdf is the probability that Y is

• Computing the joint density:

• If X and Y are independent,

This is an “expectation” version of the total probability theorem.

• An N dimensional random vector is a mapping from a probability

p X1 , X 2 ,...,X n ( x1 , x2 ,..., xn )  P[ X 1  x1 , X 2  x2 ,....., X n  xn ]

• Joint Probability Mass Function

• For any set A, P A    p X1 X 2 X n ( x1 , x2 ,..., xn )

• Get marginals by successive integrations:

• The following statements are equivalent

3. f X 1 , X 2 ,...,X n ( x1 , x2 ,..., xn )  f X 1 ( x1 ) f X 2 ( x2 ).... f X n ( xn )

The expected value of Z can be computed from the joint density:

E[ Z ]  ... g ( x , x ,..., x ) p( x , x ,..., x )

If the Xi are discrete,

• If X and Y are independent, then the expectation of the product is the

   xf X ( x)dx     yfY ( y )dy 

• More generally, if Xi are independent and gi(∙) are arbitrary functions

Central moment: E[( X  E[ X ]) j (Y  E[Y ]) k ]

variances E[( X  E[ X ]) 2 ]  VAR[ X ]

Proof VARZ  E[( Z  E[ Z ]) 2 ]

 E[( X  E[ X ]) 2 ]  2 E[( X  E[ X ])(Y  E[Y ])]  E[(Y  E[Y ]) 2 ]

Note: If X and Y are uncorrelated, then VARZ   VAR X   VARY 

Vous aimerez peut-être aussi