Vous êtes sur la page 1sur 28

Random Variable and Probability Distribution

Outline of Lecture
 Random Variable
– Discrete Random Variable.
– Continuous Random Variables.
 Probability Distribution Function.
– Discrete and Continuous.
– PDF and PMF.
 Expectation of Random Variables.
 Propagation through Linear and Nonlinear model.
 Multivariate Probability Density Functions.
 Some Important Probability Distribution
Functions.

2
Random Variables

 A random variables are functions that associate a


numerical value to each outcome of an experiment.
– Function values are real numbers and depend on “chance”.

 The function that assigns value to each outcome is


fixed and deterministic.
– The randomness is due to the underlying randomness of the
argument of the function X.
– If we roll a pair of dice then the sum of two face values is a
random variable.

 Random numbers can be Discrete or Continuous.


– Discrete: Countable Range.
– Continuous: Uncountable Range.

3
Discrete Random Variables
 A random variable X and the corresponding
distribution are said to be discrete, if the number of
values for which X has non-zero probability is finite.
 Probability Mass Function of X:
pj when x  x j
f ( x) 
0 otherwise

 Probability Distribution Function of x:


F ( x)  P( X  x)
 Properties of Distribution Function:
monotonically increasing
Right Continuous
0  F ( x)  1
P (a  x  b)  F (b)  F (a )
4
Examples
 X denote the number of heads when a biased coin with probability
of head p is tossed twice.
– X can take value 0, 1 or 2.

0 x0
(1  p) 2 0  x  1
F ( x) 
p(1  p)1  x  2
p 2
x2
 X denote the random variable that is equal to sum of two fair
dices.
– Random variable can take any integral value between 1 and 12.

5
Continuous Random Variables and Distributions

 X is a continuous random variable if there exists a


non-negative function f(x) defined for real line having
the property that
x
P( X  x)  F ( x)  

f ( y )dy

F '( x)  f ( x)
 The integrand f(y) is called a probability density
function.



f ( x)dx  1
 Properties: b
P(a  X  b)  F (b)  F (a)   f ( x)dx
a

6
Continuous Random Variables and Distributions

 Probability that a continuous random variable will assume any


particular value is zero.
a
P( X  a)   f ( x)dx  0
a

 It does not mean that event will never occur.


– Occur infrequently and its relative frequency will converge to zero.
– f(a) large Probability mass is very dense.
– f(a) small  Probability mass is not very dense.

 f(a) is the measure of how likely it is that random variable will be


near a. a 
P(a    X  a   )   f ( x)dx  2 f (a)
a

7
Difference Between PDF and PMF
 Probability density function does not defines a probability but
probability density.
– To obtain the probability we must integrate it in an interval.
 Probability mass function gives the true probability.
– It does not need to be integrate to obtain the probability.

a b
 Probability distribution function is either continuous or has a jump
discontinuity.
1) P(a  X  b) 3) P(a  X  b)
2) P(a  X  b) 4) P(a  X  b)

– Are they equal?


8
Statistical Characterization of Random Variables
 Recall, a random number denote the numerical attribute assigned
to an outcome of an experiment.
 We can not be certain which value of X will be observed on a
particular trial.
 Will average of all the values will be same for two different set of
trials?

x1  x2   xn y1  y2   yn
x y
n n
 Recall, probability approx. equal to relative frequency.
– Approx. Np1 number of xi’s have value u1

x1  x2   xn np1u1   npmum
x    ui pi
n n

9
Statistical Characterization of Random Variables
 Expected Value:
–The expected value of a discrete random variable, x
is found by multiplying each value of random
variable by its probability and then summing over all
values of x.
Expected value of x: E[ x]   xP( x)   xf ( x)
x x
– Expected value is equivalent to center of mass concept.

r   mi   ri  mi
– That’s why name first moment also.
– Body is perfectly balanced abt. Center of mass
 The expectation value of x is the “balancing point” for the
probability mass function of x
– Expected value is equal to the point of symmetry in case of
symmetric pmf/pdf.
10
Statistical Characterization of Random Variables
 Law of Unconscious Statistician (LOTUS): We can
take an expectation of any function of a random
variable.
Expected value of (y=g(x)) = E[y]=  yf y  g(x)f x
y x

 This balance point is the value expected for g(x) for


all possible repetitions of the experiment involving the
random variable x.

 Expected value of a continuous density function f(x),


is given by

E ( x)   xf ( x)dx


11
Example

 Let us assume that we have agreed to pay $1 for


each dot showing when a pair of dice is thrown. We
are interested in knowing, how much we would lose
on the average?
Values of x Frequency Values of Probability
Probability Function Distribution
Function
2 1 P(x=2) = 1/36 P(x2) = 1/36
3 2 P(x=3) = 2/36 P(x3) = 3/36
4 3 P(x=4) = 3/36 P(x4) = 6/36
5 4 P(x=5) = 4/36 P(x5) = 10/36
6 5 P(x=6) = 5/36 P(x6) = 15/36
7 6 P(x=7) = 6/36 P(x7) = 21/36
8 5 P(x=8) = 5/36
P(x8) = 26/36
9 4 P(x=9) = 4/36
P(x9) = 30/36
10 3 P(x=10) = 3/36
11 2 P(x=11) = 2/36 P(x10) = 33/36
12 1 P(x=12) = 1/36 P(x11) = 35/36
P(x12) = 1
Sum 36 1.00

 Average amount we pay=


(($2x1)+($3x2)+……+($12x1))/36=$7
 E(x)=$2(1/36)+$3(2/36)+……….+$12(1/36)=$7 12
Example (Continue…)

 Let us assume that we had agreed to pay an amount


equal to the squares of the sum of the dots showing
on a throw of dice.
– What would be the average loss this time?
 Will it be ($7)2=$49.00?

 Actually, now we are interested in calculating E[x2].


– E[x2]=($2)2(1/36)+……….+($12)2(1/36)=$54.83  $49
– This result also emphasized that (E[x])2  E[x2]

13
Expectation Rules

 Rule 1: E[k]=k; where k is a constant


 Rule 2: E[kx] = kE[x].
 Rule 3: E[x  y] = E[x]  E[y].
 Rule 4: If x and y are independent
E[xy] = E[x]E[y]
 Rule 5: V[k] = 0; where k is a constant
 Rule 6: V[kx] = k2V[x]

14
Variance of Random Variable

 Variance of random variable, x is defined as


V ( x)   2  E[( x   )2 ]

V ( x)  E[ x 2  2  x   2 ]
 E[ x 2 ]  2( E[ x]) 2  ( E[ x]) 2
 E[ x 2 ]  ( E[ x]) 2

This result is also known as “Parallel Axis Theorem”

15
Propagation of moments and density function through
linear models

 y=ax+b

– Given:  = E[x] and 2 = V[x]


– To find: E[y] and V[y]

E[y] = E[ax]+E[b] = aE[x]+b = a+b


V[y] = V[ax]+V[b] = a2V[x]+0 = a2 2

(x  )
 Let us define z

Here, a = 1/  and b = - / 
Therefore, E[z] = 0 and V[z] = 1
z is generally known as “Standardized variable”
16
Propagation of moments and density function through
non-linear models
 If x is a random variable with probability density function p(x) and y =
f(x) is a one to one transformation that is differentiable for all x then the
probability function of y is given by

– p(y)=p(x)|J|-1, for all x given by x=f-1(y)


– where J is the determinant of Jacobian matrix J.
1
 Example: Let y  ax 2 and p( x)  exp( x 2 / 2 x2 )
 x 2

NOTE: for each value of y there are two values of x.


1
p( y )  exp( y / 2a x2 ),  y  0
2 x 2 ay
and
p(y) = 0, otherwise
We can also show that

E ( y )  a x2 and V ( y )  2a 4 x4

17
Random Variables

 One random number depicts one physical phenomenon.


– Web server.
 Just an extension to random variable
– A vector random variable X is a function that assigns a vector of
real number to each outcome in the sample space.
– e.g. Sample Space = Set of People.
– Random vector=[X=weight, Y=height of a person].
 A random point (X,Y) has more information than X or Y.
– It describes the joint behavior of X and Y.
 The joint probability distribution function:
F ( X , Y )  P({ X  x}  {Y  y})
 What Happens: x   x  
y y
18
Random Vectors

 Joint Probability Functions:


– Joint Probability Distribution Function:
F ( X )  P[{X1  x1} {X 2  x2}  ......... {X n  xn }]

– Joint Probability Density Function:

n F ( X )
f ( x) 
X 1X 2 ...X n
 Marginal Probability Functions: A marginal probability
functions are obtained by integrating out the
variables that are of no interest.
y 

F ( x)   P( x, y ) or  f ( x, y )dy
y y 

19
Multivariate Expectations

E( X )   xf

X ( x)dx


What abt. g(X,Y)=X+Y
f X ( x)  

f X ,Y ( x, y )dy

  
E( X )   xf

X ( x)dx    xf
 
X ,Y ( x, y )dxdy

  
E (Y ) 

 yf Y ( y )dy    yf
 
X ,Y ( x, y )dydx
  
E ( g ( X ))   g ( x) f

X ( x)dx    g ( x) f
 
X ,Y ( x, y )dydx

 
E (h(Y ))    h( y ) f
 
X ,Y ( x, y )dxdy

 
E ( g ( X , Y ))    g ( x, y ) f
 
X ,Y ( x, y )dxdy
20
Multivariate Expectations

 Mean Vector:
E[x]  [ E[ x1 ] E[ x2 ] ...... E[ xn ]]
 Expected value of g(x1,x2,…….,xn) is given by
E[ g (x)]    ..... g (x) f (x) or   ..... g (x) f (x)dx
xn xn1 x1 xn xn-1 x1

 Covariance Matrix:
cov[x]  P  E[(x   )(x   )T ]  E[xxT ]   T
where, S  E[xxT ] is known as autocorrelation matrix.

 1 0  0   1 12 1n   1 0  0
 0   0   1  2 n   0  2  0 
NOTE: P  R   2   21
   
   
 0 0   n    n1 n 2 1   0 0   n 

R is the correlation matrix


21
Covariance Matrix

 Covariance matrix indicates the tendency of each


pair of dimensions in random vector to vary together
i.e. “co-vary”.
 Properties of covariance matrix:
– Covariance matrix is square.
– Covariance matrix is always +ive definite i.e. xTPx > 0.
– Covariance matrix is symmetric i.e. P = PT.
– If xi and xj tends to increase together then Pij > 0.
– If xi and xj are uncorrelated then Pij = 0.

22
Independent Variables

 Recall, two random variables are said to be independent


if knowing values of one tells you nothing about the other
variable.
– Joint probability density function is product of the marginal
probability density functions.
– Cov(X,Y)=0 if X and Y are independent.
– E(XY)=E(X)E(Y).
 Two variables are said to be uncorrelated if cov(X,Y)=0.
– Independent variables are uncorrelated but vice versa is not
true.
 Cov(X,Y)=0Integral=0.
– It tells us that distribution is balanced in some way but says
nothing abt. Distribution values.
– Example: (X,Y) uniformly distributed on unit circle.

23
Gaussian or Normal Distribution

 The normal distribution is the most widely known and used distribution
in the field of statistics.
– Many natural phenomena can be approximated by Normal
distribution.
 Central Limit Theorem:
– The central limit theorem states that given a distribution with a mean
 and variance 2, the sampling distribution of the mean approaches
a normal distribution with a mean  and a variance  2/N as N, the
sample size increases.
0.399
 Normal Density Function: 

( x   )2
1
f ( x)  e 2 2
x 
 2

-2 -  + +2 24


Multivariate Normal Distribution
 Multivariate Gaussian Density Function:
 1 
  2  X μ  R  X μ  
T 1
1
f ( X)  1
e  

n
2 R 2

 How to find equal probability surface?


1
  1
 Xμ  constant
T
 Xμ R
2

 More ever one is interested to find the probability of x lies inside


the quadratic hyper surface
– For example what is the probability of lying inside 1-σ ellipsoid.
 1 
1
R  CΣC T Y  C( X  μ)  2 0 0 
 1 
Yi  1 
P   zi2  c 2    f ( z )dV zi   0

 22
0 
Σ
V
i  
 
z12  z22   zn2  c 2  0 1 
0
  n2 

25
Multivariate Normal Distribution

 Yi represents coordinates based on Cartesian


principal axis system and σ2i is the variance along the
principal axes.
 Probability of lying inside 1σ,2σ or 3σ ellipsoid
decreases with increase in dimensionality.
n\c
1 2 3
1 0.683 0.955 0.997
2 0.394 0.865 0.989
Curse of Dimensionality
3 0.200 0.739 0.971

26
Summary of Probability Distribution Functions

Probability Parameters Characteristics Probability Mean Variance


Distribution Function
Discrete

Binomial 0  p  1 and n  0,1, 2, Skewed unless n


Cx p x q n x np npq
p=0.5
M=0…n, N=0,1,2… M nM ( N  M )( N  n )
M
C x N  M Cn  x n
Hypergeometric n=0…N Skewed N N N 2 ( N  1)
Cn

Poisson >0 Skewed  x e   


positively x!
Continuous
( x   )2
Normal Symmetric 1 2 2
-     and   0 e  2
about   2

Standardized Symmetric 1 x2
2

Normal about zero e


2 0 1

Exponential Skewed  e  T
0 1/ 1/2
Positively

A distribution is skewed if it has most of its values either to the right or to the left of its mean

27
Properties of Estimators

 Unbiasedness
– On average the value of parameter being estimated is equal
to true value.
E[ xˆ ]  x

 Efficiency
– Have a relatively small variance.
– The values of parameters being estimated should not vary
with samples.
 Sufficiency
– Use as much as possible information available from the
samples.
 Consistency
– As the sample size increases, the estimated value
approaches the true value.

28

Vous aimerez peut-être aussi