Probability

Random Variable and Probability Distribution
Outline of Lecture
 Random Variable
– Discrete Random Variable.
– Continuous Random Variables.
 Probability Distribution Function.
– Discrete and Continuous.
– PDF and PMF.
 Expectation of Random Variables.
 Propagation through Linear and Nonlinear model.
 Multivariate Probability Density Functions.
 Some Important Probability Distribution
Functions.
2
Random Variables
 A random variables are functions that associate a

numerical value to each outcome of an experiment.
– Function values are real numbers and depend on “chance”.
 The function that assigns value to each outcome is

fixed and deterministic.
– The randomness is due to the underlying randomness of the
argument of the function X.
– If we roll a pair of dice then the sum of two face values is a
random variable.
 Random numbers can be Discrete or Continuous.

– Discrete: Countable Range.
– Continuous: Uncountable Range.
3
Discrete Random Variables
 A random variable X and the corresponding
distribution are said to be discrete, if the number of
values for which X has non-zero probability is finite.
 Probability Mass Function of X:
pj when x  x j
f ( x) 
0 otherwise
 Probability Distribution Function of x:

F ( x)  P( X  x)
 Properties of Distribution Function:
monotonically increasing
Right Continuous
0  F ( x)  1
P (a  x  b)  F (b)  F (a )
4
Examples
 X denote the number of heads when a biased coin with probability
of head p is tossed twice.
– X can take value 0, 1 or 2.
0 x0
(1  p) 2 0  x  1
F ( x) 
p(1  p)1  x  2
p 2
x2
 X denote the random variable that is equal to sum of two fair
dices.
– Random variable can take any integral value between 1 and 12.
5
Continuous Random Variables and Distributions
 X is a continuous random variable if there exists a

non-negative function f(x) defined for real line having
the property that
x
P( X  x)  F ( x)  

f ( y )dy
F '( x)  f ( x)
 The integrand f(y) is called a probability density
function.



f ( x)dx  1
 Properties: b
P(a  X  b)  F (b)  F (a)   f ( x)dx
a
6
Continuous Random Variables and Distributions
 Probability that a continuous random variable will assume any

particular value is zero.
a
P( X  a)   f ( x)dx  0
a
 It does not mean that event will never occur.

– Occur infrequently and its relative frequency will converge to zero.
– f(a) large Probability mass is very dense.
– f(a) small  Probability mass is not very dense.
 f(a) is the measure of how likely it is that random variable will be

near a. a 
P(a    X  a   )   f ( x)dx  2 f (a)
a
7
Difference Between PDF and PMF
 Probability density function does not defines a probability but
probability density.
– To obtain the probability we must integrate it in an interval.
 Probability mass function gives the true probability.
– It does not need to be integrate to obtain the probability.
a b
 Probability distribution function is either continuous or has a jump
discontinuity.
1) P(a  X  b) 3) P(a  X  b)
2) P(a  X  b) 4) P(a  X  b)
– Are they equal?

8
Statistical Characterization of Random Variables
 Recall, a random number denote the numerical attribute assigned
to an outcome of an experiment.
 We can not be certain which value of X will be observed on a
particular trial.
 Will average of all the values will be same for two different set of
trials?
x1  x2   xn y1  y2   yn
x y
n n
 Recall, probability approx. equal to relative frequency.
– Approx. Np1 number of xi’s have value u1
x1  x2   xn np1u1   npmum
x    ui pi
n n
9
 Expected Value:
–The expected value of a discrete random variable, x
is found by multiplying each value of random
variable by its probability and then summing over all
values of x.
Expected value of x: E[ x]   xP( x)   xf ( x)
x x
– Expected value is equivalent to center of mass concept.
r   mi   ri  mi
– That’s why name first moment also.
– Body is perfectly balanced abt. Center of mass
 The expectation value of x is the “balancing point” for the
probability mass function of x
– Expected value is equal to the point of symmetry in case of
symmetric pmf/pdf.
10
 Law of Unconscious Statistician (LOTUS): We can
take an expectation of any function of a random
variable.
Expected value of (y=g(x)) = E[y]=  yf y  g(x)f x
y x
 This balance point is the value expected for g(x) for

all possible repetitions of the experiment involving the
random variable x.
 Expected value of a continuous density function f(x),

is given by

E ( x)   xf ( x)dx

11
Example
 Let us assume that we have agreed to pay $1 for

each dot showing when a pair of dice is thrown. We
are interested in knowing, how much we would lose
on the average?
Values of x Frequency Values of Probability
Probability Function Distribution
Function
2 1 P(x=2) = 1/36 P(x2) = 1/36
3 2 P(x=3) = 2/36 P(x3) = 3/36
4 3 P(x=4) = 3/36 P(x4) = 6/36
5 4 P(x=5) = 4/36 P(x5) = 10/36
6 5 P(x=6) = 5/36 P(x6) = 15/36
7 6 P(x=7) = 6/36 P(x7) = 21/36
8 5 P(x=8) = 5/36
P(x8) = 26/36
9 4 P(x=9) = 4/36
P(x9) = 30/36
10 3 P(x=10) = 3/36
11 2 P(x=11) = 2/36 P(x10) = 33/36
12 1 P(x=12) = 1/36 P(x11) = 35/36
P(x12) = 1
Sum 36 1.00
 Average amount we pay=

(($2x1)+($3x2)+……+($12x1))/36=$7
 E(x)=$2(1/36)+$3(2/36)+……….+$12(1/36)=$7 12
Example (Continue…)
 Let us assume that we had agreed to pay an amount

equal to the squares of the sum of the dots showing
on a throw of dice.
– What would be the average loss this time?
 Will it be ($7)2=$49.00?
 Actually, now we are interested in calculating E[x2].

– E[x2]=($2)2(1/36)+……….+($12)2(1/36)=$54.83  $49
– This result also emphasized that (E[x])2  E[x2]
13
Expectation Rules
 Rule 1: E[k]=k; where k is a constant

 Rule 2: E[kx] = kE[x].
 Rule 3: E[x  y] = E[x]  E[y].
 Rule 4: If x and y are independent
E[xy] = E[x]E[y]
 Rule 5: V[k] = 0; where k is a constant
 Rule 6: V[kx] = k2V[x]
14
Variance of Random Variable
 Variance of random variable, x is defined as

V ( x)   2  E[( x   )2 ]
V ( x)  E[ x 2  2  x   2 ]
 E[ x 2 ]  2( E[ x]) 2  ( E[ x]) 2
 E[ x 2 ]  ( E[ x]) 2
This result is also known as “Parallel Axis Theorem”
15
Propagation of moments and density function through
linear models
 y=ax+b
– Given:  = E[x] and 2 = V[x]

– To find: E[y] and V[y]
E[y] = E[ax]+E[b] = aE[x]+b = a+b

V[y] = V[ax]+V[b] = a2V[x]+0 = a2 2
(x  )
 Let us define z

Here, a = 1/  and b = - / 
Therefore, E[z] = 0 and V[z] = 1
z is generally known as “Standardized variable”
16
Propagation of moments and density function through
non-linear models
 If x is a random variable with probability density function p(x) and y =
f(x) is a one to one transformation that is differentiable for all x then the
probability function of y is given by
– p(y)=p(x)|J|-1, for all x given by x=f-1(y)

– where J is the determinant of Jacobian matrix J.
1
 Example: Let y  ax 2 and p( x)  exp( x 2 / 2 x2 )
 x 2
NOTE: for each value of y there are two values of x.

1
p( y )  exp( y / 2a x2 ),  y  0
2 x 2 ay
and
p(y) = 0, otherwise
We can also show that
E ( y )  a x2 and V ( y )  2a 4 x4
17
Random Variables
 One random number depicts one physical phenomenon.

– Web server.
 Just an extension to random variable
– A vector random variable X is a function that assigns a vector of
real number to each outcome in the sample space.
– e.g. Sample Space = Set of People.
– Random vector=[X=weight, Y=height of a person].
 A random point (X,Y) has more information than X or Y.
– It describes the joint behavior of X and Y.
 The joint probability distribution function:
F ( X , Y )  P({ X  x}  {Y  y})
 What Happens: x   x  
y y
18
Random Vectors
 Joint Probability Functions:

– Joint Probability Distribution Function:
F ( X )  P[{X1  x1} {X 2  x2}  ......... {X n  xn }]
– Joint Probability Density Function:
n F ( X )
f ( x) 
X 1X 2 ...X n
 Marginal Probability Functions: A marginal probability
functions are obtained by integrating out the
variables that are of no interest.
y 
F ( x)   P( x, y ) or  f ( x, y )dy
y y 
19
Multivariate Expectations

E( X )   xf

X ( x)dx

What abt. g(X,Y)=X+Y
f X ( x)  

f X ,Y ( x, y )dy
  
E( X )   xf

X ( x)dx    xf
 
X ,Y ( x, y )dxdy
  
E (Y ) 

 yf Y ( y )dy    yf
 
X ,Y ( x, y )dydx
  
E ( g ( X ))   g ( x) f

X ( x)dx    g ( x) f
 
X ,Y ( x, y )dydx
 
E (h(Y ))    h( y ) f
 
X ,Y ( x, y )dxdy
 
E ( g ( X , Y ))    g ( x, y ) f
 
X ,Y ( x, y )dxdy
20
Multivariate Expectations
 Mean Vector:
E[x]  [ E[ x1 ] E[ x2 ] ...... E[ xn ]]
 Expected value of g(x1,x2,…….,xn) is given by
E[ g (x)]    ..... g (x) f (x) or   ..... g (x) f (x)dx
xn xn1 x1 xn xn-1 x1
 Covariance Matrix:
cov[x]  P  E[(x   )(x   )T ]  E[xxT ]   T
where, S  E[xxT ] is known as autocorrelation matrix.
 1 0  0   1 12 1n   1 0  0
 0   0   1  2 n   0  2  0 
NOTE: P  R   2   21
   
   
 0 0   n    n1 n 2 1   0 0   n 
R is the correlation matrix

21
Covariance Matrix
 Covariance matrix indicates the tendency of each

pair of dimensions in random vector to vary together
i.e. “co-vary”.
 Properties of covariance matrix:
– Covariance matrix is square.
– Covariance matrix is always +ive definite i.e. xTPx > 0.
– Covariance matrix is symmetric i.e. P = PT.
– If xi and xj tends to increase together then Pij > 0.
– If xi and xj are uncorrelated then Pij = 0.
22
Independent Variables
 Recall, two random variables are said to be independent

if knowing values of one tells you nothing about the other
variable.
– Joint probability density function is product of the marginal
probability density functions.
– Cov(X,Y)=0 if X and Y are independent.
– E(XY)=E(X)E(Y).
 Two variables are said to be uncorrelated if cov(X,Y)=0.
– Independent variables are uncorrelated but vice versa is not
true.
 Cov(X,Y)=0Integral=0.
– It tells us that distribution is balanced in some way but says
nothing abt. Distribution values.
– Example: (X,Y) uniformly distributed on unit circle.
23
Gaussian or Normal Distribution
 The normal distribution is the most widely known and used distribution
in the field of statistics.
– Many natural phenomena can be approximated by Normal
distribution.
 Central Limit Theorem:
– The central limit theorem states that given a distribution with a mean
 and variance 2, the sampling distribution of the mean approaches
a normal distribution with a mean  and a variance  2/N as N, the
sample size increases.
0.399
 Normal Density Function: 
( x   )2
1
f ( x)  e 2 2
x 
 2
-2 -  + +2 24

Multivariate Normal Distribution
 Multivariate Gaussian Density Function:
 1 
  2  X μ  R  X μ  
T 1
1
f ( X)  1
e  
n
2 R 2
 How to find equal probability surface?

1
  1
 Xμ  constant
T
 Xμ R
2
 More ever one is interested to find the probability of x lies inside

the quadratic hyper surface
– For example what is the probability of lying inside 1-σ ellipsoid.
 1 
1
R  CΣC T Y  C( X  μ)  2 0 0 
 1 
Yi  1 
P   zi2  c 2    f ( z )dV zi   0

 22
0 
Σ
V
i  
 
z12  z22   zn2  c 2  0 1 
0
  n2 
25
Multivariate Normal Distribution
 Yi represents coordinates based on Cartesian

principal axis system and σ2i is the variance along the
principal axes.
 Probability of lying inside 1σ,2σ or 3σ ellipsoid
decreases with increase in dimensionality.
n\c
1 2 3
1 0.683 0.955 0.997
2 0.394 0.865 0.989
Curse of Dimensionality
3 0.200 0.739 0.971
26
Summary of Probability Distribution Functions
Probability Parameters Characteristics Probability Mean Variance

Distribution Function
Discrete
Binomial 0  p  1 and n  0,1, 2, Skewed unless n

Cx p x q n x np npq
p=0.5
M=0…n, N=0,1,2… M nM ( N  M )( N  n )
M
C x N  M Cn  x n
Hypergeometric n=0…N Skewed N N N 2 ( N  1)
Cn
Poisson >0 Skewed  x e   

positively x!
Continuous
( x   )2
Normal Symmetric 1 2 2
-     and   0 e  2
about   2
Standardized Symmetric 1 x2
2
Normal about zero e

2 0 1
Exponential Skewed  e  T
0 1/ 1/2
Positively
A distribution is skewed if it has most of its values either to the right or to the left of its mean
27
Properties of Estimators
 Unbiasedness
– On average the value of parameter being estimated is equal
to true value.
E[ xˆ ]  x
 Efficiency
– Have a relatively small variance.
– The values of parameters being estimated should not vary
with samples.
 Sufficiency
– Use as much as possible information available from the
samples.
 Consistency
– As the sample size increases, the estimated value
approaches the true value.
28

Probability

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Probability

Transféré par

Droits d'auteur :

Formats disponibles

Random Variable and Probability Distribution

 A random variables are functions that associate a

 The function that assigns value to each outcome is

 Random numbers can be Discrete or Continuous.

 Probability Distribution Function of x:

 X is a continuous random variable if there exists a

 Probability that a continuous random variable will assume any

 It does not mean that event will never occur.

 f(a) is the measure of how likely it is that random variable will be

– Are they equal?

 This balance point is the value expected for g(x) for

 Expected value of a continuous density function f(x),

 Let us assume that we have agreed to pay $1 for

 Average amount we pay=

 Let us assume that we had agreed to pay an amount

 Actually, now we are interested in calculating E[x2].

 Rule 1: E[k]=k; where k is a constant

 Variance of random variable, x is defined as

This result is also known as “Parallel Axis Theorem”

– Given:  = E[x] and 2 = V[x]

E[y] = E[ax]+E[b] = aE[x]+b = a+b

– p(y)=p(x)|J|-1, for all x given by x=f-1(y)

NOTE: for each value of y there are two values of x.

 One random number depicts one physical phenomenon.

 Joint Probability Functions:

– Joint Probability Density Function:

R is the correlation matrix

 Covariance matrix indicates the tendency of each

 Recall, two random variables are said to be independent

-2 -  + +2 24

 How to find equal probability surface?

 More ever one is interested to find the probability of x lies inside

 Yi represents coordinates based on Cartesian

Probability Parameters Characteristics Probability Mean Variance

Binomial 0  p  1 and n  0,1, 2, Skewed unless n

Poisson >0 Skewed  x e   

Normal about zero e

Vous aimerez peut-être aussi