Vous êtes sur la page 1sur 12

PROBABILITY & RANDOM PROCESSES We have looked at sample space S, of subsets of the sample space S and field probability

assignment p for the members of , satisfying the probability axioms. Bayes theorem and total probability theorem were discussed. field not being the power set of the sample space when S is We talked about the uncountable. Let us move on to Random Variables. To use the properties of functions of a real variable we need to move from p[.] to functions on the real line . We look at Borel sets which are members of the field generated by the unbounded sets (- , x] . field ? What are the sets in this (a, b] = (
, b] ( , a] 1 1 ,a ] {a} = lim (a n n n [a, b] = (a, b] {a} [a, b) = [a, b] {b} (a, b) = (a, b] {b} field in addition to countable sets Also, ( , b), [a, ), (a, ) are members of the and complements of countable sets. This collection of subsets of is called Borel sets ( ) or just .

Random Variable
Consider a random experiment with probability space ( S , , p) . Definition: A random variable X is a mapping from S to such that sets in are pulled back to sets in , i.e. X: S such that X-1(B) for all B . Will every function X: S a random variable? NO!

Example: S = {1,2,3,,n} and ={ , S ,{1},{2,3,...,n}} . Then the function X(i) = 2i for all i = 1, 2,,n is not a random variable. If = 2S then every function X will be measurable. We usually take countable. = 2S when S is

When S is uncountable we have 2S and we do have functions which are not random variables (non-measurable). Only solace is that measurable functions are closed under sums, products and limits.

Cumulative Distribution Function

Consider a probability model ( S , , p) and a random variable X defined on S. For a fixed x , define the set Ax as: X (s) x} . Ax = {s S | We typically write Ax as {X x}. Therefore p[Ax] = p[{X x}] = p[X x] To get p it is enough to assign probabilities to the collection of events {Ax: x }. Definition: The cumulative distribution function (cdf) of a random variable (rv) X denoted by FX(x) = p[X x], x . Note that there is a single argument for the function FX and the right hand side is the function p of a set {X x}.

Properties of cdf: 1. FX(x) is non-negative, i.e. FX(x) 0, x . 2. FX(x) 1, x . 3. FX(x) is a non-decreasing function i.e. for x, y
h o

we have FX(x)


when x < y. 4. FX(x) is continuous from the right i.e. lim FX (a h) FX (a) for a

Inverse problem: Will every function with the above four properties be a cdf of a random experiment? Yes! Of Course! On a sample space [0, 1] - used in simulation. We have three types of random variables and their cdfs: Discrete, continuous and mixed How to calculate probabilities using cdf? Example 1: Interval (a, b] Define the set A as {s S | a < X(s) b} = {a < X b} A = {X b}{X a} i.e. {X b} = A { X a} which is disjoint union. p[a < X b] = FX(b) FX(a) Example 2: Interval [a, b]: p[a X b] = FX(b) FX(a) + p[X = a] 1 1 ,a ] we get Using {X=a} = lim (a n n n 1 1 X a ] p[X = a] = lim p [a n n n 1 1 ) lim FX (a ) = lim FX (a n n n n = FX (a ) FX (a )

Example 3: Interval [a, b): p[a X < b] = FX(b) FX(a) + p[X = a] p[X = b] Example 4: Interval (a, b): p[a < X < b] = FX(b) FX(a) p[X = b] Example 5: Interval (a, ): P[a < X ] = 1 FX(a)

Conditional cdf:
Consider a probability model ( S , , p) and an arbitrary event A , and a rv X defined on S. Since FX(x) is the probability of the event B = [X x], conditional probabilities p[B|A] make sense when p[A] > 0. Definition: The conditional cdf (cumulative distribution function) of a rv X, given an event A such that p[A] > 0, is defined as p[{X x} A] FX(x|A) = . A may or may not be related to the rv X p[ A] Till now: We have reduced the probability assignment problem from , to specifying a function FX(x) of single argument. Seen how to calculate probabilities of intervals and their union, complements etc. This will be enough to calculate a lot of probabilities when used along with probability axiom 3 But how to find the cdf? Big question! Even if we find one how do we check axiom 3?

Probability Density Function

Definition: The probability density function (pdf) of a rv X, denoted by fX(x), is defined as the derivative of FX(x) wherever it exists; dFX ( x) , x fX(x) = dx Note: The derivative exists only if the cdf is continuous. Even if the cdf is continuous it may have different left and right derivatives at a point x = a. If the cdf has a countable number of discontinuities, the definition still is valid if we replace the derivative by Dirac delta function ( x ) at points of discontinuity.

Properties of pdf: 1. fX(x) is a non-negative function i.e. fX(x) 0,

2. fX(x) integrates to 1; i.e.

f X ( x)dx 1.

If a function f satisfies the above two properties is there a rv X such that X has f(x) as its pdf? Yes, as we can get a cdf from this function f(x).

Note that

f X ( x)dx FX ( y).

Probabilities through pdf:


1. p[a < X 2. p[a 3. p[a X

b] =

f X ( x)dx

b] =

f X ( x)dx
f X ( x)dx
a b

X < b] =

4. p[a < X < b] =


f X ( x)dx

5. p[a < X ]


f X ( x)dx

In the computation of these probabilities relevant delta functions must be used. Additivity axiom follows from the additivity of integrals. (There are subtle mathematical points when the events are not intervals).

Conditional pdf
Definition: The conditional pdf of a rv X, given an event A with p[A] > 0, is defined dFX ( x | A) as fX(x|A) = dx Pdf in practice: From the definition of pdf we can give the following interpretation p[ X x dx] p[ X x] fX(x) , for very small dx dx fX(x)dx p[X x + dx] p[X x] State of affairs till now: Able to assign probabilities satisfying axiom 3 Need to assign probabilities to only infinitesimal intervals (x, x + dx] Need not worry about more mathematical rigour than this! Probabilistic Model again! Consider a random experiment E with initial probability model ( S , , p) . Suppose we define a random variable X on S with cdf FX(x) and pdf fX(x). The random variable X transforms ( S , , p) into [ , ( ), FX(x)] or equivalently into [ , ( ), fX(x)]. It is to be noted that regardless of the random experiment and the random variable X, the sample space and the collection ( ) are the same for all experiments. Thus the original random experiment can be implicitly considered in our mind and we take the liberty of naming fX(x) [or FX(x)] as the probabilistic model of the random experiment! This is our focus from now on.

Standard Random Variable Models There are infinitely many discrete and continuous random variables. Among these, a few have wide applications.
Discrete Models: If the rvs take a countable number of values {ak} then their pdf is a sum of delta functions: fX(x) = p[ X ak ] ( x ak ) . Alternatively we give the probability mass

function pmf

p[X = ak] = pk, where pk > 0 and




1. Bernoulli: The Bernoulli rv X takes only two values 0 and 1. Therefore SX = {0, 1}. Its pmf is given in terms of the parameter p: p[x = 1] = p, p[x = 0] = 1 p; Examples: coin flip, rolling a die, indicator function of an event, arrivals in a very small time lot of length t 2. Binomial: The binomial random variable X with parameters n, p has a sample space SX = {0, 1, 2, , n}. Here p [0, 1] is a probability, and n is a positive n k p (1 p) n k , k = 0,1,, n. The binomial rv integer. The pmf is p[X = k] = k

k 1

X k is the sum of n Bernoulli rvs Xk.

Examples: The number of heads in n flips of a coin, the total number of arrivals in an interval. 3. Geometric: The geometric rv X with parameter p has a sample space SX = {0, 1, 2, } i.e. all non-negative integers. The pmf is given by p[X = k] = p(1 p)k, k = 0,1,2, Example: First head in tossing a coin. 4. Negative Binomial: The Negative Binomial (Pascal) random variable X with parameters r, p has a sample space SX = {r, r + 1, r + 2, }, where r is a positive integer and p is a probability. The pmf is k 1 r p (1 p) k r , k = r, r + 1, . p[X = k] = r 1 Example: The no of tosses required to get rth head. 5. Poisson: Poisson rv X, with parameter

has a sample space SX = {0, 1, 2,}. > 0 is a

Its pmf is given by p[X = k] = positive real number.


e , k = 0, 1, 2, where

Examples: The number of arrivals in a queuing system, the number of errors in a printed page.

6. Discrete Uniform: The discrete uniform rv X with parameters a, b where a and b are integers has the sample space SX = {a, a+1, , b 1, b}. Its pmf is given by 1 p[X = k] = for k = a, a+1,,b1, b b a 1 Continuous Models: If the rvs take a continuum of values (uncountable) then the rv X is called a continuous rv. For continuous random variables the cdf is continuous. 1. Uniform: The continuous uniform rv X wi th parameters a, b where a and b are real has a sample space SX = [a, b], where a < b. The pdf is fX(x) = x
[a, b] and = 0 otherwise.

1 b a


2. Gaussian (Normal): The Gaussian random variable X is defined on the sample space with parameters , 2 where, is a real parameter and 2 is any positive real number. The pdf is given by fX(x) =

1 2



,x (

, ).

The standard normal rv has pdf

(x) =

1 e 2 with parameters 0 and 1, i.e. 2



1. The values of the integrals


( x)dx are given in normal tables.

Examples: Any normal phenomenon like noise, distribution of height of people in a certain age group etc. 3. Exponential: The exponential rv X, with parameter > 0, sample space SX = [0, ). The pdf is given by fX(x) =

, x [0, ) .

Examples: Time between two consecutive arrivals, life of electronic components. 4. Gamma: The gamma random variable X, with parameters a, has sample space SX = [0, ). The parameters a, are real positive numbers. The pdf is given by ( x) a 1 e x fX(x) = , x [0, ) , where (a) s a 1e s ds , is the so-called (a) 0 gamma function. Sum of a exponential rvs is gamma if a is a positive integer.

5. Cauchy: The Cauchy rv X, with parameters a, b has a sample space SX = ( Here a, are real numbers with b > 0. The pdf is given by b fX(x) = , x ( , ). [(x a)2 b2 This is used in communication theory.

, ).

6. Erlang: This model is a special case of gamma random variable when a is a positive integer n. Sample space SX = [0, ). Pdf is ( x) n 1 e x fX(x) = , x [0, ) . (n 1)!

Moments of Random Variables

The pdf fX(x) of a rv X gives the complete information about a rv, i.e. the probabilities of any event involving the random variable can be obtained. Sometimes we might not need all that information or might not have all that information. Especially in modelling real world situations we might not have all the information required to determine the pdf. We might use single numbers representing the whole model, like the average weight of a 11 year old in Tamil Nadu is 32.5 kgs, average mileage of a Maruti 800 car which is less than a year old is 22 kms/litre. All this is partial information. But what kind of partial information will be useful? This partial information is provided by the moments of the pdf. Moments play a fundamental role. The notion of first moment (average) is used very widely as a representation of the random variable. (Chebyshev used higher order moments in central limit theorem) Average and Variance of a rv X Definition: The average of a rv X denoted by E(X) is defined as E(X) = xfx ( x)dx . E(X) is also called the expected value, expectation, first moment or mean of X. We usually use for E(X)

Note that if two rvs X1, X2 both have the same pdf then they have the same average. Hence it is a property of the pdf and not rv. Question: Is the mean well defined for any pdf fX(x)? We know that it is well defined only when

| xfx ( x) | dx =

| x | f x ( x)dx <

, i.e. the integral in the

definition of E(X) converges absolutely. For example the Cauchy rv dos not have a mean.

Definition: The variance of a random variable X (with a finite mean ), denoted by var(X) is defined as


)2 f X ( x)dx . The square root of the variance var(X)


is called the standard deviation of X and is usually denoted as also denoted as

2 X

.The variance is

Question: Is the variance well defined for any pdf fX(x)? It is, as the integrand in the definition of variance is always positive, even though it may equal + . When we fit pdfs for experimental data means and variances will definitely be well defined since fX(x) will vanish outside a finite interval. Properties of mean and variance: 1. Shifting a rv by a constant c . E(X + c) = E(X) + c Var(X+c) = Var(X) 2. Scaling a rv by a constant c , E(c.X) = c.E(X) Var(c.X) = c2.Var(X) 3. Relationship between variance and mean: Var(X) = Higher Moments: Definition: The nth raw moment of a rv X, denoted by E(Xn) is defined as E(Xn) =

x 2 f X ( x)dx

x n f X ( x)dx

Definition: The nth central moment of a rv X, denoted by EC(Xn) is defined as EC(Xn) =


)n f X ( x)dx

Hence we get, Var(X) = E(X2)

Average and Variance of the models: Random Variable 1. Bernoulli 2. Binomial 3. 4. 5. 6. 7. Geometric Negative Binomial Poisson Discrete Uniform Uniform

Average p np 1 p p r p

Variance p(1 p) np(1 p) 1 p p2 r (1 p) p2

b a 2 b a 2

(b a)(b a 2) 12 (b a)2 2

8. 9. 10. 11. 12.

Gaussian Exponential Gamma Cauchy Erlang

1 a
Does not exist n



Does not exist n


Transformations of Random Variables

In engineering we use a lot of transformations called data processing. Special cases are filtering, quantizing, clipping, predicting etc. Transforms are classified memoryless and with memory. A memoryless transformation is of the kind Y(s) = g(X(s)); the value of the output depends on only one value of the input. Memo21ryless transforms are quantizing and clipping. Transforms with memory are ARMA (autoregressive moving averages) Question: If the input X is a rv will the output Y be a rv? Yes, if and only if g is a measurable function. We consider only such functions g. PDF and CDF of the transformed random variable If X is a given rv and the processing function is g what is the cdf or pdf of Y? If A = {Y y} then FY(y) = p(A). To find FY we need to relate A to an equivalent event involving rv X. This event is B = {s| g(X(s)) y} = {s| X(s) g-1(y)} (Loosely! As g might not be invertible) FY(y) = p[Y y] = p[g(X) y] = p[X g-1(y)] (Again loosely) The right most probability can be evaluated from the cdf of X. Theorem: Consider a rv X with pdf fX(x) and a given transformation Y = g(X). If g-1(y) = {x1 , x2 ,..., xn } , then the pdf of Y is given by

1 . | dg ( xl ) / dx | l 1 If for a given y, g-1(y) is empty, then fY(y) = 0. fY ( y ) f X ( xl ).

Note: If n is uncountably infinite, the function g(x) must be flat over some regions of x. Then we can write p[Y = k] = P[x1 X x2] and if the RHS probability is not zero then Y has a point probability i.e. Y must have a discrete part.

Some important Transformations:

1. Linear (Affine) transformation: Let Y = aX + b where a and b are real. 1 y b Then fY(y) = fX ( ) Also the cdf is |a| a

y b ) if a > 0, a y b = 1 FX( ) if a < 0. |a| 2. Cosine Transformation of Uniform Random Variable This transformation is often used in communication channels. Let X be a uniform rv with range [0, 2). The transformation we use is Y = cos(X). Note that for each value y in (1, 1), the set cos-1(y) has two elements, and 2 , where 0 < < and cos() = y. Also 1 f X ( x) , 0 x 2 .Hence 2 1 1 1 fY ( y ) , 1 y 1 , since 2 1 y2 2 1 y2 1 y2 dg 1 . Note that the pdf approaches when y sin( x) dx 1 y2
FY(y) = FX( approaches 1. 3. Clipping Transformation This arises in signal processing applications. Let X be a rv with a range SX = ( , ) . In this case the transformation is x, if | x | a, g(x) = a, if x a,

a, if x a. Here Y will be a mixed type of rv given by f X ( y ), if | y | a, fY ( y ) ( y a). p[ X a], if y a, ( y a). p[ X a], if y a.

4. Quantizer Transformation A quantizer converts its input (continuous or discrete) into discrete valued outputs. For example sound video or voltage signals are quantized before storage in computers. Generic form of a quantizer with parameters d, a, is defined as follows. a, x [0, d ), d a, x [d ,2d ),

g(x) kd a, x [kd , (k 1)d ), d a, x [ d ,0), (k 1)d a, x [ (k 1)d , kd ). Let X be a random variable with SX = ( , ) . As g(x) is discrete valued, the pdf fY(y) will have only (.) functions. The probability mass function of Y is

P[Y = kd + a] = p[kd X < (k+1)d], k = 0,1,2,Similarly P[Y = -(k + 1)d + a] = p[-(k + 1)d X < -kd], k = 0,1,2, 5. Logarithmic transformation This is used when certain random variables have values covering several orders of magnitude. The probabilities also may range widely say from 0.1 to 0.00001 (10-5). Such values must be converted to logarithmic scale. Here g(x) = log(x). Hence the range of X is (0, ). Log is a monotone dg 1 function and we have only singleton sets for g-1(y) = ey. Also and dx x hence fY(y) = ey.fX(ey), y ( , ) .

Transforms of Probability Density Functions

We discuss three transforms: Characteristic function defined for all random variables taking positive as well as negative values; Laplace transform defined for random variables taking only positive real values; Probability generating function defined for discrete random variables taking non-negative integer values. Definition:











( )

f X (x)e j x dx,

, and this is the Fourier Transform of the pdf fX(x) but

for the sign of the exponent. X ( ) is well defined for all pdfs since | e j x | = 1and | X ( ) | 1. Note 1: Characteristic functions are associated with pdfs rather than random variables. Note 2: There is a one to one correspondence between a pdf and its characteristic function. We can recover the pdf from the characteristic function X ( ) by the inverse transform defined as 1 j x f X ( x) d , x . X ( )e 2 Properties of characteristic functions: Theorem: If

( ) is n times differentiable at the point

= 0. Then

E(Xn) =

1 dn X ( ) jn d n



Examples: Exponential pdf: If X is an exponential random variable with parameter , then its characteristic function is . j Gaussian pdf: If X is a Gaussian random variable with ,
2 2

( ) =

, then

( ) =e


Laplace transforms: Definition: The Laplace Transform of the pdf fX(x) of a positive random variable X is defined as

LX (s)

E (e



f X ( x)e sx dx.

In this s is a complex number with

positive real part. This is a sufficient condition for LX(s) to be well defined. An inverse formula also holds true for the Laplace Transform and there is a one-to-one correspondence between fX(x) and LX(s). c j 1 fX(x) = LX (s)esx ds, x . 2j c j Theorem: Let X be a random variable with Laplace Transform LX(s) that is n times differentiable at s = 0. Then d n LX (s) E(Xn) = ( 1)n at s 0. dsn Note: Expanding LX(s) about the origin s = 0, we get E( X n ) n s . LX(s) = n! n 0 Examples: Gamma pdf: X is a Gamma random variable with parameters

and a. Its Laplace

Transform is given by


Note: We use the Gamma integral


y a 1e y dy

Bernoulli pmf: For Bernoulli random variable the Laplace Transform is (1 - p) + pes. Probability Generating Function: For a discrete random variable X taking nonnegative integer values, let pX(k) represent p[X = k]. Definition: The probability generating function of the pmf pX(k) is defined as GX(z) = E(zX) =
k 0

pX (k ) z k . The parameter z is in general a complex number.