Académique Documents
Professionnel Documents
Culture Documents
then,
X
TOPICS E[X] = xP (X = x)
x
Counting:
Multiplication rule: n1 n2 . . . nr outcomes if experiment Linearity: Let X, Y be r.v. and let a be a fixed constant
has r steps and ith step has ni outcomes. then,
E[aX + Y ] = aE[X] + E[Y ]
Ordered, without replacement: n(n 1)(n 2)...(n k +
n!
1) = (nk)! ways to choose k items out of n without Let X be a discrete r.v. then,
replacement, if the order in which the k items are chosen
matters (e.g., out of 10 people, choose a president, vice- V ar(X) = E[(X E[X])2 ] = E[X 2 ] E[X]2
president, and treasurer)
Variance is NOT linear, X, Y be r.v. and let a be a fixed
Unordered, without replacement: nk = (nk)!k!
n!
ways to
constant then,
choose k items out of n without replacement if order does
not matter (e.g., out of 10 people, choose a team of 3). V ar(X + Y ) 6= V ar(X) + V ar(Y ) (Unless X and Y are indep)
k
Ordered, with replacement: n ways to choose k items V ar(aX) = a2 V ar(X)
out of n with replacement if order matters (e.g., roll 8
dice and place them in a line: n = 6, k = 8) Bernoulli Distribution
n+k1
Unordered, with replacement: k ways to choose k X Bern(p) and has PMF
items out of n with replacement if order does not matter
(e.g., shake 8 dice in a cup and pour them onto the table) (X = 1) = p and P (X = 0) = 1 p.
Probability
Let X be the outcome of one experiment that results in
Naive definition: either success (with probability p) or failure (with prob-
number of outcome favorable to A ability 1 p), then X Bern(p).
P (A) =
number of outcomes Mean: E[X] = p
Assumes finite sample space, all outcomes equally likely.
Binomial Distribution
Axiomatic definition: P is a function from the sample
space S to the interval [0, 1], P
such that P (), P (S) = 1 X Bin(n, p) and has PMF
and for disjoint A P (i Ai ) = i P (Ai )
n k
P (X = k) = p (1 p)nk for k {0, 1, . . . , n}
Conditional Probability k
P (A B)
P (A|B) = Let X be the number of successes out of n independent
P (B) Bernoulli experiments, each with probability p of success,
and in general P (A|B) 6= P (B|A). then X Bin(n, p).
Random Variables (r.v.) Let X be the number of failures before the first success
in a sequence of Bernoulli experiments, each with prob-
A random variable is a function that takes every outcome
ability p of success, then X Geom(p).
in the sample space and assigns to it a real number. It
is a numerical summary of the experiment. Mean: E[X] = 1p
p
The distribution of a r.v. specifies the probability that Hyper-Geometric Distribution
the r.v. will take on any given value or range of values.
For a discrete r.v., we can know the distribution by know- X HGeom(w, b, n) and has PMF
ing the probability mass function (PMF), P (X = x) for
w
b
all x. k nk
P (X = k) = w+b
for k {0, 1, 2, . . . }
Expectation and Variance n
1
Let X be the number of successful units drown out of Proof.
n draws without replacement from a population of w + b
units, where w is the number of successful units in then P (X x) = P (F 1 (U ) x)
X HGeom(w, b, n). = P (Y F (x)) = F (x)
w
Mean: E[X] = n w+b
Poisson Distribution Theorem 2. Let X be a r.v. with CDF FX (x) then Y =
F (X) U nif [0, 1].
X P ois() and has PMF
Proof. It is clear that Y takes values in (0, 1), then P (Y
k y) = 0 for y 0 and P (Y 1) = 1 for y 1. So for y (0, 1)
P (X = k) = e
k!
P (Y y) = P (F (X) y)
Let X be the count of a particular rare event for a set = P (X F 1 (y)) = F (F 1 (y)) = y
period of time, say an hour, which has an expectation of
then X P ois(). So Y has a U nif (0, 1) CDF.
and CDF (if x [a, b]) The MGF really does give you all of the moments of X!
For example, if we wish to find the nth moment of X we
u
ua
Z
1 just take the nth derivative of the MGF with respect to
F= U (u) = du = .
a b1 b1 t and evaluate it at t = 0
n
Key property: Universality of the Uniform: Let U be any
MX (t) = MX (0)
random variable with CDF FU (u) then FU (U ) U nif [0, 1]. tn t=0
n(n 1) . . . 1E[X n ]
= + terms containing t
Theorem 1. U U nif [0, 1] and let X = F 1 (U ) then X n!
has CDF F. = E[X n ]
2
If the MGF exists then we can write down the Taylor where x1 + + xk = n.
expansion around t = 0 to be Multivariate Normal Distribution (MVN): We say
X1 , . . . , Xn follows a MVN with mean and covariance
X E[X n ] n matrix if any linear combination
MX (t) = t
n=0
n!
X1 t1 + . . . Xn tn N ormal
If the MGF exists then it uniquely determines the dis-
Transformation Let X be a continues r.v. with PDF fX and
tribution of X, just like the CDF and PDF/PMF. If two
let Y = g(X) be a new random variable (note we require g(.)
r.v.s have the same MGF they must have the same dis-
to be differential and strictly increasing/decreasing). Then
tribution! But, be careful, if two random variables that
the PDF of Y is given by
dont have MGFs have all of the same moments that
doesnt mean that they have the same distribution!! x
fY (y) = fX (x)
If X and Y are independent then y
Transformation Let X be a continues r.v. with PDF fX and
MX+Y (t) = MX (t)MY (t)
let Y = g(X) be a new random variable (note we require g(.)
to be differential and strictly increasing/decreasing). Then
Joint Distributions Joint distributions of two random vari-
the PDF of Y is given by
ables X and Y has pdf
x
fX,Y (x, y) = fY |X (y|x)fX (x) = fX|Y (x|y)fY (y) fY (y) = fX (x)
y
which can be obtained by differentiating the CDF
where the right-hand side should be written as a function of
2 y, like so:
fX,Y (x, y) = FX,Y (x, y)
fY (y) = fX (g 1 (y)) g 1 (y)
xy
y
To make sure we have a valid PDF it must integrate to 1!
If (U, V ) is a random vector with joint PDF fU,V and
Z Z
(X, Y ) = g(U, V ) where g satisfies conditions analogous to
fX,Y (x, y)dxdy = 1 the one-dimensional case, then,
!
The marginal distribution, that is the distribution of each u u
x y
fX,Y (x, y) = fU,V (u, v)
of the variables forgetting about the other one, or just not v v
x y
carrying what value it takes, can be obtained by integrating
it out, which is also known as marginalizing. Conditional Expectation Let X and Y be two random vari-
Z ables, then the conditional expectation of Y given X is the
fX (x) = fX,Y (x, y)dy random variable
g(X) = E[Y |X].
To compute P (X X , Y Y) we simply integrate over the
Notice that, unlike the normal expectation, the conditional
sets X , Y (for example X = [0, 4] and Y = [10, 10])
expectation will give us a random variable rather then a num-
ber!! The best way to think of this is: Given that you have
Z Z
P (X X , Y Y) = fX,Y (x, y)dxdy observed X = x how does this change our believe about the
X Y
expectation of Y . Looking at it this way automatically gives
Covariance and Correlation: Let X and Y be two random you that if X and Y are independent then,
variables then the covariance is
E[Y |X] = E[Y ].
cov(X, Y ) = E[XY ] E[X]E[Y ]
The Law of Iterative Expectations (a.k.a. Adams Law) Let
and the correlation is X, Y be two random variables such that E[|X|] < then,
cov(X, Y )
1 corr(X, Y ) = p 1 E[X] = EY [EX|Y [X|Y ]]
var(X)var(Y )
For example if Y is discrete and takes values in a set Y
The covariance tells you how variable X reacts to a change
X
in variable Y , correlation is just standardized! For example if E[X] = P (Y = y)EX|Y [X|Y ]
two variables are the same then the correlation is 1! yY
Multinomial Distribution: Imagine throwing n objects into
k bins, each with probability p1 , . . . pk , such that i pi = 1 The Law of Iterative Variances (a.k.a. Eves Law) Let X, Y
P
then you end up with the multinomial distribution! X be two random variables such that E[|X|2 ] < then,
M ultk (n, p)
var(X) = EY [varX|Y (X|Y )] + varY (EX|Y [X|Y ])
n!
P (X1 = x1 , . . . , Xk = xk ) = px1 . . . pxkk Inequalities
x 1 ! . . . xk ! 1
3
1. Cauchy-Schwarz:
p X, Y r.v.s with finite variance. Recurrent state: starting at that state, will return with
|E[XY ]| E[X 2 ]E[Y 2 ] probability 1. If the chain is irreducible, all states are
recurrent.
2. Jensens: X r.v. and g a convex ( or concave) func-
tion. E[g(X)] g(E[X]) ( or E[g(X)] g(E[X])). e.g. Transient state: not recurrent; starting at that state, will
E[X 2 ] E[X]2 and E[log(X)] log E[X] eventually leave and never return.
3. Markov: X r.v. and a > 0. P (|X| a) E[|X|] Period of state i: gcd of the possible numbers of steps
a
it can take to return to i when starting at i. Aperiodic
4. Chebyshev: X r.v. with mean and variance 2 . P (|X state: state with period 1. Aperiodic chain: all states
2
| a) a2 . have period 1.
Classification of States: