Harvard

STAT 110 Cheat Sheet Let X be a discrete r.v.
then,
X
TOPICS E[X] = xP (X = x)
x
Counting:
Multiplication rule: n1 n2 . . . nr outcomes if experiment Linearity: Let X, Y be r.v. and let a be a fixed constant
has r steps and ith step has ni outcomes. then,
E[aX + Y ] = aE[X] + E[Y ]
Ordered, without replacement: n(n 1)(n 2)...(n k +
n!
1) = (nk)! ways to choose k items out of n without Let X be a discrete r.v. then,
replacement, if the order in which the k items are chosen
matters (e.g., out of 10 people, choose a president, vice- V ar(X) = E[(X E[X])2 ] = E[X 2 ] E[X]2
president, and treasurer)
Variance is NOT linear, X, Y be r.v. and let a be a fixed
Unordered, without replacement: nk = (nk)!k!
n!

ways to
constant then,
choose k items out of n without replacement if order does
not matter (e.g., out of 10 people, choose a team of 3). V ar(X + Y ) 6= V ar(X) + V ar(Y ) (Unless X and Y are indep)
k
Ordered, with replacement: n ways to choose k items V ar(aX) = a2 V ar(X)
out of n with replacement if order matters (e.g., roll 8
dice and place them in a line: n = 6, k = 8) Bernoulli Distribution
n+k1

Unordered, with replacement: k ways to choose k X Bern(p) and has PMF
items out of n with replacement if order does not matter
(e.g., shake 8 dice in a cup and pour them onto the table) (X = 1) = p and P (X = 0) = 1 p.
Probability
Let X be the outcome of one experiment that results in
Naive definition: either success (with probability p) or failure (with prob-
number of outcome favorable to A ability 1 p), then X Bern(p).
P (A) =
number of outcomes Mean: E[X] = p
Assumes finite sample space, all outcomes equally likely.
Binomial Distribution
Axiomatic definition: P is a function from the sample
space S to the interval [0, 1], P
such that P (), P (S) = 1 X Bin(n, p) and has PMF
and for disjoint A P (i Ai ) = i P (Ai )
n k
P (X = k) = p (1 p)nk for k {0, 1, . . . , n}
Conditional Probability k
P (A B)
P (A|B) = Let X be the number of successes out of n independent
P (B) Bernoulli experiments, each with probability p of success,
and in general P (A|B) 6= P (B|A). then X Bin(n, p).
Bayes rule Mean: E[X] = np
P (B|A)P (A) Geometric Distribution

P (A|B) =
P (B) X Geom(p) and has PMF
P (B|A)P (A)
=
P (B|A)P (A) + P (B|A)P (A) P (X = k) = p(1 p)k for k {0, 1, 2, . . . }
Random Variables (r.v.) Let X be the number of failures before the first success
in a sequence of Bernoulli experiments, each with prob-
A random variable is a function that takes every outcome
ability p of success, then X Geom(p).
in the sample space and assigns to it a real number. It
is a numerical summary of the experiment. Mean: E[X] = 1p
p
The distribution of a r.v. specifies the probability that Hyper-Geometric Distribution
the r.v. will take on any given value or range of values.
For a discrete r.v., we can know the distribution by know- X HGeom(w, b, n) and has PMF
ing the probability mass function (PMF), P (X = x) for
w
b
all x. k nk
P (X = k) = w+b
for k {0, 1, 2, . . . }
Expectation and Variance n
1
Let X be the number of successful units drown out of Proof.
n draws without replacement from a population of w + b
units, where w is the number of successful units in then P (X x) = P (F 1 (U ) x)
X HGeom(w, b, n). = P (Y F (x)) = F (x)
w
Mean: E[X] = n w+b
Poisson Distribution Theorem 2. Let X be a r.v. with CDF FX (x) then Y =
F (X) U nif [0, 1].
X P ois() and has PMF
Proof. It is clear that Y takes values in (0, 1), then P (Y
k y) = 0 for y 0 and P (Y 1) = 1 for y 1. So for y (0, 1)
P (X = k) = e
k!
P (Y y) = P (F (X) y)
Let X be the count of a particular rare event for a set = P (X F 1 (y)) = F (F 1 (y)) = y
period of time, say an hour, which has an expectation of
then X P ois(). So Y has a U nif (0, 1) CDF.
Mean: E[X] = . Exponential Distribution

Let X Expo(), then X has pdf
Negative Binomial
fX (x) = ex , x>0
X N Bin(r, p) and has PMF
and CDF
n+r1 FX (x) = 1 ex .
P (X = n) = (1 p)n pr
r1
Relationship with the uniform: X = log U Expo(1) if
U U nif [0, 1]. Key property: Memoryless: If you have
Let X be the number of failures before the first r suc-
waited t minutes in line gives you no idea of how much longer
cess when the probability of success is p, then X
you have to wait!!
N Bin(r, p).
Normal (Gaussian) Distribution
r(1p)
Mean: E[X] = p . Let X N (, 2 ), then X has a pdf (the cdf is nasty)
1 1 2
Fundamental Bridge Let IA be the indication of the occur- fX (x) = e 22 (x)
2 2
rence of event A. Then the fundamental bridge is
Relationship to uniform: Complicated, so Box-Muller trans-
P (A) = E[IA ]. form.
Maths Key property: The most magical R distribution, just pops up
2
everywhere! Also tells you that ex dx = 2, why?

X 1 Moment Generating Function (MGF)
an = for |a| < 1
n=0
1 a
Let X be any random variable, then the MGF of X is
X xn
=ex
n=0
n! MX (t) = E[etX ]
n
t2 X 2

X n nj j
x y =(x + y)n . = E 1 + tX + + ...
j=0
j 2!
t E[X 2 ]
2
Uniform Distribution = 1 + tE[X] + + ...
2!
Let U U nif [a, b], then U has a pdf
Notice that the MGF may not be finite for all values of
1 t, but for it to exist we require it to be finite in a small
fU (u) =
ba neighborhood around t = 0.
and CDF (if x [a, b]) The MGF really does give you all of the moments of X!
For example, if we wish to find the nth moment of X we
u
ua
Z
1 just take the nth derivative of the MGF with respect to
F= U (u) = du = .
a b1 b1 t and evaluate it at t = 0
n

Key property: Universality of the Uniform: Let U be any
MX (t) = MX (0)
random variable with CDF FU (u) then FU (U ) U nif [0, 1]. tn t=0
n(n 1) . . . 1E[X n ]
= + terms containing t
Theorem 1. U U nif [0, 1] and let X = F 1 (U ) then X n!
has CDF F. = E[X n ]
2
If the MGF exists then we can write down the Taylor where x1 + + xk = n.
expansion around t = 0 to be Multivariate Normal Distribution (MVN): We say

X1 , . . . , Xn follows a MVN with mean and covariance
X E[X n ] n matrix if any linear combination
MX (t) = t
n=0
n!
X1 t1 + . . . Xn tn N ormal
If the MGF exists then it uniquely determines the dis-
Transformation Let X be a continues r.v. with PDF fX and
tribution of X, just like the CDF and PDF/PMF. If two
let Y = g(X) be a new random variable (note we require g(.)
r.v.s have the same MGF they must have the same dis-
to be differential and strictly increasing/decreasing). Then
tribution! But, be careful, if two random variables that
the PDF of Y is given by
dont have MGFs have all of the same moments that

doesnt mean that they have the same distribution!! x
fY (y) = fX (x)
If X and Y are independent then y
Transformation Let X be a continues r.v. with PDF fX and
MX+Y (t) = MX (t)MY (t)
let Y = g(X) be a new random variable (note we require g(.)
to be differential and strictly increasing/decreasing). Then
Joint Distributions Joint distributions of two random vari-
the PDF of Y is given by
ables X and Y has pdf

x
fX,Y (x, y) = fY |X (y|x)fX (x) = fX|Y (x|y)fY (y) fY (y) = fX (x)
y
which can be obtained by differentiating the CDF
where the right-hand side should be written as a function of
2 y, like so:
fX,Y (x, y) = FX,Y (x, y)

fY (y) = fX (g 1 (y)) g 1 (y)

xy
y
To make sure we have a valid PDF it must integrate to 1!
If (U, V ) is a random vector with joint PDF fU,V and
Z Z
(X, Y ) = g(U, V ) where g satisfies conditions analogous to
fX,Y (x, y)dxdy = 1 the one-dimensional case, then,
!
The marginal distribution, that is the distribution of each u u
x y
fX,Y (x, y) = fU,V (u, v)

of the variables forgetting about the other one, or just not v v
x y
carrying what value it takes, can be obtained by integrating
it out, which is also known as marginalizing. Conditional Expectation Let X and Y be two random vari-
Z ables, then the conditional expectation of Y given X is the
fX (x) = fX,Y (x, y)dy random variable
g(X) = E[Y |X].
To compute P (X X , Y Y) we simply integrate over the
Notice that, unlike the normal expectation, the conditional
sets X , Y (for example X = [0, 4] and Y = [10, 10])
expectation will give us a random variable rather then a num-
ber!! The best way to think of this is: Given that you have
Z Z
P (X X , Y Y) = fX,Y (x, y)dxdy observed X = x how does this change our believe about the
X Y
expectation of Y . Looking at it this way automatically gives
Covariance and Correlation: Let X and Y be two random you that if X and Y are independent then,
variables then the covariance is
E[Y |X] = E[Y ].
cov(X, Y ) = E[XY ] E[X]E[Y ]
The Law of Iterative Expectations (a.k.a. Adams Law) Let
and the correlation is X, Y be two random variables such that E[|X|] < then,
cov(X, Y )
1 corr(X, Y ) = p 1 E[X] = EY [EX|Y [X|Y ]]
var(X)var(Y )
For example if Y is discrete and takes values in a set Y
The covariance tells you how variable X reacts to a change
X
in variable Y , correlation is just standardized! For example if E[X] = P (Y = y)EX|Y [X|Y ]
two variables are the same then the correlation is 1! yY
Multinomial Distribution: Imagine throwing n objects into
k bins, each with probability p1 , . . . pk , such that i pi = 1 The Law of Iterative Variances (a.k.a. Eves Law) Let X, Y
P
then you end up with the multinomial distribution! X be two random variables such that E[|X|2 ] < then,
M ultk (n, p)
var(X) = EY [varX|Y (X|Y )] + varY (EX|Y [X|Y ])
n!
P (X1 = x1 , . . . , Xk = xk ) = px1 . . . pxkk Inequalities
x 1 ! . . . xk ! 1
3
1. Cauchy-Schwarz:
p X, Y r.v.s with finite variance. Recurrent state: starting at that state, will return with
|E[XY ]| E[X 2 ]E[Y 2 ] probability 1. If the chain is irreducible, all states are
recurrent.
2. Jensens: X r.v. and g a convex ( or concave) func-
tion. E[g(X)] g(E[X]) ( or E[g(X)] g(E[X])). e.g. Transient state: not recurrent; starting at that state, will
E[X 2 ] E[X]2 and E[log(X)] log E[X] eventually leave and never return.
3. Markov: X r.v. and a > 0. P (|X| a) E[|X|] Period of state i: gcd of the possible numbers of steps
a
it can take to return to i when starting at i. Aperiodic
4. Chebyshev: X r.v. with mean and variance 2 . P (|X state: state with period 1. Aperiodic chain: all states
2
| a) a2 . have period 1.
Law of Large Number (LLN) Let X1 , X2 , X3 , . . . be i.i.d. Stationary Distribution:

random variables drawn from some distribution with mean Stationary distribution is probability row vector s satis-
= E[Xi ] for all i, then fying sQ = s. Assuming irreducibility and aperiodicity:
X1 + + Xn s is unique.
lim Sn = lim =
n n n
si is long-run probability of being at state i :
The Central Limit Theorem (CLT) Let Y be the sum of n, limn Qm is a matrix whose rows are all s.
for some large n, i.i.d. random variables drawn from some Expected time to return to i, starting at i, is 1/si .
distribution with mean E[Y ] = Y and variance var(Y ) =
Y2 , then Reversibility:
.
Y N (Y , Y2 ),
Reversible chain: there exists s satisfying si qij = sj qji
.
where denotes is approximately distributed. When we use for all i and j. Then s is the stationary distribution,
the CLT we usually have that Y1 = X1 + + Xn or Y2 = assuming the entries sum to 1.
Xn = n1 (X1 + + Xn ). Specifically if we assume that each
2 Important cases of reversibility:
Xi has mean X and variance X then
. 2
Columns of Q sum to 1:s is uniform over all states.
Y1 N (nX , nX ), (Note that this applies to all symmetric transition
2

. X matrices.)
Y2 = Xn N X , .
n Random walk on undirected network: s is propor-
tional to degree sequence.
Markov Condition:
Metropolis algorithm:
Markov chain X0 , X1 , X2 , . . . on state space {1, . . . , M }
satisfies the Markov condition Start with desired distribution s and an arbitrary chain
with transition matrix P .
P (Xn+1 = j|X0 = i0 , X1 = i1 , . . . , Xn1 = in1 , Xn = i)
= P (Xn+1 = j|Xn = i) Define new chain whose transitions are

sj pji
qij = pij min ,1
Only todays information matters for predicting tomor- si pij
row; additional information about the past is redundant
given todays information. Given the present, the past s pji
pij is the proposal probability, and min sji pij , 1 is the
and future are conditionally independent. acceptance probability.
Transition Matrix: The new chain satisfies si qij = sj qji , hence has station-
ary distribution s.
The (i, j) entry of transition matrix Q gives the prob-
ability of going from state i to state j in one step:
gij = qij = P (Xn+1 = j|Xn = i).
(m)
Qm is the m-step transition matrix: qij = P (Xn+m =
j|Xn = i).
If v is the row vector encoding the PMF of X0 , such that

P (X0 = i) = vi , then the PMF of Xn is vQn .
Irreducible chain: can get from any state to any other

state in a finite number of steps.
Classification of States:

Harvard

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Harvard

Transféré par

Droits d'auteur :

Formats disponibles

STAT 110 Cheat Sheet Let X be a discrete r.v.

Bayes rule Mean: E[X] = np

P (B|A)P (A) Geometric Distribution

Mean: E[X] = . Exponential Distribution

Law of Large Number (LLN) Let X1 , X2 , X3 , . . . be i.i.d. Stationary Distribution:

If v is the row vector encoding the PMF of X0 , such that

Irreducible chain: can get from any state to any other

Vous aimerez peut-être aussi