Académique Documents
Professionnel Documents
Culture Documents
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 1
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
1 Probability basics
2 Basics of Convexity
4 Concentration Inequalities
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 2
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 3
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Probability
Frequentist viewpoint. To compute the probability of an event
A ⊂ Ω, count the # occurences of A in N random experiments.
N (A)
Then P(A) = lim .
n→∞ N
E.g. A coin was tossed 100 times and 51 heads were counted.
P(A) ∼ 51/100. By W.L.L.N (see later slides) this estimate
converges to the actual probability of heads. If its a fair coin, this
should be 0.5.
If A and B are two disjoint events, N (A ∪ B) = N (A) + N (B).
Thus, the frequentist viewpoint seems to imply
P(A ∪ B) = P(A) + P(B) if A, B are disjoint events.
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 4
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Probability
Axioms of Probability
1 P(φ) = 0
2 P(Ω) = 1
3 If A1 , A2 , . . . , Ak are a collection of mutually disjoint events (i.e.
Pk
Ai ⊂ Ω, Ai ∩ Aj = φ). Then, P(∪Ai ) = i=1 P(Ai ).
E.g.: Two simultaneous ‘fair’ coin tosses. Sample Space:
{(H, H), (H, T ), (T, H), (T, T )}.
P(Atleast one head) = P((H, H) ∪ (H, T ) ∪ (T, H)) = 0.75.
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 5
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 7
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Bayes’ rule
Let’s say we are given P(A|B1 ) and we need to compute P(B1 |A). Let
B1 , . . . Bk be disjoint events the probability of whose union is 1. Then,
P(A|B1 )P(B1 )
P(B1 |A) = P(A)
P(A|B1 )P(B1 )
= Pk
i=1 P(A∩Bi )
P(A|B1 )P(B1 )
= Pk
i=1 P(A|Bi )P(Bi )
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 8
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Random Variables
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 9
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Expectation
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 10
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
More on Expectation
Conditional Expectation: Let X, Y be two random variables.
Given that Y = y, we can talk of the expectation of X conditioned
on the information that Y = y:
X
E[X|Y = y] = P(X = x|Y = y)x
x
Law of Total Expectation: Let X, Y be two random-variables.
The Law of Total Expectation then states that:
E[X] = E[E[X|Y ]]
The above equation can be interpreted as: The Expectation of X
can be computed by first computing the Expectation of the
randomness present only in X (but not in Y ) and next computing
the Expectation with respect to the randomness in Y . Thus, the
‘Total Expectation’ w.r.t X can be computed by evaluating a
‘sequence of partial Expectations’.
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 11
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Stochastic Process
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 12
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Stochastic Process
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 13
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Convex Set
A set C is a convex set if for any two elements x, y ∈ C and any
0 ≤ θ ≤ 1, θx + (1 − θ)y ∈ C.
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 14
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Convex Function
A function f : X → Y is convex if X is convex and for any two x, y ∈ X
and any 0 ≤ θ ≤ 1 it holds that, f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 15
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 16
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
f (y) = max xT y
x
s.t. x ∈ C.
P
Negative Entropy: f (x) = i xi log(xi ) is convex.
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 17
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Cauchy-Schwartz Inequality
Theorem 3.1
Let X and Y be any two random variables. Then it holds that
p p
|E[XY ]| ≤ E[X 2 ] E[Y 2 ].
Proof.
p p
Let U = X/ E[X 2 ], V = Y / E[Y 2 ]. Note that E[U 2 ] = E[V 2 ] = 1.
We need to show that |E[U V ]| ≤ 1. For any two U, V it holds that
2 |2
|U V | ≤ |U | +|V
2 . Now, |E[U V ]| ≤ E[|U V |] ≤ 12 E[|U |2 + |V |2 ] = 1.
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 18
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Jensens Inequality
Proof.
The proof uses the convexity of f . First note from the first order
0
conditions for convexity, for any two x, y f (x) ≥ f (y) + f (y)(x − y).
Now, let x = X, y = E[X]. Then we have that
0
f (X) ≥ f (E[X]) + f (E[X])(X − E[X]). Taking expectations on both
sides of the previous inequality, it follows that, E[f (X)] ≥ f (E[X]).
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 19
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Applications of Jensen’s
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 20
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Note that Jensen’s inequality even holds for compositions. Let g be any
function, f be a convex function. Then, f (E[g(X)]) ≤ E[f (g(X))].
Proof.
Since s < t, the function f (x) = xt/s is convex on R+ . Thus,
(E[|X|s ])t/s = f (E[|X|s ]) ≤ E[f (|X|s )] = E[|X|t ]. The result now
follows by taking the tth root on both sides of the inequality.
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 21
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 22
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
The GM-HM inequality also follows from Jensen’s. I.e., for any
a1 , a2 , . . . , an > 0 it holds that:
n
!1
n
Y n
ai ≥ Pn 1
i=1 i=1 ai
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 23
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Markov’s Inequality
Proof.
R∞
E[X] = 0 xf (x)dx
Rt R∞
= 0 xf (x)dx + t xf (x)dx
≥ tP(X > t)
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 24
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Chebyshev’s Inequality
Proof.
This is an application of Markov’s inequality. Set Z = |X − E(X)|2 and
apply Markov’s inequality to Z with t = s2 . Note that
E[Z] = Var[X].
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 25
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Proof.
W.log assume Xi has mean 0 and variance 1. Then variance of
X = n1 ni=1 Xi equals n1 . By Chebyshev’s inequality as n → ∞, X
P
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 26
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
p(X)
D(X||Y ) = Ep(X) log
q(X)
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 27
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Proof.
The proof follows from a simple application of Jensen’s inequality
q(X)
(Theorem 3.2). Note that f (x) = log x is concave. Let g(X) = p(X) .
Then,
q(X)
−D(X||Y ) = Ep(X) log p(X)
= Ep(X) log g(X)
≤ log Ep(X) g(X)
= 0
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 28
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
A tighter bound
Lemma 5.2
Proof.
1
Note that x(1 − x) ≤ 4 for all x. Hence,
Rp p 1−p
D(p||q) = q ( x − 1−x )dx
R p (p(1−x)−(1−p)(x))
= qR x(1−x) dx
p
≥ 4 q (p − x)dx
= 2(p − q)2
s
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 29
Probability basics Basics of Convexity Basic Probabilistic Inequalities Concentration Inequalities Relative Entropy Inequalities
Summary
K. Mohan, Prof. J. Bilmes EE514A/Fall 2013/Info Theory – Lecture 0 - Sep 26th, 2013 page 30