Académique Documents
Professionnel Documents
Culture Documents
An introduction
Paulo J S G Ferreira
SPL Signal Processing Lab / IEETA
Univ. Aveiro, Portugal
Rate distortion
1 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
2 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
2 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
2 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
2 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
2 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
3 / 80
Entropy
The entropy of a discrete random variable X with
probability density p(x) is
X
p(x) log p(x)
H(X) =
x
1
p(x)
Rate distortion
4 / 80
Relative entropy
The relative entropy measures the distance between
two probability functions:
D(p||q) =
p(x)
p(x) log
q(x)
p(x)
= Ep(x) log
q(x)
Rate distortion
5 / 80
Conditional entropy
H(X|Y) is the expected value of the entropies of the
conditional distributions p(x|y), averaged over the
conditioning variable Y:
H(X|Y) =
p(y) H(X|Y = y)
X
y
p(y)
x,y
x,y
Rate distortion
6 / 80
Mutual information
X
x,y
p(x, y)
p(x, y) log
p(x)p(y)
p(x, y)
= Ep(x,y) log
p(x)p(y)
Meaning:
How far from independent X and Y are
The information that X contains about Y
Rate distortion
7 / 80
p(x, y)
p(x, y) log
x,y
p(x|y)p(y)
p(x, y) log
x,y
p(x)p(y)
p(x|y)
p(x, y) log
x,y
p(x)p(y)
p(x)
x,y
x,y
x,y
= H(X) H(X|Y)
Meaning: measures the reduction in the uncertainty of X
due to knowing Y
Paulo J S G Ferreira (SPL)
Rate distortion
8 / 80
Rate distortion
9 / 80
Chain rules
The simplest is
H(X, Y) = H(X) + H(X|Y)
Compare with p(x, y) = p(y|x) p(x)
By repeated application
H(X, Y) = H(X) + H(Y|X)
H(X, Y, Z) = H(X, Y) + H(Z|X, Y)
= H(X) + H(Y|X) + H(Z|X, Y)
Rate distortion
10 / 80
Differential entropy
The continuous case is mathematically much more subtle
The differential entropy of X with probability density f (x) is
Z
h(X) =
Rate distortion
11 / 80
Gaussian variable
The differential entropy of a Gaussian variable with
density g(x) is
Z
H(X) =
(x)2
2 2
2 2
|
{z
= log p
2 2
(x)2
2 2
+ log e
(x )2
2 2
1
p
dx
g(x)
2 2
|
{z
g(x)
log p
2 2
|
{z
g(x)
1
2
log 2 2 +
1
2
log e =
1
2
(x)2
2 2
dx
log 2e 2
Rate distortion
12 / 80
Gaussian variable
f (x) log
f (x)
g(x)
dx 0
Rate distortion
13 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
14 / 80
Main ideas
Rate distortion
15 / 80
Weak typicality
Let Xn be an i.i.d. sequence X1 , X2 , . . . , Xn
It follows that
p(Xn ) = p(X1 )p(X2 ) p(Xn )
The weakly typical set is formed by sequences such that
1
log p(xn ) H(X)
n
The term on the left is the empirical entropy
The probability of the typical set approaches 1 as n
This does not mean that most sequences are typical but
rather that the non-typical sequences have small
probability
The most likely sequence may not be weakly typical
Paulo J S G Ferreira (SPL)
Rate distortion
16 / 80
Strong typicality
Rate distortion
17 / 80
Strong typicality
It is a stronger than weak typicality
It allows better probability bounds
The probability of a non-typical sequence not only goes to
zero, it satisfies an exponential bound:
P[X 6= T()] 2n()
where is a positive function
Proof depends on Chernoff-type bound: u(x a) 2b(xa)
yields
E[u(X a)] = P(X a) E[2b(Xa) ] = 2ba E[2bX ]
Rate distortion
18 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
19 / 80
Rate distortion
20 / 80
Rate distortion
21 / 80
Is everything lost?
Rate distortion
22 / 80
Rate distortion
23 / 80
The problem
Rate distortion
24 / 80
Specific cases
Lossless case
Special case, zero distortion, source coding theorem, etc.
Low distortion
What is the minimum distortion that can be achieved for a
certain channel?
Rate distortion
25 / 80
R
r
minimise
distortion
minimise
rate
lossless
Rate distortion
26 / 80
Terminology
xn
Encoder
Decoder
Rate distortion
xb n
27 / 80
Rate distortion
28 / 80
Average distortion
The average distortion is
b)
Ep(x,xb) d(x, x
Note that
X
b )d(x, x
b) =
p(x, x
b
x,x
b |x) d(x, x
b)
p(x) p(x
b
x,x
Rate distortion
29 / 80
Rate distortion
30 / 80
Rate distortion
31 / 80
b
0 x=x
b
1 x 6= x
Rate distortion
32 / 80
Rate distortion
33 / 80
xn
Encoder
fn (xn )
Rate distortion
Decoder
xb n
34 / 80
Why sequences?
Rate distortion
35 / 80
Code distortion
Rate distortion
36 / 80
Rate distortion
37 / 80
inf R
(R,D)C
Rate distortion
38 / 80
inf D
(R,D)C
Rate distortion
39 / 80
Rate distortion
40 / 80
R(I) (D)
r
Rate distortion
41 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
42 / 80
Rate distortion
43 / 80
min
b |x):E[d(x,x
b )]D
p(x
b
I(X, X)
Rate distortion
44 / 80
P(X = 0) = q
Rate distortion
45 / 80
p=0.1
p=0.2
p=0.5
0.8
0.6
0.4
0.2
0
0
Paulo J S G Ferreira (SPL)
0.2
0.4
0.6
Rate distortion
0.8
1
April 23, 2010
46 / 80
Proof
b then show that it is
The idea is to lower bound I(X, X),
achievable
We know that
b = H(X) H(X|X)
b
I(X, X)
Rate distortion
47 / 80
Proof (continued)
Let denote addition modulo 2 (that is, exclusive-or)
Note (why?)
b = H(X X|
b X)
b
H(X|X)
Rate distortion
48 / 80
Proof (continued)
Rate distortion
49 / 80
Rate distortion
50 / 80
Solution for D 2
Rate distortion
51 / 80
b = h(X) h(X|X)
b
I(X, X)
b X)
b
= h(X) h(X X|
b
h(X) h(X X)
b thus X X
b = Gaussian
To minimise this, maximise h(X X);
b 2 ] D, hence variance X X
b can be D
Constraint: E[(X X)
Result:
b
I(X, X)
1
2
log 2e 2
1
2
log 2eD =
1
2
2
log
Rate distortion
52 / 80
2
1
log D , D < 2
2
R(D) =
0,
otherwise
3.5
variance=1
variance=2
variance=3
3
2.5
2
1.5
1
0.5
0
0
Paulo J S G Ferreira (SPL)
0.5
1.5
Rate distortion
2.5
3.5
April 23, 2010
53 / 80
1
2
2
log
D(R) = 2 22R
2
SNR = 10 log10
Then
Rate distortion
54 / 80
b n)
R(D) = min I(Xn , X
b n |xn ) such
The minimum is over all density functions f (x
that
b n )] D
E[d(Xn , X
Rate distortion
55 / 80
X1
ln
L
Di
i2
Di
1
2Di
Di
+=0
Rate distortion
56 / 80
Di =
X1
log
i2
Di
c,
c < 2i
i2 , c 2i
Rate distortion
57 / 80
i2
c
D1 D2 D3 D
4
X1 X2 X3 X4
Paulo J S G Ferreira (SPL)
Rate distortion
58 / 80
Rate distortion
59 / 80
Rate distortion
60 / 80
as n
The same argument applies to the other conditions
As n the conditions will be met
Rate distortion
61 / 80
n p(a) < |A|
Also required: symbols of zero probability do not occur:
N(a) = 0 if p(a) = 0
Rate distortion
62 / 80
p(a, b) <
n
|A| |B|
for all possible a, b
Again, pairs of zero probability must not occur
Rate distortion
63 / 80
Set size
Rate distortion
64 / 80
Rate distortion
65 / 80
Rate distortion
66 / 80
Rate distortion
67 / 80
Rate distortion
68 / 80
Rate distortion
69 / 80
Rate distortion
70 / 80
Rate distortion
71 / 80
H(Xi )
H(Xi |X
X
bi)
=
I(Xi , X
X
b i )]
R E[d(Xi , X
1X
b i )]
=n
R E[d(Xi , X
n
1 X
b i )]
nR
E[d(Xi , X
n
b n )])
= nR(E[d(Xn , X
nR(D)
Paulo J S G Ferreira (SPL)
entropy is nonnegative
definition
independence
chain rule
conditioning reduces entropy
convexity of R()
definition
b n )] D
E(d(Xn , X
Rate distortion
72 / 80
Contents
Preliminaries
Computation algorithms
Rate distortion
73 / 80
Alternating projections
Consider two subspaces A and B of a common parent
space
A point in the intersection can be found by alternating
projections
x
Paulo J S G Ferreira (SPL)
Rate distortion
x
April 23, 2010
74 / 80
POCS
The method depends on the possibility of defining a
projection
Subspaces are obviously convex
Everything works for convex sets because projections are
well defined
A
b
a
z
c
Rate distortion
75 / 80
Rate distortion
76 / 80
Rate distortion
77 / 80
Rate distortion
78 / 80
R(D) =
b |x):
p(x
b |x):
p(x
b |x):
p(x
min
b
I(X, X)
min
b |x)d(x,x
b )D
p(x)p(x
b |x)d(x,x
b )D
p(x)p(x
min
b |x)d(x,x
b )D
p(x)p(x
= min
b ) p(x
b |x):
q(x
min
b |x) log
p(x)p(x
b |x)
p(x
b)
p(x
b |x) || p(x)p(x
b)
D p(x)p(x
b
x,x
b |x)d(x,x
b )D
p(x)p(x
b |x) || p(x)q(x
b)
D p(x)p(x
Rate distortion
79 / 80
Blahut-Arimoto
Rate distortion
80 / 80