Vous êtes sur la page 1sur 7

Technion Course 048703 - Spring 2008:

Noise Removal - An Information Theoretic View


Lecture Report: Denoising of Analog Data
Lecture by Tsachy Weissman
Report by Renata Goldman
Lecture 7 (15/06/2008)
Covered Topic: Universal Denoising
Kernel Density Estimation based techniques
Context Quantization based techniques
1 Lecture Summary
In this lecture we continue to explore the extension of the DUDE to the setting
of denoising of analog data. Before addressing the main topic, we bring up the
denition of Lvy distance, which will be useful for the analysis.
Denition (Lvy distance): Lvy distance is dened as a metric
Levy
in the
space of distribution functions (F, G : R [0, 1])of one M-dimensional
random variables such that:

L

Levy
(F, G) inf { > 0 : F(x ) G(x) F(x +) +, x}
If between the graphs of F and G one inscribes squares with sides parallel
to the coordinate axes (at points of discontinuity of a graph vertical seg-
ments are added), then a side of the largest of them is equal to
Levy
.
The denition of the Lvy metric carries over to the set M of all non-
decreasing functions on R (innite values of the metric being allowed).
Recalling previous lectures, one view of the DUDE was to partition the data
into subsequences, sized according to context (not too large so that we may infer
some statistic on their recurrence in the data, nor too short that we may decide
eectively when to correct an error), then to compete with a "symbol-by-symbol
genie", who knows both the clean input and the noisy output and chooses the
best sliding window denoising rule based on them.
1
The extension of this view to the analog setup requires some method of
quantization of the context (so we may infer the statistics of its appearence
in the data) and the competition with a "less powerfull genie", who knows the
noisy sequence, but only the input empirical distribution (given analog data, the
probability that the same sequence repeats itself is ZERO, therefore it would be
imposible to compete with the genie who knows exactly the input sequence).
At this point it is useful to dene some operators that will be required for
the proposed solution:
For a given distribution P, let P be the output of the system with
transition function given input with distribution P.
G = P g(z) =

(x, z) dP (x) (= E{ (x, z)} , X P)


Dene the set S

= { P : P is a CDF}. Notice that in the set I of all


possible distributions in S

, the operator is invertible (


1
exists).
The problem is then setup according to the following premises:
1. Bounded input alphabet; same estimator alphabet; output alphabet over
all real numbers:
X =

X = [B, B], Z = R
2. Unknown input clean sequence
3. Known memoryless channel, characterized by Transition Kernel (TK, 7),
where (x, ) is the density when the input is x, such that:
(a) the family of densities {(x, )}
x[B,B]
is independently distributed
(b) {(x, )}
x
is uniformly continuous under l
1
(c)
1
is uniformly continuous under Lvy metric
(d) {(x, Z
n
)}
x
is uniformly tight, which means that > 0, B such
that

B
B
(x, z) dz 1 , x .
4. The output function (x, x) 0 is uniformly continuous with (x, x) = 0
Figure 1: Problem Setup
2
Before we continue our analysis we would like to indicate by P the joint
distribution of the channel input X P and output Z P. The conditional
distribution of X given Z = z induced by P is indicated as [P ]
X|Z=z
.
Finally, let P
emp
(X
n
) be the clean input sequence empirical distribution.
For U (P) = min

(x, x) dP(x), the best estimator will be



X
Bayes
(P) =
argmin

(x, x) dP(x).
The goal of this lecture is to examine Context Quantization based techniques
for universal denoising. These techniques consist of building a series of denoisers


X
n
CQUDE

n1
, which are:
universal lim
n
E

L
X
n
CQUDE
(X
n
, Z
n
)

= D(X, ), X stationary
(logically) implementable
The principles of contruction of this series of denoisers is to divide the noisy
output Z
n
into subsequences through context quantization and to compete with
a "symbol-by-symbol genie" in every sub-group . In our case, the "genie" knows
the the empirical distribution at the source P
emp
(X
n
).
For the competition with the "symbol-by-symbol genie", let us assume a
semi-stochastic setting (X
n
a deterministic series). Dene:
D
0
(X
n
) = min
:RR
E

1
n
n

i=1
(X
i
, (Z
i
))

It may be shown that the minimum is achieved when (z) =



X
Bayes
([P
emp
(X
n
) ]
X|Z=z
).
We conclude that a denoiser will need an estimate for P
emp
(X
n
) in order
to compete well with this genie. Lets use Kernal Density Estimation for the
output,

f
Z
(z) =
1
nh
n

i=1
K

ZZi
h

, where Z
i
are i.i.d. (due to the semi-stochastic
setting) acording to (x, ). Using the operator dened earlier, the output
density may be described as

f
Z
(z) P
emp
(X
n
).

f
Z
(z) is a good estimator for the noisy output sequences if P
emp
(Z
n
) =


f
Z
. We may approximate the true empirical distribution of the analog
data Z
n
in the following fashion: let F
M
be the family of discrete distributions
with up to M mass points over the uniform grid [B, B]. Dene:

P
emp
M
(Z
n
) = argmin
P
X
:P
X
F
M

f
Z
P
X
|
.
Theorem: If M = M
n

n
then
L

P
emp
M
(Z
n
) , P
emp
(X
n
)


n
0 a.s., X,
where F
M
P is the distribution in F
M
closest to P by Lvy Metric.
3
Proof: It is sucient to prove that

=
L


P
emp
M
(Z
n
) , P
emp
(X
n
)


n
0, a.s. X. This is due to
1
being uniformly continuous. Then:

1


P
emp
M
(Z
n
) ,
1
P
emp
(X
n
)

P
emp
M
(Z
n
) , P
emp
(X
n
)


n
0

|

P
emp
M
(Z
n
) P
emp
(X
n
) |

|

P
emp
M
(Z
n
)

f
Z
| +

f
Z
P
emp
(X
n
) |

| F
m
P
emp
(X
n
)

f
Z
| +

f
Z
P
emp
(X
n
) |

| F
m
P
emp
(X
n
) P
emp
(X
n
) | + 2

| P
emp
(X
n
)

f
Z
|
((F
m
P
emp
(X
n
) , P
emp
(X
n
))) + 2

| P
emp
(X
n
)

f
Z
|

2B
M

+ 2

| P
emp
(X
n
)

f
Z
|
n
0 a.s., X
4
2 Apendix
2.1 Kernel Density Estimation
Denition (Kernel density estimation): the Kernel density estimation is a
way of estimating the probability density function of a random variable,
in which, given some data about a sample of a population it is possible to
extrapolate the data to the entire population. A histogram can be thought
of as a collection of point samples from a kernel density estimate for which
the kernel is a uniform box the width of the histogram bin.
If x
1
, x
2
, ..., x
N
f is an i.i.d sample of a random variable, then the Kernel
density approximation of its probability density function is

f
h
(x)
1
Nh
N

i=1
K

xxi
h

,
where K is some kernel and h is the bandwidth (smoothing parameter).
Additional denition (Kernel): A Kernel is a weighting function used in
non-parametric estimation techniques. It is a non-negative real-valued
integrable function K () satisfying the following two requirements:

K (u) du = 1
K (u) = K (u) , u
2.2 Quantization
Denition (quantization): Quantization is the process of approximating a
continuous range of values (or a very large set of possible discrete values)
by a relatively-small set of discrete symbols or integer values. The input
to a quantizer is the original data, and the output is always one out of a
nite number of levels. The function used in this process has a discrete
set of output values, usually nite, and its called the quantizer. Clearly,
since this is a process of approximation, a good quantizer is one which
represents the original signal with minimum loss or distortion.
5
3 Proposed Exercises
1. Show that if F and G have densities f and g, respectively, then
L
(F, G)

|f g|
2. Show that the Transition Kernel {(x, )}
x
is uniformly continuous under
l
1
if and only if the operator is uniformly continuous under l
1
. In other
words, show that

| F G| (
L
(F, G)), where ()
0
0.
3. Show that:
(a) the minimum in D
0
(X
n
) is given by (z) =

X
Bayes
([P
emp
(X
n
) ]
X|Z=z
).
(b) if
L

P
emp
(Z
n
) , P
emp
(X
n
)


n
0, a.s., X, where

P
emp
(Z
n
)
is the estimate empirical distribution on the output data, and if

X
n
(Z
n
) [i] =

X
Bayes
(

P
emp
(Z
n
)

X|Z=Zi
), then
lim
n

X
n
(X
n
, Z
n
) D
0
(X
n
)

= 0, a.s., X
4. Prove that

f
Z
(z) is a good estimator for the output density in the following
fashion: let J
n
=

|f
Z
P
emp
(X
n
) |; prove that if h
n

n0
0 and
h
n

n
, then > 0, r > 0, such that P(J
n
> ) e
nr
, n, X
5. Check that

P
emp
M
(Z
n
) = argmin
P
X
:P
X
Fm

f
Z
P
X
| is a convex optimization
problem.
6. Show that
L
(F
m
P, P)
2B
M
, P M ([B, B]) (M ([B, B]) is the
set of all distributions with support set [B, B]).
6
References
[1] Lecture by Tsachy Weissman at Technion - Israel Institute of Technology,
June 15th, 2008
[2] Encyclopedia of Mathematics (http://eom.springer.de/l/l058310.htm)
[3] Seminar: "Discrete to Analog and Back: The DUDE Framework for Denois-
ing Discrete and Analog Data" by Tsachy Weissman at Technion - Israel
Institute of Technology, June 11th, 2008
[4] Wikipedia, the free encyclopedia (http://en.wikipedia.org)
7

Vous aimerez peut-être aussi