See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/255586744
Article · January 2004
CITATIONS 
READS 

2 
37 

1 
author: 
All content following this page was uploaded by Debasis Kundu on 15 February 2015.
The user has requested enhancement of the downloaded file.
D. Kundu
Signal processing may broadly be considered to involve the recovery of information from physical observations. The received signal is usually dis turbed by thermal, electrical, atmospheric or intentional interferences. The received signal can not be predicted deterministically, so statistical meth ods are needed to describe the signals. The problem of detecting the signal in the midst of noise arises in a wide variety of areas such as communi cations, radio location of objects, seismic signal processing and computer assisted medical diagnosis. Statistical signal processing is also used in many physical science applications, such as geophysics, acoustics, texture classiﬁ cations, voice recognition and meteorology among many others. During the past ﬁfteen years, tremendous advances have been achieved in the area of digital technology in general and in statistical signal processing in particu lar. Several sophisticated statistical tools have been used quite eﬀectively in solving various important signal processing problems. Various problem speciﬁc algorithms have evolved in analyzing diﬀerent type of signals and they are found to be quite eﬀective in practice. The main purpose of this chapter is to familiarize the statistical com munity with some of the signal processing problems where statisticians can play a very important role to provide satisfactory solutions. We provide examples of the diﬀerent statistical signal processing models and describe in detail the computational issues involved in estimating the parameters of one of the most important models, namely the undamped exponential mod els. The undamped exponential model plays a signiﬁcant role in statistical signal processing and related areas.
372
Statistical Computing
We also would like to mention that statistical signal processing is a huge area and it is not possible to cover all the topics in a limited space. We provide several important references for further reading. To begin with we give several examples to motivate the readers and to have a feeling for the subject.
Example 1: In speech signal processing, the accurate determination of the resonant frequencies of the vocal tract (i.e. the formant frequencies) in various articulatory conﬁguration is of interest, both in synthesis and in the analysis of speech. In particular for vowel sounds, the formant frequencies play a dominant role in determining which vowel is produced by a speaker and which vowel is perceived by a listener. If it is assumed that the vocal tract is represented by a tube of varying cross sectional area and the con ﬁguration of the vocal tract varies little during a time interval of one pitch period then the pressure variation p(t) at the acoustic pick up at the time point t can be written as
p(t) =
K
i=1
c _{i} e ^{s} ^{i} ^{t}
(see Fant 1960; Pinson 1963). Here c _{i} and s _{i} are complex parameters. Thus for a given sample of an individual pitch frame it is important to estimate the amplitudes c _{i} ’s and the poles s _{i} ’s.
Example 2: In radioactive tracer studies the analysis of the data ob tained from diﬀerent biological experiments is of increasing importance. The tracer material may be simultaneously undergoing diﬀusion, excretion or interaction in any biological systems. In order to analyze such a complex biological system it is often assumed that the biological system is a simple compartment model. Such a model is valid to the extent that the results calculated on the basis of this model agree with those actually obtained in a real system. Further simpliﬁcation can be done by assuming that the system is in a steady state, that is the inﬂow of the stable material is equal to the outﬂow. It has been observed by Sheppard and Householder (1951) that under above assumptions, speciﬁc activity at the time point t can be well approximated by
f(t) =
p
i=0
N _{i} e ^{λ} ^{i} ^{t}
,
where f(t) is the speciﬁc density of the material at the time point t, N _{i} ’s are the amplitudes, λ _{i} ’s are the rate constants and p represents the number of components. It is important to have a method of estimation of N _{i} , λ _{i} as well as p.
Example 3: Next consider a signal y(t) composed of M unknown damped
Computational Aspects in Statistical Signal Processing
373
or undamped sinusoidal signals , closely spaced in frequency, i.e.,
y(t) =
M
k=1
_{A} k _{e} −δ _{k} t+j2πω _{k} t
,
where A _{k} ’s are amplitudes, δ _{k} (> 0)’s are the damping factors, ω _{k} ’s are
the frequencies and j = ^{√} −1. For a given signal y(t) at ﬁnite time points, the problem is to estimate the signal parameters, namely A _{k} ’s, δ _{k} ’s, ω _{k} ’s and M. This problem is called the spectral resolution problem and is very important in digital signal processing.
Example 4: In texture analysis it is observed that certain metal textures can be modeled by the following two dimensional model
y(m, n) =
P
k=1
(A _{k} cos(mλ _{k} + nµ _{k} ) + B _{k} sin(mλ _{k} + nµ _{k} )) ,
here y(m, n) represents the gray level at the (m, n)th location, (µ _{k} , λ _{k} ) are the unknown two dimensional frequencies and A _{k} and B _{k} are unknown amplitudes. Given the two dimensional noise corrupted images it is often required to estimate the unknown parameters A _{k} , B _{k} , µ _{k} , λ _{k} and P . For diﬀerent forms of textures, the readers are referred to the recent articles of Zhang and Mandrekar (2002) or Kundu and Nandi (2003).
Example 5: Consider a linear array of P sensors which receives signals,
say x(t) = (x _{1} (t),
, x _{M} (t)), from M sources at the time point t. The
, θ _{M} with respect to the line of
. . . signals arrive at the array at angles θ _{1} ,
. . . array. One of the sensors is taken to be the reference element. The signals are assumed to travel through a medium that only introduces propagation delay. In this situation, the output at any of the sensors can be represented as a time advanced or time delayed version of the signals at the reference element. The output vector y(t) can be written in this case
y(t) = Ax(t) + (t);
t = 1,
. . .
, N,
where A is a P ×M matrix of parameters which represents the time delay. The matrix A can be represented as
where
ω _{k} = π cos(θ _{k} )
A = [a(ω _{1} ),
. . .
, a(ω _{M} )],
and
a(ω _{k} ) = [1, e ^{−}^{j}^{ω} ^{k} ,
. . .
_{,} _{e} ^{−}^{j}^{(}^{P} ^{−}^{1}^{)}^{ω} ^{k} _{]} ^{T}
.
With suitable assumptions on the distributions of x(.) and (.), given a
sample at N time points, the problem is to estimate the number of sources
and also the directions of
arrival θ _{1} ,
. . .
, θ _{M} .
374
Statistical Computing
Throughout this chapter, scalar quantities are denoted by regular lower or
upper case letters, lower and upper case bold type faces are used for vectors
and matrices. For real matrix, A ^{T} denotes the transpose of the matrix A
and for complex matrix A, A ^{H} denotes the complex conjugate transpose.
An n × n diagonal matrix with diagonal entries λ _{1} ,
. . .
λ _{n} is denoted by
diag{λ _{1} ,
. . .
, λ _{n} }. Suppose A is a m×m real matrix, then the projection
matrix on the column space of A is denoted by P _{A} = A(A ^{T} A) ^{−}^{1} A ^{T} .
We need the following deﬁnition and one matrix theory result. For a
detailed discussions of matrix theory the readers are referred to Rao (1973),
Marple (1987) and Davis (1979).
Definition 1:
matrix if
An
n × n
matrix
J is called a reﬂection (or exchange)
J =
0
0
.
.
.
0
1
0
0
.
.
.
1
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
1
.
.
.
0
0
1
0
.
.
.
0
0
.
Result 1: (Spectral Decomposition) If an n × n matrix A is Hermitian,
then all its eigenvalues are real and it is possible to ﬁnd n normalized
eigenvectors v _{1} ,
that
. . .
, v _{n} corresponding to n eigenvalues
A =
n
i=1
λ _{i} v _{i} v
H
i
^{.}
λ _{1}
. . .
, λ _{n}
such
If all the λ _{i} ’s are nonzero then from Result 1, it is immediate that
_{A} −1
=
n
i=1
1
λ i
v _{i} v
H
i
^{.}
Now we provide one important result used in statistical signal pro
cessing known as Prony’s method. It was originally proposed more than
two hundred years back by Prony (1795), a Chemical engineer. It is de
scribed in several numerical analysis text books and papers, see for ex
ample Froberg (1969), Hildebrand (1956), Lanczos (1964) and Barrodale
and Oleski (1981). Prony ﬁrst observed that for arbitrary real constants
α _{1} ,
. . .
, α _{M} and for distinct constants β _{1} ,
. . .
, β _{M} , if
µ _{i} = α _{1} e ^{β} ^{1} ^{i} +
. . .
+ α _{M} e ^{β} ^{M} ^{i} ;
i = 1,
. . .
, n,
then there exists (M + 1) constants, {g _{0} ,
. . .
, g _{M} }, such that
µ _{1}
.
.
.
µ _{n}_{−}_{M}
.
.
.
.
.
.
. .
.
µ _{M}_{+}_{1}
.
.
.
µ _{n}
g 0
.
.
.
g M
0
= . .
.
0
.
(14.1)
(14.2)
Computational Aspects in Statistical Signal Processing
375
Note that without loss of generality we can always put a restriction on g _{i} ’s
such that ^{}
M
i=0 ^{g} i
2
= 1. The above n − M equations are called Prony’s
equations. Moreover, the roots of the following polynomial equation
g _{0} + g _{1} z +
. . .
+ g _{M} z ^{M} = 0
(14.3)
are e ^{β} ^{1} ,
{g _{0} ,
. . .
. . .
, e ^{β} ^{M} . Therefore, there is a one to one correspondence between
, g _{M} }, such that
M
g
2
i
i=0
= 1,
g _{0} > 0
(14.4)
and the nonlinear parameters {β _{1} ,
. . .
, β _{M} }
as given
above.
It
is also
interesting to note that g _{0} ,
. . .
, g _{M} are independent of the linear parame
ters α _{1} ,
. . .
, α _{M} . One interesting question is how to recover α _{1} ,
. . .
, α _{M}
and β _{1} ,
. . .
, β _{M}
,
if µ _{1} ,
. . .
, µ _{n}
are given.
It
is clear
that
for
a
given
µ _{1} ,
, µ _{n} , g _{0} ,
are such that, (14.2) is satisﬁed and they can be
. . .
, g _{M} ,
. . .
, α _{M} ,
. . .
. . .
easily recovered by solving the linear equations (14.2). From g _{0} ,
, β _{M} can be obtained. Now to recover α _{1} ,
by solving (14.3), β _{1} ,
we write (14.1) as
µ = Xα,
(14.5)
where µ
=
(µ _{1} ,
. . .
, µ _{n} ) ^{T} , and
α
=
(α _{1} ,
. . .
, α _{M} ) ^{T}
are
n × 1
M × 1 vectors respectively. The n × M
matrix X is as follows:
and
X =
e ^{β} ^{1}
.
.
.
e ^{n}^{β} ^{1}
.
.
.
.
.
.
.
.
.
e ^{β} ^{M}
.
.
.
e ^{n}^{β} ^{M} .
.
Therefore, α = (X ^{T} X) ^{−}^{1} X ^{T} µ. Since β _{i} ’s are distinct and M < n, note
that X ^{T} X is a full rank matrix.
The estimation of the frequencies of the sinusoidal components embedded
in additive white noise is a fundamental problem in signal processing. It
arises in many areas of signal detection. The problem can be written math
ematically as follows:
y(n) =
M
i=1
A _{i} e ^{j}^{ω} ^{i} ^{n} + z(n),
(14.6)
where A _{1} ,
. . .
, A _{n} are unknown complex amplitudes, ω _{1} ,
. . .
, ω _{n} are un
known frequencies, ω _{i} ∈ (0, 2π), and z(n)s are independent and identi
cally distributed random variables with mean zero and ﬁnite variance ^{σ} ^{2}
2
376
Statistical Computing
for both the real and imaginary parts. Given a sample of size N, namely
y(1),
. . .
, y(N), the problem is to estimate the unknown amplitudes, the
unknown frequencies and sometimes the order M also. For a single si
nusoid or for multiple sinusoids, where frequencies are well separated, the
periodogram function can be used as a frequency estimator and it provides
a optimal solution. But for unresolvable frequencies, the periodogram can
not be used and also in this case the optimal solution is not very practical.
The optimal solutions, namely the maximum likelihood estimators or the
least squares estimators involve huge computations and usually the general
purpose algorithms like GaussNewton algorithm or NewtonRaphson algo
rithm do not work. This has led to the introduction of several suboptimal
solutions based on the eigenspace approach. But interestingly most pro
posed methods use the idea of Prony’s algorithm (as discussed in the pre
vious section) some way or the other. In the complex model, it is observed
that the Prony’s equations work and in case of undamped exponential model
(14.6), there exists a symmetry relations between the Prony’s coeﬃcients.
These symmetry relations can be used quite eﬀectively to develop some
eﬃcient numerical algorithms.
14.3.1 Symmetry Relation
M
If we denote µ _{n} = E(y(n)) = ^{} _{i}_{=}_{1} A _{i} e ^{j}^{ω} ^{i} ^{n} , then it is clear that there
exists constants g _{0} ,
. . .
M
, g _{M} , with ^{} _{i}_{=}_{0} g _{i}  ^{2} = 1, such that
µ _{1}
.
.
.
µ _{n}_{−}_{M}
.
.
.
.
.
.
.
.
.
µ _{M}_{+}_{1}
.
.
.
µ
_{n}
g 0
.
.
.
g M
0
= . .
.
0
.
(14.7)
Moreover, z _{1}
equation
=
e ^{j}^{ω} ^{1} ,
. . .
, z _{n}
=
e ^{j}^{ω} ^{n} are the roots of the polynomial
P (z) = g _{0} + g _{1} z +
. . .
+ g _{M} z ^{M}
= 0.
Note that
z _{1}  =
. . .
= z _{M}  = 1,
z¯ _{i} = z
−1
i
;
i = 1,
. . .
, M.
Deﬁne the polynomial Q(z) by
Q(z) = z ^{−}^{M} P ¯ (z) = g¯ _{0} z ^{−}^{M} +
. . .
+ g¯ _{M} .
(14.8)
Using (14.8), it is clear that P (z) and Q(z) have the same roots.
Com
paring coeﬃcients of the two polynomials yields
g k
g M
=
g¯ M−k
g¯ _{0}
;
k = 0,
. . .
, M.
Computational Aspects in Statistical Signal Processing
377
Deﬁne 

1 

b _{k} = g _{k} 
g¯ g 
2 

; 
k = 
0, 
, M, 

thus 

¯ 

b _{k} = 
b _{M}_{−}_{k} ; 
k = 0, 
, M. 
(14.9) 

The condition (14.9) is the conjugate symmetric property and can be writ 

ten compactly as 

b = J b, ¯ 

here b 
= 
(b _{0} , 
, b _{M} ) ^{T} 
and 
J 
is 
an exchange matrix 
as deﬁned be 
fore. Therefore, without loss of generality it is possible to say the vector
g = (g _{0} ,
. . .
satisﬁes
M
, g _{M} ) ^{T} , such that ^{} _{i}_{=}_{0} g _{i}  ^{2} = 1 which satisﬁes (14.7) also
g = Jg¯.
It will be used later on to develop eﬃcient numerical algorithm to obtain
estimates of the unknown parameters of the model (14.6).
14.3.2 Least Squares Estimators and Their Properties
In this subsection, we consider the least squares estimators and provide
their asymptotic properties. It is observed that the model (14.6) is a non
linear model and therefore it is diﬃcult to obtain theoretically any small
sample properties of the estimators. The asymptotic variances of the least
squares estimators attain the CramerRao lower bound under the assump
tions that the errors are independently and identically distributed normal
random variables. Therefore, it is reasonable to compare the variances of
the diﬀerent estimators with the asymptotic variances of the least squares
estimators.
The least squares estimators of the unknown parameters of the model
(14.6) can be obtained by minimizing
Q(A, ω) =
N
n=1
y(n) −
M
i=1
_{A} i _{e} jω _{i} n
2
(14.10)
with
respect to A
jA _{M}_{C} ) and
ω
=
=
(A _{1} ,
(ω _{1} ,
. . .
, A _{M} )
(A _{1}_{R} + jA _{1}_{C} ,
, A _{M}_{R} +
=
. . .
A _{k}_{R} and A _{k}_{C} denote the real and
imaginary parts of A _{k} respectively. The least squares estimator of θ =
(A _{1}_{R} , A _{1}_{C} , ω _{1} ,
. . .
, A _{M}_{R} ,
A _{M}_{C} , ω _{M} ) will
be denoted by
θ ˆ = (
ˆ
A 1R ,
A ˆ _{1}_{C} , ωˆ _{1} ,
. . .
,
ˆ
A MR ,
A ˆ _{M}_{C} , ωˆ _{M} ). We use σˆ ^{2} =
_{N} Q( A, ˆ ωˆ) an estimator
1
of σ ^{2} Unfortunately, the least squares estimators can not be obtained very
easily in this case. Most of the general purpose algorithms, for example the
GaussNewton algorithm or the NewtonRaphson algorithm do not work
well in this case. The main reason is that the function Q(A, ω) has many
local minima and therefore most of the general purpose algorithms have the
tendency to terminate at an early stage to a local minimum rather than
378
Statistical Computing
the global minimum even if the initial estimates are very good. We provide
some special purpose algorithms to obtain the least squares estimators, but
before that we state some properties of the least squares estimators. We
introduce the following notations. Consider the following 3M × 3M block
diagonal matrices
D =
D _{1}
0
.
.
.
0
0
D _{2}
.
.
.
0
0
0
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
D
_{M}
,
where each D _{k} is a 3 × 3 diagonal matrix and
Also
D _{k} = diag N ^{1} 2 , N
2 , N ^{3} .
^{1}
2
Σ =
Σ _{1}
0
.
.
.
0
0
Σ _{2}
.
.
.
0
0
0
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
Σ _{M}
where each Σ _{k} is a 3 × 3 matrix as
Σ _{k} =
1
2 ^{+}
^{2}
3 A _{k}  ^{2}
kC
2
A
−
3
A kR A kC
2
3
A _{k}  ^{2}
A
^{2}
kC
A _{k} 
_{2}
−
3
2
A kR A kC
A _{k}  ^{2}
3
A
^{2}
kC
A _{k}  ^{2}
2 _{+} 3
1
2
A
^{2}
kR
A _{k} 
_{2} −3
A ^{2}
kR
A _{k}  ^{2}
−3
A
^{2}
kR
A _{k}  ^{2}
6
A _{k}  ^{2}
.
Now we can state the main result from Rao and Zhao (1993) or Kundu and
Mitra (1997).
Result 2: Under very minor assumptions, it can be shown that
A, ˆ ωˆ
and σˆ ^{2} are strongly consistent estimators of A, ω and σ ^{2} respectively.
Moreover, ( θ ˆ − θ)D converges in distribution to a 3M × 3Mvariate
normal distribution with mean vector zero and covariance matrix σ ^{2} Σ,
where D and Σ are as deﬁned above.
In this subsection we discuss diﬀerent iterative methods to compute the
least squares estimators. The least squares estimators of the unknown
parameters can be obtained by minimizing (14.10) with respect to A and
ω. The expression Q(A, ω) as deﬁned in (14.10) can be written as
Q(A, ω) = (Y − C(ω)A) ^{H} (Y − C(ω)A). (14.11)
Computational Aspects in Statistical Signal Processing
379
Here the vector A is same as deﬁned in the subsection (14.3.2). The vector
Y = (y(1),
. . .
, y(N)) ^{T}
and C(ω) is an N × M
matrix as follows
e ^{j}^{ω} ^{1}
.
.
.
_{e} jNω _{1}
.
.
.
.
.
.
_{.}
_{.}
_{.}
e ^{j}^{ω} ^{M}
.
.
.
_{e} jNω _{M}
.
Observe that here the linear parameters A _{i} s are separable from the non
linear parameters ω _{i} s. For a ﬁxed ω, the minimization of Q(A, ω) with
respect to A is a simple linear regression problem. For a given ω, the value
ˆ
of A, say A(ω), which minimizes (14.11) is
ˆ
A(ω)
= (C(ω) ^{H} C(ω)) ^{−}^{1} C(ω) ^{H} Y.
ˆ
Substituting back A(ω) in (14.11), we obtain
ˆ
R(ω) = Q( A(ω), ω) = Y ^{H} (I − P _{C} )Y,
(14.12)
where P _{C} = C(ω)(C(ω) ^{H} C(ω)) ^{−}^{1} C(ω) ^{H} is the projection matrix on
the column space spanned by the columns of the matrix C(ω). Therefore,
the least squares estimators of ω can be obtained by minimizing R(ω) with
respect to ω. Since minimizing R(ω) is a lower dimensional optimization
problem than minimizing Q(A, ω), therefore minimizing R(ω) is more
economical from the computational point of view. If ωˆ minimizes R(ω),
then
ˆ
ˆ
A(ωˆ), say A, is the least squares estimator of ω.
ˆ
ˆ
Once ( A, ωˆ) is
obtained, the estimator of σ ^{2} , say σ ^{2} , becomes
ˆ
1
ˆ
σ ^{2} = N Q( A, ωˆ).
Now we discuss how to minimize (14.12) with respect to ω. It is clearly
a nonlinear problem and one expects to use some standard nonlinear min
imization algorithm like GaussNewton algorithm, NewtonRaphson algo
rithm or any of their variants to obtain the required solution. Unfortu
nately, it is well known that most of the standard algorithms do not work
well in this situation. Usually, they take long time to converge even from
good starting values and sometimes they converge to a local minimum. For
this reason, several special purpose algorithms have been proposed to solve
this optimization problem and we discuss them in the subsequent subsec
tions.
14.4.1 Iterative Quadratic Maximum Likelihood Esti mators
The Iterative Quadratic Maximum Likelihood (IQML) method was pro
posed by Bresler and Macovski (1986). This algorithm mainly suggests
how to minimize R(ω) with respect to ω. Let us write R(ω) as follows;
R(ω) = Y ^{H} (I − P _{C} )Y = Y ^{H} P _{G} Y,
(14.13)
380
Statistical Computing
where P _{G} = G(G ^{H} G) ^{−}^{1} G ^{H} and
G =
g¯ _{0}
.
.
.
g¯ _{M}
.
.
.
0
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0
0
.
.
.
g¯ _{0}
.
.
.
g¯ _{M}
.
(14.14)
Here G is an N ×(N −M) matrix and G ^{H} C(ω) = 0, therefore I − P _{C} =
P _{G} and that implies the second equality of (14.13). Moreover, because of
the structure of the matrix G, R(ω) can be written as
R(ω) = Y ^{H} P _{G} Y = g ^{H} Y _{D} ^{H} (G ^{H} G) ^{−}^{1} Y _{D} g.
Here the vector g is same as deﬁned before and Y _{D} is an (N − M) ×
(M + 1) data matrix as follows
Y _{D} =
y(1)
.
.
.
y(N − M)
.
.
.
.
.
.
.
.
.
y(M + 1)
.
.
.
y(N)
.
The minimization of (14.13) can be performed by using the following algo
rithm.
Algorithm IQML
[1]
Suppose at the kth step the value of the vector g is g _{(}_{k}_{)} .
[2] Compute the matrix C ^{(}^{k}^{)} = Y
D (G ^{H}
H
_{(}_{k}_{)} G _{(}_{k}_{)} ) ^{−}^{1} Y _{D} ; here G _{(}_{k}_{)} is the
matrix G in (14.14) obtained by replacing g with g _{(}_{k}_{)} .
[3] Solve the quadratic minimization problem
min
x:x=1
_{x} H _{C} (k) _{x}
and suppose the solution is g _{(}_{k}_{+}_{1}_{)} .
[4]
Check the convergence condition g _{(}_{k}_{+}_{1}_{)} − g _{(}_{k}_{)}  < (some pre as
signed value).
If the convergence condition is met,
go
to
step [5]
otherwise k = k + 1 and go to step [1].
[5] Put gˆ = (gˆ _{0} ,
. . .
, gˆ _{M} ) = g _{(}_{k}_{+}_{1}_{)} , solve the polynomial equation
gˆ _{0} + gˆ _{1} z +
. . .
+ gˆ _{M} z ^{M} = 0.
(14.15)
Obtain the M roots of (14.15) as ρˆ _{1} e ^{j}^{ω}^{ˆ} ^{1}
. . .
, ρˆ _{M} e ^{j}^{ω}^{ˆ} ^{M} . Consider
(ωˆ _{1} ,
. . .
, ωˆ _{M} ) as the least squares estimators of (ω _{1} ,
. . .
, ω _{M} ).
Computational Aspects in Statistical Signal Processing
381
14.4.2 Constrained Maximum Likelihood Method
In the IQML method the symmetric structure of the vector g is not used.
The Constrained Maximum Likelihood (CML) method was proposed by
Kannan and Kundu (1994). The basic idea of the CML method is to mini
mize Y ^{H} P _{G} Y with respect to the vector g when it satisﬁes the symmetric
structure (14.9). It is clear that R(ω) can also be written as
R(ω) = Y ^{H} (I − P _{C} )Y = Y ^{H} P _{B} Y,
where P _{B} = B(B ^{H} B) ^{−}^{1} B ^{H} and
B =
¯
b _{0}
.
.
.
¯
b _{M}
.
.
.
0
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0
0
.
.
.
¯
b 0
.
.
.
¯
b M
.
(14.16)
Moreover b _{i} ’s satisfy the symmetric relation (14.9). Therefore the problem
reduces to minimize
Y ^{H} P _{B} Y with respect to b
that b ^{H} b = 1 and b _{k} =
¯
b _{M}_{−}_{k}
for
k
=
0,
. . .
=
(b _{0} ,
. . .
, b _{M} ) such
, M.
The constrained
minimization can be done in the following way.
where
Let us write b = c + j d,
c
^{T}
=
, c M ,
2
, c _{1} , c _{0} ),
. . .
d ^{T} = (d _{0} , d _{1} ,
, d M _{−}_{1} , 0, −d M _{−}_{1} ,
2
2
. . .
, cM−1
2
,
. . .
, c _{1} , c _{0} ),
, −d _{1} , −d _{0} ),
if M is even and
c
^{T}
=
(c _{0} , c _{1} ,
, c _{M}_{−}_{1}
2
d ^{T} = (d _{0} , d _{1} ,
, d _{M}_{−}_{1} , −d M−1 , −d M−1 _{−}_{1} ,
2
2
2
¯
, −d _{1} , −d _{0} ),
¯
, b ^{N}^{−}^{M} ^{} , where
if M is odd. The matrix B can be written as B = ^{} b ^{1} ,
¯
¯
( b ^{i} ) ^{T} is of the form [0, b, 0]. Let U and V denote the real and imaginary
parts of the matrix B. Assuming that M is odd, B can be written as
B =
M−1
2
α=0
(c _{α} U _{α} + jd _{α} V _{α} ) ,
where U _{α} , V _{α} , α = 0, 1,
. . .
,
M−1
2
are N × (N − M) matrices with
entries 0 and 1 only. The minimization of Y ^{H} P _{B} Y (as given in (14.16))
can be obtained by diﬀerentiating Y ^{H} P _{B} Y with respect to c _{α} and d _{α} ;
382
Statistical Computing
for α = 0, 1,
. . .
,
of the form
M−1
2 , which is equivalent to solving a matrix equation
D(˜c, d) _{d} = 0, ˜ ˜c ˜ 

where D is an (M + 1) × (M + 1) matrix. 
Here ˜c ^{T} 
= (c _{0} , 
, c _{M}_{−}_{1} 
) 

2 

˜ 

and d ^{T} = (d _{0} , 
, d _{M}_{−}_{1} ). The matrix D can be written in a partitioned 

2 

form as 

˜ ˜ 

D = A ˜ B ^{H} _{C} , B ˜ 
(14.17) 

where 
˜ ˜ ˜ A, B, C are all 
^{M}^{+}^{1}

M+1

˜
˜
˜
the matrices A, B, C are as follows;
˜
A _{i}_{k}
˜
B _{i}_{k}
˜
C _{i}_{k}
=
=
=
Y ^{H} U _{i} (B ^{H} B) ^{−}^{1} U
H
k
Y − Y ^{H} B(B ^{H} B) ^{−}^{1} (U ^{T} U _{k} + U ^{T} U _{i} )
i
k
(B ^{H} B) ^{−}^{1} B ^{H} Y + Y ^{H} U _{k} (B ^{H} B) ^{−}^{1} U ^{T} Y
i
−jY ^{H} U _{i} (B ^{H} B) ^{−}^{1} V
H
k
Y − jY ^{H} B(B ^{H} B) ^{−}^{1} (U ^{T} V _{k} − V ^{T} U _{i} )
i
k
(B ^{H} B) ^{−}^{1} B ^{H} Y + jY ^{H} V _{k} (B ^{H} B) ^{−}^{1} U ^{T} Y
i
Y ^{H} V _{i} (B ^{H} B) ^{−}^{1} V
H
k
Y − Y ^{H} B(B ^{H} B) ^{−}^{1} (V
T
i
V _{k} + V ^{T} V _{i} )
k
(B ^{H} B) ^{−}^{1} B ^{H} Y + Y ^{H} V _{k} (B ^{H} B) ^{−}^{1} V
T
i Y
for i, k = 0, 1,
. . .
,
M−1
2
.
Similarly, when M is even, let us deﬁne ˜c ^{T}
=
(c _{0} ,
. . .
˜
, c M ) and d ^{T}
2
= (d _{0} ,
. . .
, d M ).
2
In this case also the matrix D
can be partitioned as (14.17), but here A, B, C are ^{} ^{M}
2
˜
˜
˜
+ 1 ^{} × ^{} ^{M} + 1 ^{} ,
2
^{} M
2
+ 1 ^{} × ^{} ^{M} ^{} and ^{} ^{M}
2
2
× _{M}
2
^{} matrices respectively. The (i, k)th
˜
˜
˜
elements of A, B, C are the same as before with appropriate ranges of i
and k. The matrix D is a real symmetric matrix in both cases and we need
to solve a matrix equation of the form
D(x)x = 0,
such that
x = 1.
(14.18)
The constrained minimization problem is transformed to a real valued non
linear eigenvalue problem. If xˆ satisﬁes (14.18), then xˆ should be an eigen
vector of the matrix D(xˆ). The CML method suggests the following itera
tive technique to solve (14.18)
D(x ^{(}^{k}^{)} ) − λ ^{(}^{k}^{+}^{1}^{)} I x ^{(}^{k}^{+}^{1}^{)} = 0,
x ^{(}^{k}^{+}^{1}^{)}  = 1,
where λ ^{(}^{k}^{+}^{1}^{)} is the eigenvalue which is closest to zero of the matrix D(xˆ)
and x ^{(}^{k}^{+}^{1}^{)} is the corresponding eigenvector. The iterative process should
be stopped when λ ^{(}^{k}^{+}^{1}^{)} is small compared to D, the largest eigenvalue
of the matrix D. The CML estimators can be obtained using the following
algorithm:
Computational Aspects in Statistical Signal Processing
CML Algorithm:
383
[1] Suppose at the ith step the value of the vector x is x _{(}_{i}_{)} . Normalize
x, i.e. x ^{(}^{i}^{)} =
_{x}
(i)
x ^{(}^{i}^{)}  ^{.}
[2] Calculate the matrix D(x ^{(}^{i}^{)} ).
[3] Find the eigenvalue λ ^{(}^{i}^{+}^{1}^{)} of D(x ^{(}^{i}^{)} ) closest to zero and normalize
the corresponding eigenvector x ^{(}^{i}^{+}^{1}^{)} .
[4] Test the convergence by checking whether λ ^{(}^{i}^{+}^{1}^{)}  < D.
[5]
If the convergence condition in step 4 is not met, then i := i + 1 and
go to step 2.
14.4.3 Expectation Maximization Algorithm
The expectation maximization (EM) algorithm, developed by Dempster,
Laird and Rubin (1977) is a general method for solving the maximum like
lihood estimation problem when the data are incomplete. The details on
the EM algorithm can be found in Chapter 4. Although the EM algorithm
has been originally used for incomplete data, it can be used also when the
data is complete. The EM algorithm can be used quite eﬀectively in es
timating the unknown amplitudes and the frequencies of the model (14.6)
and the method was proposed by Feder and Weinstein (1988). In devel
oping the EM algorithm it is assumed that the errors are i.i.d Gaussian
random variables, although it can be relaxed and it is not pursued here.
But in formulating the EM algorithm one needs to know the distribution
of the error random variables.
For better understanding we explain the EM algorithm brieﬂy over
here. Let Y denote the observed (may be incomplete) data possessing
the probability density function f _{Y} (y; θ) indexed by the parameter vector
θ ∈ Θ ⊂ R ^{k} and let X denote the ‘complete’ data vector related to Y by
H(X) = Y,
where H(.) is a many to one noninvertible function. Therefore, the density
function of X, say f _{X} (x; θ) can be written as
f _{X} (x; θ) = f _{X}_{}_{Y}_{=}_{y} (x; θ)f _{Y} (y; θ) ∀H(x) = y, (14.19)
where f _{X}_{}_{Y}_{=}_{y} (x; θ) is the conditional probability density function of X
given Y = y. Taking the logarithm on both sides of (14.19)
ln f _{Y} (y; θ) = ln f _{X} (x; θ) − ln f _{X}_{}_{Y}
Bien plus que des documents.
Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.
Annulez à tout moment.