Académique Documents
Professionnel Documents
Culture Documents
1
Random Variable
Definition
Numerical characterization of outcome of a random
event
Examples
1) Number on rolled dice
2) Temperature at specified time of day
3) Stock Market at close
4) Height of wheel going over a rocky road
2
Random Variable
But we can make
Non-examples these into RV’s
1) ‘Heads’ or ‘Tails’ on coin
2) Red or Black ball from urn
3
Two Types of Random Variables
Random Variable
Discrete RV Continuous RV
• Die • Temperature
• Stocks • Wheel height
4
PDF for Continuous RV
Given Continuous RV X…
What is the probability that X = x0 ?
Oddity : P(X = x0) = 0
Otherwise the Prob. “Sums” to infinity
Need to think of Prob. Density Function (PDF)
xo xo + ∆ x
P ( x0 < X < x0 + ∆ ) = area shown
xo + ∆
= ∫x
o
p X ( x )dx
5
Most Commonly Used PDF: Gaussian
1 −( x − m ) 2 / 2σ 2
p X ( x) = e
σ 2π
1 x 2 2σ 2
p x ( x) = e
σ 2π
7
Effect of Variance on Gaussian PDF
pX(x)
σ σ
Area within ±1 σ of mean = 0.683
= 68.3%
x=m
x
pX(x)
Small σ
Small Variability
(Small Uncertainty)
x
pX(x)
Large σ Large Variability
(Large Uncertainty)
x 8
Why Is Gaussian Used?
Central Limit theorem (CLT)
The sum of N independent RVs has a pdf
that tends to be Gaussian as N → ∞
9
Joint PDF of RVs X and Y p XY ( x, y )
Describes probabilities of joint events concerning X and Y. For
example, the probability that X lies in interval [a,b] and Y lies in
interval [a,b] is given by:
bd
Pr{( a < X < b) and ( c < Y < d )} = ∫ ∫ p XY ( x, y )dxdy
ac
x is held y is held
fixed fixed
“slice and
normalize”
y is held fixed
p XY ( x, y )
p X |Y = y ( x | y ) = = p X ( x)
pY ( y )
13
Independent and Dependent Gaussian PDFs
y
Contours of pXY(x,y).
Independent y
Different slices
(non-zero mean) give
x same normalized
curves
Different slices
y
Dependent give
different normalized
x curves
14
An “Independent RV” Result
p XY ( x, y ) = p X ( x ) pY ( y )
Here’s why:
p XY ( x, y ) p X ( x ) pY ( y )
pY | X = x ( y | x ) = = = pY ( y )
p X ( x) p X ( x)
15
Characterizing RVs
PDF tells everything about an RV
– but sometimes they are “more than we need/know”
So… we make due with a few Characteristics
– Mean of an RV (Describes the centroid of PDF)
– Variance of an RV (Describes the spread of PDF)
– Correlation of RVs (Describes “tilt” of joint PDF)
Symbolically: E{X}
16
Motivating Idea of Mean of RV
Motivation First w/ “Data Analysis View”
Consider RV X = Score on a test Data: x1, x2,… xN
∑
N
x N 0V0 + N1V1 + ... N nV100 100 N i
= ∑Vi
Test i =1 i
Average = x = N
=
N N
i =0
Ni = # of scores of value Vi
n
≈ P(X = Vi)
N =∑ i =1
N i (Total # of scores)
Statistics
This is called Data Analysis View
But it motivates the Data Modeling View Probability 17
Theoretical View of Mean
Data Analysis View leads to Probability Theory:
Data Modeling
For Discrete random Variables :
n
E{ X } = ∑ xi PX ( xi )
n =1
Probability Function
This Motivates form for Continuous RV:
∞
E{ X } = ∫ x p X ( x ) dx
−∞
Probability Density Function
≈
1
E{ X } = ∫ x p X ( x )dx Avg =
N
∑x
i =1
i
−∞ “Law of Large
PDF Numbers” Data
Dummy Variable
19
Variance of RV
There are similar Data vs. Theory Views here…
But let’s go right to the theory!!
= ∫ ( x − m x ) 2 p X ( x )dx
= ∫ x 2 p X ( x )dx
20
Motivating Idea of Correlation
Motivate First w/ Data Analysis View
Consider a random experiment that observes the
outcomes of two RVs:
Example: 2 RVs X and Y representing height and weight, respectively
y
Positively Correlated
x x
21
Illustrating 3 Main Types of Correlation
N
∑ ( xi − x )( yi − y )
1
Data Analysis View: C xy =
N i =1
σ XY = E{( X − X )(Y − Y )}
σ XY = ∫ ∫ ( x − X )( y − Y ) p XY ( x, y )dxdy
If X = Y: σ XY = σ X2 = σ Y2
If σ XY = E{( X − X )(Y − Y )} = 0
Then E{ XY } = X Y
Called “Correlation of X & Y ”
= f X ( x ) fY ( y ) = E{ X }E{Y }
Independence
σ XY
Correlation Coefficient : ρ XY =
σ XσY
−1 ≤ ρ XY ≤ 1
26
Covariance and Correlation For
Random Vectors…
x = [ X 1 X 1 X N ]T
Correlation Matrix :
E {X 1 X 1 } E {X 1 X 2 } E {X 1 X N }
E {X 2 X 1 } E {X 2 X 2 } E {X 2 X N }
R x = E{xx } =
T
E {X N X 1 } E {X N X 2 } E {X N X N }
Covariance Matrix :
C x = E{(x − x )( x − x )T }
27
A Few Properties of Expected Value
E{ X + Y } = E{ X } + E{Y } E{aX } = aE{ X } E{ f ( X )} = ∫ f ( x ) p X ( x )dx
σ 2 + σ 2 + 2σ
X Y XY
var{ X + Y } = var{aX } = a 2σ X2
σ 2 + σ 2 , if X & Y are uncorrelated
X Y
{(
var{ X + Y } = E X + Y − X − Y )} 2 σ X2
var{a + X } =
= E {( X + Y ) } where X = X − X
z z
2
z
= E {( X ) + (Y ) +2 X Y }
z
2
z
2
z z
= E {( X ) }+ E {(Y ) }+ 2 E { X Y }
z
2
z
2
z z
= σ X2 + σ Y2 + 2σ XY
28
Joint PDF for Gaussian
Let x = [X1 X2 … XN]T be a vector of random variables. These random variables
are said to be jointly Gaussian if they have the following PDF
1 1
p(x ) = exp − ( x − μ x )T C −x1 ( x − μ x )
2
N
( 2π ) 2 det(C x )
μ y = E{y} = Aμ x + b
C y = E{( y − μ y )( y − μ y )T } = AC x A T
A special case of this is the sum of jointly Gaussian RVs… which can be
handled using A = [1 1 1 … 1]
30
Moments of Gaussian RVs
Let X be zero mean Gaussian with variance σ2
Let X1 X2 X3 X4 be any four jointly Gaussian random variables with zero mean
Then…
Note that this can be applied to find E{X2Y2} if X and Y are jointly Gaussian
31
Chi-Squared Distribution
Let X1 X2 … XN be a set of zero-mean independent jointly Gaussian random
variables each with unit variance.
1 ( N /2) −1 − y /2
N /2 y e , y≥0
p( y ) = 2 Γ( N / 2)
0, y<0
For this RV we have that:
32
Review of
Matrices and Vectors
1/45
Vectors & Vector Spaces
Definition of Vector: A collection of complex or real numbers,
generally put in a column v
1 Transpose
v = " = [v1 ! v N ]T
v N
a1 b1 a1 + b1
a = " b = " a + b = "
a N bN a N + bN
2/45
Definition of Scalar: A real or complex number.
a1 αa1
a = " αa = "
a N αa N
(x + y ) + z = y + (x + z )
2. Associativity α(βx) = (αβ)x
α ( x + y ) = αx + α y
3. Distributivity
(α + β)x = αx + βx
1x = x
4. Scalar Unity &
Scalar Zero 0x = 0, where 0 is the zero vector of all zeros
4/45
Definition of a Vector Space: A set V of N-dimensional vectors
(with a corresponding set of scalars) such that the set of vectors
is:
(i) “closed” under vector addition
(ii) “closed” under scalar multiplication
In other words:
• addition of vectors – gives another vector in the set
• multiplying a vector by a scalar – gives another vector in the set
Examples:
1. The space R2 is a subspace of R3.
2. Any plane in R3 that passes through the origin is a subspace
3. Any line passing through the origin in R2 is a subspace of R2
4. The set R2 is NOT a subspace of C2 because R2 isn’t closed
under complex scalars (a subspace must retain the original
space’s set of scalars)
7/45
Geometric Structure of Vector Space
Length of a Vector (Vector Norm): For any vector v in CN
we define its length (or “norm”) to be
N N
∑v 2
= ∑ vi
2
2
v 2
= i v
i =1 2
i =1
α v1 + β v 2 2
≤ α v1 2
+ β v2 2
v 2
< ∞ ∀v ∈ C N
v 2
= 0 iff v = 0
8/45
Distance Between Vectors: the distance between two
vectors in a vector space with the two norm is defined by:
d ( v1 , v 2 ) = v1 − v 2 2
v1 v1 – v2
v2
9/45
Angle Between Vectors & Inner Product:
v
Motivate the idea in R :
2
A cos θ 1
A v = A sin θ u=
θ 0
u
2
Note that: ∑u v
i =1
i i = 1 ⋅ A cos θ + 0 ⋅ A sin θ = A cos θ
10/45
Inner Product Between Vectors :
Define the inner product between two complex vectors in CN by:
N
< u, v >= ∑ u i vi*
i =1
2
3. Linking Inner Product to Norm: v 2
=< v, v >
12/45
Building Vectors From Other Vectors
Can we find a set of “prototype” vectors {v1, v2, …, vM} from
which we can build all other vectors in some given vector space V
by using linear combinations of the vi?
M M
v = ∑α k v k u = ∑ βk vk
k =1 k =1
Same “Ingredients”… just different amounts of them!!!
16/45
Expansion and Transformation
Fact: For a given basis {v1, v2, …, vN}, the expansion of a vector v
in V is unique. That is, for each v there is only one, unique set of
N
coefficients {α1, α2, … , αN} such that v = α v
∑k =1
k k
18/45
DFT from Basis Viewpoint:
If we have a discrete-time signal x[n] for n = 0, 1, … N-1
x = [x[0] x[1] ! x[ N − 1]]
T
Define vector:
Define a orthogonal basis from the exponentials used in the IDFT:
1 1 1 1
1 e j 2 π1⋅1 / N e j 2 π 2⋅1 / N e j 2 π ( N −1)⋅1 / N
d0 =
"
d1 =
"
j 2 π1( N −1) / N
d2 =
"
j 2 π 2( N −1) / N
… d N −1 =
"
j 2 π( N −1)( N −1) / N
1 e e e
α i = v, v i = αi
N N
Then…. v = ∑αk vk u = ∑ βk vk
k =1 k =1
21/45
Example: DFT Coefficients as Inner Products:
Recall: N-pt. IDFT is an expansion of the signal vector in terms of
N Orthogonal vectors. Thus
X [k ] = x, d k
N −1
= ∑ x[ n ]d *
k [n ]
n =0
N −1
= ∑ x[n]e − j 2πkn / N
n =0
See “reading notes” for some details about normalization issues in this case
22/45
Matrices
Matrix: Is an array of (real or complex) numbers organized in
rows and columns. a11 a12 a13 a14
Here is a 3x4 example:
A = a 21 a 22 a 23 a 24
a31 a32 a33 a34
A, A −1
y2
x1 y1
x2
To see this:
< v1 , v1 > < v1 , v 2 > ! < v1 , v N >
< v , v > < v , v > ! < v 2 , v N >
VV H = 2 1 2 2
1 0 ! 0
0 1 ! 0 Inner products are 0 or 1
= =I because this is an ON basis
" ' "
0 0 ! 1
29/45
Unitary and Orthogonal Matrices
A unitary matrix is a complex matrix A whose inverse is A-1 = AH
For the real-valued matrix case… we get a special case of “unitary”
the idea of “unitary matrix” becomes “orthogonal matrix”
for which A-1 = AT
Two Properties of Unitary Matrices: Let U be a unitary matrix
and let y1 = Ux1 and y2 = Ux2
1. They preserve norms: ||yi|| = ||xi||.
2. They preserve inner products: < y1, y2 > = < x1, x2 >
That is the “geometry” of the old space is preserved by the unitary
matrix as it transforms into the new space.
(These are the same as the preservation properties of ON basis.)
30/45
DFT from Unitary Matrix Viewpoint:
Consider a discrete-time signal x[n] for n = 0, 1, … N-1.
N −1
We’ve already seen the DFT in a basis viewpoint: 1
x= ∑ N X [k ] d k
k =0 &
#%#
$
αk
Now we can view the DFT as a transform from the Unitary matrix
viewpoint:
1 1 1 ! 1
1 e j 2 π1⋅1 / N e j 2 π 2⋅1 / N ! e j 2 π ( N −1)⋅1 / N
D = [d 0 | d1 | … | d N −1 ] =
" " " "
j 2 π1( N −1) / N j 2 π ( N −1)( N −1) / N
1 e e j 2 π 2 ( N −1) / N ! e
DFT IDFT
~
x = DH x 1 ~
x = Dx
N
(Acutally D is not unitary but N-1/2D is unitary… see reading notes) 31/45
Geometry Preservation of Unitary Matrix Mappings
Recall… unitary matrices map in such a way that the sizes of
vectors and the orientation between vectors is not changed.
A, A −1
y2
x1
x2
y1
A, A −1
Unitary mappings just
“rigidly rotate” the space.
32/45
Effect of Non-Unitary Matrix Mappings
A, A −1
y2
x1
x2
y1
A, A −1
33/45
More on Matrices as Transforms
We’ll limit ourselves here to real-valued vectors and matrices
Rn Rm
A y
x
x A y
Otherwise….The range(A) ⊂ Rm
…because the columns don’t span Rm
35/45
Rank of a Matrix: rank(A) = largest # of linearly independent
columns (or rows) of matrix A
For an m×n matrix we have that rank(A) ≤ min(m,n)
An m×n matrix A has “full rank” when rank(A) = min(m,n)
Example: This matrix has rank of 3 because the 4th column cam be
written as a combination of the first 3 columns
1 0 0 1
0 1 0 2
A = 0 0 1 1
0 0 0 0
0 0 0 0
36/45
Characterizing “Tall Matrix” Mappings
We are interested in answering: Given a vector y, what vector x
mapped into it via matrix A?
“Tall Matrix” (m > n) Case
If y does not lie in range(A), then there is No Solution
If y lies in range(A), then there is a solution (but not
necessarily just one unique solution)
y = Ax
y∉range(A) y∈range(A)
39/45
Characterizing “Square Matrix” Mappings
One Solution
y∉range(A) y∈range(A)
41/45
Eigenvalues and Eigenvectors of Square Matrices
If matrix A is n×n, then A maps Rn → Rn
Q: For a given n×n matrix A, which vectors get mapped into
being almost themselves???
More precisely… Which vectors get mapped to a scalar multiple
of themselves???
Even more precisely… which vectors v satisfy the following:
Av = λv
Input Output
These vectors are “special” and are called the eigenvectors of A.
The scalar λ is that e-vector’s corresponding eigenvalue.
v Av
42/45
“Eigen-Facts for Symmetric Matrices”
• If n×n real matrix A is symmetric, then
– e-vectors corresponding to distinct e-values are orthonormal
– e-values are real valued
– can decompose A as A = VΛ V T
V = [v1 v2 ! vn ] VV T = I
Λ = diag{λ1 , λ2 ,…, λn }
• If, further, A is pos. def. (semi-def.), then
– e-values are positive (non-negative)
– rank(A) = # of non-zero e-values
• Pos. Def. ⇒ Full Rank (and therefore invertible)
• Pos. Semi-Def. ⇒ Not Full Rank (and therefore not invertible)
– When A is P. D., then we can write
A −1 = VΛ −1 V T
{ }
For P.D. A, A-1 has
the same e-vectors and Λ −1 = diag 1 λ , 1 λ ,…, 1 λ
1 2 n
has reciprocal e-values 43/45
Other Matrix Issues
We’ll limit our discussion to real-valued matrices and vectors
Quadratic Forms and Positive-(Semi)Definite Matrices
Quadratic Form = Matrix form for a 2nd-order multivariate
polynomial
x1 a11 a12
Example: x = A =
x2 a 21 a 22
variable fixed
The quadratic form of matrix A is:
QA ( x1 , x2 ) = x T Ax (1 × 2) ⋅ ( 2 × 2) ⋅ ( 2 × 1) = (1 × 1) scalar
2 2
scalar = ∑∑ ij i j 11 1 22 2 + (a12 + a21 ) x1 x2
a x x = a x 2
+ a x 2
i =1 j =1 44/45
• Values of the elements of matrix A determine the characteristics
of the quadratic form QA(x)
– If QA(x) ≥ 0 ∀x ≠ 0… then say that QA(x) is “positive semi-definite”
– If QA(x) > 0 ∀x ≠ 0… then say that QA(x) is “positive definite”
– Otherwise say that QA(x) is “non-definite”
• These terms carry over to the matrix that defines the Quad Form
– If QA(x) ≥ 0 ∀x ≠ 0… then say that A is “positive semi-definite”
– If QA(x) > 0 ∀x ≠ 0… then say that A is “positive definite”
45/45
Ch. 1 Introduction to Estimation
1/15
An Example Estimation Problem: DSB Rx
S( f ) M( f )
–f o fo f f
s(t; f o ,φo ) = m(t ) cos(2πf o t + φo )
BPF x ( t ) = s ( t ) + w( t ) Audio
X
& Amp
Amp
cos(2πfˆo t + φˆo )
Mˆ ( f )
Electronics Adds
Noise w(t) fˆo & φˆo Oscillator
(usually “white”)
Est. Algo. w/ fˆo & φˆo
f
3/15
PDF of Estimate
Because estimates are RVs we describe them with a PDF…
Will depend on:
1. structure of s[n]
2. probability model of w[n]
3. form of est. function g(x)
p ( fˆo )
Mean measures centroid
fo fˆo
Desire: E{ fˆo } = f o
( )
σ 2fˆ = E fˆo − E{ fˆo } = small
o
2
4/15
1.2 Mathematical Estimation Problem
General Mathematical Statement of Estimation Problem:
For… Measured Data x = [ x[0] x[1] … x[N-1] ]
Unknown Parameter θ = [θ1 θ2 … θp ]
θ is Not Random
x is an N-dimensional random data vector
Q: What captures all the statistical information needed for an
estimation problem ?
A: Need the N-dimensional PDF of the data, parameterized by θ
6/15
Ex. Modeling Data with Linear Trend
See Fig. 1.6 in Text
x[n ] = [%
"+
A #] + w[n ]
$Bn
"
s[ n; A, B ]
9/15
1.3 Assessing Estimator Performance
Can only do this when the value of θ is known:
• Theoretical Analysis, Simulations, Field Tests, etc.
( )
“unbiased
ˆ 2
σ θ2ˆ = E θ − E{θ } = small
ˆ
10/15
Equivalent View of Assessing Performance
Define estimation error: e = θˆ − θ (θˆ = θ + e )
RV RV Not RV
{ }
σ e2 = E (e − E{e})2 = small
“unbiased
11/15
Example: DC Level in AWGN
Model: x[n ] = A + w[n ], n = 0, 1, … , N − 1
Gaussian, zero mean, variance σ2
White (uncorrelated sample-to-sample)
PDF of an individual data sample:
1 ( x[i ] − A) 2
p ( x[i ]) = exp −
2πσ 2 2σ 2
⇒ E{ Aˆ } = A Yes! Unbiased!
Due to Indep.
(white & Gauss.
• Can we get a small variance? ⇒ Indep.)
1 N −1 N −1 N −1
Nσ 2
∑
1
∑ var( x[n]) = N 2 ∑σ
1
var( A) = var
ˆ x [ n ] = 2
2
=
N n =0 N n =0 n =0 N2
σ2
⇒ var( Aˆ ) = Can make var small by increasing N!!!
N 13/15
Theoretical Analysis vs. Simulations
• Ideally we’d like to be always be able to theoretically
analyze the problem to find the bias and variance of the
estimator
– Theoretical results show how performance depends on the problem
specifications
• But sometimes we make use of simulations
– to verify that our theoretical analysis is correct
– sometimes can’t find theoretical results
14/15
Course Goal = Find “Optimal” Estimators
• There are several different definitions or criteria for optimality!
• Most Logical: Minimum MSE (Mean-Square-Error)
( )
– See Sect. 2.4
ˆ 2
– To see this result: mse(θ ) = E θ − θ
ˆ
(
ˆ
)
2
mse(θ ) = E θ − θ
ˆ
= var{θˆ} + b 2 (θ )
[( ) (
= E θˆ − E{θˆ} + E{θˆ} − θ
)]
2
Bias
[
ˆ 2
] { }
= E θ − E{θ } + b(θ ) E θˆ − E{θˆ} + b 2 (θ )
ˆ
%"$
" "# "
b(θ ) = E{θˆ} − θ
=0
= var{θˆ} + b 2 (θ )
Minimum Variance
Unbiased Estimators
Ch. 2: Minimum Variance Unbiased Est.
MVU
Basic Idea of MVU: Out of all unbiased estimates,
find the one with the lowest variance
(This avoids the realizability problem of MSE)
{}
E θ = θ
ˆ for all θ
Example: Estimate DC in White Uniform Noise
x [n ] = A + w [n ] n = 0 ,1, ..., N − 1
Unbiased Estimator:
∧ N −1
∑ x[n ]
1
Α=
N n =0
∧
same as before : E {A} = A regardless of A value
Biased Estimator:
∨ N −1
∑
1
A x(n)
N n >0
∨ ∧ ∨
⇒ A= A ⇒ E A = A
∨
if A < 1 , then E A ≠ A
= 0 if A ≥ 1
⇒ Bias ⇒ Biased Est.
≠ 0 if A < 1
2.4 Minimum Variance Criterion
(Recall problem with MMSE criteria)
= 0 for MVU
∃ an MVU ∃ an MVU
var{θˆi } var{θˆi }
θˆ1 θˆ1
θˆ2 θˆ2
θˆ3 θˆ3
θ θ
2.6 Finding the MVU Estimator
Even if MVU exists: may not be able to find it!!
[
Then an estimator is notated as: θˆ = θˆ1 θˆ
2 ! θˆ p ]
T
1 ( x[0] − A) 2
p (x[0]; A) = exp − 2
2πσ 2 2σ
3 A
A x[0]
Define: Likelihood Function (LF)
The LF = the PDF p(x;θ )
∂ 2 ln p (x ; θ ) “Expected sharpness
− E of LF”
∂θ 2
θ = true value
E{•} is w.r.t p(x;θ )
3.4 Cramer-Rao Lower Bound
Theorem 3.1 CRLB for Scalar Parameter
∂ ln p( x;θ )
Assume “regularity” condition is met: E = 0 ∀θ
∂θ
Then σ 2 ≥ 1
θˆ
Right-Hand
∂ 2 ln p (x;θ ) Side is
− E 2
∂θ CRLB
θ = true value
∂ 2 ln p (x;θ ) ∂ 2 ln p (x;θ )
E 2 =∫ 2
p( x;θ )dx
∂θ ∂θ
Steps to Find the CRLB
1. Write log 1ikelihood function as a function of θ:
• ln p(x;θ )
N −1
− ∑ (x [n ] − A )
2
1 Property
= exp n = 0 of exp
(2πσ )
N
2 2 2σ 2
Now take ln to get LLF:
(
)
N N −1
1
ln p ( x; A) = − ln 2πσ 2 2 − 2 ∑
( x [n ] − A )2
2σ n =0
$!!#!!" $!!!#!!!"
∂ ∂
(~~) =0 (~~) =?
∂A ∂A
sample
Now take first partial w.r.t. A: mean
N −1
∂ 1 N
∂A
ln p ( x; A) =
σ2
∑ (x[n] − A) = σ 2 (x − A) (!)
n =0
CRLB
• Doesn’t depend on A
For fixed N & σ 2
• Increases linearly with σ 2
• Decreases inversely with N
σ2 N
Continuation of Theorem 3.1 on CRLB
There exists an unbiased estimator that attains the CRLB iff:
∂ ln p ( x;θ )
= I (θ )[g ( x ) − θ ] (!)
∂θ
for some functions I(θ ) and g(x)
Furthermore, the estimator that achieves the CRLB is then given
by:
Since no unbiased estimator can do better… this
θˆ = g ( x ) is the MVU estimate!!
N −1
σ2 1
I ( A) =
N
2
⇒ var{ Aˆ } = = CRLB θˆ = g ( x ) = x =
N
∑ x[n]
σ N n =0
Notes:
• Not all estimators are efficient (see next example: Phase Est.)
• Not even all MVU estimators are efficient
AWGN w/ zero
Signal Model: x[n ] = A cos(2πf o n + φo ) + w[n ]
$!! !#!!! " mean & σ 2
s[ n;φo ]
Signal-to-Noise Ratio:
Signal Power = A2/2 A2
SNR =
Noise Power = σ 2 2σ 2
Assumptions:
1. 0 < fo < ½ ( fo is in cycles/sample)
2. A and fo are known (we’ll remove this assumption later)
Problem: Find the CRLB for estimating the phase.
Exploit
We need the PDF: Whiteness
and Exp.
N −1 Form
− ∑ (x [n ] − A cos( 2π f o n + φ ) )
2
1
p (x ; φ ) = exp n = 0
(2πσ )
N
2 2 2σ 2
Now taking the log gets rid of the exponential, then taking
partial derivative gives (see book for details):
∂ ln p (x ; φ ) − A N −1
2
A
= 2 ∑ x [n ]sin( 2π f o n + φ ) − sin( 4π f o n + 2φ )
∂φ σ n =0 2
A N −1
= 2
∑ (E {x [n ]}cos( 2π f o n + φ ) − A cos( 4π f o n + 2φ ) )
σ n =0
So… plug that in, get a cos2 term, use trig identity, and get
∂ 2 ln p (x ; φ ) A2 N −1 N −1 NA 2
− E 2 = ∑1− ∑ cos( 4π f o n + 2φ ) ≈ 2σ 2 = N × SNR
∂φ 2σ 2 n = 0 n −0
=N << N if
fo not near 0 or ½
n
N-1
1
var{φˆ} ≥
Non-dB
Now… invert to get CRLB:
N × SNR
var{φˆ}
CRB
N
1
Consider the “Incremental Sensitivity” of p(x;θ ) to changes in θ :
~ ∂p ( x;θ ) θ ∂ ln p ( x;θ )
Now let ∆θ → 0: Sθp ( x ) = lim Sθp ( x ) = = θ
∆θ → 0 ∂θ p( x;θ ) ∂θ
Has the needed properties for “info” (as does “Shannon Info”):
1. I(θ ) ≥ 0 (easy to see using the alternate form of CRLB)
2. I(θ ) is additive for independent observations
follows from: ln p(x;θ ) = ln ∏ p( x[n];θ ) = ∑ ln[ p( x[n];θ )]
n n
White,
Gaussian,
Q: What is the CRLB? Zero Mean
σ2 ∂s[n;θ ]
2
var(θˆ) ≥ 2
Note: ∂θ tells how
N −1
∂s[n;θ ]
∑ ∂θ
sensitive signal is to parameter
n =0
near 0 or ½
4
Bound on
2
Variance
0
0 0.1 0.2 0.3 0.4 0.5
fo (cycles/sample)
0.025
CRLB 1 /2 (cycle s /s a mple)
Bound on 0.02
Std. Dev.
0.015
0.01
0 0.1 0.2 0.3 0.4 0.5
f (cycles/sample) 6
o
3.6 Transformation of Parameters
Say there is a parameter θ with known CRLBθ
But imagine that we instead are interested in estimating
some other parameter α that is a function of θ :
α = g(θ )
Q: What is CRLBα ?
2 Proved in
∂g (θ )
var(α ) ≥ CRLBα = CRLBθ Appendix 3B
∂θ
Captures the
sensitivity of α to θ
start stop
Sensor Sensor
Measure Elapsed Time T
Possible Accuracy Set by CRLBT
But… really want to measure speed V = d/T
Find the CRLBV:
2 Accuracy Bound
∂ D
CRLBV = × CRLBT V2
∂T T σV ≥ CRLBT (m / s)
D
2
D
= − × CRLBT • Less accurate at High Speeds (quadratic)
T • More accurate over large distances
V4
= 2
× CRLBT
D 8
Effect of Transformation on Efficiency
Suppose you have an efficient estimator of θ : θˆ
But… you are really interested in estimating α = g(θ )
pdf of θˆ pdf of θˆ
θˆ θˆ
Small N Case Large N Case
PDF is widely spread PDF is concentrated
over nonlinear mapping onto linearized section 10
3.7 CRLB for Vector Parameter Case
Vector Parameter: θ = [θ1 θ2 ! θp ]T
[
Its Estimate: θˆ = θˆ1 θˆ2 ! θˆ p ]
T
{}
ˆ ˆ [ ][ ]
ˆ T
var θ = E θ − θ θ − θ = C θˆ
For example: var( xˆ ) cov( xˆ , yˆ ) cov( xˆ , zˆ )
for θ = [x y z]T C θˆ = cov( yˆ , xˆ ) var( yˆ ) cov( yˆ , zˆ )
cov( zˆ, xˆ ) cov( zˆ, yˆ ) var( zˆ )
1
Fisher Information Matrix
For the vector parameter case…
2
The CRLB Matrix
Then, under the same kind of regularity conditions,
the CRLB matrix is the inverse of the FIM:
CRLB = I −1 (θ)
4
CRLB Off-Diagonal Elements Insight Not In Book
σ ŷe σ ŷe
ye ye
σ x̂e σ x̂e
xe x̂e xe x̂e
Each case has the same variances… but location accuracy
characteristics are very different. ⇒ This is the effect of the
off-diagonal elements of the covariance
Should consider effect of off-diagonal CRLB elements!!! 5
CRLB Matrix and Error Ellipsoids Not In Book
()
p θˆ =
1 1
exp − θˆ T C θ−ˆ 1 θˆ C θ−ˆ 1 = A
(2π )N C θˆ 2
~ 2σ ŷ ~ 2σ ŷ k = -2 ln(1-Pe)
e e
x̂e x̂e
where Pe is the prob.
~ 2σ x̂ that the estimate will
e
~ 2σ x̂ lie inside the ellipse
e
~ 2σ x̂ ~ 2σ x̂
e 8
e
Ellipsoids and Eigen-Structure Not In Book
x1
x1
Different
Notations
∂φ ( x )
grad φ ( x1 ,…, xn ) = ∇ xφ ( x ) = =
∂x
T
∂φ ∂φ
= !
∂x1 ∂xn
10
For our quadratic form function we have:
∂φ ∂ ( xi x j )
φ ( x ) = x A x = ∑∑ aij xi x j
T
⇒ = ∑∑ aij (♣)
i j
∂x k i j
∂x k
∂ ( xi x j ) ∂xi ∂x j
Product rule: = x j + xi (♣♣)
∂xk ∂xk ∂x
%$ # %$#k
1 i = k δ jk
= δ ik =
0 i ≠ k
= 2∑ akj x j
By Symmetry:
aik = aki
j
And from this we get:
∇ x ( x T Ax) = 2Ax
11
Since grad ⊥ ellipse, this says Ax is ⊥ ellipse:
x2
Ax
x
x1
Ax , x = k
x1 Ax = λ x
Eigenvectors are
Ax , x = k Principle Axes!!!
Note: This says that if A has a zero eigenvalue, then the error ellipse
will have an infinite length principle axis ⇒ NOT GOOD!!
13
Application of Eigen-Results to Error Ellipsoids
The Error Ellipsoid corresponding to the estimator covariance
matrix Cθˆ must satisfy: ˆ T −1 ˆ
θ C θˆ θ = k
Note that the error
ellipse is formed
Thus finding the eigenvectors/values of Cθ−ˆ 1 using the inverse cov
shows structure of the error ellipse
Recall: Positive definite matrix A and its inverse A-1 have the
• same eigenvectors
• reciprocal eigenvalues
14
θˆ2
Illustrate with 2-D case: θˆ T C θ−ˆ 1θˆ = k
v1
v1 & v2 v2
λ1 & λ2
θˆ1
Eigenvectors/values for C ˆ k λ1
θ kλ2
(not the inverse!)
15
The CRLB/FIM Ellipse
Can make an ellipse from the CRLB Matrix…
instead of the Cov. Matrix
This ellipse will be the smallest error ellipse that an unbiased estimator
can achieve!
16
3.8 Vector Transformations
Just like for the scalar case…. α = g(θ)
If you know CRLBθ you can find CRLBα
T
∂g (θ) −1 ∂g (θ)
CRLB α = I ( θ )
$#" ∂θ
∂θ CRLB on θ
$!!!! !#!!!!! "
CRLB on α
Jacobian Matrix
(see p. 46)
Example: Usually can estimate Range (R) and Bearing (ϕ) directly
But might really want emitter (x, y)
1
Example of Vector Transform y
ye Emitter
Can estimate Range (R) and Bearing (φ) directly R
But might really want emitter location (xe, ye) φ
xe x
R xe R cosφ
Direct θ = α = = g (θ) = Mapped
Parameters Parameters
φ y e R sin φ
y
Jacobian Matrix
ye
∂R cosφ ∂R cosφ
∂φ
∂g (θ ) ∂R
=
∂θ ∂R sin φ ∂R cosφ x
xe
∂R ∂φ
T
cosφ − R sin φ ∂g (θ) ∂g (θ)
CRLB α = CRLB
=
∂θ
θ
∂θ
sin φ R cosφ 2
3.9 CRLB for General Gaussian Case
In Sect. 3.5 we saw the CRLB for “signal + AWGN”
For that case we saw: Deterministic Signal w/
The PDF’s parameter-dependence Scalar Deterministic Parameter
showed up only in the mean of the PDF
4
Gen. Gauss. Ex.: Time-Difference-of-Arrival
Tx Rx1 Given:
Rx2
x1(t) = s(t – ∆τ) + w1(t)
x= [
x1T x2]
T T
Case #1 Case #1
µ ( ∆τ ) = 0 No Term #1 C( ∆τ ) = C No Term #2
s1[0; ∆τ ]
C11 C12 ( ∆τ )
C( ∆τ ) = s1[1; ∆τ ]
%
C 21 ( ∆τ ) C 22
µ( ∆τ ) = s1[ N − 1; ∆τ ]
s [0; ∆τ ]
Cii = C si si + C w i w i 2
%
Cij ( ∆τ ) = C si s j ( ∆τ ) s [ N − 1; ∆τ ]
2 5
Comments on General Gaussian CRLB
It is interesting to note that for any given problem you may find
each case used in the literature!!!
6
3.11 CRLB Examples
We’ll now apply the CRLB theory to several examples of
practical signal processing problems.
We’ll revisit these examples in Ch. 7… we’ll derive ML
estimators that will get close to achieving the CRLB
1. Range Estimation
– sonar, radar, robotics, emitter location
2. Sinusoidal Parameter Estimation (Amp., Frequency, Phase)
– sonar, radar, communication receivers (recall DSB Example), etc.
3. Bearing Estimation
– sonar, radar, emitter location
4. Autoregressive Parameter Estimation
– speech processing, econometrics
1
Ex. 1 Range Estimation Problem
Transmit Pulse: s(t) nonzero over t∈[0,Ts]
Receive Reflection: s(t – τo)
Measure Time Delay: τo
Bandlimited s(t)
White Gaussian
PSD of w(t)
Ts t
No/2
BPF x(t) s(t – τo)
& Amp
–B B f
T t 2
Range Estimation D-T Signal Model
PSD of w(t) ACF of w(t)
No/2 σ2 = BNo
w[n ] 0 ≤ n ≤ no − 1
x[n ] = s ( n∆ − τ o ) + w[n ] no ≤ n ≤ n o + M − 1
w[n ] no + M ≤ n ≤ N − 1
3
Range Estimation CRLB
Now apply standard CRLB result for signal + WGN:
Plug in… and keep
non-zero terms
σ2 σ2
var(τˆo ) ≥ =
N −1 2 no + M −1 2
∂s[n;τ o ] ∂s ( n∆ − τ o )
∑ ∂τ o ∑
∂τ o
n = 0 n = no
σ2 σ2
= 2
= 2
no + M −1 M −1
∂ ∂s (t )
∑ s ( t )
∂t t = n∆ −τ
∑ ∂t
n = no o
n =0 t = n∆
σ2 No / 2 1
var(τˆo ) ≥ 2
= 2
= 2
1 Ts ∂s (t ) Ts ∂s ( t ) Ts ∂s ( t )
∆ ∫0
∂t
dt ∫0 dt
∂t Es ∫0 dt
∂t
No / 2 Es
Ts
E s = ∫ s 2 (t )dt
1 0
var(τˆo ) ≥ FT Theorem
( )
Ts
∫0
2 & Parseval
Es 2πf 2
S ( f ) df
No / 2 Es Define a BW measure:
∞
=
1
∫−∞ (2πf )2
S ( f )
2
df
∞ Parseval Brms =
∫−∞ (2πf )
2 2
S ( f ) df ∞
∫−∞
Es 2
S ( f ) dt
∞
∫−∞ S ( f )
No / 2 2
dt
Brms is “RMS BW” (Hz)
A type of “SNR” 5
Range Estimation CRLB (cont.)
Using these ideas we arrive at the CRLB on the delay:
var (τˆo ) ≥
1
SNRE × Brms
2 (sec )
2
No
=
Pn × (2B )
2
Thus…
Ps
= =
SNR
Es Ts
SNRE = 2 BTs SNR
Pn N o × 2 B
( )
2
var (τˆo ) ≥
1
2 BTs SNR × Brms
2 (sec ) 2
6
Range Estimation CRLB (cont.)
( )
2
c 4
( )
2
∂R var Rˆ ≥ 2
CRLBRˆ = CRLBτˆ with R = cτo / 2 m
∂τ o
o 2 BTs SNR × Brms
2
7
Ex. 2 Sinusoid Estimation CRLB Problem
Given DT signal samples of a sinusoid in noise….
Estimate its amplitude, frequency, and phase
x[n ] = A cos(Ω o n + φ ) + w[n ] n = 0, 1, , N − 1
Ps A2 / 2 A2
SNR = = =
Pn σ 2
2σ 2
8
Sinusoid Estimation CRLB Approach
Approach:
• Find Fisher Info Matrix
• Invert to get CRLB matrix
• Look at diagonal elements to get bounds on parm variances
9
Sinusoid Estimation Fisher Info Elements
Taking the partial derivatives and using approximations given in
book (valid when Ωo is not near 0 or π) : θ = [ A Ω o φ ]T
N −1 N −1
∑ cos (Ωo n + φ ) = ∑ (1 + cos(2Ωo n + 2φ )) ≈
1 1 N
[I(θ)]11 = 2
σ2 n =0 2σ 2 n =0 2σ 2
− 1 N −1 −A N −1
[I(θ)]12 = [I(θ)]21 = 2 ∑ An cos(Ω o n + φ ) sin(Ω o n + φ ) = ∑ n sin(2Ωo n + 2φ ) ≈ 0
σ n =0 2σ 2 n =0
− 1 N −1 −A N −1
[I(θ)]13 = [I(θ)]31 = 2 ∑ A cos(Ω o n + φ ) sin(Ω o n + φ ) = ∑ sin(2Ωo n + 2φ ) ≈ 0
σ n =0 2σ 2 n =0
N −1 N −1 N −1
A2 A2
∑A ∑ n (1 − cos(2Ωo n + 2φ )) ≈ 2σ 2 ∑ n 2
1
[I(θ)]22 = 2
( n ) sin (Ω o n + φ ) =
2 2 2
σ2 n =0 2σ 2 n =0 n =0
N −1 N −1
A2
∑ A n sin ∑n
1
[I(θ)]23 = [I(θ)]32 = 2 2
(Ω o n + φ ) ≈
σ 2
n =0 2σ 2
n =0
N −1
NA2
∑A
1
[I(θ)]33 = 2
sin (Ω o n + φ ) ≈
2
σ 2
n =0 2σ 2 10
Sinusoid Estimation Fisher Info Matrix
θ = [ A Ω o φ ]T
N
2 0 0
2σ
A2 N −1
A2 N −1
I( θ) ≈ 0
2σ 2
∑n2 2 ∑
n
2σ n =0
n =0
N −1
A2 NA2
0
2σ 2 ∑n 2σ 2
n =0
A2
Recall… SNR = and closed form results for these sums
2σ 2
11
Sinusoid Estimation CRLBs (using co-factor & det
approach… helped by 0’s)
Inverting the FIM by hand gives the CRLB matrix… and then
extracting the diagonal elements gives the three bounds:
2σ 2
var( Aˆ ) ≥ ( volts2 )
N
To convert to Hz2
12 multiply by (Fs /2π)2
var(Ω
ˆ )≥
o (( rad/sample) 2 )
SNR × N ( N 2 − 1)
2( 2 N − 1) 4
var(φˆ) ≥ ≈ ( rad 2 )
SNR × N ( N + 1) SNR × N
–B B f τ
1/2B 1/B 3/2B
13
Ex. 3 Bearing Estimation CRLB Problem
Figure 3.8 Emits or reflects
from textbook: signal s(t)
s (t ) = At cos(2πf o t + φ )
Simple model
d
Propagation Time to nth Sensor: tn = t0 − n cos β n = 0, 1, , M − 1
c
sn (t ) = αs (t − tn )
Signal at nth Sensor: d
= A cos 2πf o t − t0 + n cos β + φ
c 14
Bearing Estimation Snapshot of Sensor Signals
Now instead of sampling each sensor at lots of time instants…
we just grab one “snapshot” of all M sensors at a single instant ts
d
sn (t s ) = A cos 2πf o t s − t0 + n cos β + φ
c
2πf
~
= A cos
o
+
~
cos β d n φ = A cos Ω s n + φ ( )
c
ωs
Ωs Spatial sinusoid w/
spatial frequency Ωs
(
x[n ] = sn (t s ) + w[n ] = A cos Ω s n + φ + w[n ]
~
)
Each w[n] is a noise sample that comes from a different sensor so…
Model as uncorrelated Gaussian RVs (same as white temporal noise)
Assume each sensor has same noise variance σ2
~
So… the parameters to consider are: θ = [ A Ωs φ ]T
A
A
Ω
α = g (θ) = β = arccos
c s
which get transformed to:
~ 2πf o d
φ ~
φ
Parameter of interest!
16
Bearing Estimation CRLB Result
Using the FIM for the sinusoidal parameter problem… together
with the transform. of parms result (see book p. 59 for details):
12
var( βˆ ) ≥ 2
( rad 2 )
M +1 L
( 2π ) 2 SNR × M sin (β )
2
M −1 λ
Define: Lr = L/λ
L = Array physical length in meters Array Length “in
M = Number of array elements wavelengths”
λ = c/fo Wavelength in meters (per cycle)
• Bearing Accuracy:
– Decreases as 1/SNR – Depends on actual bearing β
– Decreases as 1/M Best at β = π/2 (“Broadside”)
– Decreases as 1/Lr2 Impossible at β = 0! (“Endfire”)
2
σ u2 p
ln Pxx ( f ; θ) = ln
p
2
= ln σ u2 − ln 1 + ∑ a[m]e − j 2πfm
m =1
1+ ∑ a[m]e − j 2πfm
m =1
19
AR Estimation CRLB Asymptotic Result
After taking these derivatives… you get results that can be
simplified using properties of FT and convolution. Complicated
dependence on
The final result is: var(aˆ[k ]) ≥
N
[R ]
σ u2 −1
xx kk k = 1, 2, , p AC Matrix!!
2σ u4
var(σˆ u2 ) ≥ Both Decrease
N as 1/N
Improves as pole
gets closer to
–a[1] unit circle…
Re(z)
PSDs with
sharp peaks are
easier to 20
CRLB Example:
Single-Rx Emitter Location via Doppler
s (t; f1 ) t ∈ [t1 , t1 + T ]
s ( t ; f 2 ) t ∈ [t 2 , t 2 + T ]
s ( t ; f 3 ) t ∈ [t 3 , t 3 + T ]
(X, Y, Z, fo)
1
Problem Background
Radar to be Located: at Unknown Location (X,Y,Z)
Transmits Radar Signal at Unknown Carrier Frequency fo
2
Physics of Problem
Emitter
Relative motion between u(t)
emitter and receiver Receiver
causes a Doppler shift of
the carrier frequency: v(t)
u(t) is unit vector
along line-of-sight
fo
f (t , x ) = f o − v ( t ) • u( t )
c
= fo −
( ) ( ) (
f o V x (t ) X p (t ) − X + V y (t ) Y p (t ) − Y + Vz (t ) Z p (t ) − Z ) .
c ( ) ( ) ( )
X p (t ) − X 2 + Y p (t ) − Y 2 + Z p (t ) − Z 2
~
f ( ti , x ) = f ( ti , x ) + v ( ti )
3
Estimation Problem Statement
Vector-Valued function of a Vector
Given:
Data Vector:
~
[
~ ~ ~
f (x) = f (t1 , x) f (t 2 , x) f (t N , x)]T
I use J for the FIM instead of I to avoid confusion with the identity matrix. 5
Convenient Form for FIM Called “The Jacobian” of f(x)
∂
H= f (x) = [h1 | h 2 | h 3 | h 4 ]
∂x x = true value
where ∂
∂x f (t1 , x )
j
∂ f (t2 , x )
h j = ∂x j
∂ f (t N , x )
∂x j
x = true value
J = HT C −1H
6
CRLB Matrix
The Cramer-Rao bound covariance matrix then is:
CCRB ( x ) = J −1
[
= H C HT −1
]
−1
p (θ) =
1
( 2π ) N / 2 det(Cθ )
{
exp − 12 θT Cθ−1θ} Cx
Cθ =
C xy
C y
C yx
9
Finding Projections
To find the projection of the CRLB ellipse:
1. Invert the FIM to get CCRB
2. Select the submatrix CCRB,sub from CCRB
3. Invert CCRB,sub to get Jproj
4. Compute the ellipse for the quadratic form of Jproj
( )
Mathematically: CCRB,sub = PCCRB PT
−1 T −1
−1 T
J proj = PJ P
= PJ P
P is a matrix formed from the identity matrix:
keep only the rows of the variables projecting onto
11
Slices of Error Ellipsoids
Q: What happens if one parameter were perfectly known.
Capture by setting that parameter’s error to zero
⇒ slice through the error ellipsoid.
Impact:
• slice = projection when ellipsoid not tilted
• slice < projection when ellipsoid is tilted.
Recall: Correlation causes tilt
12
Chapter 4
Linear Models
1
General Linear Model
Recall signal + WGN case: x[n] = s[n;θ] + w[n]
x = s(θ) + w Here, dependence on θ is general
Now we consider a special case: Linear “Observations”:
s(θ) = Hθ + b
p×1 known “offset”(p×1)
N×1 known “observation
matrix” (N×p)
Q: Why?
3
Importance of The Linear Model
There are several reasons:
ˆθ (T −1 −1 T −1
MVU = H C H )
H C (x − b ) … as we’ll see!!!
4
MVUE for Linear Model
Theorem: The MVUE for the General Linear Model and its
covariance (i.e. its accuracy performance) are given by:
ˆθ (T −1 −1 T −1
MVU = H C H )
H C (x − b )
(
Cθˆ = H C H
T −1
)
−1
and achieves the CRLB.
Proof: We’ll do this for the b = 0 case but it can easily be done
for the more general case.
1 ∂ T −1 − −
=− x$
! C!
# "x − 2$ T
x!C #
1
!Hθ
" + θ$!
T
H
!
T
#
1
C!!Hθ
"
2 ∂θ
Constant Linear Quadratic w.r.t. θ
w.r.t. θ w.r.t. θ (Note: HTC-1H is symmetric)
Thus, from (A1.2): for pos. def. C ∃ N×N invertible matrix D, s.t.
{ } {
~w
Ew } {
~ T = E ( Dw )( Dw )T = E Dww T DT } Claim: White!!
= D D (D )
T −1 T
−1
= DCD T
D = I
x ~
x MVUE for Lin. θ̂
D Model w/ White
Whitening Noise
Filter 7
Ex. 4.1: Curve Fitting
Caution: The “Linear” in “Linear Model”
does not come from fitting straight lines to data
It is more general than that !!
x[n] Data
8
Ex. 4.2: Fourier Analysis (not most general)
2πkn 2πkn
M M
Data Model: x[n ] = ∑ a k cos ∑ k
+ b sin + w [n ]
k =1 N k =1 N
Parameters to AWGN
Estimate
Matrix:
H=
… …
n = 0, 1, 2, …, N
Down each column
2πkn 2πkn
cos sin k = 1, 2, …, M
N N 9
Now apply MVUE Theorem for Linear Model:
θˆMVU (
= H HT
)
−1
HT x θˆMVU =
N T
2
H x
N
= I
2
Each Fourier coefficient
Using standard
estimate is found by the inner
orthogonality of
product of a column of H with
sinusoids (see book)
the data vector x
ˆθ (
MVU = H H
T −1 T
H x )
Cθˆ = σ H H2
( T
)
−1
and achieves the CRLB.
12
Q: What signal u[n] is best to use ?
1
Motivation for BLUE
Except for Linear Model case, the optimal MVU estimator might:
1. not even exist
2. be difficult or impossible to find
⇒ Resort to a sub-optimal estimate
BLUE is one such sub-optimal estimate
Idea for BLUE:
1. Restrict estimate to be linear in data x
2. Restrict estimate to be unbiased
3. Find the best one (i.e. with minimum variance)
Linear
Unbiased
Variance
3
6.4 Finding The BLUE (Scalar Case)
N −1
1. Constrain to be Linear: θˆ = ∑ a n x[ n]
n −0
∑a
n=0
n E { x [ n ]} = θ
4
Finding BLUE for Scalar Linear Observations
Consider scalar-parameter linear observation:
x[n] = θs[n] + w[n] ⇒ E{x[n]} = θs[n]
N −1
Then for the unbiased condition we need: E{θˆ } = θ ∑ a#n"s[!
n] = θ
n −0 ⇓
Tells how to choose
weights to use in the
Need aT s = 1
BLUE estimator form
N −1
θˆ = ∑a
n −0
n x[ n]
{ } { }
Like var{aX} =a2 var{X}
var θˆBLU = var aT x = aT Ca
5
Goal: minimize aTCa subject to aTs = 1
⇒ Constrained optimization
Appendix 6A: Use Lagrangian Multipliers:
Minimize J = aTCa + λ(aTs – 1)
∂J λ
Set : = 0 ⇒ a = − C −1s
∂a #$"$ 2 $!
$ C −1s
a s =1
T
a=
sT C −1s
λ λ 1
⇒ aT s = − sT C −1s = 1 ⇒ − =
2 2 sT C −1s
sT C −1x var(θˆ) =
1
θˆ =a x=
T
BLUE
sT C −1s sT C −1s
7
6.5 Vector Parameter Case: Gauss-Markov Thm
Gauss-Markov Theorem:
If data can be modeled as having linear observations in noise:
x = Hθ + w
Known Matrix Known Mean & Cov
(PDF is otherwise
arbitrary & unknown)
(
T −1 −1 T −1
Then the BLUE is: θ BLUE = H C H H C x
ˆ )
(
and its covariance is: C ˆ = HT C −1H
θ
)
−1
s(t – t1)
s(t – t2) s(t – t3) Rx3
Rx1 Rx2 (x3,y3)
(x1,y1) (x2,y2)
Hyperbola:
Hyperbola:
τ23 = t3 – t2 = constant
τ12 = t2 – t1 = constant
TDOA = Time-Difference-of-Arrival
ti = To + Ri/c + εi i = 0, 1, . . . , N-1
~ Rn Ai Bi
Apply to TOA: ti = ti − i
= To + δx s + δy s + ε i
c c c
known known known
12
Conversion to TDOA Model N–1 TDOAs rather
than N TOAs
TDOAs: τ i = ~ti − ~ti −1 , i = 1, 2, …, N − 1
Ai − Ai −1 Bi − Bi −1
= δx s + δy s + ε i − ε i −1
c$ c$ # $"$ !
#$" ! #$" ! correlated noise
known known
In matrix form: x = Hθ + w
x = [τ 1 τ2 ' τ N −1 ]T θ = [δx s δy s ]T
( A1 − A0 ) & ( B1 − B0 ) ε1 − ε 0
1 ( A2 − A1 ) & ( B2 − B1 ) ε 2 − ε1
H= w= = Aε
c & & & &
( AN −1 − AN − 2 ) & ( BN −1 − BN − 2 ) ε N −1 − ε N − 2
14
Apply TDOA Result to Simple Geometry
Tx
R
Rx1 α Rx2 α Rx3
d d
1
0
2 2 2 cos 2
α
Then can show: Cθˆ = σ c
3/ 2
0 2
(1 − sin α )
ey
Diagonal Error Cov ⇒ Aligned Error Ellipse
σ y/cσ
2
10
or
1
10
σ x/cσ
0
10
-1
10
0 10 20 30 40 50 60 70 80 90
α (degre es )
Tx
• Used Std. Dev. to show units of X & Y
• Normalized by cσ… get actual values by R
multiplying by your specific cσ value
Rx1 α Rx2 α Rx3
d d
1
Motivation for MLE
Problems: 1. MVUE often does not exist or can’t be found
<See Ex. 7.1 in the textbook for such a case>
2. BLUE may not be applicable (x ≠ Hθ + w)
This makes the MLE one of the most popular practical methods
2
Rationale for MLE
Choose the parameter value that:
makes the data you did observe…
the most likely data to have been observed!!!
Consider 2 possible parameter values: θ1 & θ2
Ask the following: If θi were really the true value, what is the
probability that I would get the data set I really got ?
Let this probability be Pi
3
Definition of the MLE
θˆML is the value of θ that maximizes the “Likelihood
Function” p(x;θ) for the specific measured data x
p(x;θ) θˆML maximizes the
likelihood function
θˆML θ
Note: Because ln(z) is a monotonically increasing function…
θˆML maximizes the log likelihood function ln{p(x; θ)}
Expand this :
N Aˆ 2 N
+ ∑ x[n ] − NA + ∑ 2 A∑ x[n ] +
1 1 ˆ 1 1
− x [n ] −
2 ˆ =0
ˆ
2A Aˆ Aˆ ˆ
2A ˆ
2A 2
2A ˆ 2
Cancel 5
N −1
∑
1
Manipulate to get: Aˆ 2
+ Aˆ − x 2
[n ] = 0
N n=0
A2
var( Aˆ ) → = CRLB
1 Asymptotically…Unbiased &
N A+
2 Efficient
6
7.5 Properties of the MLE (or… “Why We Love MLE”)
8
Monte Carlo Simulations: see Appendix 7A
A methodology for doing computer simulations to evaluate
performance of any estimation method Not just for the MLE!!!
Illustrate for deterministic signal s[n; θ ] in AWGN
Monte Carlo Simulation:
Data Collection:
1. Select a particular true parameter value, θtrue
- you are often interested in doing this for a variety of values of θ
so you would run one MC simulation for each θ value of interest
2. Generate signal having true θ: s[n;θt] (call it s in matlab)
3. Generate WGN having unit variance
w = randn ( size(s) );
4. Form measured data: x = s + sigma*w;
- choose σ to get the desired SNR
- usually want to run at many SNR values
→ do one MC simulation for each SNR value
9
Data Collection (Continued):
5. Compute estimate from data x
6. Repeat steps 3-5 M times
- (call M “# of MC runs” or just “# of runs”)
7. Store all M estimates in a vector EST (assumes scalar θ)
∑ (θˆ )
M
Statistical Evaluation: 1
b= − θ true
M i =1
i
1. Compute bias
∑ (θˆ − θ )
M
2. Compute error RMS 1 2
RMS = t
M i =1
i
Now explore (via plots) how: Bias, RMS, and VAR vary with:
θ value, SNR value, N value, Etc.
Is B ≈ 0 ?
Is RMS ≈ (CRLB)½ ?
10
Ex. 7.6: Phase Estimation for a Sinusoid
Some Applications:
1. Demodulation of phase coherent modulations
(e.g., DSB, SSB, PSK, QAM, etc.)
2. Phase-Based Bearing Estimation
Recall CRLB: ()
var φˆ ≥
2σ 2
NA 2
=
1
N ⋅ SNR
For this problem… all methods for finding the MVUE will fail!!
⇒ So… try MLE!!
11
So first we write the likelihood function:
1 N −1 2
p ( x;φ ) =
1
exp − 2 ∑
[x[n ] − A cos(2πf o n + φ )]
(2πσ )
N
2 2 2σ $ n = 0!!!!!#!!!!! "
End up in same
GOAL: Find φ that … equivalent to place if we
maximizes this minimizing this maximize LLF
N −1
∂J (φ )
J (φ ) = ∑ [x[n ] − A cos(2πf o n + φ )]2
∆
So, minimize: Setting = 0 gives
n =0
∂φ
n =0 n = 0!!!!!
$ !#!!!!!!
"
≈0
sin and cos are ⊥ when summed over full cycles
( )
N −1
∑
So… MLE Phase Estimate satisfies:
x [ n ] sin 2πf o n + φˆ =0
Interpret via inner product or correlation n =0
12
Now…using a Trig Identity and then re-arranging gives:
cos(φ ) ∑ x[n ] sin (2πf o n ) = − sin(φ ) ∑ x[n ] cos(2πf o n )
ˆ ˆ
n n
cos(2πfot)
x(t) The “sums” in the above equation
-sin(2πfot) play the role of the LPF’s in the
figure (why?)
LPF
Thus, ML phase estimator can be
yq(t) viewed as: atan of ratio of Q/I
13
Monte Carlo Results for ML Phase Estimation
14
7.6 MLE for Transformed Parameters
Given PDF p(x;θ ) but want an estimate of α = g (θ )
What is the MLE for α ??
αˆ ML maximizes p(x; g −1 (α ))
g(θ )
2. α = g(θ ) is not a one-to-one function
θ
Need to define modified likelihood function:
pT ( x;α ) = max p ( x;θ )
{$
θ : α = g (θ )}
!! !#!!! "
• For each α, find all θ’s that map to it
αˆ ML maximizes pT (x;α ) • Extract largest value of p(x; θ ) over
this set of θ’s 1
Invariance Property of MLE Another Big
Advantage of MLE!
Theorem 7.2: Invariance Property of MLE
If parameter θ is mapped according to α = g(θ ) then the
MLE of α is given by
αˆ = g (θˆ)
where θˆ is the MLE for θ found by maximizing p(x;θ )
Note: when g(θ ) is not one-to-one the MLE for α maximizes
the modified likelihood function
“Proof”:
Easy to see when g(θ ) is one-to-one
2
Ex. 7.9: Estimate Power of DC Level in AWGN
x[n] = A + w[n] noise is N(0,σ2) & White
α = A2
Want to Est. Power: α = A2 ⇒
A
⇒ For each α value there are 2 PDF’s to consider
∑ ( x[n ] − α )
1 1
p T1 ( x ; α ) = exp − 2
( 2πσ 2 ) N / 2 2σ
2
n
∑ ( x[n ] + α )
1 1
p T2 ( x ; α ) = exp − 2
( 2πσ 2 ) N / 2 2σ
2
n
Then:
αˆ ML = arg max { p(x; }
α ), p ( x ;− α ) 2 Demonstration that
α ≥0
Invariance Result
= arg max p ( x ; A ) 2 Holds for this
− ∞ < A< ∞ Example
[ ]
= Aˆ ML 2
3
Ex. 7.10: Estimate Power of WGN in dB
x[n] = w[n] WGN w/ var = σ2 unknown
Recall: Pnoise = σ2
N −1
1
Can show that the MLE for variance is: Pˆnoise =
N
∑x
n=0
2
[n]
Invariance Property !
4
7.7: Numerical Determination of MLE
Note: In all previous examples we ended up with a closed-form
expression for the MLE: θˆ = f (x)
ML
5
So…we can’t always find a closed-form MLE!
But a main advantage of MLE is:
We can always find it numerically!!!
(Not always computationally efficiently, though)
6
Iterative Methods for Numerical MLE
Step #1: Pick some “initial estimate” θˆ0
Step #2: Iteratively improve it using
θˆi +1 = f (θˆi , x ) such that lim p( x ;θ i ) = max p( x;θ )
i →∞ θ
“Hill Climbing in the Fog”
p(x;θ ) Note: A so-called “Greedy”
maximization algorithm will
always move up even
though taking an occasional
θ step downward may be the
θˆ0 θˆ1 θˆ2
better global strategy!
Convergence Issues:
1. May not converge
2. May converge, but to local maximum
- good initial guess is needed !!
- can use rough grid search to initialize
- can use multiple initializations 7
Iterative Method: Newton-Raphson MLE
The MLE is the maximum of the LF… so set derivative to 0:
∂ ln p ( x;θ ) So… MLE is a
=0
$!∂# θ!" zero of g(θ )
∆
= g (θ )
Newton-Raphson is a numerical method for finding the zero
of a function… so it can be applied here… Linearize g(θ )
dg (θ ) Truncated
g (θ ) ≈ g (θ k ) + (θ − θˆk ) Taylor
dθ θ =θˆk
$!!!!!#!!!!!" Series
set = 0
solve for θˆk +1
g (θˆk )
θˆk +1 = θ k −
ˆ
θˆ2 θˆ1 θˆ0 θ dg (θ )
dθ θ =θˆk 8
∂ ln p( x;θ )
Now… using our “definition of convenience”: g (θ ) =
∂θ
So then the Newton-Raphson MLE iteration is:
2
−1 Iterate until
∂ ln p ( x;θ ) ∂ ln p ( x;θ )
θˆk +1 = θˆk − convergence
∂θ 2
∂θ criterion is met:
θ =θˆk
| θˆk +1 − θˆk |< ε
Look Familiar???
Looks like I(θ ), except: I(θ ) is evaluated at the You get to
true θ, and has an expected value choose!
Generally:
For a given PDF model, compute derivatives analytically…
or… compute derivatives numerically:
∂ ln p( x;θ ) ln p(x;θˆk + ∆θ ) − ln p( x;θˆk )
≈
∂θ θˆ ∆θ
k
9
Convergence Issues of Newton-Raphson:
1. May not converge
2. May converge, but to local maximum
- good initial guess is needed !!
- can use rough grid search to initialize
- can use multiple initializations
∂ ln p( x;θ )
∂θ
θˆ1
θˆ3 θˆ0 θˆ2 θ
∂f (θ)
∂θ Derivative w.r.t.
1
a vector
∂f (θ)
∂f ( θ ) ∂ θ 2
=
∂θ
&
∂f (θ)
∂θ p 11
Ex. 7.12: Estimate DC Level and Variance
x[n] = A + w[n] noise is N(0,σ2) and white
A
Estimate: DC level A and Noise Variance σ2 ⇒ θ =
σ 2
N −1
2
∑ [x[n ] − A]
1 1
LF is: p ( x; A, σ ) =2
exp −
(2πσ ) 2σ 2
N
2 2 n =0
∂ ln p ( x; θ) set
Solve: = 0
∂θ
N −1
∂ ln p ( x; θ)
∑ ( x[ n ] − A) = ( x − A) = 0
1 N
= 2
∂A σ n =0 σ 2 x
N −1 θ ML = 1
ˆ
∂ ln p ( x; θ)
∂σ 2
=−
N
2σ 2
+
1
2σ 4
∑ (x[n ] − A)2 = 0
N
∑( x[n] − x )
2
n =0 n
θˆ ML ~ N ( θ, I −1 ( θ ))
a
13
Ex. 7.12 Revisited
σ 2
A 0
cov{θˆ } =
N
It can be shown that: E{θ} =
ˆ
( N − 1) 2 2( N − 1) 4
σ 0 σ
N
N 2
σ 2
A 0
E{θˆ } ≈ = θ cov{θˆ } ≈ = I −1 (θ)
For large N then : N
2 4
σ 2 0 σ
N
which we see satisfies the asymptotic property.
This is why we
eA could “decouple”
the estimates
14
MLE for the General Gaussian Case
Let the data be general Gaussian: x ~ N (µ(θ), C(θ))
∂u(θ ) ∂C(θ )
Thus ∂ ln p(x;θ)/ ∂θ will depend in general on and
∂θ ∂θ
∂ ln p( x; θ)
For each k = 1, 2, . . . , p set: =0
∂θ k
This gives p simultaneous equations, the kth one being:
T −1
1 −1 ∂ C ( θ ) ∂µ ( θ ) T ∂C ( θ )
C ( θ ) [x − µ ( θ ) ] − [x − µ ( θ ) ] [x − µ ( θ ) ] = 0
− 1
− tr C ( θ ) + 1
2 ∂θ k ∂θ k 2 ∂ θ k
Note: for the “deterministic signal + noise” case: Terms #1 & #3 are zero
(
T −1 −1 T −1
Solving this gives: θ ML = H C H H C x
ˆ )
Hey! Same as chapter 4’s MVU for linear model
θˆ ML ~ N ( θ, ( H T C −1H ) −1 )
EXACT…
Not Asymptotic!!
16
Numerical Solutions for Vector Case
Obvious generalizations… see p. 187
Get
Numerically
17
7.9 Asymptotic MLE
Useful when data samples x[n] come from a WSS process
18
7.10 MLE Examples
We’ll now apply the MLE theory to several examples of
practical signal processing problems.
These are the same examples for which we derived the CRLB
in Ch. 3
1. Range Estimation
– sonar, radar, robotics, emitter location
2. Sinusoidal Parameter Estimation (Amp., Frequency, Phase)
– sonar, radar, communication receivers (recall DSB Example), etc.
3. Bearing Estimation
We – sonar, radar, emitter location
Will
Cover 4. Autoregressive Parameter Estimation
– speech processing, econometrics
See Book
1
Ex. 1 Range Estimation Problem
Transmit Pulse: s(t) nonzero over t∈[0,Ts]
Receive Reflection: s(t – τo)
Measure Time Delay: τo
Bandlimited s(t)
White Gaussian
PSD of w(t)
Ts t
No/2
BPF x(t) s(t – τo)
& Amp
–B B f
T t 2
Range Estimation D-T Signal Model
PSD of w(t) ACF of w(t)
No/2 σ2 = BNo
w[n ] 0 ≤ n ≤ no − 1
x[n ] = s[n − no ] + w[n ] no ≤ n ≤ no + M − 1
w[n ] no + M ≤ n ≤ N − 1
3
Range Estimation Likelihood Function
White and Gaussian ⇒ Independent ⇒ Product of PDFs
3 different PDFs – one for each subinterval
#1 # 2 # 3
N −1 2
∑ x [n ] no + M −1
p ( x; no ) = C N exp − n =0 2 • exp − ∑
1
( −2 x [ n ]s[ n − n 0 ] + s 2
[ n − n o ])
2σ 2σ 2
=
n n
o
must minimize this or maximize its
Does not depend on no negative over values of no 4
Range Estimation ML Condition
no + M −1 no + M −1
So maximize this: 2 ∑ x[n]s[n − n0 ] + ∑ s 2 [ n − no ]
n = no = no
n
N −1
So maximize this: ∑ x[n]s[n − n0 ]
n =0
Cxs[m]
Warning: When
signals are complex
m (e.g., ELPS) take find
no peak of |Cxs[m] |
N −1
C xs [m ] = ∑ x[n]s[n − m],
n =0
∆
= J ( A, Ω o ,φ )
For MLE: Minimize This
7
Sinusoid Parameter Estimation ML Condition
To make things easier…
Define:
c(Ωo) = [1 cos(Ωo) cos(Ωo2) … cos(Ωo(N-1))]T
s(Ωo) = [0 sin(Ωo) sin(Ωo2) … sin(Ωo(N-1))]T
and…
H(Ωo) = [c(Ωo) s(Ωo)] an Nx2 matrix
8
Then: J'(α1 ,α2,Ωo) = [x – H (Ωo) α]T [x – H (Ωo) α]
Looks like the linear model case… except for Ωo dependence of H (Ωo)
[
αˆ = H (Ω o )H(Ω o )
T
] −1
H T (Ω o )x
Then plug that into J'(α1 ,α2,Ωo):
J ′(αˆ1 ,αˆ 2 , Ω o ) = [x − H(Ω o )αˆ ]T [x − H(Ω o )αˆ ]
[ ]
= x T − αˆ T H T (Ω o ) [x − H(Ω o )αˆ ]
[ ]
2
−1 T
= x I − H(Ω o ) H (Ω o )H(Ω o ) H (Ω o ) x
T T
[
= I − H ( Ω o ) HT ( Ω o ) H ( Ω o ) ]
−1
HT ( Ω o )
[
= x T x − x T H(Ω o ) H T (Ω o )H(Ω o ) H T (Ω o )x
]−1
minimize w.r.t. Ωo
maximize 9
Sinusoid Parms. Exact MLE Procedure
Step 1: Maximize “this term” over Ωo to find Ω̂o
=Ω
ˆ
o arg max x T
H ( Ω
0<Ωo <π
)
o H {
T
( Ω o ) H ( Ω )
o
−1
H T
(Ωo )x }
Step 2: Use result of Step 1 to get Could Do Numerically
[ ˆ T ˆ −1 T
αˆ = H (Ω o )H(Ω o ) H (Ωˆ )x
o ]
Step 3: Convert Step 2 result by solving
αˆ1 = Aˆ cos(φˆ)
for Aˆ & φˆ
αˆ 2 = − Aˆ sin(φˆ)
10
Sinusoid Parms. Approx. MLE Procedure
First we look at a specific structure:
T −1
−1 c T (Ωo )x c T ( Ω o ) c ( Ω o ) c T ( Ω o ) s( Ω o ) c T ( Ω o ) x
x H(Ωo ) H (Ωo )H(Ωo ) H (Ωo )x =
T T T
T T T
s ( Ω ) x s ( Ω ) c ( Ω ) s T
( Ω ) s ( Ω o
) s ( Ω ) x
o
o o
o
o
−1
N
Then… if Ωo is not near 0 or π, then approximately 2 0
≈
0 N
2
and Step 1 becomes
2
{ }
N −1 2
∑ ( )
− jΩo n 2
=Ωo arg max
ˆ x[ =
n ]e arg max X Ω
0<Ωo <π
N n =0 0<Ω<π
φˆ = ∠X (Ω
ˆ )
o
11
The processing is implemented as follows:
Given the data: x[n], n = 0, 1, 2, … , N-1
1. Compute the DFT X[m], m = 0, 1, 2, … , M-1 of the data
• Zero-pad to length M = 4N to ensure dense grid of frequency points
• Use the FFT algorithm for computational efficiency
Ω 12
Ω̂ o
Ex. 3 Bearing Estimation MLE
Figure 3.8 Emits or reflects
from textbook: signal s(t)
s (t ) = At cos(2πf o t + φ )
Simple model
( ~
)
x[n ] = sn (t s ) + w[n ] = A cos Ω s n + φ + w[n ]
1/42
TDOA/FDOA LOCATION
ν21 = ω2 – ω1
s (t )
= constant
ν23 = ω2 – ω3
= constant
s (t − t3 )e jω3t
FDOA
Frequency-
Difference-
Of-
s (t − t1 )e jω1t Data
Arrival Link
s (t − t2 )e jω 2 t
Data Link
TDOA
Time- τ23 = t2 – t3
Difference- = constant
Of- τ21 = t2 – t1
Arrival = constant
Next 2/42
Classical TDOA/FDOA Emitter Location:
s (t ) s (t )
TDOA13
TDOA12 r1 (t ) FDOA13
FDOA12 r1 (t )
r1 (t ) TDOA12 TDOA13
FDOA12 FDOA13
CAF12 (τ , ω ) CAF13 (τ , ω )
Next 3/42
Stage 1: Estimating TDOA/FDOA
Next 4/42
SIGNAL MODEL
Will Process Equivalent Lowpass signal, BW = B Hz
– Representing RF signal with RF BW = B Hz
Sampled at Fs > B complex samples/sec XRF(f)
Collection Time T sec
f
At each receiver:
X(f)
Make f
BPF ADC LPE Equalize
Signal
XfLPE(f)
Next 5/42
DOPPLER & DELAY MODEL
s(t) sr(t) = s(t – τ(t))
Tx Rx
R(t)
Propagation Time: τ(t) = R(t)/c
R(t ) = Ro + vt + (a / 2)t 2 +
Use linear approximation – assumes small
change in velocity over observation interval
~
s (t ) = E (t )e j [ ω c t + φ ( t )]
Analytic Signal of Rx
~s (t ) = ~s ([1 − v / c]t − τ )
r d
j {ω c ([1− v / c ]t − τ d ) + φ ([1− v / c ]t − τ d )}
= E ([1 − v / c]t − τd )e
Now what? Notice that v << c (1 – v/c) ≈ 1
Say v = –300 m/s (–670 mph) then v/c = –300/3x108 = –10-6 (1 – v/c)=1.000001
Now assume E(t) & φ(t) vary slowly enough that
E ([1 − v / c]t ) ≈ E (t ) For the range of v
of interest
φ([1 − v / c]t ) ≈ φ(t )
Called Narrowband Approximation Next 7/42
DOPPLER & DELAY MODEL (continued)
Narrowband Analytic Signal Model
~s (t ) = E (t − τ )e j{ωc t − ωc ( v / c )t − ωc τ d + φ(t − τ d )}
r d
− jω c τ d − jω c ( v / c ) t jω c t jφ ( t − τ d )
=e e e E (t − τd )e
Constant Doppler Carrier Transmitted Signal’s
Phase Shift Term LPE Signal
Term Term Time-Shifted by τd
α= –ωcτd ωd= ωcv/c
∫ s(t ) dt
“Hz” 2 2
1 t
σ FDOA ≥ 2
Trms =
2π 2 T rms BT × SNR eff
∫
2
s (t ) dt
Problem with Stein’s CRLBs M. Fowler X. Hu, “Signal Models for TDOA/FDOA
Estimation,” IEEE T. AES, Oct. 2008.
Stein’s paper does not derive these CRLB results… rather they are just
stated.
There is no mention of what signal model is assumed….
And, it turns out that matters very much!!!
Next 9/42
TDOA/FDOA CRLB History Lesson
Next 10/42
M. Fowler X. Hu, “Signal Models
Signals: Sonar vs. RF for TDOA/FDOA Estimation,”
IEEE T. AES, Oct. 2008.
jφ jν 1nT
r1[n ] = e s( nT − τ 1 )e + w1[n ]
=
r2 [n ] s( nT ) + w2 [n ] Noise Model
• Zero-mean WSS processes
r1 • Gaussian
r= • Independent of each other
r2
This much is the same for each case…
At least when the narrowband approximation can be used…
which we assume here so we can focus on the impact of
differences in the statistical model.
Next 11/42
M. Fowler X. Hu, “Signal Models
Signal Models: Sonar vs. RF for TDOA/FDOA Estimation,”
IEEE T. AES, Oct. 2008.
Next 12/42
M. Fowler X. Hu, “Signal Models
PDFs: Sonar vs. RF for TDOA/FDOA Estimation,”
IEEE T. AES, Oct. 2008.
Next 15/42
A. Yeredor & E. Angel
Correct CRLB for RF Signals
=
r1[n ] s( nT ) + w1[n ]
N N
r2 [n ]= ae jφ s( nT − τ ) e jν nT + w2 [n ] − ≤ n ≤ −1
2 2
sτ [ n ]
Signal Model Noise Model
• Deterministic • Zero-mean WSS processes
• Complex Baseband • White (can generalize to colored noise)
• s[n] itself is UN-Known • Gaussian
- Must Estimate! • Independent of each other
• Complex Baseband
Define:
T
N N N
s s − s − + 1 s − 1
2 2 2
T
N N N
sτ sτ − s −
2 τ 2 + 1 sτ − 1
2 Next 16/42
A. Yeredor & E. Angel
Correct CRLB for RF Signals
Now using property of DFT: sτ = F Dτ Fs
H (Pad zeros to account
for DFT circular nature)
N
1 2π −2
F is (unitary) DFT matrix: =
F exp − j ⋅ nnT
N N
N
− + 1
2π n 2
𝐃𝐃𝜏𝜏 is “delay” matrix: =Dτ diag exp − j ⋅ n ⋅τ
N
N
−
Dν diag {exp ( − j ⋅ n ⋅ν )} 2 1
𝐃𝐃𝜈𝜈 is “doppler” matrix: =
r1 = s + v1
Then get:
=r2 ae jφ Dν F H Dτ F s + v 2
Unit: “samples”
Unit: “rad/sample”
( −π , π ) Models Doppler Models Delay
Next 17/42
A. Yeredor & E. Angel
Correct CRLB for RF Signals
r1 = s + v1
r2 ae jφ Dν F H Dτ F s + v 2 Recall: Must treat s as Unknown!
Qτ ,ν
s σ 12I 0
μθ E {r} = jφ = =
Cθ cov{r} Λ 2
ae D
ν F H
D F
τ s σ
0 2 I
No 𝛉𝛉 dependence!
General Gaussian FIM elements:
∂μ H ∂μθ −1 ∂ C θ −1 ∂ C θ
[ J θ ]ij 2 Re θ Cθ−1 + tr Cθ Cθ This term
∂θi
∂θ j ∂ θi ∂θj
is zero!
Easy Inversion! Next 18/42
A. Yeredor & E. Angel
Correct CRLB for RF Signals
r1 = s + v1 ∂μθ H −1 ∂μθ
J θ = 2 Re Λ
r2 ae jφ Dν F H Dτ F s + v 2 ∂θ
∂θ
Qτ ,ν θ = [ Re {s} Im {s} a φ τ ν ]
γ
(1 + η a 2 ) I 0 η a Re {B}
2
=Jθ
σ1
2
0 (1 + ηa ) I
2
η a Im {B}
η a Re {B } −η a Im {B } η Re {G G}
H H H
σ 12 ∂ae jφ Qτ ,ν s B e − jφ QτH,ν G
η 2 G
σ2
∂γ γ = [a φ τ ν ]
Now could get the CRLB matrix for full parameter vector:
CRLBθ = J θ−1
But we really only want w.r.t. 𝛄𝛄
Next 19/42
A. Yeredor & E. Angel
Correct CRLB for RF Signals
J −Re{
1
??
Define: CRLBθ = s},Im{s}
CRLB γ = J −γ 1
?? J −γ 1
Next 20/42
A. Yeredor & E. Angel
Correct CRLB for RF Signals
r1 = s + v1
r2 ae jφ Dν F H Dτ F s + v 2 The final result for the
Qτ ,ν FIM of interest is:
sH s −s H s′ s H Ns =s Q= s D F H
Dτ F s
τ ,ν ν
Jφ ,τ ,ν −s s′
= H
s′ s′
H
− Re {s′ Qτ ,ν Ns}
H H
2π H
′ =
s H Ns − Re {s′H QτH,ν Ns}
s F NFs
H N 2 s
s N
1 2π
=
F exp − j ⋅ nnT
N N
−2
N
So… use all these
2π N
− + 1 boxes to compute this
=Dτ diag exp − j ⋅ n ⋅τ
N n 2
J then invert it to get
=Dν diag {exp ( − j ⋅ n ⋅ν )} N the CRLB!
2 − 1
N = diag {n}
Next 21/42
A. Yeredor & E. Angel
Correct CRLB for RF Signals
We can interpret some of these FIM terms:
sH s −s H s′ s H Ns
Jφ ,τ ,ν −s s′
= H
s′ s′
H
− Re {s′ Qτ ,ν Ns}
H H
s H Ns − Re {s′H QτH,ν Ns} H N 2 s
s
2π
2
s s = ∑ s [n ] s′H s′ = ( ) N 2 ( Fs ) s H N 2 s ≈ s H N 2s
H 2 H
Fs
n N
∫ t s(t ) dt
2 2
∫
2 2
f S ( f ) df
2
Brms =
2
Trms =
∫
2
∫
2 s (t ) dt
S ( f ) df
Next 22/42
A. Yeredor & E. Angel
Correct CRLB for RF Signals
CRBs: “specific”
CRBFH: Fowler-Hu
CRBs: Wax for WSS Gaussian
Next 23/42
M. Fowler X. Hu, “Signal Models
MLE: Sonar vs. Radar/Comm for TDOA/FDOA Estimation,”
IEEE T. AES, Oct. 2008.
∂C ∂C ∂μ
− tr Cθ−1 θ + [r − μθ ] Cθ−1 θ Cθ−1 [r − μθ ] + 2 Re [r − μθ ] Cθ−1 θ =
H H
0
∂θ i ∂θ i ∂θ i
∂Cθ −1
−1 ∂s
H
r C θ Cθ r = 0 2 Re [r − sθ ] C−1 θ =
H
0
∂θi ∂θi
⇒ θˆ ML ,ac = arg max {−r H Cθ−1r} =
⇒ θˆ ML ,em arg max {2 Re {r H C−1sθ } − sθH C−1sθ }
θ θ
H( f )
Cross Find Peak
Correlate
H( f )
Next 25/42
S. Stein, “Differential Delay/Doppler
ML Estimator for TDOA/FDOA ML Estimation with Unknown
Signals,” IEEE Trans. on SP, 1993.
Next 26/42
S. Stein, “Differential Delay/Doppler ML Estimation with Unknown Signals,” IEEE Trans. on SP, 1993.
DFT View:
Minimize!
where:
X only in here!
So…
• First term of L1 is not needed
• Second term of L1 led to signal estimate
• Third term of L1 … look at now! Next
29/42
S. Stein, “Differential Delay/Doppler ML Estimation with Unknown Signals,” IEEE Trans. on SP, 1993.
where:
Next
30/42
S. Stein, “Differential Delay/Doppler
ML Estimator for TDOA/FDOA ML Estimation with Unknown
Signals,” IEEE Trans. on SP, 1993.
s1 (t ) “Compare”
LPE Rx Find
Signals Signals Peak
At Two Delay Doppler For all of
Receivers Delays &
τ ω |A(ω,τ)|
s2 (t ) Dopplers
jα jω d t Ambiguity Function
=e e s1 (t − τ d )
T
A(ω ,τ ) = ∫ s1 (t ) s2 (t + τ ) e − jωt dt
0
ω
τd
ωd τ
Next 31/42
COMPUTING THE AMBIGUITY FUNCTION
Direct computation based on the equation for the ambiguity
function leads to computationally inefficient methods.
Next 32/42
Stage 2: Estimating Geo-Location
Next 33/42
TDOA/FDOA LOCATION
Centralized Network of P
• “P-Choose-2” Pairs
“P-Choose-2” TDOA Measurements
“P-Choose-2” FDOA Measurements
• Warning: Watch out for Correlation
Effect Due to Signal-Data-In-Common
Data Data
Link Link
Data Link
Next 34/42
TDOA/FDOA LOCATION
Pair-Wise Network of P
• P/2 Pairs
P/2 TDOA Measurements
P/2 FDOA Measurements
• Many ways to select P/2 pairs
• Warning: Not all pairings are equally
good!!! The Dashed Pairs are Better
Next 35/42
TDOA/FDOA Measurement Model
Given N TDOA/FDOA measurements with corresponding 2×2 Cov. Matrices
(τˆ1 ,νˆ1 ), (τˆ2 ,νˆ2 ), , (τˆN ,νˆ N ) Assume pair-wise network, so…
C1 , C 2 , , C N TDOA/FDOA pairs are uncorrelated
Now, those are the TDOA/FDOA estimates… so the true values are notated as:
(τ 1 ,ν 1 ), (τ 2 ,ν 2 ), , (τ N ,ν N )
s2 n −1 = τ n , n = 1, 2, , N
s = [ s1 s2 s2 N ]T
s2 n = ν n , n = 1, 2, , N
“Signal” Vector
Next 36/42
TDOA/FDOA Measurement Model (cont.)
Each of these measurements r(n) has an error ε(n) associated with it, so…
r =s+ε
Because these measurements were estimated using an ML estimator (with
sufficiently large number of signal samples) we know that error vector ε is a
zero-mean Gaussian vector with cov. matrix C given by:
C1 0 0
Assumes that
C = diag{C1 , C 2 , , C N } = 0 0 TDOA/FDOA
pairs are
0 0 C N uncorrelated!!!
The true TDOA/FDOA values depend on:
Emitter Parms: (xe, ye, ze) and transmit frequency fe xe = [ xe ye ze fe ]T
Receivers’ Nav Data (positions & velocities): The totality of it called xr
To complete the model… we need to know how s(xe;xr) depends on xe and xr.
Thus we need to find TDOA & FDOA as functions of xe and xr
Next 37/42
TDOA/FDOA Measurement Model (cont.)
Here we’ll simplify to the x-y plane… extension is straight-forward.
Two Receivers with: (x1, y1, Vx1, Vy1) and (x2, y2, Vx2, Vy2)
Emitter with: (xe, ye)
(Let Ri be the range between Receiver i and the emitter; c is the speed of light.)
R1 − R2
s1 ( xe , y e ) = τ 12 =
c
1
= (x1 − xe )2 + ( y1 − ye )2 − (x2 − xe )2 + ( y2 − ye )2
c
s2 ( xe , y e , f e ) = ν 12 =
fe d
(R1 − R2 )
c dt
f e (x1 − xe )Vx1 + ( y1 − y e )Vy1 (x2 − xe )Vx2 + ( y 2 − y e )Vy 2
= −
2
c
( x − x )2
+ ( y − y )2
( x − x )2
+ ( y − y )
Next
1 e 1 e 2 e 2 e
38/42
CRLB for Geo-Location via TDOA/FDOA
Recall: For the General Gaussian Data case the CRLB depends on a FIM that
has structure like this:
T
∂μ x (θ) −1 ∂μ x (θ) 1 −1 ∂C x (θ) −1 ∂C x (θ)
[J (θ)]nm = C x (θ) + tr C x (θ) C x (θ)
∂
θ n ∂θ
m ∂θ ∂θ m
2
n
variability of mean w.r.t. parms variability of cov. w.r.t. parms
Here we have a deterministic “signal” plus Gaussian noise so we only have the
1st term… Using the notation introduced here gives…
−1
∂sT ( x e ) −1 ∂s( x e )
CCRLB ( x e ) = C ()
∂
x e ∂ x e
HT H
∂ ∂ ∂ ∂
∂xe ∂y e ∂z e ∂f e
CCRLB ( x e ) = H C H [ T −1
] −1
Next 40/42
CRLB for Geo-Loc. via TDOA/FDOA (cont.)
Geometry and TDOA vs. FDOA Trade-Offs
Next 41/42
Estimator for Geo-Location via TDOA/FDOA
Because we have used the ML estimator to get the TDOA/FDOA
estimates the ML’s asymptotic properties tell us that we have
Gaussian TDOA/FDOA measurements
Because the TDOA/FDOA measurement model is nonlinear it is
unlikely that we can find a truly optimal estimate… so we again
resort to the ML. For the ML of a Nonlinear Signal in Gaussian we
generally have to proceed numerically.
One way to do Numerical MLE is ML Newton-Raphson (need vector
version):
2
−1
∂ ln p ( x ; θ ) ∂ ln p ( x ; θ )
θˆ k +1 = θˆ k −
∂θ∂θ
T
∂θ
θ =θˆ k
1
8.3 The Least-Squares (LS) Approach
All the previous methods we’ve studied… required a
probabilistic model for the data: Needed the PDF p(x;θ)
signal
s[n;θ] strue[n;θ] x[n] = strue[n;θ] + w[n]
model + ∑ + ∑
+ + = s[n;θ] + e[n]
Similar to δ[n] w[n] noise
Fig. 8.1(a) (measurement model &
model
error) measurement error
error
2
Least-Squares Criterion
x[n] ε[n]
∑ Minimize the LS Cost
+–
N −1 N −1
s[n; θˆ ] J (θ) = ∑ ε [n ] =
2
∑ ( x [ n ] − s[ n ; θ ])2
signal n =0 n =0
θ̂ model
Say you know x[10] was poor in quality compared to other data…
You’d want to de-emphasize its importance in the sum of squares:
N −1
J (θ ) = ∑ n
w ( x [ n ] − s[ n ;θ ]) 2
n =0
4
8.4 Linear Least-Squares
A linear least-squares problem is one where the parameter
observation model is linear: s = Hθ x = Hθ + e
N×1
p×1 p = Order of the model
N×p Known Matrix
n =0
= (x − Hθ )T (x − Hθ )
Now, to minimize, first expand:
J (θ) = x T x − x T Hθ − θT HT x + θT HT Hθ
T T T T Scalar = scalarT So…
= x x − 2x Hθ + θ H Hθ
θTHTx = (θTHTx)T = xTHθ
∂J (θ) T T
Now setting = 0 gives − 2H x + 2H Hθˆ = 0
∂θ
Called the
“LS Normal Equations”
HT Hθˆ = HT x
( )
ˆθ = HT H −1 HT x
LS (
−1
ˆs LS = Hθˆ LS = H HT H HT x ) 6
Comparing the Linear LSE to Other Estimates
Model Estimate
x = Hθ + e
No Probability Model Needed
(
ˆθ = HT H −1 HT x
LS )
x = Hθ + w
PDF Unknown, White
ˆθ (
T
BLUE = H H
−1 T
H x )
If you
assume
x = Hθ + w
( )
Gaussian &
T −1 apply
θˆ
ML = H H HT x these…
PDF Gaussian, White
BUT you
are
WRONG…
x = Hθ + w
PDF Gaussian, White
ˆθ ( T
MVU = H H
−1 T
H x ) you at least
get the
LSE! 7
The LS Cost for Linear LS
For the linear LS problem…
−1
what is the resulting LS cost for using θˆ LS = HT H HT x ? ( )
( )( ) ( ) ( )
T
T T −1 T T −1 T
J min = x − Hθˆ LS x − Hθ LS = x − H H H H x x − H H H H x
ˆ
Properties of
Transpose
T
T
(
T −1 T
) T −1 T
= x − x H H H H x − H H H H x
( )
Factor out x’s
( −1
) −1 (
= x T I − x T H H T H H T I − H H T H H T x
%""""""""$" """""""
)
#
Easily Verified!
(
= I − H HT H )
−1
HT
Note: if AA = A then A is called idempotent
T
J min = x I − H H H
( T
)
−1
H T x
J min = x T x − x T H HT H ( )
−1
HT x
2
0 ≤ J min ≤ x 8
Weighted LS for Linear LS
Recall: de-emphasize bad samples’ importance in the sum of
squares: N −1
J (θ) = ∑ wn ( x[n] − s[n; θ])
2
n =0
ˆθ (
T
WLS = H WH
−1 T
)
H Wx
T
( T
J min = x W − WH H WH )
−1
H T W x
Rp RN N>p
Range (H) ⊂ RN
θ s
s lies in subspace of RN
x can lie anywhere in RN 10
LS Geometry Example N = 3 p = 2
Notation a bit different from the book
x=s+e
“noise” takes s out of
Range(H) and into RN
ε = x − sˆ ε ⊥ hi
e
h2
H columns lie in this s sˆ = θ1h1 + θ 2 h 2
plane = “subspace” h1
spanned by the columns
of H = S2
(Sp in general)
11
LS Orthogonality Principle
The LS error vector must be ⊥ to all columns of H
ε T H = 0T or HT ε = 0
Can use this property to derive the LS estimate:
H T ε = 0 ⇒ H T (x − Hθ ) = 0
T T T −1 T
⇒ H Hθ = H x ⇒ θ LS = H H H x
ˆ ( )
Rp RN Same answer as before…
H
but no derivatives to worry about!
θ
s
θ̂ Range (H) ⊂ RN
x
(HTH)-1HT Acts like an inverse from RN back to
Rp… called pseudo-inverse of H 12
LS Projection Viewpoint
From the R3 example earlier… we see that ŝ must lie
“right below” x
ˆ T
(−1 T
From our earlier results we have: sˆ = Hθ LS = H H H H x
%""$""#
)
∆
= PH
x
sˆ = PH x
13
Aside on Projections
If something is “on the floor”… its projection onto the floor = itself!
if z ∈ Range( H), then PH z = z
(
PH = H H H T
)
−1
HT Easily Verified
14
What Happens w/ Orthonormal Columns of H
T −1 T
Recall the general Linear LS solution: θ LS = H H H x
ˆ ( )
h1 , h1 h1 , h 2 ' h1 , h p
h ,h h2 , h2 ' h2 , h p
T
2 1
where H H=
& & ( &
h p , h1 h p , h2 ' h p ,h p
15
Geometry with Orthonormal Columns of H
Inner Product Between ith
Re-write this LS solution as: θˆi = hTi x Column and Data Vector
p p
Then we have: sˆ = Hθˆ = ∑θˆi h i = ∑ ( hTi x ) h i
% "$" #
i =1 i =1
Projection of x
x onto hi axis
h2 ( h T2 x ) h 2
When the columns of H are ⊥
we can first find the projection
h1 onto each 1-D subspace
independently, then add these
( h1T x ) h1 independently derived results.
sˆ = ( h1T x ) h1 + ( h T2 x ) h 2 Nice!
16
8.6 Order-Recursive LS s[n]
Motivate this idea with Curve Fitting
Given data: n = 0, 1, 2, . . ., N-1
s[0], s[1], . . ., s[N-1]
1 2 3 4
p
(# of parameters in model)
1
Choosing the Best Model Order
Q: Should you pick the order p that gives the smallest Jmin??
A: NO!!!!
Fact: Jmin(p) is monotonically non-increasing as order p increases
~ h3
h3
Orthogonalized
version of h3 S2 = 2-D space spanned
by h1 & h2
= Range(H2)
h2
h1
~ ~
h h k +1 ~
∆sˆ k +1 = x, ~ k +1 ~ use h k +1 = Pk⊥ h k +1
h k +1 h k +1
T~ T ⊥
x h k +1 ~ x Pk h k +1 ⊥
= h k +1 = P h
2 k k +1
~ 2
h k +1 Pk⊥ h k +1
%""$""#
scalar!
= H k θˆ k + ∆sˆ k +1 7
T ⊥
Now we have: x Pk h k +1 ⊥
sˆ k +1 = H k θˆ k + 2 Pk h k +1
⊥
Pk h k +1
Scalar…
can move here
and transpose
( I − Pk ) h k +1 hTk +1 Pk⊥ x
= H k θˆ k +
hTk +1 Pk⊥ h k +1
%""$"" #
Write out Pk⊥ Write out ||.||2 and use
that Pk⊥ is idempotent
scalar… define as b for convenience
Clearly this is θˆ k +1 8
Order-Recursive LS Solution
hTk +1 Pk⊥ x
−
θˆ k − ( H k H k ) H k h k +1 T
T 1 T
h k +1 Pk h k +1
⊥
k +1 =
ˆθ
T ⊥
h P
k +1 k x
T ⊥
h k +1 Pk h k +1
N
1 1 N −1
∑ x[n] = N + 1 N N ∑ x[n] + x[ N ]
… and this: ˆA = 1
N
N + 1 n =0
n =0
N ˆ 1
= AN −1 + x[ N ]
N$
% +#1 N +1
1
=1−
N +1
1
Aˆ N = Aˆ N −1 + ( x[ N ] − Aˆ N −1 )
%$# N +1 %$#
old estimate prediction of
the new data
%"""$"""#
prediction error
11
Weighted Sequential LS for DC-Level Case
This is an even better illustration… w[n] has unknown PDF
but has known time-
Assumed model: x[n ] = A + w[n ] var{w[n ]} = σ n2 dependent variance
N −1
∑ σ2
x[n ]
n =0 n
Standard WLS gives: Aˆ N −1 = N −1
∑σ 2
1
n =0 n
12
Exploring The Gain Term
1
We know that var( Aˆ N −1 ) = … and using it in kN …
N −1
1
∑ σ 2
n =0 n
“poorness” of
current estimate
var( Aˆ N −1 )
…we get that kN =
var( Aˆ N −1 ) + σ&
N
2 “poorness” of
new data
variance of
the new data
Note: 0 ≤ K[N] ≤ 1
Good Estimate
If var( Aˆ [ N − 1]) << σ n2 Bad Data
⇒ K[N ] ≈ 0
⇒ K[N ] ≈ 1
θˆ n −1 LS Estimate using x n −1
∆
Σ n −1 = { }
cov θˆ n −1 quality measure of estimate
16
Sequential LS Block Diagram
Σ n −1 , h n , σ n2
Updated
Compute Estimate
Observations Gain
hTn θˆ n −1
hTn z-1
θˆ n −1
Predicted
hn Previous
Observation
Estimate
17
8.8 Constrained LS
Why Constrain? Because sometimes we know (or believe!)
certain values are not allowed for θ
For example: In emitter location you may know that the emitter’s
range can’t exceed the “radio horizon”
You may also know that the emitter is on the left side of the
aircraft (because you got a strong signal from the left-side
antennas and a weak one from the right-side antennas)
1
Constrained LS Problem Statement
Say that Sc is the set of allowable θ values (due to constraints).
Then we seek θˆCLS ∈ S c such that
2 2
x − Hθˆ CLS = min x − Hθ
θ∈Sc
x2 contours of
(x – Hθ)T (x – Hθ)
2-D Linear Equality
Constraint
Constrained
Unconstrained Minimum
Minimum
x1 3
Constrained Optimization: Lagrange Multiplier
x2 f (x1,x2) contours Constraint: g(x1,x2) = C
g(x1,x2) – C = h(x1,x2) = 0
∂h ( x1 , x2 )
∂x a
∇h ( x1 , x2 ) =
1 =
∂h ( x1 , x2 )
b
∂x2
− 2H T x + 2H T Hθ + A T λ = 0 ⇒ θˆ c ( λ ) = $
H!T
H
−1 T
!#!!H ( −
1 T
" 2 H H
x ) ( )−1
AT λ
θˆ uc
Unconstrained Estimate
( ) ( )
−1
1
A θˆ uc − H T H
2
−1
A λ = b ⇒ λ c = 2A HT H
T
−1
A
T
(Aθˆ uc − b)
3. Plug in to get the constrained solution: θˆ c = θˆ c ( λ c )
( ) ( )
−1
θˆ c = θˆ uc
−1
!!
−1
− H T H A T A H T H A T Aθˆ uc − b
!!!!!
( )
$!!!!! !#!!! "
"Correction Term" Amount of
Constraint Deviation 5
Geometry of Constrained Linear LS
The above result can be interpreted geometrically:
s ŝ uc
ŝ c
Constraint Line
θ = [ A1 A2 A3 r% ]T
$!#!"
α
βT
Then we can write:
1 1 1
s(θ) = H( r )β
r r2 r 3
H( r ) =
&
& &
βˆ ( r ) = [HT ( r )H( r )]−1 H T ( r )x
r N −1 r 2( N −1) r 3( N −1)
Depends on only one variable… so
Then we need to minimize : might conceivably just compute on
[ ][ T
J ( r ) = x − H( r )βˆ ( r ) x − H( r )βˆ ( r ) ] a grid and find minimum
i =0
∂θ p
N −1
∂J (θ) ∂sθ [i ]
Taking these partials gives:
∂θ j
= −
%2 ∑ ($x[!#
i ] − sθ [i ])
! !" ! ∂θ j
can i =0
ignore ∆
=r
$#"
i
Why ? ∆
= hij
12
N −1
Now set to zero: ∑ ri hij = 0 for j = 1,…, p ⇒ g (θ) = HTθ rθ = 0
$=0#!
i! "
Matrix Depend nonlinearly on θ
×Vector
∂s θ [0 ] ∂sθ [0] ∂s θ [0 ]
'
∂θ 1 ∂θ 2 ∂θ p
( x [ 0 ] − s θ [ 0 ])
∂ s θ [1] ∂ s θ [1]
'
∂ s θ [1]
∂θ 1 ∂θ 2 ∂θ p
Hθ = rθ = &
& & ' &
( x [ N − 1] − s [ N − 1])
θ
∂ s θ [ N − 1] ∂ s θ [ N − 1] ∂ s θ [ N − 1]
'
∂θ 1 ∂θ 2 ∂θ p
N −1
Then the equation to solve is: g (θ) = HTθ rθ = ∑ rθ [n]hi (θ) = 0
n =0
13
For Newton-Raphson we linearize g(θ) around our current
estimate and iterate: Need this
∂g (θ) −1 T −1
∂H r
θˆ k +1 = θˆ k − g ( θ) = θˆ k − θ θ H θ rθ
T
∂θ ˆ ∂θ
θ =θk θ =θˆ
k
∂HTθ rθ ∂ N −1 N −1
∂rθ [n ]h n (θ) N −1∂h n (θ) N −1
∂rθ [n ]
∂θ
= ∑ n θ
∂θ n =0
h ( θ ) r [ n ] = ∑ ∂θ = ∑ ∂θ θ r [ n ] + ∑ n ∂θ
h ( θ )
n =0 n =0 $!#
! !" ! $ n =0!!#!!"
∆
= Gn (θ) − HTθ Hθ
Derivative of Product Rule
∂sθ [n ]
∂θ
1
∂sθ [n ]
[G (θ)]
n ij =
∂ 2 sθ [ n ]
∂θ i ∂θ j
i, j = 1,2,…, p
∂rθ [n ] ∂ ( x[n ] − sθ [n ])
∂θ
=
∂θ
∂θ 2
= −
&
∂sθ [n ]
∂HTθ rθ N −1
= ∑ G n (θ)(x[n ] − sθ [n ]) − HTθ H θ ∂θ
p
∂θ n =0 14
So the Newton-Raphson method becomes:
T −1
∂H r
θˆ k +1 = θˆ k − θ θ HTθ rθ
∂θ
θ =θˆ
k
−1
ˆ
T
θk
N −1
ˆ ( )
(
= θ k + H θˆ H ˆ − ∑ G n (θ k ) x[n ] − sθˆ [n ] HTθˆ x − s θˆ )
k k k k
n =0
[ {
≈ x − s θˆ
k
+ H θˆ (θ − θˆ k )
k
}] [x − {s
T
θˆ k
+ H θˆ (θ − θˆ k )
k
}]
= [x − s θ ] [x − s ]
T
θˆ k
+ H θˆ θˆ k − H θˆ θˆ k
+ H θˆ θˆ k − H θˆ θ
k k k k
∆ ∆
=y =y
All Known Things
16
This gives a form for the LS cost that looks like a linear problem!!
[
J (θ) = y − H θˆ θ
k
] [y − H θ]
T
θˆ k
= [H H ] H (x − s + H θˆ )
T −1 T
ˆθ
k θˆ k ˆθ
k θˆ k θˆ k k
= [H H ] H H θˆ + [H H ] ( )
T −1 T T −1
ˆθ θˆ k ˆθ θˆ k k ˆθ θˆ k
HθˆT x − s θˆ
$!!!#!!!"
k k k k k
=I
Gauss-Newton LS Iteration: [
θˆ k +1 = θˆ k + HθˆT H θˆ
k k
]
−1
(
HθˆT x − s θˆ
k k
)
Gauss-Newton LS Iteration Steps:
1. Start with an initial estimate
2. Iterate the above equation until change is “small”
17
Newton-Raphson vs. Gauss-Newton
How do these two methods compare?
θ
[
G-N: θˆ k +1 = θˆ k + HˆT H ˆ
θ k k
]
−1
(
HθˆT x − s θˆ
k k
)
−1
N-R: ˆθ
T N −1
( )
(
k +1 = θ k + H θˆ k H θˆ − ∑ G n (θ k ) x[n ] − sθˆ k [n ] H θˆ k x − s θˆ k
ˆ
k
ˆ
T
)
n =0
18
8.10 Signal Processing Examples of LS
We’ll briefly look at two examples from the book…
Book Examples
1. Digital Filter Design
2. AR Parameter Estimation for the ARMA Model
3. Adaptive Noise Cancellation
4. Phase-Locked Loop (used in phase-coherent
demodulation)
The two examples we will cover highlight the flexibility of the LS
viewpoint!!!
Then (in separate note files) we’ll look in detail at two emitter
location examples not in the book
1
Ex. 8.11 Filter Design by Prony’s LS Method
The problem:
• You have some desired impulse response hd[n]
• Find a rational TF with impulse response h[n] ≈ hd[n]
4
Ex. 8.13 Adaptive Noise Cancellation Done a bit
different from
the book
Desired Interference
x[ n ] = d [ n ] + i [ n ] dˆ[n ]
Σ
+
–
~ Estimate of the desired
i [n ] Adaptive iˆ[n ] signal… with
FIR Filter “cancelled” interference
Mother’s
Heartbeat
via Chest Adaptive filter has to mimic the TF of the
chest-to-stomach propagation
6
2. Noise Canceling Headphones
Noise
Music i[n ]
Signal ~ Ear
i [n ]
m[n ] m[n ] − iˆ[n ] m[n ] + i$
[n!− i!
]# ˆ[n ]
"
Σ cancel
+
–
Adaptive
FIR Filter
iˆ[n ]
7
3. Bistatic Radar System
t[n]
Desired Interference
x[ n ] = t [ n ] + d t [ n ]
dt[n] Σ
Tx +
–
tˆ[n ]
d[n] d[n] Adaptive dˆt [n ]
FIR Filter
d[n] Delay/Doppler
Radar
Processing
8
LS and Adaptive Noise Cancellation
Goal: Adjust the filter coefficients to cancel the interference
There are many signal processing approaches to this problem…
We’ll look at this from a LS point of view:
Adjust the filter coefficients to minimize J = ∑ dˆ 2 [n ]
n
Because i[n] is uncorrelated with
d[n] minimizing J is essentially the
Desired Interference
same as making this term zero
[x[k ] − iˆ[k ]] 2
n
J [n ] = ∑λ n −k
k =0
2
Small λ quickly
n p −1 “down weights”
∑λ x[k ] − ∑ hn [l ]i [k − l ]
n −k ~
=
k =0 l =0 the past errors
λ = forgetting factor if 0 < λ < 1