Académique Documents
Professionnel Documents
Culture Documents
Abstra t
In this book, the extended Kalman lter (EKF) has been used as the standard te
hnique for
performing re
ursive nonlinear estimation. The EKF algorithm, however, provides only an approx-
imation to optimal nonlinear estimation. In this
hapter, we point out the underlying assumptions
and
aws in the EKF, and present an alternative lter with performan
e superior to that of the
EKF. This algorithm, referred to as the uns
ented Kalman lter (UKF), was rst proposed by Julier
et al [24, 22, 23, and further developed by Wan and van der Merwe [54, 53, 49, 50.
The basi
dieren
e between the EKF and UKF stems from the manner in whi
h Gaussian
random variables (GRV) are represented for propagating through system dynami
s. In the EKF,
the state distribution is approximated by a GRV, whi
h is then propagated analyti
ally through
the rst-order linearization of the nonlinear system. This
an introdu
e large errors in the true
posterior mean and
ovarian
e of the transformed GRV, whi
h may lead to sub-optimal performan
e
and sometimes divergen
e of the lter. The UKF addresses this problem by using a deterministi
sampling approa
h. The state distribution is again approximated by a GRV, but is now represented
using a minimal set of
arefully
hosen sample points. These sample points
ompletely
apture the
true mean and
ovarian
e of the GRV, and when propagated through the true nonlinear system,
aptures the posterior mean and
ovarian
e a
urately to the 2nd order (Taylor series expansion)
for any nonlinearity. The EKF, in
ontrast, only a
hieves rst-order a
ura
y. No expli
it Ja
obian
or Hessian
al
ulations are ne
essary for the UKF. Remarkably, the
omputational
omplexity of
the UKF is the same order as that of the EKF.
Julier and Uhlman demonstrated the substantial performan
e gains of the UKF in the
ontext of
state estimation for nonlinear
ontrol. A number of theoreti
al results were also derived. This
hap-
ter reviews this work, and presents extensions to a broader
lass of nonlinear estimation problems,
in
luding nonlinear system identi
ation, training of neural networks, and dual estimation problems.
Additional material in
ludes the development of an Uns
ented Kalman Smoother (UKS), spe
i
a-
tion of e
ient re
ursive square-root implementations, and a novel use of the UKF to improve parti
le
lters [49.
1
7.1. INTRODUCTION
process
noise measurement
noise
vk nk
input output
uk x k +1 xk yk
F H
xk
state
noise is given by nk . Note that we are not assuming additivity of the noise sour
es. The system
dynami
model F and H are assumed known. A simple blo
k diagram of this system is shown in
Figure 7.1. In state estimation, the EKF is the standard method of
hoi
e to a
hieve a re
ursive
(approximate) maximum-likelihood estimation of the state xk . For
ompleteness we will review the
EKF and its underlying assumptions in Se
tion 7.2 to help motivate the presentation of the UKF
for state estimation in Se
tion 7.3.
Parameter Estimation
Parameter estimation, sometimes referred to as system identi
ation or ma
hine learning, involves
determining a nonlinear mapping
yk = G(xk ; w) (7.3)
where xk is the input, yk is the output, and the nonlinear map G() is parameterized by the ve
tor
w. The nonlinear map, for example, may be a feedforward or re
urrent neural network (w are the
weights), with numerous appli
ations in regression,
lassi
ation, and dynami
modeling. Learning
orresponds to estimating the parameters w. Typi
ally, a training set is provided with sample pairs
onsisting of known input and desired outputs, fxk ; dk g. The error of the ma
hine is dened as
ek = dk G(xk ; w), and the goal of learning involves solving for the parameters w in order to
minimize the expe
tation of some given fun
tion of the error.
While a number of optimization approa
hes exist (e.g., gradient des
ent using ba
kpropagation),
the EKF may be used to estimate the parameters by writing a new state-spa
e representation
wk+1 = wk + rk (7.4)
dk = G(xk ; wk ) + ek (7.5)
where the parameters wk
orrespond to a stationary pro
ess with identity state transition matrix,
driven by pro
ess noise rk (the
hoi
e of varian
e determines
onvergen
e and tra
king performan
e
and will be dis
ussed in further detail in se
tion 7.4). The output dk
orresponds to a nonlinear
observation on wk . The EKF
an then be applied dire
tly as an e
ient \se
ond-order" te
hnique
for learning the parameters. The use of the EKF for training neural networks has been developed
by Singhal and Wu [47 and Puskorious and Feldkamp [41, and is
overed in Chapter 2 of this text.
The use of the UKF in this role is developed in Se
tion 7.4.
Dual Estimation
A spe
ial
ase of ma
hine learning arises when the input xk is unobserved, and requires
oupling both
state estimation and parameter estimation. For these dual estimation problems, we again
onsider
2
7.2. OPTIMAL RECURSIVE ESTIMATION AND THE EKF
In the next se
tion we review optimal estimation to explain the basi
assumptions and
aws
with the EKF. This will motivate the use of the UKF as a method to amend these
aws. A detailed
development of the UKF is given in Se
tion 7.3. The remainder of the
hapter will then be divided
based on the appli
ation areas reviewed above. We
on
lude the
hapter in Se
tion 7.6 with the
Uns
ented Parti
le Filter, in whi
h the UKF is used to improve sequential Monte Carlo based ltering
methods. Appendix A provides a derivation of the a
ura
y of the UKF. Appendix B details an
e
ient square-root implementation of the UKF.
This re
ursion spe
ies the
urrent state density as a fun
tion of the previous density and the most
re
ent measurement data. The state-spa
e model
omes into play by spe
ifying the state-transition
1 Note that we do not write the impli
it dependen
e on the observed input uk , as it is not a random variable.
3
7.2. OPTIMAL RECURSIVE ESTIMATION AND THE EKF
probability p(xk jxk 1 ) and measurement probability or likelihood, p(yk jxx ). Spe
i
ally, p(xk jxk 1 )
is determined by the innovations noise density p(vk ) with the state-update equation
xk+1 = F(xk ; uk ; vk ): (7.12)
For example, given an additive noise model with Gaussian density, p(vk ) = N (0; Rv ), then p(xk jxk 1 ) =
N (F(xk 1 ; uk 1 ); Rv ). Similarly p(yk jxx ) is determined by the observation noise density p(nk ) and
the measurement equation
yk = H(xk ; nk ): (7.13)
In prin
iple, knowledge of these densities and the initial
ondition p(x0 jy0 ) = p(y0 jx0 )p(x0 )=p(y0 )
determines p(xk jY0k ) for all k . Unfortunately, the multi-dimensional integration indi
ated by Equa-
tions 7.9, 7.10 and 7.11 makes a
losed form solution intra
table for most systems. The only general
approa
h is to apply Monte-Carlo sampling te
hniques that essentially
onvert integrals to nite
sums, whi
h
onverge to the true solution in the limit. The parti
le lter dis
ussed in the last
se
tion of this
hapter is an example of su
h an approa
h.
If we make the basi
assumption that all densities remain Gaussian, then the Bayesian
re
ursion
an be greatly simplied. In this
ase, only the
onditional mean x^ k = E xk jY0k and
ovarian
e
Pxk need to be evaluated. It is straightforward to show that this leads to the re
ursive estimation
x^k = (predi
tion of xk ) + Kk [yk (predi
tion of yk ) (7.14)
Pxk = Pxk Kk Py~k KkT : (7.15)
While this is a linear re
ursion, we have not assumed linearity of the model. The optimal terms in
this re
ursion are given by
x^k = E [F(xk 1 ; uk 1 ; vk 1 ) (7.16)
Kk = Pxk yk Py y 1
~k ~k (7.17)
y^k = E [H(xk ; nk ); (7.18)
where the optimal predi
tion (i.e., prior mean) of xk is written as x^ k , and
orresponds to the
expe
tation of a nonlinear fun
tion of the random variables xk 1 and vk 1 (similar interpretation
for the optimal predi
tion y^ k ). The optimal gain term Kk is expressed as a fun
tion of posterior
ovarian
e matri
es (with y~ k = yk y^ k ). Note that evaluation of the
ovarian
e terms also require
taking expe
tations of a nonlinear fun
tion of the prior state variable. Pxk is the predi
tion of the
ovarian
e of xk and Py~ k is the
ovarian
e of y~ k .
The
elebrated Kalman lter [25
al
ulates all terms in these equations exa
tly in the linear
ase, and
an be viewed as an e
ient method for analyti
ally propagating a GRV through linear
system dynami
s. For nonlinear models, however, the EKF approximates the optimal terms as:
x^k F(^xk 1 ; uk 1 ; v ) (7.19)
Kk P^ x y P^ y~ 1y~
k k k k
(7.20)
y^k H(^xk ; n ); (7.21)
where predi
tions are approximated simply as the fun
tion of the prior mean value (no expe
tation
taken)2 . The
ovarian
es are determined by linearizing the dynami
equations (xk+1 Axk +
2 The noise means are denoted by n = E [n and v = E [v, and are usually assumed to equal zero.
4
7.3. THE UNSCENTED KALMAN FILTER
Bu uk + Bvk ; yk Cxk + Dnk ), and then determining the posterior
ovarian
e matri
es analyti
ally
for the linear system. In other words, in the EKF the state distribution is approximated by a
GRV, whi
h is then propagated analyti
ally through the \rst-order" linearization of the nonlinear
system. The expli
it equations for the EKF are given in Table 7.2. As su
h, the EKF
an be
viewed as providing \rst-order" approximations to the optimal terms3 . These approximations,
however,
an introdu
e large errors in the true posterior mean and
ovarian
e of the transformed
(Gaussian) random variable, whi
h may lead to sub-optimal performan
e and sometimes divergen
e
of the lter4 . It is these \
aws" whi
h will be addressed in the next se
tion using the UKF.
Initialize with:
x^0 = E [x0 (7.22)
Px0 = E [(x0 x^0 )(x0 x^ 0 )
T
(7.23)
For k 2 f1; : : : ; 1g, the time update equations of the extended Kalman lter are:
x^ k = F(^xk 1 ; uk 1 ; v ) (7.24)
Pxk = Ak 1 Pxk 1 ATk 1 + Bk Rv BTk (7.25)
EKF equations at the
urrent time step by redening the nominal state estimate and relinearizing the measurement
equations. It is
apable of providing better performan
e than the basi
EKF, espe
ially in the
ase of signi
ant
nonlinearity in the measurement fun
tion [20. We have not performed a
omparison to the UKF at this time, though
a similar pro
edure may also be adapted to iterate the UKF.
5
7.3. THE UNSCENTED KALMAN FILTER
(Taylor series expansion) for any nonlinearity. To elaborate on this, we begin by explaining the
uns
ented transformation.
6
7.3. THE UNSCENTED KALMAN FILTER
Weighted
sample mean
x y
+
( )
Weighted
-
f
1i
sample
covariance
Px Py
= (L + )
{ i } = x x + Px x P x
Valuable insight into the UT
an also be gained by relating it to a numeri
al te
hnique
alled Gaus-
sian quadrature numeri
al evaluation of integrals. Ito and Xiong [19 re
ently showed the relation
between the UT and the Gauss-Hermite quadrature rule5 in the
ontext of state estimation. A
lose
similarity also exists between the UT and the
entral dieren
e interpolation based ltering (CDF)
te
hniques developed separately by Ito and Xiong [19 and Nrgaard, Poulsen and Ravn [39. In
[50, van der Merwe and Wan shows how the UKF and CDF
an be unied in a general family of
derivative-free Kalman lters for nonlinear estimation.
A simple example is shown in Figure 7.3 for a 2-dimensional system: the left plot shows the true
mean and
ovarian
e propagation using Monte-Carlo sampling; the
enter plots show the results
using a linearization approa
h as would be done in the EKF; the right plots show the performan
e
of the UT (note only 5 sigma points are required). The superior performan
e of the UT is
lear.
Implementation Variations
For the spe
ial (but often found)
ase where the pro
ess and measurement noise are purely additive,
the
omputational
omplexity of the UKF
an be redu
ed. In su
h a
ase, the system state need not
R
5 In the s
alar
ase, the Gauss-Hermite rule is given by, 1 f (x) p1 e x2 dx =
Pmi=1 wif (xi ), where the equality
1 2
holds for all polynomials, f (), of degree up to 2m 1 and the quadrature points xi and weights wi are determined
a
ording to the rule type (See [19 for detail). For higher dimensions the Gauss-Hermite rule requires on the order
of mL fun
tional evaluations, where L is the dimension of the state. For the s
alar
ase, the UT with = 1; = 0
and = 2
oin
ides with the three-point Gauss-Hermite quadrature rule.
7
7.3. THE UNSCENTED KALMAN FILTER
sigma points
covariance
mean
y = f (
x) Y = f (X )
y = f (x) Py = AT PxA weighted sample mean
and covariance
f (
x) transformed
true mean sigma points
true covariance
UT mean
AT PxA
UT covariance
Figure 7.3: Example of the UT for mean and
ovarian
e propagation. a) a
tual, b) rst-order linearization
(EKF),
) UT.
be augmented with the noise RV's. This redu
es the dimension of the sigma points as well as the
total number of sigma points used. The
ovarian
es of the noise sour
es are then in
orporated into
the state
ovarian
e using a simple additive pro
edure. This implementation is given in Table 7.3.2.
The
omplexity of the algorithm is order L3 , where L is the dimension of the state. This is the
same
omplexity as the EKF. The most
ostly operation is in forming the sample prior
ovarian
e
matrix Pk . Depending on the form of F, this may be simplied, e.g., for univariate time-series or
with parameter estimation (see Se
tion 7.4) the
omplexity redu
es to order L2 .
A number of variations for numeri
al purposes are also possible. For example, the matrix square
root, whi
h
an be implemented dire
tly using a Cholesky fa
torization, is in general order L3 =6.
However, the
ovarian
e matri
es are expressed re
ursively, and thus the square-root
an be
om-
puted in only order M L2 (M is the dimension of the output yk ) by performing a re
ursive update
to the Cholesky fa
torization. Details of an e
ient re
ursive square-root UKF implementation is
given in Appendix B.
8
7.3. THE UNSCENTED KALMAN FILTER
Initialize with:
2 3
P0 0 0
Pa0 = E [(xa0 x^ a0 )(xa0 x^a0 )T = 4 0 Rv 0 5 (7.38)
0 0 Rn
For k 2 f1; : : : ; 1g,
Cal
ulate sigma points:
h q q i
X ak 1 = x^ ak 1 x^ ak 1 +
Pak 1 x^ ak 1
Pak 1 (7.39)
Time update:
X xkjk 1 = F[X xk 1 ; uk 1 ; X vk 1 (7.40)
2L
X
x^ k = Wi(m)Xi;k
x
jk 1 (7.41)
i=0
X2L
Pk = Wi(
)[Xi;k
x
jk 1 x^k [Xi;k
x
jk 1 x^k T (7.42)
i=0
Y kjk 1 = H[X xkjk 1 ; X nk 1 (7.43)
2L
X
y^k = Wi(m)Yi;kjk 1 (7.44)
i=0
9
7.3. THE UNSCENTED KALMAN FILTER
Initialize with:
x^ 0 = E [x0 (7.50)
P0 = E [(x0 x^ 0 )(x0 x^0 )
T
(7.51)
For k 2 f1; : : : ; 1g,
Cal
ulate sigma points:
h p p i
Xk 1 = x^ k 1 x^ k 1 +
Pk 1 x^k 1
Pk 1 (7.52)
Time update:
X kjk 1 = F[X k 1 ; uk 1 (7.53)
2L
X
x^k =
Wi(m) Xi;k (7.54)
jk 1
i=0
X2L
Pk =
Wi(
) [Xi;k jk 1
x^ k [Xi;k jk 1 x^ k T + Rv (7.55)
i=0
q q
(redraw sigma points)6 X kjk 1 = x^ k x^ k +
Pk x^k
Pk (7.56)
Y kjk 1 = H[X kjk 1 (7.57)
2L
X
y^k = Wi(m) Yi;kjk 1 (7.58)
i=0
6 Here we have to redraw a new set of sigma points to in orporate the ee t of the additive pro ess noise. Alter-
natively, we
ould augment the already propagated set of sigma points, X kjk 1 , with sigma points derived from the
matrix square root of the pro
ess noise
ovarian
e. This will also require the sigma point weights to be re
al
ulated
as the ee
tive number of sigma points has doubled. A possible advantage of this approa
h would be that we are not
dis
arding any odd-moment information
aptured by the propagated sigma points as would be the
ase if we redraw
them from Pk .
10
7.3. THE UNSCENTED KALMAN FILTER
2 l2 , m2
1 l1 , m1
u M
The UKF was originally designed for state estimation applied to nonlinear
ontrol appli
ations
requiring full-state feedba
k [24, 22, 23. We provide an example for a double inverted pendulum
ontrol system. In addition, we provide a new appli
ation example
orresponding to noisy time-series
estimation with neural networks.
m1
(m1 + 2m2 )l1 x
os 1 + 4( + m2 )l1 2 1 + 2m2 l1 l2 2
os(2 1 ) (7.66)
3
2
= (m1 + 2m2 )gl1 sin 1 + 2m2 l1 l2 _2 sin(2 1 ) (7.67)
4
m2 xl2
os 2 + 2m2 l1 l2 1
os(2 1 ) + m2 l2 2 2 (7.68)
3
2
= m2 gl2 sin 2 2m2 l1 l2 _1 sin(2 1 ) (7.69)
These
ontinuous-time dynami
s are dis
retized with a sampling period of 0.02 se
onds. The
pendulum is stabilized by applying a
ontrol for
e, u to the
art. In this
ase we use a state
dependent Ri
atti Equation (SDRE)
ontroller to stabilize the system 7 . A state estimator is run
outside the
ontrol loop in order to
ompare the EKF to UKF (i.e., the estimates states are not
7 An SDRE
ontroller [5 is designed by formulating the dynami
equations as x
k+1 = A(xk )xk + B(xk )uk Note,
this representation is not a linearization, but rather a reformulation of the nonlinear dynami
s into a pseudo-linear
11
7.3. THE UNSCENTED KALMAN FILTER
feedba
k for
ontrol for evaluation purposes). The observation
orresponds to noisy measurements of
the
art position,
art velo
ity, and angle of the top pendulum. This is a
hallenging problem, as no
measurements are made for the bottom pendulum, as well as angular velo
ity of the top pendulum.
For this experiment, the pendulum is initialized in a ja
k-knife position (+25/-24 degrees) with a
art oset of .5 meters. The resulting state estimates are shown in Figure 7.5. Clearly the UKF
is better able to tra
k the unobserved states 8 . If the estimated states are used for feedba
k in
the
ontrol loop, the UKF system is still able to stabilize the pendulum, while the EKF system
rashes. We will return to the double inverted pendulum problem later in this
hapter for both
model estimation and dual estimation.
observed observed
4 10
5
2
cart position
cart velocity
0
0
5
2
10
4 15
0 50 100 150 0 50 100 150
unobserved unobserved
1 10
pendulum 1 velocity
pendulum 1 angle
0.5 5
0
0
0.5
5
1
1.5 10
2 15
0 50 100 150 0 50 100 150
observed unobserved
0.5 10
pendulum 2 velocity
pendulum 2 angle
0 0
5
true state
0.5 noisy observation 10
EKF estimate
UKF estimate 15
1 20
0 50 100 150 0 50 100 150
time time
Figure 7.5: State estimation for double inverted pendulum problem. Only three noisy states are observed:
art
position,
art velo
ity and the angle of the top pendulum. [10dB SNR; alpha=1, beta=0, kappa=0
form. Based on this state-spa
e representation, we design an optimal LQR
ontroller, uk = R 1 BT (xk )P(xk )xk
K(xk )xk where P(xk ) is a solution of the standard Ri
atti Equations using state-dependent matri
es A(xk ) and
B(xk ). The pro
edure is repeated at every time step at the
urrent state xk and provides lo
al asymptoti
stability of
the plant [5. The approa
h has been found to be far more robust than LQR
ontrollers based on standard linearization
te
hniques, and as well as many alternative \advan
ed" nonlinear
ontrol approa
hes.
8 Note that if all 6 states are observed with noise, then the performan
e of the EKF and UKF are
omparable.
12
7.3. THE UNSCENTED KALMAN FILTER
x(k)
0
5
200 210 220 230 240 250 260 270 280 290 300
k
Estimation of MackeyGlass time series : UKF
5
clean
noisy
UKF
x(k)
5
200 210 220 230 240 250 260 270 280 290 300
k
Estimation Error : EKF vs UKF on MackeyGlass
1
EKF
UKF
0.8
normalized MSE
0.6
0.4
0.2
0
0 100 200 300 400 500 600 700 800 900 1000
k
Figure 7.6: Estimation of Ma
key-Glass time-series with the EKF and UKF using a known model. Bottom graph
shows
omparison of estimation errors for
omplete sequen
e.
13
7.3. THE UNSCENTED KALMAN FILTER
yk = 1 0 0 xk + nk (7.71)
In the estimation problem, the noisy-time series yk is the only observed input to either the EKF
or UKF algorithms (both utilize the known neural network model). Figure 7.6 shows a sub-segment
of the estimates generated by both the EKF and the UKF (the original noisy time-series has a 3dB
SNR). The superior performan
e of the UKF is
learly visible.
As has been dis
ussed, the Kalman lter is a re
ursive algorithm providing the
onditional expe
-
tation of the state xk given all observations Y0k up to the
urrent time k . In
ontrast, the Kalman
smoother, estimates the state given all observations past and future, Y0N , where N is the nal time.
Kalman smoothers are
ommonly used for appli
ations su
h as traje
tory planning, non
ausal noise
redu
tion, and the E-step in the EM-Algorithm [46, 12. A thorough treatment of the Kalman
smoother in the linear
ase is given in [29. The basi
idea is to run a Kalman lter forward in time
to estimate the mean and
ovarian
e (^xfk ,Pfk ) of the state given past data. A se
ond Kalman lter
is then run ba
kward in time to produ
e a ba
kward-time predi
ted mean and
ovarian
e (^xkb ,Pkb )
given the future data. These two estimates are then
ombined, produ
ing the following smoothed
statisti
s given all the data:
(Psk ) 1
= (Pfk ) 1 + (Pkb ) 1 (7.72)
x^sk = Psk [(Pkb ) 1 x^kb + (Pfk ) 1 x^fk : (7.73)
For the nonlinear
ase, the EKF repla
es the Kalman lter. The use of the EKF for the forward
lter is straightforward. However, implementation of the ba
kward lter is a
hieved by using the
following linearized ba
kward-time system.
xk 1 = A 1 xk + A 1 Bvk (7.74)
(7.75)
i.e. the forward nonlinear dynami
s are linearized, and then inverted for the ba
kward model. A
linear Kalman lter is then applied.
Our proposed uns
ented Kalman smoother (UKS), repla
es the EKF with the UKF. In addition,
we
onsider using a nonlinear ba
kward model as well, either derived from rst prin
iples, or by
training a ba
kward predi
tor using a neural network model, as illustrated for the time series
ase
in Figure 7.7. The nonlinear ba
kward model allows us to take full advantage of the UKF, whi
h
requires no linearization step.
To illustrate performan
e, we re
onsider the noisy Ma
key-Glass time-series problem of the pre-
vious se
tion, as well as a se
ond time series generated using
haoti
autoregressive neural network.
The table below
ompares smoother performan
e. In this
ase, the network models are trained on
the
lean time-series, and then tested on the noisy data using either the standard extended Kalman
smoother with linearized ba
kward model (EKS1), an extended Kalman smoother with a se
ond
nonlinear ba
kward model (EKS2), and the uns
ented Kalman smoother (UKS). The forward (F),
ba
kward (B), and smoothed (S) estimation errors are reported. Again, the performan
e benets of
the uns
ented approa
h is
lear.
14
7.4. UKF PARAMETER ESTIMATION
timeseries xk
xk L1 xk +1
Thus if the \noise"
ovarian
e Re is a
onstant diagonal matrix, then, in fa
t, it
an
els out of
the algorithm (this
an be shown expli
itly), and hen
e
an be set arbitrarily (e.g., Re = :5I).
Alternatively, Re
an be set to spe
ify a weighted MSE
ost. The innovations
ovarian
e E [rk rTk =
Rrk , on the other hand, ae
ts the
onvergen
e rate and tra
king performan
e. Roughly speaking,
the larger the
ovarian
e, the more qui
kly older data is dis
arded. There are several options on
how to
hoose Rrk :
Set Rrk to an arbitrary \xed" diagonal value, whi
h may then be \annealed" towards zero as
training
ontinues.
Set Rrk = (RLS
1
1)Pwk , where RLS 2 (0; 1 is often referred to as the \forgetting fa
tor", as
dened in the Re
ursive Least Squares (RLS) algorithm [14. This provides for an approximate
15
7.4. UKF PARAMETER ESTIMATION
exponentially de
aying weighting on past data and is des
ribed more fully in [37. Note, RLS
should not be
onfused with used for sigma point
al
ulation.
T
Set Rrk = (1 )Rrk 1 + Kw k dk G(xk ; w
^ ) dk G(xk ; w ^ ) (Kwk ) , whi
h is a Robbins-
T
Monro sto
hasti
approximation s
heme for estimating the innovations [32. The method
assumes the
ovarian
e of the Kalman update model should be
onsistent with the a
tual
update model. Typi
ally, Rrk is also
onstrained to be a diagonal matrix, whi
h implies an
independen
e assumption on the parameters. Note that a similar update may also be used for
Rek .
Our experien
e indi
ates that the \Robbins-Monro" method provides the fastest rate of absolute
onvergen
e and lowest nal MMSE values (see experiments in the next se
tion). The \xed" Rrk
in
ombination with annealing
an also a
hieve good nal MMSE performan
e, but requires more
monitoring and a greater prior knowledge of the noise levels. For problems where the MMSE is
zero, the
ovarian
e should be lower bounded to prevent the algorithm from stalling and potential
numeri
al problems. The \forgetting-fa
tor" and \xed" Rrk methods are most appropriate for on-
line learning problems in whi
h tra
king of time varying parameters is ne
essary. In this
ase, the
parameter
ovarian
e stays lower bounded, allowing the most re
ent data to be emphasized. This
leads to some misadjustment, but also keeps the Kalman gain su
iently large to maintain good
tra
king. In general, the various trade-os between these dierent approa
hes is still an area of open
resear
h.
The UKF represents an alternative to the EKF for parameter estimation. However, as the state-
transition fun
tion is linear, the advantage of the UKF may not be as obvious. Note the observation
fun
tion is still nonlinear. Furthermore, the EKF essentially builds up an approximation to the
expe
ted Hessian by taking outer produ
ts of the gradient. The UKF, however, may provide a
more a
urate estimate through dire
t approximation of the expe
tation of the Hessian. While both
the EKF and UKF
an be expe
ted to a
hieve similar nal MMSE performan
e, their
onvergen
e
properties may dier. In addition, a distin
t advantage of the UKF o
urs when either the ar
hite
-
ture or error metri
is su
h that dierentiation with respe
t to the parameters is not easily derived
as ne
essary in the EKF. The UKF ee
tively evaluates both the Ja
obian and Hessian pre
isely
through its sigma point propagation, without the need to perform any analyti
dierentiation.
Spe
i
equations for UKF parameter estimation are given in Table 7.4. Simpli
ations have
been made relative to the state UKF a
ounting for the spe
i
form of the state-transition fun
tion.
In Table 7.4 we have provided two options on how the fun
tion output d^ k is a
hieved. In the rst
option, the output is given as
2L
X
d^ k = Wi(m) Di;kjk 1 E [G(xk ; wk ) (7.92)
i=0
orresponding to the dire
t interpretation of the UKF equations. The output is the expe
ted value
(mean) of a fun
tion of the random variable wk . In the se
ond option, we have
d^ k = G(xk ; w
^k ) (7.93)
orresponding to the typi
al interpretation, in whi
h the output is the fun
tion with the
urrent
\best" set of parameters. This option yields
onvergen
e performan
e that is indistinguishable from
the EKF. The rst option, however, has dierent
onvergen
e
hara
teristi
s, and requires further
explanation. In the state-spa
e approa
h to parameter estimation, absolute
onvergen
e is a
hieved
when the parameter
ovarian
e Pwk goes to zero (this also for
es the Kalman gain to zero). At this
16
7.4. UKF PARAMETER ESTIMATION
Initialize with:
w
^ 0 = E [w (7.79)
Pw0 = E [(w w^ 0 )(w w
^ 0 )T (7.80)
For k 2 f1; : : : ; 1g,
Time update and sigma point
al
ulation:
w
^k = w
^k 1 (7.81)
Pwk = Pwk 1 + Rrk 1 (7.82)
q q
W kjk 1 = w
^k w
^ k +
Pwk w
^k
P wk (7.83)
Dkjk 1 = G[xk ; W kjk 1 (7.84)
2L
X
option 1: d^ k = Wi(m) Di;kjk 1 (7.85)
i=0
option 2: d^ k = G(xk ; w
^k ) (7.86)
Measurement update equations:
2L
X
Pd~ k d~ k = Wi(
) [Di;kjk 1 d^ k [Di;kjk 1 d^ k T + Rek (7.87)
i=0
X2L
Pwk dk = Wi(
) [Wi;kjk 1 ^ k [Di;kjk
w 1 d^ k T (7.88)
i=0
Kk = Pw d Pd~ 1d~ k k
k k
(7.89)
w ^ k + Kk (dk d^ k )
^k = w (7.90)
Pw = Pw Kk Pd~ d~ KkT
k k k k
(7.91)
p
where
= (L + ), =
omposite s
aling parameter, L=dimension of the state, Rr =pro
ess noise
ov., Re =measurement noise
ov., Wi =weights as
al
ulated in Eqn. 7.34.
Table 7.4.1: UKF - Parameter Estimation
point, the output for either option is identi
al. However, prior to this, the nite
ovarian
e provides
a form of averaging on the output of the fun
tion, whi
h in turn prevents the parameters from going
to the minimum of the error surfa
e. Thus the method may help avoid falling into lo
al minimum.
Furthermore, it provides a form of built in regularization for short or noisy data sets that are prone
to overtting (exa
t spe
i
ation of the level of regularization requires further study).
Note that the
omplexity of the UKF algorithm is still order L3 (L is the number of parameters),
due to the need to
ompute a matrix square-root at ea
h time step. An order L2
omplexity (same
as the EKF)
an be a
hieved by using a re
ursive square-root formulation as given in Appendix B.
17
7.4. UKF PARAMETER ESTIMATION
1
Learning Curves : MacKay RobotArm NN parameter estimation
10
EKF
training set error : MSE
UKF
2
10
3
10
5 10 15 20 25 30 35 40
epochs
0
Learning Curves : Ikeda NN parameter estimation
10
EKF
training set error : MSE
UKF
1
10
2 4 6 8 10 12 14 16 18 20
epochs
Figure 7.8: (top) Ma
Kay-Robot-Arm problem :
omparison of learning
urves for the EKF and UKF training,
2-12-2 MLP, 'annealing' noise estimation. (bottom) Ikeda
haoti
time series :
omparison of learning
urves for
the EKF and UKF training, 10-7-1 MLP, 'Robbins-Monro' noise estimation.
We have performed a number of experiments to illustrate the performan
e of the UKF parameter
estimation approa
h. The rst set of experiments
orrespond to ben
hmark problems for neural
network training, and serve to illustrate some of the dieren
es between the EKF and UKF, as well
as the dierent options dis
ussed above. Two parametri
optimization problems are also in
luded,
orresponding to model estimation of the double pendulum, and the ben
hmark \Rosenbro
k's
Banana" optimization problem.
18
7.4. UKF PARAMETER ESTIMATION
The Ma
key-Glass and Ikeda time-series are used. The plots show only
omparisons for the UKF
(dieren
es are similar for the EKF). In general the Robbins-Monro method is the most robust
approa
h with the fastest rate of
onvergen
e. In some examples, we have seen faster
onvergen
e
with the \annealed" approa
h, however, this also requires additional insight and heuristi
methods
to monitor the learning. We should re-iterate that the \xed" and \lambda" approa
hes are more
appropriate for on-line tra
king problems.
0
Learning Curves : Ikeda NN parameter estimation
10
fixed
training set error : MSE
lambda
anneal
RobbinsMonro
1
10
2 4 6 8 10 12 14 16 18 20
epochs
Learning Curves : MackeyGlass NN parameter estimation
fixed
training set error : MSE
0 lambda
10 anneal
RobbinsMonro
2
10
4
10
2 4 6 8 10 12 14 16 18 20
epochs
Figure 7.9: Neural network parameter estimation using dierent methods for noise estimation. (top) Ikeda
haoti
time series. (bottom) Ma
key-Glass
haoti
time series [UKF ; = 1e 4 = 2 = 3 n ; n = state
dimension
19
7.4. UKF PARAMETER ESTIMATION
averaged RMSE
0.5
0.5
x2
0
0.4
0.5
0.3
1 0.2
1 0.5 0 0.5 1 0 50 100 150 200
x1 epochs
NN Classification : EKF trained NN Classification : UKF trained
1 1
0.5 0.5
x2
x2
0 0
0.5 0.5
1 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1
x1 x1
Figure 7.10: Singhal and Wu's Four Region
lassi
ation problem. UKF settings: [ = 1e 4 =2=3 n
; n = state dimension ; 2-10-10-4 MLP ; Robbins-Monro ; 1 epo
h = 100 random examples.
20
7.4. UKF PARAMETER ESTIMATION
0
Inverted Double Pendulum : parameter estimation
10
EKF
UKF
model MSE
5
10
5 10 15 20 25 30 35 40
iteration
Figure 7.11: Inverted double pendulum parameter estimation. UKF settings [ = 1e 4 =2=3 n;n
= state dimension ; Robbins-Monro
21
7.4. UKF PARAMETER ESTIMATION
where the target \observation" is xed at zero, and ek is an error term resulting in the optimization of
the sum of instantaneous
osts Jk = eTk ek . The MSE
ost is optimized by setting ek = dk G(xk ; wk ).
However, arbitrary
osts (e.g.,
ross-entropy)
an also be minimized simply by spe
ifying ek appro-
priately. Further dis
ussion of this approa
h was given in Chapter 5 of this book. Reformulation
of the UKF equations requires
hanging only the ee
tive output to be ek , and setting the desired
response to zero.
For the example at hand, we set ek = [10(x2 x1 ) (1 x1 )T . Furthermore, sin
e this opti-
mization problem is a spe
ial
ase of 'noiseless' parameter estimation where the a
tual error
an be
minimized to zero, we make use of Equation 7.93 (Option 2) to
al
ulate the output of the UKF
algorithm. This will allow the UKF to rea
h the true minimum of the error surfa
e more rapidly9 .
We also set the s
aling parameter to a small value, whi
h we have found to be appropriate again
for zero MSE problems. Under these
ir
umstan
es the performan
e of the UKF and EKF is in-
distinguishable as illustrated in Figure 7.12. Overall, the the performan
e of the two lters are
omparable or superior to a number of alternative optimization approa
hes (e.g., Davidon-Flet
her-
Powell, Levenberg-Marquardt, et
. See \optdemo" in MATLAB). The main purpose of this example
was to illustrate the versatility of the UKF to general optimization problems.
20
Function Value
10
EKF
UKF
0
10
f(X)
20
10
40
10
1 2 3 4 5 6 7
k
20
Model Error
10
EKF
UKF
0
10
MSE
20
10
40
10
1 2 3 4 5 6 7
k
Figure 7.12: Rosenbro
k's 'Banana' optimization problem. UKF settings: [ = 1e 4 =2=3 n;n=
state dimension ; Fixed
9
Note that the use of Option 1, where the expe
ted value of the fun
tion is used as the output, essentially involves
averaging of the output based on the
urrent parameter
ovarian
e. This slows
onvergen
e in the
ase where zero
MSE is possible sin
e
onvergen
e of the state
ovarian
e to zero would also be ne
essary through proper annealing
of the state noise innovations Rr .
22
7.5. UKF DUAL ESTIMATION
In the the dual extended Kalman lter [52, a separate state-spa
e representation is used for
the signal and the weights. Two EKFs are run simultaneously for signal and weight estimation.
At every time-step, the
urrent estimate of the weights is used in the signal-lter, and the
urrent
estimate of the signal-state is used in the weight-lter. In the dual UKF algorithm, both state- and
weight-estimation are done with the UKF.
In the joint extended Kalman lter [36, the signal-state and weight ve
tors are
on
atenated
into a single, joint state ve
tor: [xTk wkT T . Estimation is done re
ursively by writing the state-spa
e
equations for the joint state as:
xk+1 F(xk ; uk ; wk ) + B vk
wk+1 = I wk rk (7.97)
xk
yk = 1 0 0 w + nk ; (7.98)
k
and running an EKF on the joint state-spa
e to produ
e simultaneous estimates of the states xk
and w. Again, our approa
h is to use the UKF instead of the EKF.
Noisy Time-Series
We present results on two time-series to provide a
lear illustration of the use of the UKF over the
EKF. The rst series is again the Ma
key-Glass-30
haoti
series with additive noise (SNR 3dB).
The se
ond time-series (also
haoti
)
omes from an autoregressive neural network with random
weights driven by Gaussian pro
ess noise and also
orrupted by additive white Gaussian noise (SNR
3dB). A standard 6-10-1 MLP with tanh hidden a
tivation fun
tions and a linear output layer
was used for all the lters in the Ma
key-Glass problem. A 5-3-1 MLP was used for the se
ond
problem. The pro
ess and measurement noise varian
es asso
iated with the state were assumed to
be known. Note that in
ontrast to the state estimation example in the previous se
tion, only the
noisy time-series is observed. A
lean referen
e is never provided for training.
Example training
urves for the dierent dual and joint Kalman based estimation methods are
shown in Figure 7.13. A nal estimate for the Ma
key-Glass series is also shown for the Dual UKF.
The superior performan
e of the UKF based algorithms are
lear.
23
7.5. UKF DUAL ESTIMATION
0.55
Chaotic AR neural network
0.5
Dual UKF
Dual EKF
Joint UKF
0.45 Joint EKF
normalized MSE
0.4
0.35
0.3
0.25
epoch
0.2
0 5 10 15 20 25 30
0.7
MackeyGlass chaotic time series
Dual EKF
Dual UKF
0.6 Joint EKF
Joint UKF
0.5
normalized MSE
0.4
0.3
0.2
0.1
epoch
0
5 10 15 20 25 30 35 40 45 50 55 60
5
200 210 220 230 240 250 260 270 280 290 300
k
Figure 7.13: Comparative learning
urves and results for the dual estimation experiments. Curves are averaged
over 10 and 3 runs respe
tively using dierent initial weights. \Fixed" innovation
ovarian
es are used in the
joint algorithms. \Annealed"
ovarian
es are used for the weight lter in the dual algorithms.
24
7.5. UKF DUAL ESTIMATION
Mode estimation
This example illustrates the use of the joint UKF for estimating the modes of a mass and spring sys-
tem (see Figure 7.14). This work was performed at the University of Washington by Mark Campbell
and Shelby Brunke. While, the system is linear, dire
t estimation of the natural frequen
ies, !1 and
!2 , jointly with the states is a nonlinear estimation problem. Figure 7.15
ompares the performan
e
of the EKF and the UKF. Note that the EKF does not
onverge to the true value for !2 . For
this experiment, the input pro
ess noise SNR is approximately 100 dB, and the measured positions
y1 and y2 have additive noise at a 60 db SNR (these settings ee
tively turn the task into a pure
parameter estimation problem). A xed innovations Rr was used for the parameter estimation in
the joint algorithms. Sampling was done at Nyquist (based on !2 ), whi
h emphasizes the ee
t of
linearization in the EKF. For faster sampling rates the performan
e of the EKF and UKF be
ome
more similar.
u1 = N (0,1) u2 = N (0,1)
x1 0 1 0 0 x1
x 211 0
2
0 x1
1 = 1
x2 0 0 0 1 x2
x2 0 0 22 2 22 x2
y1 (pos) y2 (pos)
20
19.5
(rad/s)
19
1
18.5 EKF
UKF
Actual
18
0 1 2 3 4 5 6 7 8 9 10
54
EKF
53.5 UKF
53 Actual
(rad/s)
52.5
52
2
51.5
51
50.5
0 1 2 3 4 5 6 7 8 9 10
Time (s)
25
7.5. UKF DUAL ESTIMATION
650 650
Velocity Estimation
Velocity Estimation
Actual Actual
UKF UKF
600 600 (no failure)
550 550
500 500
0 20 40 60 80 100 0 20 40 60 80 100
Altitude Estimation
Altitude Estimation
8000 8000
7800 7800
7600 7600
20 40 60 80 100 20 40 60 80 100
5 5
x 10 x 10
2 2
Lift Estimation
Lift Estimation
1.5 1.5
1 1
0.5 0.5
0 0
50 60 70 80 90 100 50 60 70 80 90 100
Time (sec) Time (sec)
Figure 7.16: F15 model joint estimation (Note that the estimated and true values of the state are indistinguish-
able at this resolution)
26
7.5. UKF DUAL ESTIMATION
1
angle estimates (rad)
angle 1
0.5
angle 2
0
2
true model parameters
parameters estimates
UKF estimates
1.5
cart mass
1
pendulum 2 length & mass
0.5
pendulum 1 length & mass
0
0 0.5 1 1.5 2
time (s)
Figure 7.17: Double Inverted Pendulum joint estimation. Estimated states and parameters. Only angle 1 and
angle 2 are plotted (in radians).
27
7.6. THE UNSCENTED PARTICLE FILTER
where the random samples fxk ; i = 1; : : : ; N g, are drawn from p(xk jY0k ) and () denotes the Dira
(i)
delta fun
tion. The posterior ltering density, p(xk jY0k ), is a marginal of the full posterior density
given by p(Xk0 jY0k ). Consequently, any expe
tations of the form
Z
E g(xk ) = g(xk )p(xk jY0k )dxk (7.99)
For example, letting g (x) = x yields the optimal MMSE estimate x^ k = E [xk jY0k . The parti
les x(ki)
are assumed to be independent and identi
ally distributed (i.i.d.) for the approximation to hold.
As N goes to innity, the estimate
onverges to the true expe
tation almost surely. Sampling from
the ltering posterior is only a spe
ial
ase of Monte Carlo simulation whi
h in general deals with
the
omplete posterior density, p(Xk0 jY0k ). We will use this more general form to derive the parti
le
lter algorithm.
It is often impossible to sample dire
tly from the posterior density fun
tion. However, we
an
ir
umvent this di
ulty by making use of importan
e sampling and alternatively sampling from
a known proposal distribution q (Xk0 jY0k ). The exa
t form of this distribution is a
riti
al design
issue and is usually
hosen in order to fa
ilitate easy sampling. The details of this is dis
ussed later.
Given this proposal distribution we
an make use of the following substitution,
28
7.6. THE UNSCENTED PARTICLE FILTER
Z
p(Xk0 jY0k ) k k k
E gk (X0 ) k
= gk (Xk0 ) q(X jY )dX0
q(Xk0 jY0k ) 0 0
Z
p(Yk jXk )p(Xk )
= gk (Xk0 ) 0k 0 k k0 q(Xk0 jY0k )dXk0
p(Y0 )q(X0 jY0 )
Z
w (Xk )
= gk (Xk0 ) k k0 q(Xk0 jY0k )dXk0
p(Y0 )
where the variables wk (Xk0 ) are known as the unnormalized importan
e weights
p(Y0k jXk0 )p(Xk0 )
wk = : (7.101)
q(Xk0 jY0k )
We
an get rid of the unknown normalizing density p(Y0k ) as follows
Z
1
E gk (Xk0 ) = gk (Xk0 )wk (Xk0 )q(Xk0 jY0k )dXk0
p(Y0k )
R
g (Xk )w (Xk )q(Xk jYk )dXk
= R k 0 k 0 q(X0k jYk0) 0
p(Y0k jXk0 )p(Xk0 ) q(X0k0 jY00k ) dXk0
R
gkR(Xk0 )wk (Xk0 )q(Xk0 jY0k )dXk0
=
wk (Xk0 )q(Xk0 jY0k )dXk0
E q (jY0k ) wk (Xk0 )gk (Xk0 )
= ;
E q (jY0k ) wk (Xk0 )
where the notation E q(jY0k ) has been used to emphasize that the expe
tations are taken over the
proposal distribution q (jY0k ).
A sequential update to the importan
e weights is a
hieved by expanding the proposal distribution
as q (Xk0 jY0k ) = q (X0k 1 jY0k 1 )q (xk jX0k 1 ; Y0k ), where we are making the assumption that the
urrent
state is not dependent on future observations. Furthermore, under our assumption that the states
orresponds to a Markov pro
ess and that the observations are
onditionally independent given the
states, we
an arrive at the re
ursive update:
p(yk jxk )p(xk jxk 1 )
wk = w k : (7.102)
q(xk jX0k 1 ; Y0k )
1
Equation (7.102) provides a me
hanism to sequentially update the importan
e weights given an
appropriate
hoi
e of proposal distribution, q (xk jX0k 1 ; Y0k ). Sin
e we
an sample from the proposal
distribution and evaluate the likelihood p(yk jxk ) and transition probabilities p(xk jxk 1 ), all we
need to do is generate a prior set of samples and iteratively
ompute the importan
e weights. This
pro
edure then allows us to evaluate the expe
tations of interest by the following estimate
P
i=1 g(x0:k )wk (x0:k )
(i) (i) N
1=N N X
E g(X0 )k
PN = g(x(0:i)k )wek (x(0:i)k ) (7.103)
1=N i=1 wk (x(0:i)k ) i=1
P
j =1 wk and x0:k denotes the i'th sample
where the normalized importan
e weights wek(i) = wk(i) = N (j ) (i)
traje
tory drawn from the proposal distribution, q (xk jX0 ; Y0k ). This estimate asymptoti
ally
k 1
29
7.6. THE UNSCENTED PARTICLE FILTER
onverges if the expe
tation and varian
e of g(Xk0 ) and wk exist and are bounded, and if the support
of the proposal distribution in
ludes the support of the posterior distribution. Thus, as N tends
to innity, the posterior density fun
tion
an be approximated arbitrarily well by the point-mass
estimate
N
X
pb(Xk0 jY0k ) = wek(i) (Xk0 x(0:i)k ) (7.104)
i=1
In the
ase of ltering we do not need to keep the whole history of the sample traje
tories, in that
only the
urrent set of samples at time k is needed to
al
ulate expe
tations of the form given in
Equations 7.99 and 7.100. To do this we simply set, g(Xk0 ) = g(xk ). These point-mass estimates
an approximate any general distribution arbitrarily well, limited only by the number of parti
les
used and how well the above mentioned importan
e sampling
onditions are met. In
ontrast, the
posterior distribution
al
ulated by the EKF is a minimum varian
e Gaussian approximation to the
true distribution, whi
h inherently
annot
apture
omplex stru
ture like multi-modalities, skewness
or other higher order moments.
30
7.6. THE UNSCENTED PARTICLE FILTER
cdf
1 sampling
N 1
Index
i
w t( j )
Figure 7.18: Resampling pro
ess, whereby a random measure fx(ki) ; wek(i) g is mapped into an equally weighted
random measure fx(kj ) ; N 1 g. The index i is drawn from a uniform distribution.
number of
hildren of ea
h parti
le is then set to Ni = NiA + NiB . This pro
edure is
omputationally
heaper than pure SIR and also has lower sample varian
e. Thus residual resampling is used for
all experiments in Se
tion 7.6.2 (in general we we have found that the spe
i
hoi
e of resampling
s
heme does not signi
antly ae
t the performan
e of the parti
le lter).
After the sele
tion/resampling step at time k , we obtain N parti
les distributed marginally
approximately a
ording to the posterior distribution. Sin
e the sele
tion step favors the
reation
of multiple
opies of the \ttest" parti
les, many parti
les may end up having no
hildren (Ni = 0),
whereas others might end up having a large number of
hildren, the extreme
ase being Ni = N for
a parti
ular value i. In this
ase, there is a severe depletion of samples. Therefore, and additional
pro
udeure is often required to introdu
e sample variety after the sele
tion step without ae
ting
the validity of the approximation they infer. This is a
hieved by performing a single MCMC step
on ea
h parti
le. The basi
idea is that if the parti
les are already distributed a
ording to the
posterior p(xk jY0k ) (whi
h is the
ase), then applying a Markov
hain transition kernel with the
same invariant distribution to ea
h parti
le results in a set of new parti
les distributed a
ording
to the posterior of interest. However, the new parti
les may move to more interesting areas of the
state-spa
e. Details on the MCMC step are given in [49. For our experiments in Se
tion 7.6.2 we
found the need for a MCMC step to be unne
essary. However, this
annot be assumed in general.
The pseudo-
ode of a generi
parti
le lter is presented in Algorithm 7.6.1. In implementing this
algorithm, the
hoi
e of the proposal distribution q (xk jX0k 1 ; Y0k ) is the most
riti
al design issue.
The optimal proposal distribution (whi
h minimizes the varian
e on the importan
e weights) is given
by [27, 30, 55,
q(xk jX0k 1 ; Y0k ) = p(xk jX0k 1 ; Y0k ); (7.106)
i.e., the true
onditional state density given the previous state history and all observations. Sampling
from this is, of
ourse, impra
ti
al for arbitrary densities (re
all the motivation for using importan
e
sampling in the rst pla
e). Consequently the transition prior is the most popular
hoi
e of proposal
31
7.6. THE UNSCENTED PARTICLE FILTER
Prior Likelihood
.
Figure 7.19: In
luding the most
urrent observation into the proposal distribution, allows us to move the samples
in the prior to regions of high likelihood. This is of paramount importan
e if the likelihood happens to lie in one
of the tails of the prior distribution, or if it is too narrow (low measurement error).
32
7.6. THE UNSCENTED PARTICLE FILTER
(d) Output: The output of the algorithm is a set of samples that
an be used to approximate the posterior
distribution as follows
XN
pb(xk jY0k ) = 1 xk x(ki)
N i=1
The optimal MMSE estimator is given as:
N
X
x^ k = E (xk jY0k ) N1 x(ki)
i=1
Similar expe
tations of the fun
tion g(xk )
an also be
al
ulated as a sample average.
have a greater support overlap with the true posterior distribution than the overlap a
hieved by the
EKF estimates. In addition, s
aling parameters used for sigma point sele
tion
an be optimized to
apture
ertain
hara
teristi
of the prior distribution if known, i.e., the algorithm
an be modied
to work with distributions that have heavier tails than Gaussian distributions su
h as Cau
hy or
Student-t distributions. The new lter that results from using a UKF for proposal distribution
generation within a parti
le lter framework is
alled the Uns
ented Parti
le Filter (UPF). Referring
to Algorithm 7.6.1 for the generi
parti
le lter, the rst item in the importan
e sampling step:
33
7.6. THE UNSCENTED PARTICLE FILTER
2L
X
Xk(jik)x 1 = F Xk(i)x1 ; uk ; Xk(i)v1 x (kij)k 1 = Wj(m) Xj;k
(i)x
jk 1 (7.113)
j =0
2L
X
P(kij)k 1 = Wj(
)[Xj;k
(i)x
jk 1 x (kij)k 1 [Xj;k
(i)x
jk 1 x (kij)k 1 T (7.114)
j =0
2L
X
Y (kij)k 1 =h Xk(jik)x 1 ; Xk(i)n1 yk(ij)k 1 = Wj(m) Yj;k
(i)
jk 1 (7.115)
j =0
2L
X
Py~k y~k = Wj(
) [Yj;k
(i)
jk 1 yk(ij)k 1 [Yj;k
(i)
jk 1 yk(ij)k 1 T (7.116)
j =0
2L
X
Pxk yk = Wj(
) [Xj;k
(i)
jk 1 x (kij)k 1 [Yj;k
(i)
jk 1 yk(ij)k 1 T (7.117)
j =0
The performan
e of the Uns
ented Parti
le Filter is
ompared on two estimation problems. The rst
problem is a syntheti
s
alar estimation problem and the se
ond is a real world problem
on
erning
the pri
ing of nan
ial instruments.
Syntheti
Experiment
For this experiment, a time-series was generated by the following pro
ess model
xk+1 = 1 + sin(!t) + 1 xk + vk ; (7.120)
34
7.6. THE UNSCENTED PARTICLE FILTER
where vk is a Gamma G a(3; 2) random variable modeling the pro
ess noise, and ! = 0:04 and
1 = 0:5 are s
alar parameters. A non-stationary observation model,
yk = 2 x2k + nk t 30 (7.121)
3 xk 2 + nk t > 30
is used, with 2 = 0:2 and 3 = 0:5. The observation noise, nk , is drawn from a Gaussian distribution
N (0; 0:00001). Given only the noisy observations, yk , the dierent lters were used to estimate the
underlying
lean state sequen
e xk for k = 1 : : : 60. The experiment was repeated 100 times with
random re-initialization for ea
h run. All of the parti
le lters used 200 parti
les and residual
resampling. The UKF parameters were set to = 1, = 0 and = 2. These parameters are
optimal for the s
alar
ase. Table 7.1 summarizes the performan
e of the dierent lters. The table
shows the means and varian
es of the mean-square-error (MSE) of the state estimates. Figure 7.20
ompares the estimates generated from a single run of the dierent parti
le lters. The superior
performan
e of the uns
ented parti
le lter (UPF) is
learly evident.
Algorithm MSE
mean var
extended Kalman lter (EKF) 0.374 0.015
Uns
ented Kalman Filter (UKF) 0.280 0.012
Parti
le Filter : generi
0.424 0.053
Parti
le Filter : MCMC move step 0.417 0.055
Parti
le Filter : EKF proposal 0.310 0.016
Parti
le Filter : EKF proposal and MCMC move step 0.307 0.015
Parti
le Filter : UKF proposal (\Uns
ented Parti
le Filter ") 0.070 0.006
Parti
le Filter : UKF proposal and MCMC move step 0.074 0.008
Table 7.1: State estimation experiment results. This plot shows the mean and varian
e of the MSE
al
ulated
over 100 independent runs.
35
7.6. THE UNSCENTED PARTICLE FILTER
6
E[x(t)]
True x
PF estimate
2 PFEKF estimate
PFUKF estimate
1
0 10 20 30 40 50 60
Time
Figure 7.20: Plot of estimates generated by the dierent lters on the syntheti state estimation experiment.
36
7.7. CONCLUSIONS
0.19
0.17
0.16
0.15
0.14
0.13
2900 2950 3000 3050 3100 3150 3200 3250 3300 3350 3400 3450
Strike prices
Figure 7.21: Probability smile for options on the FTSE-100 index (1994). Although the volatility smile indi
ates
that the option with strike pri
e equal to 3225 is under-pri
ed, the shape of the probability gives us a warning
against the hypothesis that the option is under-pri
ed. Posterior mean estimates obtained with Bla
k-S
holes
model and parti
le lter [*, 4-th order polynomial t [| and hypothesized volatility [o.
Option type Algorithm mean NSE
Trivial 0.078
Extended Kalman Filter (EKF) 0.037
Uns
ented Kalman Filter (UKF) 0.037
Call Parti
le Filter : generi
0.037
Parti
le Filter : EKF proposal 0.009
Uns
ented Parti
le Filter 0.009
Trivial 0.035
Extended Kalman Filter (EKF) 0.023
Uns
ented Kalman Filter (UKF) 0.023
Put Parti
le Filter : generi
0.023
Parti
le Filter : EKF proposal 0.007
Uns
ented Parti
le Filter 0.008
Table 7.2: One-step-ahead normalized square errors over 100 runs. The trivial predi
tion is obtained by assuming
that the pri
e on the following day
orresponds to the
urrent pri
e.
37
7.7. CONCLUSIONS
3
x 10
14
12
Interest rate
10
4
0 50 100 150 200 250
0.21
0.2
0.19
Volatility
0.18
0.17
0.16
0.15
0.14
0 50 100 150 200 250
Time (days)
lter. The UKF addresses many of the approximation issues of the EKF and
onsistently a
hieves
an equal or better level of performan
e at a
omparable level of
omplexity. The performan
e bene-
ts of the UKF based algorithms were demonstrated in a number of appli
ation domains, in
luding
state estimation, dual estimation, and parameter estimation.
There are a number of
lear advantages to the UKF. Firstly, the mean and
ovarian
e of the state
estimate is
al
ulated to se
ond order or better, as opposed to rst order in the EKF. This provides
for a more a
urate implementation of the optimal re
ursive estimation equations, whi
h is the basis
for both the EKF and UKF. While equations spe
ifying the UKF may appear more
ompli
ated than
the EKF, the a
tual
omputational
omplexity is equivalent. For state estimation, both algorithms
are in general order L3 (where L is the dimension of the state). For parameter estimation, both
algorithms are order L2 (where L is the number of parameters). An e
ient re
ursive square-root
implementation (see Appendix B) was ne
essary to a
hieve the level of
omplexity in the parameter-
estimation
ase. Furthermore, a distin
t advantage of the UKF is its ease of implementation. In
ontrast to the EKF, no analyti
al derivatives (Ja
obians or Hessians) need to be
al
ulated. The
utility of this is espe
ially valuable in situations where the system is a 'bla
k box' model in whi
h the
internal dynami
equations are unavailable. In order to apply an EKF to su
h systems, derivatives
must be found either from a prin
ipled analyti
re-derivation of the system, or through
ostly and
often ina
urate numeri
al methods (e.g., by perturbation). On the other hand, the UKF relies on
only fun
tional evaluations (inputs and outputs) through the use deterministi
ally drawn samples
from the prior distribution of the state random variable. From a
oding perspe
tive, this also allows
for a mu
h more general and modular implementation.
Even though the UKF has
lear advantages over the EKF, there are still a number of limitations.
As in the EKF, it makes a Gaussian assumption on the probability density of the state random vari-
able. Often this assumption is valid, and numerous real world appli
ations have been su
essfully
38
7.7. CONCLUSIONS
implemented based on this assumption. However, for
ertain problems (e.g., multi-modal obje
t
tra
king), a Gaussian assumption will not su
e, and the UKF (or EKF)
annot be applied applied
with
onden
e. In su
h examples, one has to resort to more powerful, but also more
ompu-
tationally expensive, ltering paradigms su
h as parti
le lters (see Se
tion 7.6. Finally, another
implementation limitation leading to some un
ertainty, is the ne
essity to
hoose the three uns
ented
transformation parameters (i.e., , and ). While we have attempted to provide some guidelines
on how to
hoose these parameters, the optimal sele
tion
learly depends on the spe
i
s of the
problem at hand, and is not fully understood. In general, the
hoi
e of settings does not appear
riti
al for state-estimation, but has a greater ae
t on performan
e and
onvergen
e properties for
for parameter-estimation. Our
urrent work fo
uses on addressing this issue through developing
a unied and adaptive way of
al
ulating the optimal value of these parameters. Other areas of
open resear
h in
lude utilizing the UKF for estimation of noise
ovarian
es, extension of the UKF
to re
urrent ar
hite
tures that may require dynami
derivatives (see Chapter 2 and 5), and the use
of the UKF and smoother in the Expe
tation-Maximization algorithm (see Chapter 6). Clearly, we
have only begun to s
rat
h the surfa
e of the numerous appli
ations that
an benet with use of the
UKF.
A
knowledgements
This work was sponsored in part by NSF under grants ECS-0083106 and IRI-9712346, and DARPA
under grant F33615-98-C-3516.
If we
onsider the prior variable x as being perturbed about a mean x by a zero-mean disturban
e
x with
ovarian
e Px , then the Taylor series expansion of the nonlinear transformation f (x) about
x is 1 (x r )n f (x)
X x
f (x) = f (x + x) = (7.124)
n=0
n ! x=x
If we dene the operator Dx f as
Dnx f , [(x rx )n f (x)x=x (7.125)
then the Taylor series expansion of the nonlinear transformation y = f (x)
an be written as
1 1 1
y = f (x) = f (x) + Dx f + D2x f + D3x f + D4x f + : : : (7.126)
2 3! 4!
39
7.7. CONCLUSIONS
The UT
al
ulates the posterior mean from the propagated sigma points using Equation 7.33.
The sigma points are given by
p
Xi = x (L + ) i
= x ~ i
P
where i denotes the i0 th
olumn12 of the matrix square root of Px . This implies that Li=1 ( i Ti ) =
Px . Given this formulation of the sigma points, we
an again write the propagation of ea
h point
through the nonlinear fun
tion as a Taylor series expansion about x ,
Yi = f (Xi )
1 1 1
= f (x) + D~ i f + D2~ i f + D3~ i f + D4~ i f + : : :
2 3! 4!
Using Equation 7.33, the UT predi
ted mean is
2L
1 X 1 1 1
yUT = f (x) + f (x) + D~ i f + D2~ i f + D3~ i f + D4~ i f + : : :
L+ 2(L + ) i=1 2 3! 4!
2L
1 X 1 1 1
= f (x) + D~ i f + D2~ i f + D3~ i f + D4~ i f + : : :
2(L + ) i=1 2 3! 4!
Sin
e the sigma points are symmetri
ally distributed around x , all the odd moments are zero. This
results in the following simpli
ation,
2L
1 X 1 2 1 1
yUT = f (x) + D~ i f + D4~ i f + D6~ i f + : : :
2(L + ) i=1 2 4! 6!
and sin
e
" #
1 X2L
1 2 1 2L
X p p
D f = (rf )T L + i i L + T
(rf )
2(L + ) i=1 2 ~ i 2(L + ) i=1
" #
2L
L+ 1 X
= (rf )T i T
(rf )
2(L + ) 2 i=1
i
1
=
2
rT Px r f (x) x=x
11 This in
ludes probability distributions like Gaussian, Student-T, et
.
12 See Se
tion 7.3 for details of exa
tly how the sigma points are
al
ulated.
40
7.7. CONCLUSIONS
where Ax is the Ja
obian matrix of f (x) evaluated at x . It
an be shown (using a similar approa
h
as for the posterior mean) that posterior
ovarian
e
al
ulated by the UT is given by
1 n T o
(Py )UT = Ax Px ATx 4 0
r Px r f (x) rT Px r f (x) T x=x
1
1 X2L X1X 1 1 T
+ Di~ k f Dj~ k f A
2(L + ) k=1 i=1 j =1 i!j !
| {z }
6
i=j =1
2 3
1X
X 1 1 2L X
X 2L T
2j
4 D2~ik f D ~ m f 5 (7.134)
i=1 j =1
(2i)!(2j )!4(L + )2 k=1 m=1
| {z }
6
i=j =1
Comparing Equations 7.133 and 7.134 it is
lear that the UT again
al
ulates the posterior
ovarian
e
a
urately to the rst two terms with errors only introdu
ed in the fourth and higher order moments.
41
7.7. CONCLUSIONS
Julier shows in [22 how the absolute term-by-term error of these higher order moments are again
onsistently smaller for the UT than for the linearized
ase whi
h trun
ates the Taylor series after
the rst term, i.e.,
(Py )LIN = Ax Px ATx : (7.135)
For this derivation, we assumed the value of the parameter in the UT to be 0. If prior
knowledge about the shape of the prior distribution of x is known,
an be set to a non-zero value
that minimizes the error in some of the higher ( 4) order moments. Julier shows in [21 how the
error in the kurtosis of the posterior distribution is minimized for a Gaussian x when = 2.
42
7.7. CONCLUSIONS
Initialize with:
x^ 0 = E [x0 S0 =
hol E [(x0 x^ 0 )(x0 x^ 0 )T (7.136)
For k 2 f1; : : : ; 1g,
Sigma point
al
ulation and time update:
Xk 1 = [^xk 1 x^ k 1 +
Sk x^ k 1
Sk (7.137)
X kjk 1 = F[X k 1 ; uk 1 (7.138)
2L
X
x^ k =
Wi(m)Xi;k (7.139)
jk 1
i=0
q p v
Sk = qr W1(
) X 1:2L;kjk 1 x^k R (7.140)
n o
Sk =
holupdate Sk ; X0;k x^ k ; W0(
) (7.141)
(redraw sigma points) 14
X kjk 1 = x^k x^k +
Sk x^k
Sk (7.142)
Y kjk 1 = H[X kjk 1 (7.143)
2L
X
y^k = Wi(m) Yi;kjk 1 (7.144)
i=0
14
Here we have to redraw a new set of sigma points to in
orporate the ee
t of the additive pro
ess noise. Alter-
natively, we
ould augment the already propagated set of sigma points, X kjk 1 , with sigma points derived from the
matrix square root of the pro
ess noise
ovarian
e. This will also require the sigma point weights to be re
al
ulated
as the ee
tive number of sigma points has doubled. A possible advantage of this approa
h would be that we are not
dis
arding any odd-moment information
aptured by the propagated sigma points as would be the
ase if we redraw
them from Pk .
43
7.7. CONCLUSIONS
Square-Root State-Estimation
As in the original UKF, the lter is initialized by
al
ulating the matrix square-root of the state
ovarian
e on
e via a Cholesky fa
torization (Eqn. 7.136). However, the propagated and updated
Cholesky fa
tor is then used in subsequent iterations to dire
tly form the sigma points. In Eqn. 7.140
the time-update of the Cholesky fa
tor, S , is
al
ulated using a QR de
omposition of the
ompound
matrix
ontaining the weighted propagated sigma points and the matrix square-root of the additive
pro
ess noise
ovarian
e. The subsequent Cholesky update (or downdate) in Eqn. 7.141 is ne
essary
sin
e the the zeroth weight, W0(
) , may be negative. These two steps repla
e the time-update of P
in Eqn. 7.55, and is also O(L3 ).
The same two-step approa
h is applied to the
al
ulation of the Cholesky fa
tor, Sy~ , of the
observation-error
ovarian
e in Eqns. 7.145 and 7.146. This step is O(LM 2 ), where M is the obser-
vation dimension. In
ontrast to the way the Kalman gain is
al
ulated in the standard UKF (see
Eqn. 7.61), we now use two nested inverse (or least squares ) solutions to the following expansion of
Eqn. 7.61, Kk (Sy~ k STy~ k ) = Pxk yk . Sin
e Sy~ is square and triangular, e
ient \ba
k-substitutions"
an be used to solve for Kk dire
tly without the need for a matrix inversion.
Finally, the posterior measurement update of the Cholesky fa
tor of the state
ovarian
e is
al-
ulated in Eqn. 7.150 by applying M sequential Cholesky downdates to Sk . The downdate ve
tors
are the
olumns of U = Kk Sy~ k . This repla
es the posterior update of Pk in Eqn. 7.63, and is also
O(LM 2 ).
Square-Root Parameter-Estimation
The parameter-estimation algorithm follows a similar framework as that of the state-estimation
square-root UKF. However, an O(ML2 ) algorithm, as opposed to O(L3 ), is possible by taking
advantage of the linear state transition fun
tion. Spe
i
ally, the time-update of the state
ovarian
e
is given simply by Pwk = Pwk 1 + Rrk 1 (see Se
tion 7.4 for dis
ussion on sele
ting Rrk 1 ). In the
square-root lters, Swk may thus be updated dire
tly in Eqn 7.153 using one of two options: 1) Swk =
1=2
RLS Swk 1 ,
orresponding to an exponential weighting on past data. 2) Swk = Swk 1 +Drk 1 , where
the diagonal matrix Drk 1 , is
hosen to approximate the ee
ts of annealing a diagonal pro
ess noise
ovarian
e Rrk 15 . Both options avoid the
ostly O(L3 ) QR and Cholesky based updates ne
essary
in the state-estimation lter.
15 This updates ensures the main diagonal of Pwk is exa
t. However, additional o-diagonal
ross-terms
Sw 1 DTr
k k 1
+ Drk 1 STwk 1 are also introdu
ed (though the ee
t appears negligible).
44
7.7. CONCLUSIONS
Initialize with:
^ 0 = E [w Sw0 =
hol E [(w w
w ^ 0 )(w w
^ 0 )T (7.151)
For k 2 f1; : : : ; 1g,
Time update and sigma point
al
ulation:
w
^k = w^k 1 (7.152)
1=2
Swk = RLS Swk 1 or Swk = Swk 1 + Drk 1 (7.153)
W kjk 1 = w^ k w^ k +
Swk w^ k
Swk
(7.154)
Dkjk 1 = G[xk ; W kjk 1 (7.155)
2L
X
d^ k = Wi(m) Di;kjk 1 (7.156)
i=0
45
Bibliography
[1 D. Avitzour. A sto
hasti
simulation Bayesian approa
h to multitarget tra
king. IEE Pro
eed-
ings on Radar, Sonar and Navigation, 142(2):41{44, 1995.
[2 E. R. Beadle and P. M. Djuri
. A fast weighted Bayesian bootstrap lter for nonlinear model
state estimation. IEEE Transa
tions on Aerospa
e and Ele
troni
Systems, 33(1):338{343, 1997.
[3 F. Bla
k and M. S
holes. The pri
ing of options and
orporate liabilities. Journal of Politi
al
E
onomy, 81:637{659, 1973.
[4 R. W. Brumbaugh. An Air
raft Model for the AIAA Controls Design Challenge. PRC In
.,
Edwards, CA.
[5 J. R. Cloutier, C. N. D'Souza, and C. P. Mra
ek. Nonlinear regulation and nonlinear H-innity
ontrol via the state-dependent Ri
ati equation te
hnique: Part1, Theory. In Pro
eedings of the
International Conferen
e on Nonlinear Problems in Aviation and Aerospa
e, Daytona Bea
h,
FL, May 1996.
[6 J. F. G. de Freitas. Bayesian Methods for Neural Networks. PhD thesis, Cambridge University
Engineering Department, 1999.
[7 J. F. G. de Freitas, M. Niranjan, A. H. Gee, and A. Dou
et. Sequential Monte Carlo methods
to train neural network models. Neural Computation, 12(4):955{993, 2000.
[8 A. Dou
et. On sequential simulation-based methods for Bayesian ltering. Te
hni
al Report
CUED/F-INFENG/TR 310, Department of Engineering, Cambridge University, 1998.
[9 A. Dou
et, J. F. G. de Freitas, and N. J. Gordon. Introdu
tion to sequential Monte Carlo
methods. In A. Dou
et, J F G de Freitas, and N. J. Gordon, editors, Sequential Monte Carlo
Methods in Pra
ti
e. Springer-Verlag, 2000.
[10 J. P. Dutton. Development of a Nonlinear Simulation for the M
Donnell Douglas F-15 Eagle
with a Longitudinal TECS Control Law. Master's thesis, Dept. of Aeronauti
s and Astronauti
s,
University of Washington, 1994.
[11 B. Efron. The Bootstrap, Ja
knife and other Resampling Plans. SIAM, Philadelphia, 1982.
[12 Z. Ghahramani and S. T. Roweis. Learning nonlinear dynami
al systems using an EM algo-
rithm. In M. J. Kearns, S. A. Solla, and D. A. Cohn, editors, Advan
es in Neural Information
Pro
essing Systems 11: Pro
eedings of the 1998 Conferen
e. MIT Press, 1999.
[13 N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approa
h to nonlinear/non-Gaussian
Bayesian state estimation. IEE Pro
eedings-F, 140(2):107{113, April 1993.
46
BIBLIOGRAPHY BIBLIOGRAPHY
47
BIBLIOGRAPHY BIBLIOGRAPHY
48
BIBLIOGRAPHY BIBLIOGRAPHY
[49 R. van der Merwe, J. F. G. de Freitas, A. Dou
et, and E. A Wan. The Uns
ented Parti
le Filter.
Te
hni
al Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department,
Cambridge, England, August 2000.
[50 R. van der Merwe and E. A. Wan. E
ient derivative-free kalman lters for online learning. In
To appear in European Symposium on Arti
ial Neural Networks (ESANN), Bruges, Belgium,
April 2001.
[51 R. van der Merwe and E. A. Wan. The square-root uns
ented kalman lter for state and
parameter-estimation. In To appear in International Conferen
e on A
ousti
s, Spee
h, and
Signal Pro
essing, Salt Lake City, Utah, May 2001.
[52 E. A. Wan and A. T. Nelson. Neural dual extended Kalman ltering: appli
ations in spee
h
enhan
ement and monaural blind signal separation. In Pro
. Neural Networks for Signal Pro-
essing Workshop. IEEE, 1997.
[53 E. A. Wan and R. van der Merwe. The Uns
ented Kalman Filter for Nonlinear Estimation.
In Pro
eedings of Symposium 2000 on Adaptive Systems for Signal Pro
essing, Communi
ation
and Control (AS-SPCC), Lake Louise, Alberta, Canada, O
tober 2000. IEEE.
[54 E. A. Wan, R. van der Merwe, and A. T. Nelson. Dual Estimation and the Uns
ented Transfor-
mation. In S.A. Solla, T.K. Leen, and K.-R. Muller, editors, Advan
es in Neural Information
Pro
essing Systems 12, pages 666{672. MIT Press, 2000.
[55 V. S. Zaritskii, V. B. Svetnik, and L. I. Shimelevi
h. Monte-Carlo te
hniques in problems of
optimal information pro
essing. Automation and Remote Control, 36(3):2015{2022, 1975.
49