Paper The Inscented Kalman Filter 2001

Chapter 7
The Uns ented Kalman Filter
Eri A. Wan and Rudolph van der Merwe

7.1. INTRODUCTION
Abstra t
In this book, the extended Kalman lter (EKF) has been used as the standard te hnique for
performing re ursive nonlinear estimation. The EKF algorithm, however, provides only an approx-
imation to optimal nonlinear estimation. In this hapter, we point out the underlying assumptions
and aws in the EKF, and present an alternative lter with performan e superior to that of the
EKF. This algorithm, referred to as the uns ented Kalman lter (UKF), was rst proposed by Julier
et al [24, 22, 23, and further developed by Wan and van der Merwe [54, 53, 49, 50.
The basi dieren e between the EKF and UKF stems from the manner in whi h Gaussian
random variables (GRV) are represented for propagating through system dynami s. In the EKF,
the state distribution is approximated by a GRV, whi h is then propagated analyti ally through
the rst-order linearization of the nonlinear system. This an introdu e large errors in the true
posterior mean and ovarian e of the transformed GRV, whi h may lead to sub-optimal performan e
and sometimes divergen e of the lter. The UKF addresses this problem by using a deterministi
sampling approa h. The state distribution is again approximated by a GRV, but is now represented
using a minimal set of arefully hosen sample points. These sample points ompletely apture the
true mean and ovarian e of the GRV, and when propagated through the true nonlinear system,
aptures the posterior mean and ovarian e a urately to the 2nd order (Taylor series expansion)
for any nonlinearity. The EKF, in ontrast, only a hieves rst-order a ura y. No expli it Ja obian
or Hessian al ulations are ne essary for the UKF. Remarkably, the omputational omplexity of
the UKF is the same order as that of the EKF.
Julier and Uhlman demonstrated the substantial performan e gains of the UKF in the ontext of
state estimation for nonlinear ontrol. A number of theoreti al results were also derived. This hap-
ter reviews this work, and presents extensions to a broader lass of nonlinear estimation problems,
in luding nonlinear system identi ation, training of neural networks, and dual estimation problems.
Additional material in ludes the development of an Uns ented Kalman Smoother (UKS), spe i a-
tion of e ient re ursive square-root implementations, and a novel use of the UKF to improve parti le
lters [49.
7.1 Introdu tion

The uns ented Kalman lter (UKF) represents a derivative-free alternative to the extended Kalman
lter (EKF), and provides superior performan e at an equivalent omputational omplexity. In
presenting the UKF, we will over a number of appli ation areas of nonlinear estimation in whi h
the EKF has been applied. General appli ation areas may be divided into state estimation, parameter
estimation (e.g., learning the weights of a neural network), and dual estimation (e.g., the Expe tation
Maximization (EM) algorithm). Ea h of these areas pla e spe i requirements on the UKF or EKF
and will be developed in turn. An overview of the framework for these areas is brie y reviewed next.
State estimation
The basi framework for the EKF involves estimation of the state of a dis rete-time nonlinear
dynami system,
xk+1 = F(xk ; uk ; vk ) (7.1)
yk = H(xk ; nk ); (7.2)
where xk represent the unobserved state of the system, uk is a known exogenous input, and yk is the
observed measurement signal. The pro ess noise vk drives the dynami system, and the observation
1
7.1. INTRODUCTION
process
noise measurement
noise
vk nk
input output
uk x k +1 xk yk
F H
xk
state
Figure 7.1: dis rete-time nonlinear dynami system
noise is given by nk . Note that we are not assuming additivity of the noise sour es. The system
dynami model F and H are assumed known. A simple blo k diagram of this system is shown in
Figure 7.1. In state estimation, the EKF is the standard method of hoi e to a hieve a re ursive
(approximate) maximum-likelihood estimation of the state xk . For ompleteness we will review the
EKF and its underlying assumptions in Se tion 7.2 to help motivate the presentation of the UKF
for state estimation in Se tion 7.3.
Parameter Estimation
Parameter estimation, sometimes referred to as system identi ation or ma hine learning, involves
determining a nonlinear mapping
yk = G(xk ; w) (7.3)
where xk is the input, yk is the output, and the nonlinear map G() is parameterized by the ve tor
w. The nonlinear map, for example, may be a feedforward or re urrent neural network (w are the
weights), with numerous appli ations in regression, lassi ation, and dynami modeling. Learning
orresponds to estimating the parameters w. Typi ally, a training set is provided with sample pairs
onsisting of known input and desired outputs, fxk ; dk g. The error of the ma hine is dened as
ek = dk G(xk ; w), and the goal of learning involves solving for the parameters w in order to
minimize the expe tation of some given fun tion of the error.
While a number of optimization approa hes exist (e.g., gradient des ent using ba kpropagation),
the EKF may be used to estimate the parameters by writing a new state-spa e representation
wk+1 = wk + rk (7.4)
dk = G(xk ; wk ) + ek (7.5)
where the parameters wk orrespond to a stationary pro ess with identity state transition matrix,
driven by pro ess noise rk (the hoi e of varian e determines onvergen e and tra king performan e
and will be dis ussed in further detail in se tion 7.4). The output dk orresponds to a nonlinear
observation on wk . The EKF an then be applied dire tly as an e ient \se ond-order" te hnique
for learning the parameters. The use of the EKF for training neural networks has been developed
by Singhal and Wu [47 and Puskorious and Feldkamp [41, and is overed in Chapter 2 of this text.
The use of the UKF in this role is developed in Se tion 7.4.
Dual Estimation
A spe ial ase of ma hine learning arises when the input xk is unobserved, and requires oupling both
state estimation and parameter estimation. For these dual estimation problems, we again onsider
2
7.2. OPTIMAL RECURSIVE ESTIMATION AND THE EKF
a dis rete-time nonlinear dynami system,

xk+1 = F(xk ; uk ; vk ; w) (7.6)
yk = H(xk ; nk ; w): (7.7)
where both the system states xk and the set of model parameters w for the dynami system must
be simultaneously estimated from only the observed noisy signal yk . Example appli ations in lude
adaptive nonlinear ontrol, noise redu tion (e.g., spee h or image enhan ement), determining the un-
derlying pri e of nan ial time-series, et . A general theoreti al and algorithmi framework for dual
Kalman based estimation was presented in Chapter 5 of this text. An Expe tation-Maximization
approa h was also overed in Chapter 6. Approa hes to dual estimation utilizing the UKF are de-
veloped in Se tion 7.5.
In the next se tion we review optimal estimation to explain the basi assumptions and aws
with the EKF. This will motivate the use of the UKF as a method to amend these aws. A detailed
development of the UKF is given in Se tion 7.3. The remainder of the hapter will then be divided
based on the appli ation areas reviewed above. We on lude the hapter in Se tion 7.6 with the
Uns ented Parti le Filter, in whi h the UKF is used to improve sequential Monte Carlo based ltering
methods. Appendix A provides a derivation of the a ura y of the UKF. Appendix B details an
e ient square-root implementation of the UKF.
7.2 Optimal Re ursive Estimation and the EKF

Given observations yk , the goal is to estimate the state xk . We make no assumptions about the
nature of the system dynami s at this point. The optimal estimate in the minimum mean-squared
error (MMSE) sense is given by the onditional mean:

x^k = E xk jY0k ; (7.8)
where Y0k is the sequen e of observations up to time k . Evaluation of this expe tation requires
knowledge of the a posteriori density p(xk jY0k ) 1 . Given this density, we an determine not only the
MMSE estimator, but any \best" estimator under a spe ied performan e riterion. The problem
of determining the a posterior density is in general referred to as the Bayesian approa h, and an
be evaluated re ursively a ording to the following relations,
p(xk jY0k 1 )p(yk jxk )
p(xk jY0k ) = (7.9)
p(yk jY0k 1 )
where Z
p(xk jY0k 1 ) = p(xk jxk 1 )p(xk 1 jY0k 1 )dxk 1 (7.10)
and the normalizing onstant p(yk jY0k ) is given by

Z
p(yk jY0 ) =
k 1
p(xk jY0k 1 )p(yk jxk )dxk (7.11)
This re ursion spe ies the urrent state density as a fun tion of the previous density and the most
re ent measurement data. The state-spa e model omes into play by spe ifying the state-transition
1 Note that we do not write the impli it dependen e on the observed input uk , as it is not a random variable.
3
7.2. OPTIMAL RECURSIVE ESTIMATION AND THE EKF
probability p(xk jxk 1 ) and measurement probability or likelihood, p(yk jxx ). Spe i ally, p(xk jxk 1 )
is determined by the innovations noise density p(vk ) with the state-update equation
xk+1 = F(xk ; uk ; vk ): (7.12)
For example, given an additive noise model with Gaussian density, p(vk ) = N (0; Rv ), then p(xk jxk 1 ) =
N (F(xk 1 ; uk 1 ); Rv ). Similarly p(yk jxx ) is determined by the observation noise density p(nk ) and
the measurement equation
yk = H(xk ; nk ): (7.13)
In prin iple, knowledge of these densities and the initial ondition p(x0 jy0 ) = p(y0 jx0 )p(x0 )=p(y0 )
determines p(xk jY0k ) for all k . Unfortunately, the multi-dimensional integration indi ated by Equa-
tions 7.9, 7.10 and 7.11 makes a losed form solution intra table for most systems. The only general
approa h is to apply Monte-Carlo sampling te hniques that essentially onvert integrals to nite
sums, whi h onverge to the true solution in the limit. The parti le lter dis ussed in the last
se tion of this hapter is an example of su h an approa h.
If we make the basi assumption that all densities remain Gaussian, then the Bayesian
re ursion
an be greatly simplied. In this ase, only the onditional mean x^ k = E xk jY0k and ovarian e
Pxk need to be evaluated. It is straightforward to show that this leads to the re ursive estimation
x^k = (predi tion of xk ) + Kk [yk (predi tion of yk ) (7.14)
Pxk = Pxk Kk Py~k KkT : (7.15)
While this is a linear re ursion, we have not assumed linearity of the model. The optimal terms in
this re ursion are given by
x^k = E [F(xk 1 ; uk 1 ; vk 1 ) (7.16)
Kk = Pxk yk Py y 1
~k ~k (7.17)
y^k = E [H(xk ; nk ); (7.18)
where the optimal predi tion (i.e., prior mean) of xk is written as x^ k , and orresponds to the
expe tation of a nonlinear fun tion of the random variables xk 1 and vk 1 (similar interpretation
for the optimal predi tion y^ k ). The optimal gain term Kk is expressed as a fun tion of posterior
ovarian e matri es (with y~ k = yk y^ k ). Note that evaluation of the ovarian e terms also require
taking expe tations of a nonlinear fun tion of the prior state variable. Pxk is the predi tion of the
ovarian e of xk and Py~ k is the ovarian e of y~ k .
The elebrated Kalman lter [25 al ulates all terms in these equations exa tly in the linear
ase, and an be viewed as an e ient method for analyti ally propagating a GRV through linear
system dynami s. For nonlinear models, however, the EKF approximates the optimal terms as:
x^k F(^xk 1 ; uk 1 ; v ) (7.19)
Kk P^ x y P^ y~ 1y~
k k k k
(7.20)
y^k H(^xk ; n ); (7.21)
where predi tions are approximated simply as the fun tion of the prior mean value (no expe tation
taken)2 . The ovarian es are determined by linearizing the dynami equations (xk+1 Axk +
2 The noise means are denoted by n = E [n and v = E [v, and are usually assumed to equal zero.
4
7.3. THE UNSCENTED KALMAN FILTER
Bu uk + Bvk ; yk Cxk + Dnk ), and then determining the posterior ovarian e matri es analyti ally
for the linear system. In other words, in the EKF the state distribution is approximated by a
GRV, whi h is then propagated analyti ally through the \rst-order" linearization of the nonlinear
system. The expli it equations for the EKF are given in Table 7.2. As su h, the EKF an be
viewed as providing \rst-order" approximations to the optimal terms3 . These approximations,
however, an introdu e large errors in the true posterior mean and ovarian e of the transformed
(Gaussian) random variable, whi h may lead to sub-optimal performan e and sometimes divergen e
of the lter4 . It is these \ aws" whi h will be addressed in the next se tion using the UKF.
Initialize with:
x^0 = E [x0 (7.22)
Px0 = E [(x0 x^0 )(x0 x^ 0 )
T
(7.23)
For k 2 f1; : : : ; 1g, the time update equations of the extended Kalman lter are:
x^ k = F(^xk 1 ; uk 1 ; v ) (7.24)
Pxk = Ak 1 Pxk 1 ATk 1 + Bk Rv BTk (7.25)
and the measurement update equations:
Kk = Px CTk (Ck Px CTk + Dk Rn DTk )

k k
1
(7.26)
x^k = x^k + Kk (yk H(^xk ; n )) (7.27)
Px = (I Kk Ck )Px :
k k
(7.28)
where

F(x; uk ; v ) F(^xk ; uk ; v) H(x; n ) H(^xk ; n)
Ak , ; Bk , ; Ck , ; Dk , ;
x
x^ v
v x x^ n n
k k
(7.29)
and where Rv , and Rn are the ovarian es of vk and nk , respe tively.
Table 7.2.1: extended Kalman lter (EKF) equations
7.3 The Uns ented Kalman Filter

The UKF addresses the approximation issues of the EKF. The state distribution is again represented
by a GRV, but is now spe ied using a minimal set of arefully hosen sample points. These sample
points ompletely apture the true mean and ovarian e of the GRV, and when propagated through
the true non-linear system, aptures the posterior mean and ovarian e a urately to the 2nd order
3 While \se ond-order" versions of the EKF exist, their in reased implementation and omputational omplexity
tend to prohibit their use.

4 A popular te hnique to improve the "rst-order" approa h is the Iterated EKF, whi h ee tively iterates the
EKF equations at the urrent time step by redening the nominal state estimate and relinearizing the measurement
equations. It is apable of providing better performan e than the basi EKF, espe ially in the ase of signi ant
nonlinearity in the measurement fun tion [20. We have not performed a omparison to the UKF at this time, though
a similar pro edure may also be adapted to iterate the UKF.
5
(Taylor series expansion) for any nonlinearity. To elaborate on this, we begin by explaining the
uns ented transformation.
Uns ented Transformation

The uns ented transformation (UT) is a method for al ulating the statisti s of a random variable
whi h undergoes a nonlinear transformation [23. Consider propagating a random variable x (di-
mension L) through a nonlinear fun tion, y = f (x). Assume x has mean x and ovarian e Px .
To al ulate the statisti s of y, we form a matrix X of 2L + 1 sigma ve tors Xi a ording to the
following:
X0 = x (7.30)
p
Xi = x + (L + )Px i i = 1; : : : ; L
p
Xi = x (L + )Px
i L
i = L + 1; : : : ; 2L
where = 2 (L + ) L is a s aling parameter. The onstant determines the spread of the

sigma points around x and is usually set to a small positive value (e.g., 1 1e 4). The
onstant is a se ondary s aling parameter whi h is usually set to 0 or 3 L (see [24 for details),
and is used to in orporate
p prior knowledge of the distribution of x (for Gaussian distributions,
= 2 is optimal). ( (L + )Px )i is the ith olumn of the matrix square root (e.g., lower triangular
Cholesky fa torization). These sigma ve tors are propagated through the nonlinear fun tion,
Yi = f (Xi ) i = 0; : : : ; 2L ; (7.31)
and the mean and ovarian e for y are approximated using a weighted sample mean and ovarian e
of the posterior sigma points,
2L
X
y Wi(m) Y i (7.32)
i=0
X2L
Py Wi( ) fY i y gfY i y gT (7.33)
i=0
with weights Wi given by

W0(m) = =(L + ) (7.34)
W0( ) = =(L + ) + (1 2 + )
Wi(m) = Wi( ) = 1=f2(L + )g i = 1; : : : ; 2L:
A blo k diagram illustrating the steps in performing the UT is shown in Figure 7.2. Note that
this method diers substantially from general Monte-Carlo sampling methods whi h require orders
of magnitude more sample points in an attempt to propagate an a urate (possibly non-Gaussian)
distribution of the state. The de eptively simple approa h taken with the UT results in approxima-
tions that are a urate to the 3rd order for Gaussian inputs for all nonlinearities. For non-Gaussian
inputs, approximations are a urate to at least the 2nd order, with the a ura y of third and higher
order moments determined by the hoi e of and . The proof of this is provided in Appendix A.
6
Weighted
sample mean
x y
+
( )
Weighted
-
f
1i
sample
covariance
Px Py
= (L + )
{ i } = x x + Px x P x
Figure 7.2: Blo k diagram of the UT
Valuable insight into the UT an also be gained by relating it to a numeri al te hnique alled Gaus-
sian quadrature numeri al evaluation of integrals. Ito and Xiong [19 re ently showed the relation
between the UT and the Gauss-Hermite quadrature rule5 in the ontext of state estimation. A lose
similarity also exists between the UT and the entral dieren e interpolation based ltering (CDF)
te hniques developed separately by Ito and Xiong [19 and Nrgaard, Poulsen and Ravn [39. In
[50, van der Merwe and Wan shows how the UKF and CDF an be unied in a general family of
derivative-free Kalman lters for nonlinear estimation.
A simple example is shown in Figure 7.3 for a 2-dimensional system: the left plot shows the true
mean and ovarian e propagation using Monte-Carlo sampling; the enter plots show the results
using a linearization approa h as would be done in the EKF; the right plots show the performan e
of the UT (note only 5 sigma points are required). The superior performan e of the UT is lear.
Uns ented Kalman Filter

The uns ented Kalman lter (UKF) is a straightforward extension of the UT to the re ursive esti-
mation in Equation 7.14, where the state RV is redened as the on atenation of the original state
and noise variables: xak = [xTk vkT nTk T . The UT sigma point sele tion s heme (Equation 7.30) is
applied to this new augmented state RV to al ulate the orresponding sigma matrix, X ak . The
UKF equations are given in Table 7.3.1. Note that no expli it al ulation of Ja obians or Hessians
are ne essary to implement this algorithm. Furthermore, the overall number of omputations are
the same order as the EKF.
Implementation Variations
For the spe ial (but often found) ase where the pro ess and measurement noise are purely additive,
the omputational omplexity of the UKF an be redu ed. In su h a ase, the system state need not
R
5 In the s alar ase, the Gauss-Hermite rule is given by, 1 f (x) p1 e x2 dx =
Pmi=1 wif (xi ), where the equality
1 2
holds for all polynomials, f (), of degree up to 2m 1 and the quadrature points xi and weights wi are determined
a ording to the rule type (See [19 for detail). For higher dimensions the Gauss-Hermite rule requires on the order
of mL fun tional evaluations, where L is the dimension of the state. For the s alar ase, the UT with = 1; = 0
and = 2 oin ides with the three-point Gauss-Hermite quadrature rule.
7
Actual (sampling) Linearized (EKF) UT
sigma points
covariance
mean

y = f (
x) Y = f (X )
y = f (x) Py = AT PxA weighted sample mean
and covariance
f (
x) transformed
true mean sigma points
true covariance
UT mean
AT PxA
UT covariance
Figure 7.3: Example of the UT for mean and ovarian e propagation. a) a tual, b) rst-order linearization
(EKF), ) UT.
be augmented with the noise RV's. This redu es the dimension of the sigma points as well as the
total number of sigma points used. The ovarian es of the noise sour es are then in orporated into
the state ovarian e using a simple additive pro edure. This implementation is given in Table 7.3.2.
The omplexity of the algorithm is order L3 , where L is the dimension of the state. This is the
same omplexity as the EKF. The most ostly operation is in forming the sample prior ovarian e
matrix Pk . Depending on the form of F, this may be simplied, e.g., for univariate time-series or
with parameter estimation (see Se tion 7.4) the omplexity redu es to order L2 .
A number of variations for numeri al purposes are also possible. For example, the matrix square
root, whi h an be implemented dire tly using a Cholesky fa torization, is in general order L3 =6.
However, the ovarian e matri es are expressed re ursively, and thus the square-root an be om-
puted in only order M L2 (M is the dimension of the output yk ) by performing a re ursive update
to the Cholesky fa torization. Details of an e ient re ursive square-root UKF implementation is
given in Appendix B.
8
Initialize with:
x^0 = E [x0 (7.35)

P0 = E [(x0 x^0 )(x0 x^ 0 )T (7.36)
x^ 0 = E [xa = [^xT0 0 0T
a
(7.37)
2 3
P0 0 0
Pa0 = E [(xa0 x^ a0 )(xa0 x^a0 )T = 4 0 Rv 0 5 (7.38)
0 0 Rn
For k 2 f1; : : : ; 1g,
Cal ulate sigma points:
h q q i
X ak 1 = x^ ak 1 x^ ak 1 + Pak 1 x^ ak 1 Pak 1 (7.39)
Time update:
X xkjk 1 = F[X xk 1 ; uk 1 ; X vk 1 (7.40)
2L
X
x^ k = Wi(m)Xi;k
x
jk 1 (7.41)
i=0
X2L
Pk = Wi( )[Xi;k
x
jk 1 x^k [Xi;k
x
jk 1 x^k T (7.42)
i=0
Y kjk 1 = H[X xkjk 1 ; X nk 1 (7.43)
2L
X
y^k = Wi(m)Yi;kjk 1 (7.44)
i=0
Measurement update equations:

2L
X
Py~k y~ k = Wi( ) [Yi;kjk 1 y^k [Yi;kjk 1 y^k T (7.45)
i=0
X2L
Pxk yk = Wi( ) [Xi;kjk 1 x^ k [Yi;kjk 1 y^k T (7.46)
i=0
Kk = Px y Py~ 1y~
k k k k
(7.47)
x^k = x^k + Kk (yk y^k ) (7.48)
Pk = Pk Kk Py~ y~ KkT k k
(7.49)
p
where, xa = [xT vT nT T , X a = [(X x )T (X v )T (X n )T T , = (L + ), = omposite s aling
parameter, L=dimension of augmented state, Rv =pro ess noise ov., Rn =measurement noise ov.,
Wi =weights as al ulated in Eqn. 7.34.
Table 7.3.1: Uns ented Kalman Filter (UKF) equations
9
Initialize with:
x^ 0 = E [x0 (7.50)
P0 = E [(x0 x^ 0 )(x0 x^0 )
T
(7.51)
For k 2 f1; : : : ; 1g,
h p p i
Xk 1 = x^ k 1 x^ k 1 + Pk 1 x^k 1 Pk 1 (7.52)
Time update:
X kjk 1 = F[X k 1 ; uk 1 (7.53)
2L
X
x^k =
Wi(m) Xi;k (7.54)
jk 1
i=0
X2L
Pk =
Wi( ) [Xi;k jk 1

x^ k [Xi;k jk 1 x^ k T + Rv (7.55)
i=0
q q
(redraw sigma points)6 X kjk 1 = x^ k x^ k + Pk x^k Pk (7.56)
Y kjk 1 = H[X kjk 1 (7.57)
2L
X
y^k = Wi(m) Yi;kjk 1 (7.58)
i=0

2L
X
Py~k y~k = Wi( ) [Yi;kjk 1 y^k [Yi;kjk 1 y^k T + Rn (7.59)
i=0
X2L
i=0
Kk = Px y Py~ 1y~
k k k k
(7.61)
x^k = x^k + Kk (yk y^k ) (7.62)
Pk = Pk Kk Py~ y~ KkT k k
(7.63)
p
where = (L + ), = omposite s aling parameter, L=dimension of the state, Rv =pro ess noise
ov., Rn =measurement noise ov., Wi =weights as al ulated in Eqn. 7.34.
Table 7.3.2: UKF - additive (zero mean) noise ase
6 Here we have to redraw a new set of sigma points to in orporate the ee t of the additive pro ess noise. Alter-
natively, we ould augment the already propagated set of sigma points, X kjk 1 , with sigma points derived from the
matrix square root of the pro ess noise ovarian e. This will also require the sigma point weights to be re al ulated
as the ee tive number of sigma points has doubled. A possible advantage of this approa h would be that we are not
dis arding any odd-moment information aptured by the propagated sigma points as would be the ase if we redraw
them from Pk .
10
2 l2 , m2
1 l1 , m1
u M
Figure 7.4: Double inverted pendulum
7.3.1 State Estimation Examples
The UKF was originally designed for state estimation applied to nonlinear ontrol appli ations
requiring full-state feedba k [24, 22, 23. We provide an example for a double inverted pendulum
ontrol system. In addition, we provide a new appli ation example orresponding to noisy time-series
estimation with neural networks.
Double inverted pendulum.

A double inverted pendulum (See Figure 7.4) has states orresponding to art position and ve-
_ 1 ; _1 ; 2 ; _2 . The
lo ity, and top and bottom pendulum angle and angular velo ity, x = [x; x;
system parameters orrespond the length and mass of ea h pendulum, and the art mass, w =
[l1 ; l2 ; m1 ; m2 ; M . The dynami equations are
(M + m1 + m2 )x (m1 + 2m2 )l1 1 os 1 m2 l2 2 os 2 (7.64)
2 2
= u + (m1 + 2m2 )l1 _1 sin 1 + m2 l2 _2 sin 2 (7.65)
m1
(m1 + 2m2 )l1 x os 1 + 4( + m2 )l1 2 1 + 2m2 l1 l2 2 os(2 1 ) (7.66)
3
2
= (m1 + 2m2 )gl1 sin 1 + 2m2 l1 l2 _2 sin(2 1 ) (7.67)
4
m2 xl2 os 2 + 2m2 l1 l2 1 os(2 1 ) + m2 l2 2 2 (7.68)
3
2
= m2 gl2 sin 2 2m2 l1 l2 _1 sin(2 1 ) (7.69)
These ontinuous-time dynami s are dis retized with a sampling period of 0.02 se onds. The
pendulum is stabilized by applying a ontrol for e, u to the art. In this ase we use a state
dependent Ri atti Equation (SDRE) ontroller to stabilize the system 7 . A state estimator is run
outside the ontrol loop in order to ompare the EKF to UKF (i.e., the estimates states are not
7 An SDRE ontroller [5 is designed by formulating the dynami equations as x
k+1 = A(xk )xk + B(xk )uk Note,
this representation is not a linearization, but rather a reformulation of the nonlinear dynami s into a pseudo-linear
11
feedba k for ontrol for evaluation purposes). The observation orresponds to noisy measurements of
the art position, art velo ity, and angle of the top pendulum. This is a hallenging problem, as no
measurements are made for the bottom pendulum, as well as angular velo ity of the top pendulum.
For this experiment, the pendulum is initialized in a ja k-knife position (+25/-24 degrees) with a
art oset of .5 meters. The resulting state estimates are shown in Figure 7.5. Clearly the UKF
is better able to tra k the unobserved states 8 . If the estimated states are used for feedba k in
the ontrol loop, the UKF system is still able to stabilize the pendulum, while the EKF system
rashes. We will return to the double inverted pendulum problem later in this hapter for both
model estimation and dual estimation.
observed observed
4 10
5
2
cart position
cart velocity
0
0
5
2
10
4 15
0 50 100 150 0 50 100 150
unobserved unobserved
1 10
pendulum 1 velocity
pendulum 1 angle
0.5 5
0
0
0.5
5
1
1.5 10
2 15
0 50 100 150 0 50 100 150
observed unobserved
0.5 10
pendulum 2 velocity
pendulum 2 angle
0 0
5
true state
0.5 noisy observation 10
EKF estimate
UKF estimate 15
1 20
0 50 100 150 0 50 100 150
time time
Figure 7.5: State estimation for double inverted pendulum problem. Only three noisy states are observed: art
position, art velo ity and the angle of the top pendulum. [10dB SNR; alpha=1, beta=0, kappa=0
form. Based on this state-spa e representation, we design an optimal LQR ontroller, uk = R 1 BT (xk )P(xk )xk
K(xk )xk where P(xk ) is a solution of the standard Ri atti Equations using state-dependent matri es A(xk ) and
B(xk ). The pro edure is repeated at every time step at the urrent state xk and provides lo al asymptoti stability of
the plant [5. The approa h has been found to be far more robust than LQR ontrollers based on standard linearization
te hniques, and as well as many alternative \advan ed" nonlinear ontrol approa hes.
8 Note that if all 6 states are observed with noise, then the performan e of the EKF and UKF are omparable.
12
Estimation of MackeyGlass time series : EKF

5
clean
noisy
EKF
x(k)
0
5
200 210 220 230 240 250 260 270 280 290 300
k
Estimation of MackeyGlass time series : UKF
5
clean
noisy
UKF
x(k)
5
200 210 220 230 240 250 260 270 280 290 300
k
Estimation Error : EKF vs UKF on MackeyGlass
1
EKF
UKF
0.8
normalized MSE
0.6
0.4
0.2
0
0 100 200 300 400 500 600 700 800 900 1000
k
Figure 7.6: Estimation of Ma key-Glass time-series with the EKF and UKF using a known model. Bottom graph
shows omparison of estimation errors for omplete sequen e.
Noisy time-series estimation

In this example, the UKF is used to estimate an underlying lean time-series orrupted by additive
Gaussian white noise. The time-series used is the Ma key-Glass-30 haoti series [35, 28. The lean
times-series is rst modeled as a nonlinear autoregression

xk = f xk 1 ; :::xk M ; w + vk ; (7.70)
where the model f (parameterized by w) was approximated by training a feedforward neural network
on the lean sequen e. The residual error after onvergen e was taken to be the pro ess noise
varian e.
Next, white Gaussian noise was added to the lean Ma key-Glass series to generate a noisy
time-series yk = xk + nk . The orresponding state-spa e representation is given by:
xk+1 = F(xk ; w) + B vk
2 3 2 3 2 3
xk+1 2 f (xk ; : : : ; x3
k 2+1
M ; w) 3 1
6 xk 7 6 1 0 0 0 xk 7 607
6 7 6
.. = 66 .7 6 .
7 6 7
77 + 6 . 7 vk
. 0 .. 5 4 ..
6
4
7
5 4 40
.. 55 4 .. 5
.
xk M 0 0 1 0 xk M +1 0
13

yk = 1 0 0 xk + nk (7.71)
In the estimation problem, the noisy-time series yk is the only observed input to either the EKF
or UKF algorithms (both utilize the known neural network model). Figure 7.6 shows a sub-segment
of the estimates generated by both the EKF and the UKF (the original noisy time-series has a 3dB
SNR). The superior performan e of the UKF is learly visible.
7.3.2 The Uns ented Kalman Smoother
As has been dis ussed, the Kalman lter is a re ursive algorithm providing the onditional expe -
tation of the state xk given all observations Y0k up to the urrent time k . In ontrast, the Kalman
smoother, estimates the state given all observations past and future, Y0N , where N is the nal time.
Kalman smoothers are ommonly used for appli ations su h as traje tory planning, non ausal noise
redu tion, and the E-step in the EM-Algorithm [46, 12. A thorough treatment of the Kalman
smoother in the linear ase is given in [29. The basi idea is to run a Kalman lter forward in time
to estimate the mean and ovarian e (^xfk ,Pfk ) of the state given past data. A se ond Kalman lter
is then run ba kward in time to produ e a ba kward-time predi ted mean and ovarian e (^xkb ,Pkb )
given the future data. These two estimates are then ombined, produ ing the following smoothed
statisti s given all the data:
(Psk ) 1
= (Pfk ) 1 + (Pkb ) 1 (7.72)
x^sk = Psk [(Pkb ) 1 x^kb + (Pfk ) 1 x^fk : (7.73)
For the nonlinear ase, the EKF repla es the Kalman lter. The use of the EKF for the forward
lter is straightforward. However, implementation of the ba kward lter is a hieved by using the
following linearized ba kward-time system.
xk 1 = A 1 xk + A 1 Bvk (7.74)
(7.75)
i.e. the forward nonlinear dynami s are linearized, and then inverted for the ba kward model. A
linear Kalman lter is then applied.
Our proposed uns ented Kalman smoother (UKS), repla es the EKF with the UKF. In addition,
we onsider using a nonlinear ba kward model as well, either derived from rst prin iples, or by
training a ba kward predi tor using a neural network model, as illustrated for the time series ase
in Figure 7.7. The nonlinear ba kward model allows us to take full advantage of the UKF, whi h
requires no linearization step.
To illustrate performan e, we re onsider the noisy Ma key-Glass time-series problem of the pre-
vious se tion, as well as a se ond time series generated using haoti autoregressive neural network.
The table below ompares smoother performan e. In this ase, the network models are trained on
the lean time-series, and then tested on the noisy data using either the standard extended Kalman
smoother with linearized ba kward model (EKS1), an extended Kalman smoother with a se ond
nonlinear ba kward model (EKS2), and the uns ented Kalman smoother (UKS). The forward (F),
ba kward (B), and smoothed (S) estimation errors are reported. Again, the performan e benets of
the uns ented approa h is lear.
14
7.4. UKF PARAMETER ESTIMATION
timeseries xk
xk L1 xk +1
Figure 7.7: Forward/ba kward neural network predi tion training.
Ma key-Glass Norm. MSE Chaoti AR-NN Norm. MSE

Algorithm F B S Algorithm F B S
EKS1 0.20 0.70 0.27 EKS1 0.35 0.32 0.28
EKS2 0.20 0.31 0.19 EKS2 0.35 0.22 0.23
UKS 0.10 0.24 0.08 UKS 0.23 0.21 0.16
7.4 UKF Parameter Estimation

Re all that parameter estimation involves learning a nonlinear mapping yk = G(xk ; w), where w
orresponds to the set of unknown parameters. G() may be a neural network, or other parame-
terized fun tion. The EKF may be used to estimate the parameters by writing a new state-spa e
representation
wk+1 = wk + rk (7.76)
dk = G(xk ; wk ) + ek : (7.77)
where wk orrespond to a stationary pro ess with identity state transition matrix, driven by pro ess
noise rk . The desired output dk orresponds to a nonlinear observation on wk . In the linear ase,
the relationship between the Kalman Filter (KF) and the popular Re ursive Least Squares (RLS) is
given in [45, 14. In the nonlinear ase, the EKF training orresponds to a modied-Newtons type
method [37 (see also Chapter 2).
From an optimization perspe tive, the following predi tion-error ost is minimized:
k
X
J (w) = (dt G(xt ; w))T (Re ) 1 (dt G(xt ; w)) (7.78)
t=1
Thus if the \noise" ovarian e Re is a onstant diagonal matrix, then, in fa t, it an els out of
the algorithm (this an be shown expli itly), and hen e an be set arbitrarily (e.g., Re = :5I).
Alternatively, Re an be set to spe ify a weighted MSE ost. The innovations ovarian e E [rk rTk =
Rrk , on the other hand, ae ts the onvergen e rate and tra king performan e. Roughly speaking,
the larger the ovarian e, the more qui kly older data is dis arded. There are several options on
how to hoose Rrk :
Set Rrk to an arbitrary \xed" diagonal value, whi h may then be \annealed" towards zero as
training ontinues.
Set Rrk = (RLS
1
1)Pwk , where RLS 2 (0; 1 is often referred to as the \forgetting fa tor", as
dened in the Re ursive Least Squares (RLS) algorithm [14. This provides for an approximate
15
exponentially de aying weighting on past data and is des ribed more fully in [37. Note, RLS
should not be onfused with used for sigma point al ulation.
T
Set Rrk = (1 )Rrk 1 + Kw k dk G(xk ; w
^ ) dk G(xk ; w ^ ) (Kwk ) , whi h is a Robbins-
T
Monro sto hasti approximation s heme for estimating the innovations [32. The method
assumes the ovarian e of the Kalman update model should be onsistent with the a tual
update model. Typi ally, Rrk is also onstrained to be a diagonal matrix, whi h implies an
independen e assumption on the parameters. Note that a similar update may also be used for
Rek .
Our experien e indi ates that the \Robbins-Monro" method provides the fastest rate of absolute
onvergen e and lowest nal MMSE values (see experiments in the next se tion). The \xed" Rrk
in ombination with annealing an also a hieve good nal MMSE performan e, but requires more
monitoring and a greater prior knowledge of the noise levels. For problems where the MMSE is
zero, the ovarian e should be lower bounded to prevent the algorithm from stalling and potential
numeri al problems. The \forgetting-fa tor" and \xed" Rrk methods are most appropriate for on-
line learning problems in whi h tra king of time varying parameters is ne essary. In this ase, the
parameter ovarian e stays lower bounded, allowing the most re ent data to be emphasized. This
leads to some misadjustment, but also keeps the Kalman gain su iently large to maintain good
tra king. In general, the various trade-os between these dierent approa hes is still an area of open
resear h.
The UKF represents an alternative to the EKF for parameter estimation. However, as the state-
transition fun tion is linear, the advantage of the UKF may not be as obvious. Note the observation
fun tion is still nonlinear. Furthermore, the EKF essentially builds up an approximation to the
expe ted Hessian by taking outer produ ts of the gradient. The UKF, however, may provide a
more a urate estimate through dire t approximation of the expe tation of the Hessian. While both
the EKF and UKF an be expe ted to a hieve similar nal MMSE performan e, their onvergen e
properties may dier. In addition, a distin t advantage of the UKF o urs when either the ar hite -
ture or error metri is su h that dierentiation with respe t to the parameters is not easily derived
as ne essary in the EKF. The UKF ee tively evaluates both the Ja obian and Hessian pre isely
through its sigma point propagation, without the need to perform any analyti dierentiation.
Spe i equations for UKF parameter estimation are given in Table 7.4. Simpli ations have
been made relative to the state UKF a ounting for the spe i form of the state-transition fun tion.
In Table 7.4 we have provided two options on how the fun tion output d^ k is a hieved. In the rst
option, the output is given as
2L
X
d^ k = Wi(m) Di;kjk 1 E [G(xk ; wk ) (7.92)
i=0
orresponding to the dire t interpretation of the UKF equations. The output is the expe ted value
(mean) of a fun tion of the random variable wk . In the se ond option, we have
d^ k = G(xk ; w
^k ) (7.93)
orresponding to the typi al interpretation, in whi h the output is the fun tion with the urrent
\best" set of parameters. This option yields onvergen e performan e that is indistinguishable from
the EKF. The rst option, however, has dierent onvergen e hara teristi s, and requires further
explanation. In the state-spa e approa h to parameter estimation, absolute onvergen e is a hieved
when the parameter ovarian e Pwk goes to zero (this also for es the Kalman gain to zero). At this
16
Initialize with:
w
^ 0 = E [w (7.79)
Pw0 = E [(w w^ 0 )(w w
^ 0 )T (7.80)
For k 2 f1; : : : ; 1g,
Time update and sigma point al ulation:
w
^k = w
^k 1 (7.81)
Pwk = Pwk 1 + Rrk 1 (7.82)
q q
W kjk 1 = w
^k w
^ k + Pwk w
^k P wk (7.83)
Dkjk 1 = G[xk ; W kjk 1 (7.84)
2L
X
option 1: d^ k = Wi(m) Di;kjk 1 (7.85)
i=0
option 2: d^ k = G(xk ; w
^k ) (7.86)
2L
X
Pd~ k d~ k = Wi( ) [Di;kjk 1 d^ k [Di;kjk 1 d^ k T + Rek (7.87)
i=0
X2L
Pwk dk = Wi( ) [Wi;kjk 1 ^ k [Di;kjk
w 1 d^ k T (7.88)
i=0
Kk = Pw d Pd~ 1d~ k k
k k
(7.89)
w ^ k + Kk (dk d^ k )
^k = w (7.90)
Pw = Pw Kk Pd~ d~ KkT
k k k k
(7.91)
p
where = (L + ), = omposite s aling parameter, L=dimension of the state, Rr =pro ess noise
ov., Re =measurement noise ov., Wi =weights as al ulated in Eqn. 7.34.
Table 7.4.1: UKF - Parameter Estimation
point, the output for either option is identi al. However, prior to this, the nite ovarian e provides
a form of averaging on the output of the fun tion, whi h in turn prevents the parameters from going
to the minimum of the error surfa e. Thus the method may help avoid falling into lo al minimum.
Furthermore, it provides a form of built in regularization for short or noisy data sets that are prone
to overtting (exa t spe i ation of the level of regularization requires further study).
Note that the omplexity of the UKF algorithm is still order L3 (L is the number of parameters),
due to the need to ompute a matrix square-root at ea h time step. An order L2 omplexity (same
as the EKF) an be a hieved by using a re ursive square-root formulation as given in Appendix B.
17
1
Learning Curves : MacKay RobotArm NN parameter estimation
10
EKF
training set error : MSE
UKF
2
10
3
10
5 10 15 20 25 30 35 40
epochs
0
Learning Curves : Ikeda NN parameter estimation
10
EKF
UKF
1
10
2 4 6 8 10 12 14 16 18 20
epochs
Figure 7.8: (top) Ma Kay-Robot-Arm problem : omparison of learning urves for the EKF and UKF training,
2-12-2 MLP, 'annealing' noise estimation. (bottom) Ikeda haoti time series : omparison of learning urves for
the EKF and UKF training, 10-7-1 MLP, 'Robbins-Monro' noise estimation.
7.4.1 Parameter Estimation Examples
We have performed a number of experiments to illustrate the performan e of the UKF parameter
estimation approa h. The rst set of experiments orrespond to ben hmark problems for neural
network training, and serve to illustrate some of the dieren es between the EKF and UKF, as well
as the dierent options dis ussed above. Two parametri optimization problems are also in luded,
orresponding to model estimation of the double pendulum, and the ben hmark \Rosenbro k's
Banana" optimization problem.
Ben hmark NN regression and time-series problems

The Ma kay-Robot-Arm dataset [33, 34 and the Ikeda haoti time series [17 are used as ben hmark
problems to ompare neural network training. Figure 7.8 illustrates the dieren es in learning urves
for the EKF versus UKF (option 1). Note the slightly lower nal MSE performan e of the UKF
weight training. If option 2 for the UKF output is used (see Equation 7.86), then the learning
urves for the EKF and UKF are indistinguishable; this has been found to be onsistent with all
experiments, thus we will not show expli it learning urves for the UKF with option 2.
Figure 7.9 illustrates performan e dieren es based on the hoi e of pro ess noise ovarian e Rrk .
18
The Ma key-Glass and Ikeda time-series are used. The plots show only omparisons for the UKF
(dieren es are similar for the EKF). In general the Robbins-Monro method is the most robust
approa h with the fastest rate of onvergen e. In some examples, we have seen faster onvergen e
with the \annealed" approa h, however, this also requires additional insight and heuristi methods
to monitor the learning. We should re-iterate that the \xed" and \lambda" approa hes are more
appropriate for on-line tra king problems.
0
Learning Curves : Ikeda NN parameter estimation
10
fixed
lambda
anneal
RobbinsMonro
1
10
2 4 6 8 10 12 14 16 18 20
epochs
Learning Curves : MackeyGlass NN parameter estimation
fixed
0 lambda
10 anneal
RobbinsMonro
2
10
4
10
2 4 6 8 10 12 14 16 18 20
epochs
Figure 7.9: Neural network parameter estimation using dierent methods for noise estimation. (top) Ikeda
haoti time series. (bottom) Ma key-Glass haoti time series [UKF ; = 1e 4 = 2 = 3 n ; n = state
dimension
Four Regions Classi ation

In the next example, we onsider a ben hmark pattern lassi ation problem having four interlo king
regions [47. A three-layer feedforward network (MLP) with 2-10-10-4 nodes is trained using inputs
randomly drawn within the pattern spa e, S = [ 1; 1 [1; 1, with the desired output value of
+0:8 if the pattern fell within the assigned region and 0:8 otherwise. Figure 7.10 illustrates the
lassi ation task, learning urves for the UKF and EKF, and the nal lassi ation regions. For the
learning urve, ea h epo h represents 100 randomly drawn input samples. The test set evaluated on
ea h epo h orresponds to a uniform grid of 10,000 points. Again, we see the superior performan e
for the UKF.
19
True Mapping Learning Curves on Test Set

1 0.7
EKF
0.6 UKF
averaged RMSE
0.5
0.5
x2
0
0.4
0.5
0.3
1 0.2
1 0.5 0 0.5 1 0 50 100 150 200
x1 epochs
NN Classification : EKF trained NN Classification : UKF trained
1 1
0.5 0.5
x2
x2
0 0
0.5 0.5
1 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1
x1 x1
Figure 7.10: Singhal and Wu's Four Region lassi ation problem. UKF settings: [ = 1e 4 =2=3 n
; n = state dimension ; 2-10-10-4 MLP ; Robbins-Monro ; 1 epo h = 100 random examples.
20
0
Inverted Double Pendulum : parameter estimation
10
EKF
UKF
model MSE
5
10
5 10 15 20 25 30 35 40
iteration
Figure 7.11: Inverted double pendulum parameter estimation. UKF settings [ = 1e 4 =2=3 n;n
= state dimension ; Robbins-Monro
Double inverted pendulum

Returning to the double inverted pendulum (Se tion 7.3.1), we onsider learning the system param-
eters, w = [l1 ; l2 ; m1 ; m2 ; M . These parameter values are treated as unknown (all initialized to 1:0).
The full state, x = [x; x; _ 1 ; _1 ; 2 ; _2 , is observed. Figure 7.11 shows the total model MSE versus
iteration omparing the EKF to UKF. Ea h iteration represents a pendulum rash with dierent
initial onditions for the state (no ontrol is applied). The nal onverged parameter estimates are
given below:
l1 l2 m1 m2 M
True model 0.50 0.75 0.75 0.50 1.50
UKF Estimate 0.50 0.75 0.75 0.50 1.49
EKF Estimate 0.50 0.75 0.68 0.45 1.35
In this ase, the EKF has onverged to a biased solution, possibly orresponding to a lo al minimum
in the error surfa e.
Rosenbro k's Banana fun tion

For the last parameter estimation example, we turn to a pure optimization problem. The Banana
fun tion [43 an be thought of as a 2-dimensional surfa e whi h has a saddle-like urvature that
bends around the origin. Spe i ally, we wish to nd the values of x1 and x2 that minimizes the
fun tion
f (x1 ; x2 ) = 100(x2 x21 )2 + (1 x1 )2 : (7.94)
The true minimum is at x1 = 1 and x2 = 1. The Banana fun tion is a well known test problem used
to ompare the onvergen e rates of ompeting minimization te hniques.
In order to use the UKF or EKF, the basi parameter estimation equations need to be refor-
mulated to minimize a non-MSE ost fun tion. To do this we write the state-spa e equations in
observed-error form [42:
wk = wk + rk
1 (7.95)
0 = e k + ek : (7.96)
21
where the target \observation" is xed at zero, and ek is an error term resulting in the optimization of
the sum of instantaneous osts Jk = eTk ek . The MSE ost is optimized by setting ek = dk G(xk ; wk ).
However, arbitrary osts (e.g., ross-entropy) an also be minimized simply by spe ifying ek appro-
priately. Further dis ussion of this approa h was given in Chapter 5 of this book. Reformulation
of the UKF equations requires hanging only the ee tive output to be ek , and setting the desired
response to zero.
For the example at hand, we set ek = [10(x2 x1 ) (1 x1 )T . Furthermore, sin e this opti-
mization problem is a spe ial ase of 'noiseless' parameter estimation where the a tual error an be
minimized to zero, we make use of Equation 7.93 (Option 2) to al ulate the output of the UKF
algorithm. This will allow the UKF to rea h the true minimum of the error surfa e more rapidly9 .
We also set the s aling parameter to a small value, whi h we have found to be appropriate again
for zero MSE problems. Under these ir umstan es the performan e of the UKF and EKF is in-
distinguishable as illustrated in Figure 7.12. Overall, the the performan e of the two lters are
omparable or superior to a number of alternative optimization approa hes (e.g., Davidon-Flet her-
Powell, Levenberg-Marquardt, et . See \optdemo" in MATLAB). The main purpose of this example
was to illustrate the versatility of the UKF to general optimization problems.
20
Function Value
10
EKF
UKF
0
10
f(X)
20
10
40
10
1 2 3 4 5 6 7
k
20
Model Error
10
EKF
UKF
0
10
MSE
20
10
40
10
1 2 3 4 5 6 7
k
Figure 7.12: Rosenbro k's 'Banana' optimization problem. UKF settings: [ = 1e 4 =2=3 n;n=
state dimension ; Fixed
9
Note that the use of Option 1, where the expe ted value of the fun tion is used as the output, essentially involves
averaging of the output based on the urrent parameter ovarian e. This slows onvergen e in the ase where zero
MSE is possible sin e onvergen e of the state ovarian e to zero would also be ne essary through proper annealing
of the state noise innovations Rr .
22
7.5. UKF DUAL ESTIMATION
7.5 UKF Dual Estimation

Re all that the dual estimation problem onsists of simultaneously estimating the lean state xk and
the model parameters w from the noisy data yk (see Equation 7.7). A number of algorithmi ap-
proa hes exist for this problem, in luding Joint and Dual EKF methods (Re ursive-Predi tion-Error
and Maximum-likelihood versions), and Expe tation-Maximization (EM) approa hes. A thorough
overage of these algorithms is given in Chapter 5 and 6 of this text. In this se tion, we present
results for the Dual UKF (predi tion-error) and Joint UKF methods.
In the the dual extended Kalman lter [52, a separate state-spa e representation is used for
the signal and the weights. Two EKFs are run simultaneously for signal and weight estimation.
At every time-step, the urrent estimate of the weights is used in the signal-lter, and the urrent
estimate of the signal-state is used in the weight-lter. In the dual UKF algorithm, both state- and
weight-estimation are done with the UKF.
In the joint extended Kalman lter [36, the signal-state and weight ve tors are on atenated
into a single, joint state ve tor: [xTk wkT T . Estimation is done re ursively by writing the state-spa e
equations for the joint state as:

xk+1 F(xk ; uk ; wk ) + B vk
wk+1 = I wk rk (7.97)

xk
yk = 1 0 0 w + nk ; (7.98)
k
and running an EKF on the joint state-spa e to produ e simultaneous estimates of the states xk
and w. Again, our approa h is to use the UKF instead of the EKF.
7.5.1 Dual Estimation Experiments
Noisy Time-Series
We present results on two time-series to provide a lear illustration of the use of the UKF over the
EKF. The rst series is again the Ma key-Glass-30 haoti series with additive noise (SNR 3dB).
The se ond time-series (also haoti ) omes from an autoregressive neural network with random
weights driven by Gaussian pro ess noise and also orrupted by additive white Gaussian noise (SNR
3dB). A standard 6-10-1 MLP with tanh hidden a tivation fun tions and a linear output layer
was used for all the lters in the Ma key-Glass problem. A 5-3-1 MLP was used for the se ond
problem. The pro ess and measurement noise varian es asso iated with the state were assumed to
be known. Note that in ontrast to the state estimation example in the previous se tion, only the
noisy time-series is observed. A lean referen e is never provided for training.
Example training urves for the dierent dual and joint Kalman based estimation methods are
shown in Figure 7.13. A nal estimate for the Ma key-Glass series is also shown for the Dual UKF.
The superior performan e of the UKF based algorithms are lear.
23
0.55
Chaotic AR neural network
0.5
Dual UKF
Dual EKF
Joint UKF
0.45 Joint EKF
normalized MSE
0.4
0.35
0.3
0.25
epoch
0.2
0 5 10 15 20 25 30
0.7
MackeyGlass chaotic time series
Dual EKF
Dual UKF
0.6 Joint EKF
Joint UKF
0.5
normalized MSE
0.4
0.3
0.2
0.1
epoch
0
5 10 15 20 25 30 35 40 45 50 55 60
Estimation of MackeyGlass time series : Dual UKF

5
clean
noisy
Dual UKF
x(k)
5
200 210 220 230 240 250 260 270 280 290 300
k
Figure 7.13: Comparative learning urves and results for the dual estimation experiments. Curves are averaged
over 10 and 3 runs respe tively using dierent initial weights. \Fixed" innovation ovarian es are used in the
joint algorithms. \Annealed" ovarian es are used for the weight lter in the dual algorithms.
24
Mode estimation
This example illustrates the use of the joint UKF for estimating the modes of a mass and spring sys-
tem (see Figure 7.14). This work was performed at the University of Washington by Mark Campbell
and Shelby Brunke. While, the system is linear, dire t estimation of the natural frequen ies, !1 and
!2 , jointly with the states is a nonlinear estimation problem. Figure 7.15 ompares the performan e
of the EKF and the UKF. Note that the EKF does not onverge to the true value for !2 . For
this experiment, the input pro ess noise SNR is approximately 100 dB, and the measured positions
y1 and y2 have additive noise at a 60 db SNR (these settings ee tively turn the task into a pure
parameter estimation problem). A xed innovations Rr was used for the parameter estimation in
the joint algorithms. Sampling was done at Nyquist (based on !2 ), whi h emphasizes the ee t of
linearization in the EKF. For faster sampling rates the performan e of the EKF and UKF be ome
more similar.
u1 = N (0,1) u2 = N (0,1)
x1 0 1 0 0 x1

x 211 0
2
0 x1
1 = 1
x2 0 0 0 1 x2

x2 0 0 22 2 22 x2
y1 (pos) y2 (pos)
Figure 7.14: Mass and Spring system.
20
19.5
(rad/s)
19
1
18.5 EKF
UKF
Actual
18
0 1 2 3 4 5 6 7 8 9 10
54
EKF
53.5 UKF
53 Actual
(rad/s)
52.5
52
2
51.5
51
50.5
0 1 2 3 4 5 6 7 8 9 10
Time (s)
Figure 7.15: linear mode predi tion
25
F15 ight simulation

In this example (also performed at the University of Washington), joint estimation is done on an
F15 air raft model [4. The simulation in ludes vehi le nonlinear dynami s, engine and sensor noise
modeling, as well as atmospheri modeling (densities, pressure, et .) based on look-up tables. Also
in orporated are aerodynami for es based on data from Wright Patterson AFB. A losed loop
system using a gain s heduled TECS ontroller is used to ontrol the model [10. A simulated
mission was used to test the UKF estimator and involved a qui k des ent, short ta ti al run, 180
deg turn and as ent with a possible failure in the stabilitator (horizontal ontrol surfa e on the tail
of the air raft). Measurements onsisted of the states with additive noise (20 dB SNR). Turbulen e
was approximately 1m=s RMS. During the mission the joint UKF estimated the 12 state (positions,
orientations, and their derivatives) as well as parameters orresponding to aerodynami for es and
moments. This was done \o-line", i.e., the estimated states were not used within the ontrol
loop. Illustrative results are shown in Figure 7.16 for estimation of the altitude, velo ity, and lift
parameter (overall lift for e on the air raft). The left olumn shows the mission without a failure.
The right olumn in ludes a 50% stabilitator failure at 65 se onds. Note, even with this failure
the UKF is still apable of tra king the state and parameters. It should be pointed out that the
\bla k-box" nature of the simulator was not ondu ive to taking Ja obians ne essary for running
the EKF. Hen e, implementation the EKF for omparison was not performed.
650 650
Velocity Estimation
Velocity Estimation
Actual Actual
UKF UKF
600 600 (no failure)
550 550
500 500
0 20 40 60 80 100 0 20 40 60 80 100
Altitude Estimation
Altitude Estimation
8000 8000
7800 7800
7600 7600
20 40 60 80 100 20 40 60 80 100
5 5
x 10 x 10
2 2
Lift Estimation
Lift Estimation
1.5 1.5
1 1
0.5 0.5
0 0
50 60 70 80 90 100 50 60 70 80 90 100
Time (sec) Time (sec)
Figure 7.16: F15 model joint estimation (Note that the estimated and true values of the state are indistinguish-
able at this resolution)
26
Double Inverted Pendulum

For the nal dual estimation example, we again onsider the double inverted pendulum, but this time
we estimate both the states and system parameters using the joint UKF. Observations orrespond
to noisy measurements of the 6 states. Estimated states are then fed ba k for losed-loop ontrol.
In addition, parameter estimates are used at every time step to design the ontroller using the
SDRE approa h. Figure 7.17 illustrates performan e of this adaptive ontrol system by showing the
evolution of the estimated and a tual states. At the start of the simulation both the states and
parameters are unknown (the ontrol system is unstable at this point). However, within one trial,
the UKF enables onvergen e and stabilization of the pendulum without a single rash.
1
angle estimates (rad)
angle 1
0.5
angle 2
0
0.5 true states

noisy observations
UKF estimates
1
0 0.5 1 1.5 2
time (s)
2
true model parameters
parameters estimates
UKF estimates
1.5
cart mass
1
pendulum 2 length & mass
0.5
pendulum 1 length & mass
0
0 0.5 1 1.5 2
time (s)
Figure 7.17: Double Inverted Pendulum joint estimation. Estimated states and parameters. Only angle 1 and
angle 2 are plotted (in radians).
27
7.6. THE UNSCENTED PARTICLE FILTER
7.6 The Uns ented Parti le Filter

The parti le lter is a sequential Monte Carlo based method that allows for a omplete representation
of the state distribution using sequential importan e sampling and resampling [8, 9, 13. Whereas
the standard EKF and UKF make a Gaussian assumption to simplify the optimal re ursive Bayesian
estimation (see Se tion 7.2), parti le lters make no assumptions on the form of the probability den-
sities in question, i.e., full nonlinear, non-Gaussian estimation. In this se tion, we present a method
that utilizes the UKF to augment and improve the standard parti le lter, spe i ally through
generation of the importan e proposal distribution. This hapter will review only the ba kground
fundamentals ne essary to introdu e parti le ltering, and the extension based on the UKF. The
material is based on work done by van der Merwe, de Freitas, Dou et and Wan in [49, whi h also
provides a more thorough review and treatment of parti le lters in general.
Monte Carlo simulation and sequential importan e sampling

Parti le ltering is based on Monte Carlo simulation with sequential importan e sampling (SIS). The
overall goal is to dire tly implement optimal Bayesian estimation (see Equations 7.9, 7.10 and 7.11)
by re ursively approximating the omplete posterior state density. In Monte Carlo simulation, a set
of weighted parti les (samples), drawn from the posterior distribution, is used to map integrals to
dis rete sums. More pre isely, the posterior ltering density an be approximated by the following
empiri al estimate
N
1 X
pb(xk jY0 ) =
k
(xk xk (i) )
N i=1
where the random samples fxk ; i = 1; : : : ; N g, are drawn from p(xk jY0k ) and () denotes the Dira
(i)
delta fun tion. The posterior ltering density, p(xk jY0k ), is a marginal of the full posterior density
given by p(Xk0 jY0k ). Consequently, any expe tations of the form
Z

E g(xk ) = g(xk )p(xk jY0k )dxk (7.99)
may be approximated by the following estimate

N
X
N1

E g(xk ) g(x(ki) ) (7.100)
i=1
For example, letting g (x) = x yields the optimal MMSE estimate x^ k = E [xk jY0k . The parti les x(ki)
are assumed to be independent and identi ally distributed (i.i.d.) for the approximation to hold.
As N goes to innity, the estimate onverges to the true expe tation almost surely. Sampling from
the ltering posterior is only a spe ial ase of Monte Carlo simulation whi h in general deals with
the omplete posterior density, p(Xk0 jY0k ). We will use this more general form to derive the parti le
lter algorithm.
It is often impossible to sample dire tly from the posterior density fun tion. However, we an
ir umvent this di ulty by making use of importan e sampling and alternatively sampling from
a known proposal distribution q (Xk0 jY0k ). The exa t form of this distribution is a riti al design
issue and is usually hosen in order to fa ilitate easy sampling. The details of this is dis ussed later.
Given this proposal distribution we an make use of the following substitution,
28
Z
p(Xk0 jY0k ) k k k
E gk (X0 ) k
= gk (Xk0 ) q(X jY )dX0
q(Xk0 jY0k ) 0 0
Z
p(Yk jXk )p(Xk )
= gk (Xk0 ) 0k 0 k k0 q(Xk0 jY0k )dXk0
p(Y0 )q(X0 jY0 )
Z
w (Xk )
= gk (Xk0 ) k k0 q(Xk0 jY0k )dXk0
p(Y0 )
where the variables wk (Xk0 ) are known as the unnormalized importan e weights
p(Y0k jXk0 )p(Xk0 )
wk = : (7.101)
q(Xk0 jY0k )
We an get rid of the unknown normalizing density p(Y0k ) as follows
Z
1
E gk (Xk0 ) = gk (Xk0 )wk (Xk0 )q(Xk0 jY0k )dXk0
p(Y0k )
R
g (Xk )w (Xk )q(Xk jYk )dXk
= R k 0 k 0 q(X0k jYk0) 0
p(Y0k jXk0 )p(Xk0 ) q(X0k0 jY00k ) dXk0
R
gkR(Xk0 )wk (Xk0 )q(Xk0 jY0k )dXk0
=
wk (Xk0 )q(Xk0 jY0k )dXk0

E q (jY0k ) wk (Xk0 )gk (Xk0 )
= ;
E q (jY0k ) wk (Xk0 )
where the notation E q(jY0k ) has been used to emphasize that the expe tations are taken over the
proposal distribution q (jY0k ).
A sequential update to the importan e weights is a hieved by expanding the proposal distribution
as q (Xk0 jY0k ) = q (X0k 1 jY0k 1 )q (xk jX0k 1 ; Y0k ), where we are making the assumption that the urrent
state is not dependent on future observations. Furthermore, under our assumption that the states
orresponds to a Markov pro ess and that the observations are onditionally independent given the
states, we an arrive at the re ursive update:
p(yk jxk )p(xk jxk 1 )
wk = w k : (7.102)
q(xk jX0k 1 ; Y0k )
1
Equation (7.102) provides a me hanism to sequentially update the importan e weights given an
appropriate hoi e of proposal distribution, q (xk jX0k 1 ; Y0k ). Sin e we an sample from the proposal
distribution and evaluate the likelihood p(yk jxk ) and transition probabilities p(xk jxk 1 ), all we
need to do is generate a prior set of samples and iteratively ompute the importan e weights. This
pro edure then allows us to evaluate the expe tations of interest by the following estimate
P
i=1 g(x0:k )wk (x0:k )
(i) (i) N
1=N N X
E g(X0 )k
PN = g(x(0:i)k )wek (x(0:i)k ) (7.103)
1=N i=1 wk (x(0:i)k ) i=1
P
j =1 wk and x0:k denotes the i'th sample
where the normalized importan e weights wek(i) = wk(i) = N (j ) (i)
traje tory drawn from the proposal distribution, q (xk jX0 ; Y0k ). This estimate asymptoti ally
k 1
29
onverges if the expe tation and varian e of g(Xk0 ) and wk exist and are bounded, and if the support
of the proposal distribution in ludes the support of the posterior distribution. Thus, as N tends
to innity, the posterior density fun tion an be approximated arbitrarily well by the point-mass
estimate
N
X
pb(Xk0 jY0k ) = wek(i) (Xk0 x(0:i)k ) (7.104)
i=1
and the posterior ltering density by,

N
X
pb(xk jY0k ) = wek(i) (xk x(ki) ) : (7.105)
i=1
In the ase of ltering we do not need to keep the whole history of the sample traje tories, in that
only the urrent set of samples at time k is needed to al ulate expe tations of the form given in
Equations 7.99 and 7.100. To do this we simply set, g(Xk0 ) = g(xk ). These point-mass estimates
an approximate any general distribution arbitrarily well, limited only by the number of parti les
used and how well the above mentioned importan e sampling onditions are met. In ontrast, the
posterior distribution al ulated by the EKF is a minimum varian e Gaussian approximation to the
true distribution, whi h inherently annot apture omplex stru ture like multi-modalities, skewness
or other higher order moments.
Resampling and MCMC step

The sequential importan e sampling (SIS) algorithm dis ussed so far has a serious limitation: the
varian e of the importan e weights in reases sto hasti ally over time. Typi ally, after a few itera-
tions, one of the normalized importan e weights tends to 1, while the remaining weights tend to zero.
A large number of samples are thus ee tively removed from the sample set be ause their impor-
tan e weights be ome numeri ally insigni ant. To avoid this degenera y, a resampling or sele tion
stage may be used to eliminate samples with low importan e weights and multiply samples with
high importan e weights. This is often followed by a Markov Chain Monte-Carlo (MCMC) move
step whi h introdu es sample variety without ae ting the posterior distribution they represent.
P
A sele tion s heme asso iates to ea h parti le x(ki) a number of \ hildren", Ni , su h that N i=1 Ni =
N . Several sele tion s hemes have been proposed in the literature, in luding sampling-importan e
resampling (SIR) [11, 44, 48, residual resampling [15, 31 and minimum varian e sampling [9.
Sampling-importan e resampling (SIR) involves mapping the Dira random measure fx(ki) ; wek(i) g
into an equally weighted random measure fx(kj ) ; N 1 g. In other words, we produ e N new samples
all with equal weighting 1=N . This an be a omplished by sampling uniformly from the dis rete
set fx(ki) ; i = 1; : : : ; N g with probabilities fwek(i) ; i = 1; : : : ; N g. Figure 7.18 gives a graphi al repre-
sentation of this pro ess. This pro edure ee tively repli ates the original x(ki) parti le Ni times (Ni
may be zero).
In Residual resampling [15, 31, a two step pro ess is used whi h makes use of SIR. j In thek
rst step, the number of hildren are deterministi ly set using the oor fun tion, NiA = N wet(i) .
Ea h x(ki) parti le is repli ated NiA times. In the se ond step, SIR is used to sele t the remaining
Nt = N
PN 0(i) = N 1 we(i) N N A . These samples form
i=1 Ni samples, with new weights wt
A
t t i
PN
a se ond set, Ni , su h that N t = i=1 Ni , and are drawn as des ribed previously. The total
B B
30
cdf
1 sampling
N 1
Index
i
w t( j )
j resampled index p(i)
Figure 7.18: Resampling pro ess, whereby a random measure fx(ki) ; wek(i) g is mapped into an equally weighted
random measure fx(kj ) ; N 1 g. The index i is drawn from a uniform distribution.
number of hildren of ea h parti le is then set to Ni = NiA + NiB . This pro edure is omputationally
heaper than pure SIR and also has lower sample varian e. Thus residual resampling is used for
all experiments in Se tion 7.6.2 (in general we we have found that the spe i hoi e of resampling
s heme does not signi antly ae t the performan e of the parti le lter).
After the sele tion/resampling step at time k , we obtain N parti les distributed marginally
approximately a ording to the posterior distribution. Sin e the sele tion step favors the reation
of multiple opies of the \ttest" parti les, many parti les may end up having no hildren (Ni = 0),
whereas others might end up having a large number of hildren, the extreme ase being Ni = N for
a parti ular value i. In this ase, there is a severe depletion of samples. Therefore, and additional
pro udeure is often required to introdu e sample variety after the sele tion step without ae ting
the validity of the approximation they infer. This is a hieved by performing a single MCMC step
on ea h parti le. The basi idea is that if the parti les are already distributed a ording to the
posterior p(xk jY0k ) (whi h is the ase), then applying a Markov hain transition kernel with the
same invariant distribution to ea h parti le results in a set of new parti les distributed a ording
to the posterior of interest. However, the new parti les may move to more interesting areas of the
state-spa e. Details on the MCMC step are given in [49. For our experiments in Se tion 7.6.2 we
found the need for a MCMC step to be unne essary. However, this annot be assumed in general.
7.6.1 The Parti le Filter Algorithm
The pseudo- ode of a generi parti le lter is presented in Algorithm 7.6.1. In implementing this
algorithm, the hoi e of the proposal distribution q (xk jX0k 1 ; Y0k ) is the most riti al design issue.
The optimal proposal distribution (whi h minimizes the varian e on the importan e weights) is given
by [27, 30, 55,
q(xk jX0k 1 ; Y0k ) = p(xk jX0k 1 ; Y0k ); (7.106)
i.e., the true onditional state density given the previous state history and all observations. Sampling
from this is, of ourse, impra ti al for arbitrary densities (re all the motivation for using importan e
sampling in the rst pla e). Consequently the transition prior is the most popular hoi e of proposal
31
Prior Likelihood
.
Figure 7.19: In luding the most urrent observation into the proposal distribution, allows us to move the samples
in the prior to regions of high likelihood. This is of paramount importan e if the likelihood happens to lie in one
of the tails of the prior distribution, or if it is too narrow (low measurement error).
distribution [1, 2, 13, 18, 2610 ,

q(xk jX0k 1 ; Y0k ) $ p(xk jxk 1 ): (7.107)
For example, if an additive Gaussian pro ess noise model is used, the transition prior is simply,
p(xk jxk 1 ) = N F (xk 1 ; 0) ; Rvk

1 : (7.108)
The ee tiveness of this approximation depends on how lose the proposal distribution is to the
true posterior distribution. If there is not su ient overlap, only a few parti les will have signi ant
importan e weights when their likelihood are evaluated.
The EKF and UKF Parti le Filter

An improvement in the hoi e of proposal distribution over the simple transition prior, whi h also
address the problem of sample depletion, an be a omplished by moving the parti les towards the
regions of high likelihood, based on the most re ent observation yk (See Figure 7.19). An ee tive
approa h to a omplish this, is to use an EKF generated Gaussian approximation to the optimal
proposal, i.e.,
q(xk jX0k 1 ; Y0k ) $ qN (xk jY0k ); (7.110)
whi h is a omplished by using a separate EKF to generate and propagate a Gaussian proposal
distribution for ea h parti le, i.e.,

qN (x(ki) jY0k ) = N x (ki) ; P(ki) i = 1; : : : ; N: (7.111)
That is, at time k one uses the EKF equations, with the new data, to ompute the mean and
ovarian e of the importan e distribution for ea h parti le from the previous time step k 1. Next,
we redraw the i-th parti le (at time k ) from this new updated distribution. While still making
a Gaussian assumption, the approa h provides a better approximation to the optimal onditional
proposal distribution and has been shown to improve performan e on a number of appli ations [8, 6.
By repla ing the EKF with the UKF, we an more a urately propagate the mean and ovarian e
of the Gaussian approximation to the state distribution. Distributions generated by the UKF will
10 The notation $ denotes \ hosen as", to indi ate a subtle dieren e versus \approximation".
32
Generi Parti le Filter

1. Initialization: k = 0
For i = 1; : : : ; N; draw the states x(0i) from the prior p(x0 ).
2. For k = 1; 2; : : :
(a) Importan e sampling step
For i = 1; : : : ; N , sample x(ki) q (xk jx(0:i)k 1 ; Y0k )
For i = 1; : : : ; N , evaluate the importan e weights up to a normalizing onstant:

p(yk jx(ki) )p(x(ki) jx(ki) 1 )
wk(i) = wk(i) 1 (7.109)
q (x(ki) jx(0:i)k 1 ; Y0k )
For i = 1; : : : ; N , normalize the importan e weights:
X
N 1
(i) (i) (j )
wek = wk wk
j =1
(b) Sele tion step (resampling)

Multiply/Suppress samples x(ki) with high/low importan e weights wek(i) , respe tively, to obtain
N random samples x(ki) approximately distributed a ording to p(x(ki) jY0k ).
For i = 1; : : : ; N , set wk(i) = wek(i) = N1
( ) MCMC move step (optional)
(d) Output: The output of the algorithm is a set of samples that an be used to approximate the posterior
distribution as follows
XN
pb(xk jY0k ) = 1 xk x(ki)
N i=1
The optimal MMSE estimator is given as:
N
X
x^ k = E (xk jY0k ) N1 x(ki)
i=1
Similar expe tations of the fun tion g(xk ) an also be al ulated as a sample average.
Algorithm 7.6.1: The generi parti le lter.
have a greater support overlap with the true posterior distribution than the overlap a hieved by the
EKF estimates. In addition, s aling parameters used for sigma point sele tion an be optimized to
apture ertain hara teristi of the prior distribution if known, i.e., the algorithm an be modied
to work with distributions that have heavier tails than Gaussian distributions su h as Cau hy or
Student-t distributions. The new lter that results from using a UKF for proposal distribution
generation within a parti le lter framework is alled the Uns ented Parti le Filter (UPF). Referring
to Algorithm 7.6.1 for the generi parti le lter, the rst item in the importan e sampling step:
33
For i = 1; : : : ; N , sample x(ki) q(xk jx(0:i)k 1 ; Y0k )

is repla ed with the following UKF update:
For i = 1; : : : ; N :
{ Update the prior ( k 1) distribution for ea h parti le with the UKF:
q q
X (i)a
k 1 = x k(i)a1 x
k(i)a1 + P (i)a
k 1 x
k(i)a1 P (i)a
k 1 (7.112)
Propagate parti le into future (time update):
2L
X
Xk(jik)x 1 = F Xk(i)x1 ; uk ; Xk(i)v1 x (kij)k 1 = Wj(m) Xj;k
(i)x
jk 1 (7.113)
j =0
2L
X
P(kij)k 1 = Wj( )[Xj;k
(i)x
jk 1 x (kij)k 1 [Xj;k
(i)x
jk 1 x (kij)k 1 T (7.114)
j =0
2L
X
Y (kij)k 1 =h Xk(jik)x 1 ; Xk(i)n1 yk(ij)k 1 = Wj(m) Yj;k
(i)
jk 1 (7.115)
j =0
In orporate new observation (measurement update):
2L
X
Py~k y~k = Wj( ) [Yj;k
(i)
jk 1 yk(ij)k 1 [Yj;k
(i)
jk 1 yk(ij)k 1 T (7.116)
j =0
2L
X
Pxk yk = Wj( ) [Xj;k
(i)
jk 1 x (kij)k 1 [Yj;k
(i)
jk 1 yk(ij)k 1 T (7.117)
j =0
Kk = Pxk yk Py~k1y~k x (ki) = x (kij)k 1 + Kk (yk yk(ij)k 1 ) (7.118)

P(ki) = P(kij)k 1 Kk Py~k y~ k KTk (7.119)

{ Sample x(ki) q(x(ki) jx(0:i)k 1 ; Y0k ) N x (ki) ; P(ki)
All other steps in the parti le lter formulation remain un hanged.
7.6.2 UPF Experiments
The performan e of the Uns ented Parti le Filter is ompared on two estimation problems. The rst
problem is a syntheti s alar estimation problem and the se ond is a real world problem on erning
the pri ing of nan ial instruments.
Syntheti Experiment
For this experiment, a time-series was generated by the following pro ess model
xk+1 = 1 + sin(!t) + 1 xk + vk ; (7.120)
34
where vk is a Gamma G a(3; 2) random variable modeling the pro ess noise, and ! = 0:04 and
1 = 0:5 are s alar parameters. A non-stationary observation model,

yk = 2 x2k + nk t 30 (7.121)
3 xk 2 + nk t > 30
is used, with 2 = 0:2 and 3 = 0:5. The observation noise, nk , is drawn from a Gaussian distribution
N (0; 0:00001). Given only the noisy observations, yk , the dierent lters were used to estimate the
underlying lean state sequen e xk for k = 1 : : : 60. The experiment was repeated 100 times with
random re-initialization for ea h run. All of the parti le lters used 200 parti les and residual
resampling. The UKF parameters were set to = 1, = 0 and = 2. These parameters are
optimal for the s alar ase. Table 7.1 summarizes the performan e of the dierent lters. The table
shows the means and varian es of the mean-square-error (MSE) of the state estimates. Figure 7.20
ompares the estimates generated from a single run of the dierent parti le lters. The superior
performan e of the uns ented parti le lter (UPF) is learly evident.
Pri ing nan ial options

Derivatives are nan ial instruments whose value depends on some basi underlying ash produ t,
su h as interest rates, equity indi es, ommodities, foreign ex hange or bonds [16. A all option
allows the holder to buy a ash produ t, at a spe ied date in the future, for a pri e determined in
advan e. The pri e at whi h the option is exer ised is known as the strike pri e, while the date at
whi h the option lapses is often referred to as the maturity time. Put options, on the other hand,
allow the holder to sell the underlying ash produ t. In their seminal work [3, Bla k and S holes
derived the following industry standard equations for pri ing European all and put options
C = S N (d1 ) Xe rtm N (d2 ) (7.122)
P = S N ( d1 ) + Xe rtm N ( d2 ) (7.123)
where C denotes the pri e of a all option, P the pri e of a put option, S is the urrent value of the
underlying ash produ t, X the desired strike pri e, tm the time to maturity, N (:) is the umulative
Algorithm MSE
mean var
extended Kalman lter (EKF) 0.374 0.015
Uns ented Kalman Filter (UKF) 0.280 0.012
Parti le Filter : generi 0.424 0.053
Parti le Filter : MCMC move step 0.417 0.055
Parti le Filter : EKF proposal 0.310 0.016
Parti le Filter : EKF proposal and MCMC move step 0.307 0.015
Parti le Filter : UKF proposal (\Uns ented Parti le Filter ") 0.070 0.006
Parti le Filter : UKF proposal and MCMC move step 0.074 0.008
Table 7.1: State estimation experiment results. This plot shows the mean and varian e of the MSE al ulated
over 100 independent runs.
35
Filter estimates (posterior means) vs. True state

9
6
E[x(t)]
True x
PF estimate
2 PFEKF estimate
PFUKF estimate
1
0 10 20 30 40 50 60
Time
Figure 7.20: Plot of estimates generated by the dierent lters on the syntheti state estimation experiment.
normal distribution, and d1 and d2 are given by

ln(S=X ) + (r + 2 =2)tm
d1 = p
tm
d2 = d1 tm
p
where is the (unknown) volatility of the ash produ t, and r is the risk-free interest rate.
The volatility is usually estimated from a small moving window of data over the most re ent
50 to 180 days [16. The risk-free interest rate r is often estimated by monitoring interest rates in
the bond markets. Our approa h is to treat r and as the hidden states and C and P as the output
observations. S and tm are treated as known ontrol signals (input observations). This represents a
parameter estimation problem, with the nonlinear observation given by Equations 7.122 and 7.123.
This allows us to to ompute daily omplete probability distributions for r and and to de ide
whether the urrent value of an option in the market is being either over-pri ed or under-pri ed. See
[38 and [7 for details.
As an example, Figure 7.21 shows the implied probability density fun tion of ea h volatility
against several strike pri es using ve pairs of all and put option ontra ts on the British FTSE100
index (from February 1994 to De ember 1994). Figure 7.22 shows the estimated volatility and
interest rate for a ontra t with a strike pri e of 3225. In Table 7.2, we ompare the one-step-
ahead normalized square errors on a pair of options with strike pri e 2925. The square errors were
only measured over the last 100 days of trading, so as to allow the algorithms to onverge. The
experiment was repeated 100 times with 100 parti les in ea h parti le lter (the mean value is
reported; all varian e were essentially zero). In this example, both the EKF and UKF approa h to
improving the proposal distribution leads to a signi ant improvement over the standard parti le
lters. The main advantage of the UKF over the EKF is the ease of implementation, whi h avoids
the need to analyti ally dierentiate the Bla k S holes equations.
36
7.7. CONCLUSIONS
0.19
Implied volatility distributions at t=180

0.18
0.17
0.16
0.15
0.14
0.13
2900 2950 3000 3050 3100 3150 3200 3250 3300 3350 3400 3450
Strike prices
Figure 7.21: Probability smile for options on the FTSE-100 index (1994). Although the volatility smile indi ates
that the option with strike pri e equal to 3225 is under-pri ed, the shape of the probability gives us a warning
against the hypothesis that the option is under-pri ed. Posterior mean estimates obtained with Bla k-S holes
model and parti le lter [*, 4-th order polynomial t [| and hypothesized volatility [o.
Option type Algorithm mean NSE
Trivial 0.078
Extended Kalman Filter (EKF) 0.037
Uns ented Kalman Filter (UKF) 0.037
Call Parti le Filter : generi 0.037
Parti le Filter : EKF proposal 0.009
Uns ented Parti le Filter 0.009
Trivial 0.035
Extended Kalman Filter (EKF) 0.023
Uns ented Kalman Filter (UKF) 0.023
Put Parti le Filter : generi 0.023
Parti le Filter : EKF proposal 0.007
Uns ented Parti le Filter 0.008
Table 7.2: One-step-ahead normalized square errors over 100 runs. The trivial predi tion is obtained by assuming
that the pri e on the following day orresponds to the urrent pri e.
7.7 Con lusions

The EKF has been widely a epted as a standard tool in the ontrol and ma hine learning ommu-
nities. In this hapter, we have presented an alternative to the EKF using the uns ented Kalman
37
7.7. CONCLUSIONS
3
x 10
14
12
Interest rate
10
4
0 50 100 150 200 250
0.21
0.2
0.19
Volatility
0.18
0.17
0.16
0.15
0.14
0 50 100 150 200 250
Time (days)
Figure 7.22: Estimated interest rate and volatility.
lter. The UKF addresses many of the approximation issues of the EKF and onsistently a hieves
an equal or better level of performan e at a omparable level of omplexity. The performan e bene-
ts of the UKF based algorithms were demonstrated in a number of appli ation domains, in luding
state estimation, dual estimation, and parameter estimation.
There are a number of lear advantages to the UKF. Firstly, the mean and ovarian e of the state
estimate is al ulated to se ond order or better, as opposed to rst order in the EKF. This provides
for a more a urate implementation of the optimal re ursive estimation equations, whi h is the basis
for both the EKF and UKF. While equations spe ifying the UKF may appear more ompli ated than
the EKF, the a tual omputational omplexity is equivalent. For state estimation, both algorithms
are in general order L3 (where L is the dimension of the state). For parameter estimation, both
algorithms are order L2 (where L is the number of parameters). An e ient re ursive square-root
implementation (see Appendix B) was ne essary to a hieve the level of omplexity in the parameter-
estimation ase. Furthermore, a distin t advantage of the UKF is its ease of implementation. In
ontrast to the EKF, no analyti al derivatives (Ja obians or Hessians) need to be al ulated. The
utility of this is espe ially valuable in situations where the system is a 'bla k box' model in whi h the
internal dynami equations are unavailable. In order to apply an EKF to su h systems, derivatives
must be found either from a prin ipled analyti re-derivation of the system, or through ostly and
often ina urate numeri al methods (e.g., by perturbation). On the other hand, the UKF relies on
only fun tional evaluations (inputs and outputs) through the use deterministi ally drawn samples
from the prior distribution of the state random variable. From a oding perspe tive, this also allows
for a mu h more general and modular implementation.
Even though the UKF has lear advantages over the EKF, there are still a number of limitations.
As in the EKF, it makes a Gaussian assumption on the probability density of the state random vari-
able. Often this assumption is valid, and numerous real world appli ations have been su essfully
38
7.7. CONCLUSIONS
implemented based on this assumption. However, for ertain problems (e.g., multi-modal obje t
tra king), a Gaussian assumption will not su e, and the UKF (or EKF) annot be applied applied
with onden e. In su h examples, one has to resort to more powerful, but also more ompu-
tationally expensive, ltering paradigms su h as parti le lters (see Se tion 7.6. Finally, another
implementation limitation leading to some un ertainty, is the ne essity to hoose the three uns ented
transformation parameters (i.e., , and ). While we have attempted to provide some guidelines
on how to hoose these parameters, the optimal sele tion learly depends on the spe i s of the
problem at hand, and is not fully understood. In general, the hoi e of settings does not appear
riti al for state-estimation, but has a greater ae t on performan e and onvergen e properties for
for parameter-estimation. Our urrent work fo uses on addressing this issue through developing
a unied and adaptive way of al ulating the optimal value of these parameters. Other areas of
open resear h in lude utilizing the UKF for estimation of noise ovarian es, extension of the UKF
to re urrent ar hite tures that may require dynami derivatives (see Chapter 2 and 5), and the use
of the UKF and smoother in the Expe tation-Maximization algorithm (see Chapter 6). Clearly, we
have only begun to s rat h the surfa e of the numerous appli ations that an benet with use of the
UKF.
A knowledgements
This work was sponsored in part by NSF under grants ECS-0083106 and IRI-9712346, and DARPA
under grant F33615-98-C-3516.
Appendix A: A ura y of the Uns ented Transformation

In this se tion we show how the uns ented transformation a hieves 2nd order a ura y in the predi -
tion of the posterior mean and ovarian e of a random variable that undergoes a nonlinear transfor-
mation. For the purpose of this analysis we assume that all nonlinear transformations are analyti
a ross the domain of all possible values of x. This ondition implies that the nonlinear fun tion an
be expressed as a multi-dimensional Taylor series onsisting of an arbitrary number of terms. As
the number of terms in the sum tend to innity, the residual of the series tends to zero. This implies
that the series always onverges to the true value of the fun tion.
If we onsider the prior variable x as being perturbed about a mean x by a zero-mean disturban e
x with ovarian e Px , then the Taylor series expansion of the nonlinear transformation f (x) about
x is 1 (x r )n f (x)
X x
f (x) = f (x + x) = (7.124)
n=0
n ! x=x
If we dene the operator Dx f as
Dnx f , [(x rx )n f (x)x=x (7.125)
then the Taylor series expansion of the nonlinear transformation y = f (x) an be written as
1 1 1
y = f (x) = f (x) + Dx f + D2x f + D3x f + D4x f + : : : (7.126)
2 3! 4!
39
7.7. CONCLUSIONS
A ura y of the Mean

The true mean of y is given by
y = E [ y = E [ f (x) (7.127)

1 1 1
= E f (x) + Dx f + D2x f + D3x f + D4x f + : : : (7.128)
2 3! 4!
If we assume that x is a symmetri ally distributed11 random variable, then all odd-moments will be
zero. Also note that E [xxT = Px . Given this, the mean an be redu ed further to,

1 T 1 1
y = f (x) +
2
r Px r f (x) x=x + E D4x f + D6x f + : : :
4! 6!
(7.129)
The UT al ulates the posterior mean from the propagated sigma points using Equation 7.33.
The sigma points are given by
p
Xi = x (L + ) i
= x ~ i
P
where i denotes the i0 th olumn12 of the matrix square root of Px . This implies that Li=1 ( i Ti ) =
Px . Given this formulation of the sigma points, we an again write the propagation of ea h point
through the nonlinear fun tion as a Taylor series expansion about x ,
Yi = f (Xi )
1 1 1
= f (x) + D~ i f + D2~ i f + D3~ i f + D4~ i f + : : :
2 3! 4!
Using Equation 7.33, the UT predi ted mean is
2L
1 X 1 1 1
yUT = f (x) + f (x) + D~ i f + D2~ i f + D3~ i f + D4~ i f + : : :
L+ 2(L + ) i=1 2 3! 4!
2L
1 X 1 1 1
= f (x) + D~ i f + D2~ i f + D3~ i f + D4~ i f + : : :
2(L + ) i=1 2 3! 4!
Sin e the sigma points are symmetri ally distributed around x , all the odd moments are zero. This
results in the following simpli ation,
2L
1 X 1 2 1 1
yUT = f (x) + D~ i f + D4~ i f + D6~ i f + : : :
2(L + ) i=1 2 4! 6!
and sin e
" #
1 X2L
1 2 1 2L
X p p
D f = (rf )T L + i i L + T
(rf )
2(L + ) i=1 2 ~ i 2(L + ) i=1
" #
2L
L+ 1 X
= (rf )T i T
(rf )
2(L + ) 2 i=1
i
1
=
2
rT Px r f (x) x=x
11 This in ludes probability distributions like Gaussian, Student-T, et .
12 See Se tion 7.3 for details of exa tly how the sigma points are al ulated.
40
7.7. CONCLUSIONS
the UT predi ted mean an be further simplied to

2L
1 X
rT Px r f (x) x=x + 2(L 1+ ) 1 4 1

yUT = f (x) + D f + D6~ i f + : : : (7.130)
2 i=1
4! ~ i 6!
When we ompare the Equation 7.129 and Equation 7.130, we an learly see that the true posterior
mean and the mean al ulated by the UT agrees exa tly to the third order and that errors are only
introdu ed in the fourth and higher order terms. The magnitude of these errors depend on the
hoi e of the omposite s aling parameter as well as the higher order derivatives of f . In ontrast,
a linearization approa h al ulates the posterior mean as
yLIN = f (x); (7.131)
whi h only agrees with the true posterior mean up to the rst order. Julier and Uhlman shows in
[22 that on a term-by-term basis, the errors in the higher order terms of the UT is onsistently
smaller than those for linearization.
A ura y of the Covarian e

The true posterior ovarian e is given by

Py = E (y yT )(y yT )T = E yyT y y T (7.132)
where the expe tation is taken over the distribution of y. Substituting Equations 7.126 and 7.128
into Equation 7.132, and re alling that all odd moments of x are zero due to symmetry, we an
write the true posterior ovarian e as
1 n o
Py = Ax Px ATx 4
rT Px r f (x) rT Px r f (x) T x=x
2 3 2 3
1X
X 1 1 i j T 5 X1X
1 1 h iT
+E 4 D f D x f 4 E D2xi f E D2xj f 5(7.133)
i=1 j =1
i!j ! x i=1 j =1
(2i)!(2j )!
| {z } | {z }
6
i=j =1 6
i=j =1
where Ax is the Ja obian matrix of f (x) evaluated at x . It an be shown (using a similar approa h
as for the posterior mean) that posterior ovarian e al ulated by the UT is given by
1 n T o
(Py )UT = Ax Px ATx 4 0
r Px r f (x) rT Px r f (x) T x=x
1
1 X2L X1X 1 1 T
+ Di~ k f Dj~ k f A
2(L + ) k=1 i=1 j =1 i!j !
| {z }
6
i=j =1
2 3
1X
X 1 1 2L X
X 2L T
2j
4 D2~ik f D ~ m f 5 (7.134)
i=1 j =1
(2i)!(2j )!4(L + )2 k=1 m=1
| {z }
6
i=j =1
Comparing Equations 7.133 and 7.134 it is lear that the UT again al ulates the posterior ovarian e
a urately to the rst two terms with errors only introdu ed in the fourth and higher order moments.
41
7.7. CONCLUSIONS
Julier shows in [22 how the absolute term-by-term error of these higher order moments are again
onsistently smaller for the UT than for the linearized ase whi h trun ates the Taylor series after
the rst term, i.e.,
(Py )LIN = Ax Px ATx : (7.135)
For this derivation, we assumed the value of the parameter in the UT to be 0. If prior
knowledge about the shape of the prior distribution of x is known, an be set to a non-zero value
that minimizes the error in some of the higher ( 4) order moments. Julier shows in [21 how the
error in the kurtosis of the posterior distribution is minimized for a Gaussian x when = 2.
Appendix B: E ient Square-Root UKF Implementations

In the standard Kalman implementation, the state (or parameter) ovarian e Pk is re ursively
al ulated. The UKF requires taking the matrix square-root Sk STk = Pk , at ea h time step, whi h is
O(L3 =6) using a Cholesky fa torization. In the Square-Root UKF (SR-UKF), Sk will be propagated
dire tly, avoiding the need to refa torize at ea h time step. The algorithm will in general still be
O(L3 ) for state-estimation, but with improved numeri al propertied (e.g., guaranteed positive semi-
deniteness of the state- ovarian es) similar to those of standard square-root Kalman lters [45.
However, for the spe ial state-spa e formulation of parameter estimation, an O(L2 ) implementation
be omes possible (equivalent omplexity to EKF parameter estimation).
The square-root form of the UKF makes use of three powerful linear algebra te hniques13 , QR
de omposition, Cholesky fa tor updating and e ient least squares, whi h we brie y review below:
QR de omposition. The QR de omposition or fa torization of a matrix A 2 RLN is given
by, AT = QR, where Q 2 RN N is orthogonal, R 2 RN L is upper triangular and N L.
~ , is the transpose of the Cholesky fa tor of P = AAT ,
The upper triangular part of R, R
i.e., R = S , su h that R R = AAT . We use the shorthand notation qrfg to donate a
~ T ~ T ~
QR de omposition of a matrix where only R ~ is returned. The omputational omplexity of
a QR de omposition is O(NL ). Note that performing a Cholesky fa torization dire tly on
2
P = AAT is O(L3 =6) plus O(NL2 ) to form AAT .
Cholesky fa tor updating. If S is the original lower triangular Cholesky
p fa tor of P = AAT ,
then the Cholesky fa tor of the rank-1 update (or downdate) P uuT is denoted as S =
holupdatefS; u; g. If u is a matrix and not a ve tor, then the result is M onse utive
updates of the Cholesky fa tor using the M olumns of u. This algorithm (available in Matlab
as holupdate) is only O(L2 ) per update.
E ient least squares. The solution to the equation (AAT )x = AT b also orresponds to the
solution of the overdetermined least squares problem Ax = b. This an be solved e iently
using a QR de omposition with pivoting (implemented in Matlab's '/' operator).
The omplete spe i ations for the new square-root lters are given in Table 7.7.1 for state-
estimation and 7.7.2 for parameter-estimation. Below we des ribe the key parts of the square-root
algorithms, and how they ontrast with the standard implementations. Experimental results and
further dis ussion is presented in [50, 51.
13 See [40 for theoreti al and implementation details.
42
7.7. CONCLUSIONS
Initialize with:

x^ 0 = E [x0 S0 = hol E [(x0 x^ 0 )(x0 x^ 0 )T (7.136)
For k 2 f1; : : : ; 1g,
Sigma point al ulation and time update:
Xk 1 = [^xk 1 x^ k 1 + Sk x^ k 1 Sk (7.137)
X kjk 1 = F[X k 1 ; uk 1 (7.138)
2L
X
x^ k =
Wi(m)Xi;k (7.139)
jk 1
i=0
q p v
Sk = qr W1( ) X 1:2L;kjk 1 x^k R (7.140)
n o
Sk = holupdate Sk ; X0;k x^ k ; W0( ) (7.141)
(redraw sigma points) 14
X kjk 1 = x^k x^k + Sk x^k Sk (7.142)
Y kjk 1 = H[X kjk 1 (7.143)
2L
X
y^k = Wi(m) Yi;kjk 1 (7.144)
i=0

q
p
Sy~k = qr W1( ) [Y 1:2L;k y^k Rnk (7.145)
n o
Sy~k = holupdate Sy~k ; Y0;k y^k ; W0( ) (7.146)
2L
X
i=0
Kk = (Px y =STy~ )=Sy~
k k k k (7.148)
x^k = x^k + Kk (yk y^k )
U = Kk Sy~ k (7.149)

Sk = holupdate Sk ; U ; -1 (7.150)
p
where = (L + ), = omposite s aling parameter, L=dimension of the state, Rv =pro ess noise
ov., Rn =measurement noise ov., Wi =weights as al ulated in Eqn. 7.34.
Table 7.7.1: Square-Root UKF for state-estimation.
14
Here we have to redraw a new set of sigma points to in orporate the ee t of the additive pro ess noise. Alter-
natively, we ould augment the already propagated set of sigma points, X kjk 1 , with sigma points derived from the
matrix square root of the pro ess noise ovarian e. This will also require the sigma point weights to be re al ulated
as the ee tive number of sigma points has doubled. A possible advantage of this approa h would be that we are not
dis arding any odd-moment information aptured by the propagated sigma points as would be the ase if we redraw
them from Pk .
43
7.7. CONCLUSIONS
Square-Root State-Estimation
As in the original UKF, the lter is initialized by al ulating the matrix square-root of the state
ovarian e on e via a Cholesky fa torization (Eqn. 7.136). However, the propagated and updated
Cholesky fa tor is then used in subsequent iterations to dire tly form the sigma points. In Eqn. 7.140
the time-update of the Cholesky fa tor, S , is al ulated using a QR de omposition of the ompound
matrix ontaining the weighted propagated sigma points and the matrix square-root of the additive
pro ess noise ovarian e. The subsequent Cholesky update (or downdate) in Eqn. 7.141 is ne essary
sin e the the zeroth weight, W0( ) , may be negative. These two steps repla e the time-update of P
in Eqn. 7.55, and is also O(L3 ).
The same two-step approa h is applied to the al ulation of the Cholesky fa tor, Sy~ , of the
observation-error ovarian e in Eqns. 7.145 and 7.146. This step is O(LM 2 ), where M is the obser-
vation dimension. In ontrast to the way the Kalman gain is al ulated in the standard UKF (see
Eqn. 7.61), we now use two nested inverse (or least squares ) solutions to the following expansion of
Eqn. 7.61, Kk (Sy~ k STy~ k ) = Pxk yk . Sin e Sy~ is square and triangular, e ient \ba k-substitutions"
an be used to solve for Kk dire tly without the need for a matrix inversion.
Finally, the posterior measurement update of the Cholesky fa tor of the state ovarian e is al-
ulated in Eqn. 7.150 by applying M sequential Cholesky downdates to Sk . The downdate ve tors
are the olumns of U = Kk Sy~ k . This repla es the posterior update of Pk in Eqn. 7.63, and is also
O(LM 2 ).
Square-Root Parameter-Estimation
The parameter-estimation algorithm follows a similar framework as that of the state-estimation
square-root UKF. However, an O(ML2 ) algorithm, as opposed to O(L3 ), is possible by taking
advantage of the linear state transition fun tion. Spe i ally, the time-update of the state ovarian e
is given simply by Pwk = Pwk 1 + Rrk 1 (see Se tion 7.4 for dis ussion on sele ting Rrk 1 ). In the
square-root lters, Swk may thus be updated dire tly in Eqn 7.153 using one of two options: 1) Swk =
1=2
RLS Swk 1 , orresponding to an exponential weighting on past data. 2) Swk = Swk 1 +Drk 1 , where
the diagonal matrix Drk 1 , is hosen to approximate the ee ts of annealing a diagonal pro ess noise
ovarian e Rrk 15 . Both options avoid the ostly O(L3 ) QR and Cholesky based updates ne essary
in the state-estimation lter.
15 This updates ensures the main diagonal of Pwk is exa t. However, additional o-diagonal ross-terms
Sw 1 DTr
k k 1
+ Drk 1 STwk 1 are also introdu ed (though the ee t appears negligible).
44
7.7. CONCLUSIONS
Initialize with:

^ 0 = E [w Sw0 = hol E [(w w
w ^ 0 )(w w
^ 0 )T (7.151)
For k 2 f1; : : : ; 1g,
Time update and sigma point al ulation:
w
^k = w^k 1 (7.152)
1=2
Swk = RLS Swk 1 or Swk = Swk 1 + Drk 1 (7.153)
W kjk 1 = w^ k w^ k + Swk w^ k Swk

(7.154)
Dkjk 1 = G[xk ; W kjk 1 (7.155)
2L
X
d^ k = Wi(m) Di;kjk 1 (7.156)
i=0

q h i p e
Sdk = qr W1( ) D1:2L;k d^ k R (7.157)
n o
Sdk = holupdate Sdk ; D0;k d^ k ; W0( ) (7.158)
2L
X
Pwk dk = Wi( )[Wi;kjk 1 ^ k [Di;kjk
w 1 d^ k T (7.159)
i=0
Kk = (Pw d =STd )=Sd

k k k k (7.160)
w ^ k + Kk (dk d^ k )
^k = w (7.161)
U = Kk Sd k (7.162)

Swk = holupdate Swk ; U ; -1 (7.163)
q 2
where Drk 1 = Diag Swk 1 + Diag Swk 1 + Diag Rrk 1 .
Table 7.7.2: Square-Root UKF for parameter-estimation.
45
Bibliography
[1 D. Avitzour. A sto hasti simulation Bayesian approa h to multitarget tra king. IEE Pro eed-
ings on Radar, Sonar and Navigation, 142(2):41{44, 1995.
[2 E. R. Beadle and P. M. Djuri . A fast weighted Bayesian bootstrap lter for nonlinear model
state estimation. IEEE Transa tions on Aerospa e and Ele troni Systems, 33(1):338{343, 1997.
[3 F. Bla k and M. S holes. The pri ing of options and orporate liabilities. Journal of Politi al
E onomy, 81:637{659, 1973.
[4 R. W. Brumbaugh. An Air raft Model for the AIAA Controls Design Challenge. PRC In .,
Edwards, CA.
[5 J. R. Cloutier, C. N. D'Souza, and C. P. Mra ek. Nonlinear regulation and nonlinear H-innity
ontrol via the state-dependent Ri ati equation te hnique: Part1, Theory. In Pro eedings of the
International Conferen e on Nonlinear Problems in Aviation and Aerospa e, Daytona Bea h,
FL, May 1996.
[6 J. F. G. de Freitas. Bayesian Methods for Neural Networks. PhD thesis, Cambridge University
Engineering Department, 1999.
[7 J. F. G. de Freitas, M. Niranjan, A. H. Gee, and A. Dou et. Sequential Monte Carlo methods
to train neural network models. Neural Computation, 12(4):955{993, 2000.
[8 A. Dou et. On sequential simulation-based methods for Bayesian ltering. Te hni al Report
CUED/F-INFENG/TR 310, Department of Engineering, Cambridge University, 1998.
[9 A. Dou et, J. F. G. de Freitas, and N. J. Gordon. Introdu tion to sequential Monte Carlo
methods. In A. Dou et, J F G de Freitas, and N. J. Gordon, editors, Sequential Monte Carlo
Methods in Pra ti e. Springer-Verlag, 2000.
[10 J. P. Dutton. Development of a Nonlinear Simulation for the M Donnell Douglas F-15 Eagle
with a Longitudinal TECS Control Law. Master's thesis, Dept. of Aeronauti s and Astronauti s,
University of Washington, 1994.
[11 B. Efron. The Bootstrap, Ja knife and other Resampling Plans. SIAM, Philadelphia, 1982.
[12 Z. Ghahramani and S. T. Roweis. Learning nonlinear dynami al systems using an EM algo-
rithm. In M. J. Kearns, S. A. Solla, and D. A. Cohn, editors, Advan es in Neural Information
Pro essing Systems 11: Pro eedings of the 1998 Conferen e. MIT Press, 1999.
[13 N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approa h to nonlinear/non-Gaussian
Bayesian state estimation. IEE Pro eedings-F, 140(2):107{113, April 1993.
46
BIBLIOGRAPHY BIBLIOGRAPHY
[14 S. Haykin. Adaptive Filter Theory. Prenti e-Hall, In , 3 edition, 1996.

[15 T. Higu hi. Monte Carlo lter using the geneti algorithm operators. Journal of Statisti al
Computation and Simulation, 59(1):1{23, 1997.
[16 J. C. Hull. Options, Futures, and Other Derivatives. Prenti e Hall, third edition, 1997.
[17 K. Ikeda. Multiple-valued stationary state and its instability of light by a ring avity system.
Opt. Commun., 30(257), 1979.
[18 M. Isard and A. Blake. Contour tra king by sto hasti propagation of onditional density. In
European Conferen e on Computer Vision, pages 343{356, Cambridge, UK, 1996.
[19 K. Ito and K. Xiong. Gaussian Filters for Nonlinear Filtering Problems. IEEE Transa tions
on Automati Control, 45(5):910{927, may 2000.
[20 A. Jazwinsky. Sto hasti Pro esses and Filtering Theory. A ademi Press, New York., 1970.
[21 S. J. Julier. The S aled Uns ented Transformation. To appear in Automati a, 2000.
[22 S. J. Julier and J. K. Uhlmann. A General Method for Approximating Nonlin-
ear Transformations of Probability Distributions. Te hni al report, RRG, Dept. of
Engineering S ien e, University of Oxford, Nov 1996. http://www.robots.ox.a .uk/
~siju/work/publi ations/letter size/Uns ented.zip.
[23 S. J. Julier and J. K. Uhlmann. A New Extension of the Kalman Filter to Nonlinear Systems.
In Pro . of AeroSense: The 11th Int. Symp. on Aerospa e/Defen e Sensing, Simulation and
Controls., 1997.
[24 S. J. Julier, J. K. Uhlmann, and H. Durrant-Whyte. A new approa h for ltering nonlinear
systems. In Pro eedings of the Ameri an Control Conferen e, pages 1628{1632, 1995.
[25 R. E. Kalman. A new approa h to linear ltering and predi tion problems. Trans. ASME
Journal ofBasi Engineering, pages 35{45, 1960.
[26 G. Kitagawa. Monte Carlo lter and smoother for non-Gaussian nonlinear state spa e models.
Journal of Computational and Graphi al Statisti s, 5:1{25, 1996.
[27 A. Kong, J. S. Liu, and W. H. Wong. Sequential imputations and Bayesian missing data
problems. Journal of the Ameri an Statisti al Asso iation, 89(425):278{288, 1994.
[28 A. Lapedes and R. Farber. Nonlinear Signal Pro essing using Neural Networks: Predi tion and
System modelling. Te hni al Report LAUR872662, Los Alamos National Laboratory, 1987.
[29 F. L. Lewis. Optimal Estimation. John Wiley & Sons, In ., New York, 1986.
[30 J. S. Liu and R. Chen. Blind de onvolution via sequential imputations. Journal of the Ameri an
Statisti al Asso iation, 90(430):567{576, June 1995.
[31 J. S. Liu and R. Chen. Sequential Monte Carlo methods for dynami systems. Journal of the
Ameri an Statisti al Asso iation, 93:1032{1044, 1998.
[32 L. Ljung and T. Soderstrom. Theory and Pra ti e of Re ursive Identi ation. MIT Press,
Cambridge, MA, 1983.
47
[33 D. J. C. Ma Kay. http://wol.ra.phy. am.a .uk/ma kay/sour edata.html. WWW.

[34 D. J. C. Ma Kay. A Pra ti al Bayesian Framework for Ba kpropagation Networks. Neural
Computation, 4:448{472, 1992.
[35 M. Ma key and L. Glass. Os illation and haos in a physiologi al ontrol system. S ien e,
197(287), 1977.
[36 M. B. Matthews. A state-spa e approa h to adaptive nonlinear ltering using re urrent neural
networks. In Pro eedings IASTED Internat. Symp. Arti ial Intelligen e Appli ation and Neural
Networks, pages 197{200, 1990.
[37 A. T. Nelson. Nonlinear Estimation and Modeling of Noisy Time-Series by Dual Kalman Fil-
tering Methods. PhD thesis, Oregon Graduate Institute, 2000.
[38 M. Niranjan. Sequential tra king in pri ing nan ial options using model based and neural
network approa hes. In M C Mozer, M I Jordan, and T Pets he, editors, Advan es in Neural
Information Pro essing Systems, volume 8, pages 960{966, 1996.
[39 M. Nrgaard, N. K. Poulsen, and O. Ravn. Advan es in Derivative-Free State Estimation for
Nonlinear Systems. Te hni al Report IMM-REP-1998-15, Department of Mathemati al Mod-
elling / Department of Automation, Te hni al University of Denmark, 28 Lyngby, Denmark,
April 2000.
[40 W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numeri al Re ipes in C :
The Art of S ienti Computing. Cambridge University Press, 2 edition, 1992.
[41 G. V. Puskorius and L. A. Feldkamp. De oupled Extended Kalman Filter Training of Feedfor-
ward Layered Networks. In IJCNN, volume 1, pages 771{777, 1991.
[42 G. V. Puskorius and L. A. Feldkamp. Extensions and Enhan ements of De oupled Extended
Kalman Filter Training. In ICNN, volume 3, pages 1879{1883, 1997.
[43 H. H. Rosenbro k. An Automati Method for Finding the Greatest or Least Value of a Fun tion.
Computer Journal, 3:175{184, 1960.
[44 D. B. Rubin. Using the SIR algorithm to simulate posterior distributions. In J M Bernardo,
M H DeGroot, D V Lindley, and A F M Smith, editors, Bayesian Statisti s 3, pages 395{402,
Cambridge, MA, 1988. Oxford University Press.
[45 A. H. Sayed and T. Kailath. A State-Spa e Approa h to Adaptive RLS Filtering. IEEE Signal
Pro essing Magazine, pages 18{60, Jul 1994.
[46 R. H. Shumway and D. S. Stoer. An approa h to time series smoothing and fore asting using
the EM algorithm. J. Time Series Analysis, 3(4):253{264, 1982.
[47 S. Singhal and L. Wu. Training multilayer per eptrons with the extended Kalman lter. In
Advan es in Neural Information Pro essing Systems 1, pages 133{140, San Mateo, CA, 1989.
Morgan Kauman.
[48 A. F. M. Smith and A. E. Gelfand. Bayesian statisti s without tears: a sampling-resampling
perspe tive. Ameri an Statisti ian, 46(2):84{88, 1992.
48
[49 R. van der Merwe, J. F. G. de Freitas, A. Dou et, and E. A Wan. The Uns ented Parti le Filter.
Te hni al Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department,
Cambridge, England, August 2000.
[50 R. van der Merwe and E. A. Wan. E ient derivative-free kalman lters for online learning. In
To appear in European Symposium on Arti ial Neural Networks (ESANN), Bruges, Belgium,
April 2001.
[51 R. van der Merwe and E. A. Wan. The square-root uns ented kalman lter for state and
parameter-estimation. In To appear in International Conferen e on A ousti s, Spee h, and
Signal Pro essing, Salt Lake City, Utah, May 2001.
[52 E. A. Wan and A. T. Nelson. Neural dual extended Kalman ltering: appli ations in spee h
enhan ement and monaural blind signal separation. In Pro . Neural Networks for Signal Pro-
essing Workshop. IEEE, 1997.
[53 E. A. Wan and R. van der Merwe. The Uns ented Kalman Filter for Nonlinear Estimation.
In Pro eedings of Symposium 2000 on Adaptive Systems for Signal Pro essing, Communi ation
and Control (AS-SPCC), Lake Louise, Alberta, Canada, O tober 2000. IEEE.
[54 E. A. Wan, R. van der Merwe, and A. T. Nelson. Dual Estimation and the Uns ented Transfor-
mation. In S.A. Solla, T.K. Leen, and K.-R. Muller, editors, Advan es in Neural Information
Pro essing Systems 12, pages 666{672. MIT Press, 2000.
[55 V. S. Zaritskii, V. B. Svetnik, and L. I. Shimelevi h. Monte-Carlo te hniques in problems of
optimal information pro essing. Automation and Remote Control, 36(3):2015{2022, 1975.
49

Paper The Inscented Kalman Filter 2001

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Paper The Inscented Kalman Filter 2001

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 7

The Uns ented Kalman Filter

Eri A. Wan and Rudolph van der Merwe

7.1 Introdu tion

Figure 7.1: dis rete-time nonlinear dynami system

a dis rete-time nonlinear dynami system,

7.2 Optimal Re ursive Estimation and the EKF

and the normalizing onstant p(yk jY0k ) is given by

and the measurement update equations:

Kk = Px CTk (Ck Px CTk + Dk Rn DTk )

7.3 The Uns ented Kalman Filter

tend to prohibit their use.

Uns ented Transformation

where  = 2 (L + ) L is a s aling parameter. The onstant determines the spread of the

with weights Wi given by

Figure 7.2: Blo k diagram of the UT

Uns ented Kalman Filter

Actual (sampling) Linearized (EKF) UT

x^0 = E [x0 (7.35)

Measurement update equations:

Measurement update equations:

Figure 7.4: Double inverted pendulum

7.3.1 State Estimation Examples

Double inverted pendulum.

Estimation of MackeyGlass time series : EKF

Noisy time-series estimation

7.3.2 The Uns ented Kalman Smoother

Figure 7.7: Forward/ba kward neural network predi tion training.

Ma key-Glass Norm. MSE Chaoti AR-NN Norm. MSE

7.4 UKF Parameter Estimation

7.4.1 Parameter Estimation Examples

Ben hmark NN regression and time-series problems

Four Regions Classi ation

True Mapping Learning Curves on Test Set

Double inverted pendulum

Rosenbro k's Banana fun tion

7.5 UKF Dual Estimation

7.5.1 Dual Estimation Experiments

Estimation of MackeyGlass time series : Dual UKF

Figure 7.14: Mass and Spring system.

Figure 7.15: linear mode predi tion

F15 ight simulation

Double Inverted Pendulum

0.5 true states

7.6 The Uns ented Parti le Filter

Monte Carlo simulation and sequential importan e sampling

may be approximated by the following estimate

and the posterior ltering density by,

Resampling and MCMC step

j resampled index p(i)

7.6.1 The Parti le Filter Algorithm

distribution [1, 2, 13, 18, 2610 ,

The EKF and UKF Parti le Filter

Generi Parti le Filter

 For i = 1; : : : ; N , evaluate the importan e weights up to a normalizing onstant:

(b) Sele tion step (resampling)

Algorithm 7.6.1: The generi parti le lter.

 For i = 1; : : : ; N , sample x(ki)  q(xk jx(0:i)k 1 ; Y0k )

Propagate parti le into future (time update):

In orporate new observation (measurement update):

Kk = Pxk yk Py~k1y~k x (ki) = x (kij)k 1 + Kk (yk yk(ij)k 1 ) (7.118)

7.6.2 UPF Experiments

Pri ing nan ial options

where = 2 (L + ) L is a s aling parameter. The onstant determines the spread of the

For i = 1; : : : ; N , evaluate the importan e weights up to a normalizing onstant:

For i = 1; : : : ; N , sample x(ki) q(xk jx(0:i)k 1 ; Y0k )

Kk = Pxk yk Py~k1y~k x (ki) = x (kij)k 1 + Kk (yk yk(ij)k 1 ) (7.118)

the UT predi ted mean an be further simplied to