Augmented Particle Filters

Journal of the American Statistical Association
ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20
Augmented Particle Filters
Jonghyun Yun, Fan Yang & Yuguo Chen
To cite this article: Jonghyun Yun, Fan Yang & Yuguo Chen (2016): Augmented Particle Filters,
Journal of the American Statistical Association, DOI: 10.1080/01621459.2015.1135803
To link to this article: http://dx.doi.org/10.1080/01621459.2015.1135803
Accepted author version posted online: 02

Feb 2016.
Submit your article to this journal
Article views: 175
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=uasa20
Download by: [Universita Studi la Sapienza] Date: 09 April 2017, At: 03:16
ACCEPTED MANUSCRIPT
Augmented Particle Filters

Jonghyun Yun, Fan Yang, and Yuguo Chen
Abstract
Particle filters have been widely used for online filtering problems in state space models.
The current available proposal distributions depend either only on the state dynamics, or only
on the observation, or on both sources of information but are not available for general state
space models. In this article, we develop a new particle filtering algorithm, called the aug-
mented particle filter (APF), for online filtering problems in state space models. The APF
combines two sets of particles from the observation equation and the state equation, and the
state space is augmented to facilitate the weight computation. Theoretical justification of the
APF is provided, and the connection between the APF and the optimal particle filter (OPF) in
some special state space models is investigated. The APF shares similar properties as the OPF,
but the APF can be applied to a much wider range of models than the OPF. Simulation studies
show that the APF performs similarly to or better than the OPF when the OPF is available, and
the APF can perform better than other filtering algorithms in the literature when the OPF is not
available.
KEY WORDS: Nonlinear filtering; Particle filter; Sequential Monte Carlo; State space model.

Jonghyun Yun is Assistant Professor, Department of Mathematical Sciences, University of Texas at El Paso, El
Paso, TX 79968 (Email: jyun@utep.edu). Fan Yang is Ph.D. candidate, Department of Statistics, University of Illinois
at Urbana-Champaign, Champaign, IL 61820 (Email: fyang15@illinois.edu). Yuguo Chen is Professor, Department
of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820 (Email: yuguo@illinois.edu). This
work was partially supported by the National Science Foundation grant ATM-0620550. The authors thank the editor,
the associate editor, and three referees for valuable suggestions.
ACCEPTED MANUSCRIPT
1
ACCEPTED MANUSCRIPT
1 Introduction
A discrete-time state space model (SSM) is a stochastic process (Xt , Yt ) such that (i) {Xt } is a
Markov process that is not observed, and (ii) Yt is observed at time t and its conditional distribution
given {X s , s t} depends only on Xt . State space models are often characterized by a combination
of the state equation and the observation equation as follows
yt = ht (xt , ut ), observation equation;
(1)
xt = ft (xt1 , vt ), state equation,
where vt and ut are independent error terms with known distributions. A graphical illustration
of the state space model is given in Figure 1. SSMs have applications in many fields, including
signal processing, image analysis, speech recognition, DNA sequence analysis, oceanography, and
time series modeling; see Rabiner (1989), Elliott et al. (1995), Gordon et al. (1995), Durbin et al.
(1998), Liu (2001), Tsay (2002), Bertino et al. (2003) and Butala et al. (2009).
One important problem concerning discrete-time SSMs is computing E[g(X0:t )|y1:t ], the expec-
tation of g(X0:t ) with respect to the posterior distribution of the hidden state X0:t := {X0 , . . . , Xt }
given the current and past observations y1:t := {y1 , . . . , yt }. This is the filtering problem and
E[g(X0:t )|y1:t ] is the Bayes estimate of g(X0:t ) with respect to the squared error loss. In real ap-
plications, recursive algorithms are usually preferred for fast online updating of the filter.
Since direct computation of E[g(X0:t )|y1:t ] is not feasible except some special cases, particle
filters (PFs), also known as sequential Monte Carlo (SMC) methods, have been used to give ap-
proximate answers to filtering problems. These methods use a large number of samples (particles)
with their associate weights to approximate the posterior distribution. The random samples are
generated sequentially from a proposal distribution based on the importance sampling procedure,
and the importance weight is the ratio of the target distribution and the proposal distribution. The
efficiency of the particle filter depends heavily on the choice of the proposal distribution.
Many different particle filtering methods have been introduced to design the proposal distribu-
tion. The independent particle filter (Lin et al., 2005) and the naive particle filter (or bootstrap filter,
Gordon et al., 1993) rely on either the observation or the state equation to build up the proposal
distribution. The optimal particle filter (OPF) (Liu and Chen, 1998; Doucet et al., 2000) constructs
ACCEPTED MANUSCRIPT
2
ACCEPTED MANUSCRIPT
the proposal distribution based on both the state equation and the observation, but this proposal
distribution is often difficult to sample from directly for general SSMs and the importance weight
could be hard to compute. Some other particle filtering methods, such as Gaussian particle filters
(Kotecha and Djuric, 2003) and auxiliary particle filters (Pitt and Shephard, 1999), often require
approximation of either the target distribution or the proposal distribution.
In this paper, we propose a new approach named the augmented particle filter (APF), in which
another hidden state is introduced to facilitate the sampling and the weight computation. The APF
combines two sets of particles from the observation and state equations, so its proposal distribution
also combines information from both sources as in the OPF. However, unlike the OPF and some
other filtering algorithms, the APF can be applied to general SSMs, and it does not require special
structures of the model or any approximation to the target or proposal distribution. We find through
simulation studies that the APF performs similarly to or better than the OPF when the OPF is
available, and the APF can perform better than other filtering algorithms in the literature when the
OPF is not available.
After reviewing previous work on particle filters in Section 2, we give the framework of the
augmented particle filter and its justification in Sections 3 and 4. Some special cases of the APF
and its connection to the OPF are discussed in Section 5. Simulation studies are presented in
Section 6 to compare the performance of different methods. Finally, some conclusions are given
in Section 7.
2 Review of Particle Filters

We focus on the online filtering problem in the state space model (1), assuming that the observation
yt s arrive sequentially over time. The particle filter (PF) is a method to generate a weighted sample
(i)
(particle) {x0:T , wT(i) }i=1
N
from the posterior distribution of X0:T given y1:T in the state space model.
Based on the weighted sample, we are able to estimate E[g(X0:T )|y1:T ]. However, due to the size of
the time step T , the dimension of X0:T could be large, so it is usually difficult to draw the sample
x0:T all at once. To circumvent this difficulty, the PF generates the sample recursively based on the
ACCEPTED MANUSCRIPT
3
ACCEPTED MANUSCRIPT
following decomposition of the posterior distribution:

Q
p(yT |xT )p(xT |xT 1 ) p(x0 ) Tt=1 p(yt |xt )p(xt |xt1 )
p(x0:T |y1:T ) = p(x0:T 1 |y1:T 1 ) = Q .
p(yT |y1:T 1 ) p(y1 ) Tt=2 p(yt |y1:t1 )
Assuming p(x0 ) is known, then E[g(X0:T )|y1:T ] can be rewritten as
Z Z QT Y
T
p(x0 ) t=1 p(yt |xt )p(xt |xt1 )
g(x0:T )CT QT q(x0 ) q(xt |yt , xt1 )dx0 dx1 dxT ,
q(x0 ) t=1 q(xt |yt , xt1 ) t=1
QT
where CT1 = p(y1 ) t=2 p(yt |y1:t1 ) is a normalizing constant coming from the un-normalized den-
sities in the integrand, and q(x0 ) and q(xt |yt , xt1 ) are the proposal distributions. Based on this
decomposition, the PF draws the weighted samples in a recursive way as follows:
1. Draw x0(i) from q(x0 ) for i = 1, . . . , N. Compute the importance weight as w(i) (i) (i)
0 = p(x0 )/q(x0 ),
PN (i)
i = 1, . . . , N, and normalize the weight by w0(i) = w(i)
0 / i=1 w0 .
2. At each time step t = 1, . . . , T , draw xt(i) from the proposal distribution q(xt |yt , xt1
(i)
). Compute
the importance weight as
p(yt |xt(i) )p(xt(i) |xt1

(i)
)
w(i)
t = (i)
wt1 , i = 1, . . . , N, (2)
q(xt(i) |yt , xt1
(i)
)
PN
and normalize the weight as wt(i) = w(i)
t / i=1 w(i)
t .
The importance weights are normalized in Step 2 because the densities in (2) are often only known
up to the normalizing constant Ct which depends only on yt s. Using the normalized importance
PN (i) (i)
weight w(i)
t , the estimate of E[g(X0:t )|y1:t ] is i=1 wt g(x0:t ). The general theory of importance
sampling implies that

XN a.s.
w(i) (i)
t g(x0:t )
E[g(X0:t )|y1:t ] as N . (3)
i=1
In addition, we can approximate the target posterior as

XN
p(x0:t |y1:t ) wt(i) x(i) (x0:t ),
i=1 0:t
(i)
where x(i) (x0:t ) = 1 if x0:t = x0:t and 0 otherwise. More convergence results of PFs are given in
0:t
Doucet et al. (2000).
ACCEPTED MANUSCRIPT
4
ACCEPTED MANUSCRIPT
The performance of the particle filter depends heavily on the quality of the proposal distribu-
tion, and the variance of the importance weights is often used to measure such quality. The ideal
scenario is to choose the proposal distribution to be the same as the target distribution p(x0:t |y1:t ),
so all the importance weights are the same and the variance of the weight is 0. However, it is
usually impossible to sample from p(x0:t |y1:t ) directly except some special SSMs. If the proposal
distribution is far from p(x0:t |y1:t ), the variance of the importance weight tends to be large. In the
worst case scenario, only a few particles with large weights dominate the estimate in (3). This is
called the degeneracy of the importance weights which will make the PF algorithm inefficient.
In the following we review three popular choices of the proposal distribution. The naive particle
filter (NPF) proposed by Gordon et al. (1993) chooses the proposal as
q(xt |yt , xt1 ) = p(xt |xt1 ),
and the weight update is quite simple for this case: wt = wt1 p(yt |xt ). Hence, no information from
the observation yt is used to generate the particles at time t. The independent particle filter (IPF)
introduced by Lin et al. (2005) is another extreme case because the construction of the proposal
distribution is based on the observation yt only:
q(xt |yt , xt1 ) = q(xt |yt ).
Since the particles generated from this proposal are independent of the history x0:t1 , the particles
at time t can be matched with any particles in the history by a random permutation. After the
matching is done, we can calculate the importance weight as
p(yt |xt )p(xt |xt1 )
wt = wt1 .
q(xt |yt )
The optimal particle filter (OPF, Liu and Chen, 1998; Doucet et al., 2000) chooses the proposal
distribution as
q(xt |yt , xt1 ) = p(xt |yt , xt1 ),
and its importance weight is wt = wt1 p(yt |xt1 ). This proposal is considered optimal in the sense
that it minimizes the variance of the weights conditional upon x0:t1 and y1:t (Doucet and Gordon,
1999). That is, the variance of wt = wt1 p(yt |xt )p(xt |xt1 )/q(xt |yt , xt1 ) achieves its minimum when
ACCEPTED MANUSCRIPT
5
ACCEPTED MANUSCRIPT
q(xt |yt , xt1 ) is chosen to be p(xt |yt , xt1 ). However, the OPF can be implemented for only very
limited class of SSMs, because the proposal and the weight update are intractable in general.
For general SSMs, the NPF is often used because the OPF is usually not available and in-
corporating the observation yt into the proposal could be challenging. For the NPF, the proposal
distribution may not be close to the target distribution because the proposal depends solely on the
state equation and it does not use any information from the observation yt . This could lead to a
large variance of the importance weight, and the problem would be even worse if the dimension
of the model is high. In the next section, we propose the augmented particle filter (APF) whose
proposal distribution is widely applicable and combines information from both the observation and
state equations.
3 The Augmented Particle Filter

We first introduce the SSM with the augmented state space. As illustrated in Figure 2, the aug-
mented state space is represented by another hidden layer of nodes x1f , x2f , . . .. Given xt1 , the
augmented state xtf depends only on xt1 , and xtf does not depend on the observations, other state
vectors, and other augmented state vectors in the model. The augmented state space model can be
represented as
yt |xt = ht (xt , ut ), the observation equation;
xt |xt1 = ft (xt1 , vt ), the state equation; (4)
xtf |xt1 = ft (xt1 , vtf ), the augmented state equation,
where the error term vtf usually has the same distribution as vt . Here we use the state equation to
specify xtf |xt1 , but xtf |xt1 can be specified in other ways as long as xtf depends only on xt1 (see
an example in Section 6.2). The observation and state equations in (4) are the same as those in the
standard SSM (1), and only the augmented state equation is new.
f f
In the augmented SSM, the target distribution is p(x0:t , x1:t |y1:t ) = p(x1:t |x0:t )p(x0:t |y1:t ), and
its marginal distribution p(x0:t |y1:t ) is the same as that in the standard SSM (1). Therefore, the
inference based on the augmented SSM is valid. By considering the augmented state space, we
ACCEPTED MANUSCRIPT
6
ACCEPTED MANUSCRIPT
make the state space two times larger than the original state space, but we show in the following
that the augmented SSM can facilitate the sampling and weight computation in the augmented
particle filter.
Before we describe the APF algorithm, we define a few notations. Let xt(i) and xtf,(i) denote
the samples generated for the hidden state xt and the augmented state xtf respectively, and let xtl,(i)
denotes a sample from a proposal distribution which solely depends on the likelihood function as-
sociated with the current observation yt . The construction of the APF proposal can be summarized
as follows. First, we draw a forecast particle xtf,(i) = ft (xt1
(i)
, vtf,(i) ) by evolving the particle accord-
ing to the state equation, and draw a likelihood particle xtl,(i) from a proposal distribution based
on the likelihood function p(yt |xt ). Then we combine the two sets of particles to incorporate the
information contained in both the observation and state equations.
The APF for the augmented SSM (4) is given as follows. At the initial step t = 0, draw x0(i)
from q(x0 ), for i = 1, . . . , N, and compute the importance weight as w(i) (i) (i)
0 = p(x0 )/q(x0 ). For
t = 1, . . . , T , we repeat the following steps:
1. Draw xtf,(i) (forecast particle) from p(xtf |xt1

(i)
), which can be easily obtained by evolving
through the state equation.
2. Draw xtl,(i) (likelihood particle) from a proposal distribution ql (xtl |yt ) whose functional form
is close to p(yt |xt ).
3. Let ht ( xt , ut ) denote the derivative of ht (xt , ut ) with respect to xt at xt , where xt is a temporary

estimate of xt . Evaluate Ht := E[ht ( xt , ut )| xt ] and Rt := Var[ht ( xt , ut )| xt ].
4. Evaluate Qt := Var[ f ( xt1 , vtf )| xt1 ].
5. Let t = (Ht0 R1 1
t Ht ) . Then, combine the two particles from Steps 1 and 2 as
xt(i) = (1 1 1 1 l,(i)
t + Qt ) (t xt + Q1
t xt
f,(i)
) (5)
= Qt (t + Qt )1 xtl,(i) + t (t + Qt )1 xtf,(i) . (6)
ACCEPTED MANUSCRIPT
7
ACCEPTED MANUSCRIPT
(i) f,(i)
6. Calculate the importance weight of (x0:t , x1:t ) as

(i)
)
w(i)
t = (i)
wt1 . (7)
ql (xtl,(i) |yt )
The APF incorporates information from both the observation and state equations to construct the
proposal distribution, so it can be viewed as a combination of the IPF and NPF. In the APF, the
amount of information contained in the observation equation and state equation is represented
by the conditional variances of yt |xt and xtf |xt1 . Thus, the APF requires evaluating Var(yt |xt )
and Var(xtf |xt1 ) to determine the weight to put on the forecast particle and likelihood particle in
forming the final particle. Since state vectors are unknown, we need to estimate the variance terms
by linearizing the equations and plugging in PF estimates into the equations (see Steps 3 and 4)
unless the model has additive errors.
The justification of the APF and the weight computation in Step 6, and the reason to include
the augmented state space will be given in Section 4. Here are a few remarks on the general APF.
1. If p(yt |xt ) is proportional to a probability density function of xt when viewed as a function

of xt with yt fixed, we recommend choosing ql (xtl |yt ) p(yt |xtl ) in Step 2. As an alternative,
when the SSM has additive observation noise, we can linearize h(xt ) with respect to xt at the
modes of the likelihood function p(yt |xt ), and then substitute h(xt ) with its linearization in
p(yt |xt ) to construct ql (xtl |yt ). When p(yt |xt ) depends on only a subset of the state components,
we can divide the state into two parts and only generate likelihood particles for the state
components directly related to yt . This idea is used in Lin et al. (2005) to handle similar
situations. See examples in Sections 6.2 and 6.4.
2. When p(yt |xt ) is proportional to a proper density with respect to xt , we can take an optional
resampling step right after Step 2. The weight at step 2 can be computed as
p(yt |xtl,(i) )
wtl,(i) = . (8)
ql (xtl,(i) |yt )
We resample xtl,(i) with probability proportional to wtl,(i) , so xtl,(i) approximately follows a

distribution proportional to p(yt |xt ). In this case, the proposal q(xt(i) |yt , xtf,(i) ) in (12) would
ACCEPTED MANUSCRIPT
8
ACCEPTED MANUSCRIPT
be approximately proportional to p(yt |xtl,(i) ) instead of ql (xtl,(i) |yt ), and we need to substitute
ql (xtl,(i) |yt ) with p(yt |xtl,(i) ) in (7).
3. In Step 3, we usually choose the temporary estimate xt as the mode of p(yt |xt ) (viewed as
a function of xt with yt fixed). We used this choice for all simulation studies in Section
6. Based on the empirical performance, this choice of xt is robust to non-linear and non-
Gaussian SSMs (see Sections 6.1, 6.2 and 6.4). If it is difficult to find the mode, we can
obtain a temporary estimate by generating samples from ql (xtl |yt ), and then constructing xt =
P P
( Lj=1 wtl,( j) xtl,( j) )/( Lj=1 wtl,( j) ), which mimics the estimate of E(Xt |yt ) with the flat prior on xt .
R
This choice is more suitable for SSMs with p(yt |xt ) dxt < , although it can provide a
temporary estimate even when this condition is not satisfied. If the function ht (xt , ut ) is not
differentiable at xt or if the analytical derivative of ht (xt , ut ) is not available, we could set ht
as a numerical differentiation at ( xt , E(ut )).
4. If Ht , Rt , or Qt in Steps 3 and 4 cannot be computed analytically, we can estimate them by

their sample counterparts. Later we will see from (12) that these estimates will not cause
problems for the APF.
5. In Step 5, if Ht0 R1 1 0 1
t Ht is not invertible, we can replace t by Ht Rt Ht in (5) and use (5) to
compute xt(i) . If Ht0 R1

t Ht is invertible, both (5) and (6) can be used, and it is usually faster to
compute with (6).
6. Besides the linear combination in (6), the APF allows other ways to combine the forecast
and likelihood particles. For example, if we believe the state equation is not informative, we
can inflate the variance components in Qt , so the final particle would put more weight on xtl,(i)
which is the particle from the observation equation (like the IPF). Another case is to inflate
the variance components in t when the observation equation is not informative. In this case,
the final particle would be close to xtf,(i) , which is the particle from the state equation (like
the NPF). Thus, the NPF and IPF can be viewed as two extreme cases of the APF.
7. The APF can adopt the multiple matching technique to match the likelihood particle xtl,(i)
and the forecast particle xtf,(i) in any order. This idea was proposed by Lin et al. (2005) to
ACCEPTED MANUSCRIPT
9
ACCEPTED MANUSCRIPT
reduce the variance of the importance weight. Let Km = (km,1 , . . . , km,N ), m = 1, . . . , M, be

M different permutations of (1, . . . , N). For each permutation Km , the likelihood particle xtl,(i)
can be combined with the permuted forecast particle through the linear combination
= Qt (t + Qt )1 xtl,(i) + t (t + Qt )1 xt
(i) f,(km,i )
xt,m , (9)
and the importance weight is
(i) (i) (k )
p(yt |xt,m )p(xt,m |xt1m,i ) (km,i )
w(i)
t,m = wt . (10)
ql (xtl,(i) |yt )
(i)
After obtaining M N weighted samples {xt,m , w(i)
t,m }, i = 1, . . . , N, m = 1, . . . , M, we estimate
E[g(Xt )|y1:t ] by
P M PN (i) (i)
m=1 wt,m g(xt,m
i=1 )
E[g(Xt )|y1:t ] = P M PN (i) .
m=1 w
i=1 t,m
To obtain N particles to evolve to the next time step t + 1, we perform the selection step
(i) M
as follows. For each i, we select one particle from {xt,m }m=1 with probability proportional
(i) M (i)
to {wt,m }m=1 , and set the selected particle and its history to be x0:t with importance weight
PM
w(i)
t =
(i)
m=1 wt,m /M.
4 Theoretical Justification of the APF

The APF proposal draws xtf,(i) in the first step and constructs the final particle xt(i) by combining
xtf,(i) and xtl,(i) . Thus, the APF proposal can be written as a joint distribution
q(xt(i) , xtf,(i) |yt , xt1

(i)
) = q(xt(i) |yt , xtf,(i) )p(xtf,(i) |xt1
(i)
). (11)
The sampling of xt(i) |yt , xtf,(i) involves two steps: 1) sample xtl,(i) from ql (xtl |yt ) and 2) combine xtl,(i)
with xtf,(i) via (6). Notice that in our sampling procedure xtl,(i) depends only on yt , and xtf,(i) is
treated as a constant in q(|yt , xtf,(i) ), so xtl,(i) |yt , xtf,(i) can be reduced to xtl,(i) |yt . Thus, the proposal
ACCEPTED MANUSCRIPT
10
ACCEPTED MANUSCRIPT
q(xt(i) |yt , xtf,(i) ) can be written as
q(xt(i) |yt , xtf,(i) ) =q{(1 1 1 1 l,(i)

t + Qt ) (t xt + Q1
t xt
f,(i)
)|yt , xtf,(i) }
=ql ((1 1 1 1 l,(i)

t + Qt ) t xt |yt )
(12)
=ql (xtl,(i) |yt )|(1 1 1 1 1
t + Qt ) t |
ql (xtl,(i) |yt ).
The constant |(1 1 1 1 1

t + Qt ) t | is ignored because it is the same for all particles xt(i) s. This
explains why the APF algorithm with Ht , Rt and Qt replaced by their sample counterparts is still
justified.
The following theorem, whose proof is given in Appendix A, justifies the weight computation
in Step 6 of the APF algorithm.
Theorem 4.1. In the APF algorithm, the importance weight w(i)

t can be computed recursively as
Equation (7).
Now we explain why the augmented state space xtf is needed to facilitate the weight com-
putation. Without considering xtf , our proposal distribution at time t is the marginal distribution
q(xt(i) |yt , xt1
(i)
), the target distribution is p(x0:t |y1:t ), and the corresponding importance weight is

(i)
)
w(i)
t = (i)
wt1 . (13)
q(xt(i) |yt , xt1
(i)
)
Based on (11), (12) and (6), the marginal distribution q(xt(i) |yt , xt1
(i)
) is
q(xt(i) |yt , xt1

(i)
)
Z
= q(xt(i) , xtf,(i) |yt , xt1
(i)
)dxtf,(i)
Z
= q(xt(i) |yt , xtf,(i) )p(xtf,(i) |xt1
(i)
)dxtf,(i) (14)
Z
= ql (xtl,(i) |yt )|(1 1 1 1
t + Qt ) t |p(xt
f,(i) (i)
|xt1 )dxtf,(i)
Z
= |(t + Qt ) t | ql (t {(1
1 1 1 1 1 (i)
t + Qt )xt Qt xt
1 f,(i)
)}|yt )p(xtf,(i) |xt1
(i)
)dxtf,(i) .
ACCEPTED MANUSCRIPT
11
ACCEPTED MANUSCRIPT
The integral in (14) usually cannot be solved analytically. If we estimate it by the naive Monte
Carlo method as
1 XL
q(xt(i) |yt , xt1
(i)
)= ql (t {(1 1 (i) 1 f,( j)
t + Qt )xt Qt xt )}|yt )|(1 1 1 1
t + Qt ) t |,
L j=1
where xtf,( j) , j = 1, . . . , L, are i.i.d. samples from p(xtf |xt1

(i)
), it will introduce Monte Carlo errors
in the computation of the importance weight (13). Also, for N particles, we need to evaluate
1 f,( j)
ql (t {(1 1 (i)
t + Qt )xt Qt xt )}|yt ) totally N L times, which is very computationally expensive.
By adding the augmented state vectors and considering the joint posterior of the hidden state and
the augmented state, we can avoid the evaluation of the integral (14), and the importance weight
(7) can be computed exactly. This is the advantage and motivation of considering the augmented
state space models.
More theoretical justification of the APF is provided in Section 5.1 and Appendix B. In Sec-
tion 5.1, we show that for SSMs with Gaussian additive noise and linear observation equations,
the marginal proposal distribution of xt(i) in the APF proposal is the same as the OPF proposal
distribution. In Appendix B, we show that for non-linear SSMs with non-Gaussian measurement
noise and Gaussian state noise, the marginal proposal distribution of xt(i) in the APF proposal is
the closest (in the sense of the KL divergence) to the OPF proposal distribution among the APF,
IPF and NPF. These theoretical properties of the APF provide justification for using (5) and (6) to
combine the particles, and show the advantage of the APF over the IPF and NPF.
5 The APF for SSMs with Additive Errors

In this section, we give the APF algorithm for the SSM with additive error terms. This is a widely
used class of SSMs which can be represented by the following equations:
yt |xt = ht (xt ) + ut , ut (0, Rt );
(15)
xt |xt1 = ft (xt1 ) + vt , vt (0, Qt ),
where (, ) denotes a distribution with mean and variance . We do not restrict our model to
have normal errors here. The APF for this model is simpler than the APF for general SSMs in
ACCEPTED MANUSCRIPT
12
ACCEPTED MANUSCRIPT
Section 3. In particular, the evaluation of Rt and Qt (which measure the amount of information
contained in the observation and state equations) is much simpler for model (15) because
Rt = Var(ht ( xt ) + ut | xt ) = Var(ut ) = Rt ;
Qt = Var( ft ( xt1 ) + vt | xt1 ) = Var(vt ) = Qt .
The estimation of Ht also requires less computational cost than that for the general SSM.
The detailed APF algorithm for model (15) is given as follows. At the initial step t = 0, draw
x0(i) from q(x0 ) for i = 1, . . . , N, and compute the importance weight as w(i) (i) (i)
0 = p(x0 )/q(x0 ). For
t = 1, . . . , T , we repeat the following steps:
1. Draw xtf,(i) from p(xtf |xt1

(i)
).
2. Draw xtl,(i) from a proposal distribution ql (xtl |yt ).
3. Let Ht denote the derivative of ht (xt ) with respect to xt at xt , where xt is a temporary estimate
of xt .
4. Let t = (Ht0 R1 1
t Ht ) . Then, combine the two particles from Steps 1 and 2 as (5) or
xt(i) = Qt (t + Qt )1 xtl,(i) + t (t + Qt )1 xtf,(i) .
5. Calculate the importance weight as

(i)
)
w(i)
t = (i)
wt1 .
ql (xtl,(i) |yt )
5.1 SSMs with Gaussian Additive Noise and Linear Observation Equations
In this section, we consider SSMs with Gaussian additive errors and a linear operator Ht in the
observation equation. This is a special class of SSMs with additive errors in Section 5, and they
can be represented by
yt |xt = Ht xt + ut , ut N(0, Rt );
(16)
xt |xt1 = ft (xt1 ) + vt , vt N(0, Qt ).
ACCEPTED MANUSCRIPT
13
ACCEPTED MANUSCRIPT
Model (16) is of particular interest for two reasons. First, the proposal distribution for the APF is
simple because Rt = Rt , Qt = Qt , and the derivative of ht (xt ) is Ht . Second, the OPF is available
for this model and there is an interesting connection between the proposals of OPF and APF.
Here is the APF algorithm for model (16). At the initial step t = 0, draw x0(i) from q(x0 ) for
i = 1, . . . , N, and compute the importance weight as w(i) (i) (i)
0 = p(x0 )/q(x0 ). For t = 1, . . . , T , we
repeat the following steps.
1. Draw xtf,(i) by xtf,(i) = ft (xt1

(i)
) + v(i) (i)
t where vt N(0, Qt ).
2. Draw xtl,(i) from the proposal distribution N(Ht0 (Ht Ht0 )1 yt , t ), where t = (Ht0 R1 1
t Ht ) . This
proposal distribution ql (xtl |yt ) is proportional to p(yt |xt ). If the matrix Ht Ht0 or Ht0 R1
t Ht is not
invertible, we may use the taping idea to keep only the diagonal of the matrix or add I to the
matrix to make it invertible, where is a small positive constant and I is the identity matrix.
3. Construct the final particle by combining the particles from the last two steps as (6) or
xt(i) = (1 1 1 1 l,(i)
t + Qt ) (t xt + Q1
t xt
f,(i)
). (17)
4. The importance weight is

(i)
)
w(i)
t = (i)
wt1 .
ql (xtl,(i) |yt )
For model (16), the marginal proposal distribution of xt(i) in the APF proposal is available in a
closed form, and it is shown in Proposition 5.1 that this marginal proposal distribution is the same
as the OPF proposal distribution if Ht Ht0 and Ht0 R1
t Ht are invertible. The OPF is available for
model (16) and its proposal distribution at time t is
p(xt(i) |yt , xt1

(i)
) = N(t(i) , t ), (18)
where t = (1 1 1
t + Qt ) and t(i) = t (Ht0 Rt1 yt + Q1 (i)
t ft (xt1 )).
Proposition 5.1. For the SSM in (16) with invertible Ht Ht0 and Ht0 R1
t Ht , the marginal proposal
distribution of xt(i) in the APF proposal is the same as the OPF proposal distribution (18).
ACCEPTED MANUSCRIPT
14
ACCEPTED MANUSCRIPT
The proof is in Appendix C. This proposition indicates that the particle xt(i) in the APF is
generated in the same way as the OPF for model (16). If we ignore the augmented state space, the
importance weight for xt(i) in the APF will be computed in the same way as the OPF, i.e.,
wt(i) = p(yt |xt1

(i) (i)
)wt1
( )
1 (i) 0 1 (i) 0 (i)
exp (yt Ht ft (xt1 ))(Rt + Ht Qt Ht ) (yt Ht ft (xt1 )) wt1 . (19)
2
So if we ignore the augmented state space, the APF and the OPF are equivalent for this model. For
general SSMs, the OPF is usually not available, but the APF can still be implemented.
6 Simulation Studies
The simulations were coded in MATLAB and run on a UNIX machine with a 2.40 GHz processor.
6.1 Nonlinear Filtering
We consider the following nonlinear SSM in Gordon et al. (1993):
yt |xt = xt /20 + ut ;
2
(20)
xt |xt1 = 0.5xt1 + 25xt1

1+x2
+ 8 cos(1.2(t 1)) + vt ,
t1
where ut N(0, 2 ) and vt N(0, 2 ). We set the initial distribution as x0 N(0, 4). The OPF is
usually not available when the observation equation is nonlinear, so we compare the performance
of the NPF, IPF and APF for estimating E(Xt |y1:t ).
The proposal for the NPF is given as the state equation in (20). The IPF proposal suggested by
Lin et al. (2005) is based on linearizing the observation equation:

0.5N(c, s2 ) + 0.5N(c, s2 ), yt > 0;

q(xt |yt ) =
(21)

N(0, 252 ), yt 0,
p
where c = 20yt and s2 = min(52 /yt , 252 ). Because the model (20) has additive error terms, the
APF in Section 5 can be implemented. The forecast particle xtf,(i) is drawn from the state equation
and the sampling distribution ql (xtl |yt ) for the likelihood particle xtl,(i) is the same as (21). The three
ACCEPTED MANUSCRIPT
15
ACCEPTED MANUSCRIPT
coefficients for the linear combination are obtained as Ht = xt /10, Rt = 2 , and Qt = 2 , where
xt = c if yt > 0, and 0 otherwise. The final particle of the APF at time t is
!1 !
(i) xt2 1 xt2 l,(i) 1 f,(i)
xt = + x + 2 xt .
1002 2 1002 t
The importance weights for the three methods are computed recursively as follows.
(i) 2
(x )
NPF : wt = N yt t , 2 w(i)
(i)
t1 ;
20
(i)

(xtl,(i) )2 l,(i) (i) 25xt1
N yt | 20 , N xt |0.5xt1 + 1+(x(i) )2 + 8 cos(1.2(t 1)),
2 2
IPF : wt(i) = l,(i)

t1
w(i)
t1 ;
ql (xt |yt )
(i) ! (i)

(x )2
N yt | t20 ,2 N xt(i) |0.5xt1
(i)
+
25x
t1 +8 cos(1.2(t1)), 2

1+(x
(i)
) 2
(i) t1
ql (xtl,(i) |yt )
w(i)
t1 , yt > 0;
APF : wt =

(i) 2

N yt | (x20 t )
, 2 w(i) yt 0,
t1 ,
where N(x|, 2 ) denotes the normal density with mean and variance 2 evaluated at x.
To measure the performance of each method, we compute the root mean square error (RMSE)
and its standard error (se(RMSE)). Let X0:T denote the filtering estimate
X0:T = (E(X0 |y1 ), E(X1 |y1:2 ), . . . , E(XT |y1:T )).
We have
r
1X
K
1 k
RMSE = ||X X0:T k
||2 ;
K k=1 T 0:T
v
t r
1 d 1 k
se(RMSE) = Var ||X0:T X0:T
k
||2 ,
K T
where K = 30, 000 is the number of independent repeated experiments, and the total time step
T = 100.
We consider sixteen different combinations of {1/8, 1/4, 1/2, 1} and {2, 4, 8, 16}. The
APF, IPF, and NPF are implemented with resampling at every time step and no multiple matching.
In addition, we also include in the comparison the auxiliary particle filter (Aux) (Pitt and Shephard,
1999) and two sub-optimal filters: the unscented Kalman filter (UKF) (Wan and van der Merwe,
2000) and the extended Kalman filter (EKF) (Xiao and Hill, 1996). In the first stage of the Aux, the
ACCEPTED MANUSCRIPT
16
ACCEPTED MANUSCRIPT
index k is sampled with probability proportional to p(yt |(k) (k) (k) (k)
t )wt1 , where t = E(xt |xt1 ), in order to
simulate particles associated with a high predictive observation density. Then xt s are drawn from
(k) i
p(xt |xt1 ), followed by the evaluation of the importance weight w(i) (i) (k )
t = p(yt |xt )/p(yt |t ), where
i
t(k ) is the conditional expectation associated with an ancestor of xt(i) . The UKF is implemented
with three sigma points in each t. For the four particle filter algorithms (APF, IPF, NPF, Aux), the
number of particles for each method (see Table 1) is chosen to ensure that the computation time
is about the same for all four methods. The results for the six algorithms are presented in Table 2.
We can see that for small observation noise and large state noise , the IPF outperforms the NPF.
For large and small , the NPF is better than the IPF. The APF combines the strength of the NPF
and IPF, and it outperforms the other two methods in all settings except the few cases of = 2 in
which the RMSE of the APF is between the RMSEs of the other two methods. The performance of
the Aux is similar to the NPF, and it is outperformed by the APF in all settings. Both the EKF and
the UKF suffer from the bias introduced by deterministic approximations for the target density.
We also implemented the APF and IPF with resampling at every time step and multiple match-
ings. Again, the number of particles and the number of matchings for each method (see Table 1) is
chosen to ensure that the computation time is about the same. The results are given in Table 3. For
all the settings, the APF performs better than the IPF.
In addition, we implemented two other approximation methods: the unscented particle filter
(UPF) (van der Merwe et al., 2000) and the implicit particle filter (ImPF) (Chorin and Tu, 2009).
For the ImPF, we used the quadratic approximation explained in Chorin et al. (2010). These
two methods are time-consuming because the UPF requires constructing UKF proposals for each
particle at each time step, and the ImPF requires an optimization step to generate particles from
the high probability region of the target distribution. If the computation time is fixed as in Table 1,
the number of particles is too small for both methods to obtain stable results. Therefore we chose
the number of particles to be 100 for the UPF and ImPF without requiring the computation time to
be the same as the other methods. The CPU time for one experiment is about 3.686 seconds for the
UPF and 516.556 seconds for the ImPF. We repeated the experiments for K = 3, 000 times, and
the results are presented in Table 4. Despite their significantly longer computation time, the UPF
and ImPF have larger RMSEs than the APF.
ACCEPTED MANUSCRIPT
17
ACCEPTED MANUSCRIPT
6.2 Maneuvering Target Tracking
We consider the tracking problem given in Ikoma et al. (2001) and Lin et al. (2005) which is
aimed to track a maneuvering target (e.g., a ship or an aircraft) over time in seconds. Let be
a 6-dimensional vector with its first two components ,1:2 representing the position, the middle
two components ,3:4 representing the velocity, and the last two components ,5:6 representing the
acceleration of the target at time in the Cartesian space. The dynamics of the target is given as a
differential equation, and the discretization of the model is as follows (see Ikoma et al. (2001) for
details):
+ = A() + B()v , (22)
where

1 0 0 a1 0

0 1 0 0 a1
0
0 0 1 0 a2 0 b1 0 b2 0 b3 0

A() = , B() = .
(23)
0 0 0 1 0 a2 0 b1 0 b2 0 b3

0 0 0 0 e 0

0 0 0 0 0 e
The components in (23) are as follows:
!
1 ()2 1 1
b1 = a1 , a1 = b2 = ( a2 ), a2 = b3 = (1 e ).
2
Following Ikoma et al. (2001), an independent Cauchy state noise is assumed for v = (v,1 , v,2 )0
with the density
q
p(v,i ) = . (24)
(v2,i + q2 )
To put the problem in the SSM framework, let xt denote t and yt denote the observation at time
t which is the angle and distance of the target from the origin measured by a radar. Then we have
the following multi-dimensional nonlinear SSM
yt = h(xt ) + ut ;
(25)
xt = A()xt1 + B()vt ,
ACCEPTED MANUSCRIPT
18
ACCEPTED MANUSCRIPT
where
! !0
xt,1 q 2 2
h(xt ) = arctan , xt,1 + xt,2 , (26)
xt,2
and vt has the same Cauchy distribution as (24). Following Ikoma et al. (2001), we assume a
Gaussian observation noise

1010 0
ut N(0, R) with R = .
(27)
0 102
Since the OPF is not available for this model, we compared five methods with resampling at
every step: the APF, IPF, NPF, UPF and ImPF. The UPF and ImPF are implemented in the same
way as in Section 6.1. The NPF uses the state equation to generate particles. For the IPF, since the
observation yt is directly related to only the first two components (xt,1 , xt,2 ) of the state vector, we
use a modification of the IPF, denoted by MIPF-1 in Lin et al. (2005), which deals with the case
that the observation is directly related to only part of the state vector. In this case, the particle for
the IPF is generated in two steps: the components directly related to the observation are generated
first and the remaining components are generated afterwards. Notice that the observation yt is
normally distributed with a function of the position vector h(xt,1:2 ) as the mean, and the function
h is invertible, so mimicking the proposal for the likelihood particle in Section 5.1, we choose the
proposal for the IPF to be
ql (xt,1:2 |yt ) = N(( xt,1 , xt,2 )0 , t ), (28)
where
( xt,1 , xt,2 )0 = h1 (yt );
1 0 1
t = I2 ( Ht R Ht ).
Here I2 is the 2 2 identity matrix and denotes the Hadamard product of matrices (component-
wise matrix product). The matrix 1
t is tapered to prevent the potential matrix inversion problem.
The entities in Ht , which is the derivative of h() at the mode of p(yt |xt ), are as follows:

1
x 2
1
1 xt,1
x 2 2
1+ xt,1
xt,2
1+ xt,1
xt,2
Ht = t,2 t,2 . (29)
2 2 0.5 2 2 0.5
( xt,1 + xt,2 ) xt,1 ( xt,1 + xt,2 ) xt,2
ACCEPTED MANUSCRIPT
19
ACCEPTED MANUSCRIPT
Given xt,1:2 and xt1 , the velocity and acceleration xt,3:6 follows a degenerated distribution p(xt,3:6 |xt,1:2 , xt1 ),
and xt,3:6 can be computed deterministically.
(i) f,(i)
For the APF, we sample the position vector xt,1:2 by combining xt,1:2 from the augmented state
density p(xtf |xt1
(i) l,(i)
) and xt,1:2 from the proposal (28) through the linear combination in (6). Because
the variance Qt , which appears in the coefficient of the linear combination, is infinity for the Cauchy
state noise, we truncate the Cauchy density in (24) on its 99.99% highest density region to estimate
Qt . The augmented state density p(xtf |xt1 ) follows the dynamics in (22), but we set q for the
density of Cauchy noise in (24) to be a large value (q = 1000 in our simulation). This is because
the Cauchy state noise is heavy-tailed, and we want to flatten the density of the augmented state
vector to cover the tail region with high probability. The augmentation is only for the position
vector xt,1:2 , because the velocity and acceleration xt,3:6 can be computed deterministically given
xt,1:2 and xt1 . The APF importance weight at t is
(i) (i) (i)
N(yt |h(xt,1:2 ), R)p(xt,1:2 |xt1 )
w(i)
t = l,(i) 1
(i)
wt1 . (30)
N(xt,1:2 |h (yt ), t )
We did not consider multiple matching for the IPF and APF in this example because we focus on
comparing the performance of different proposal distributions.
In our simulation, the initial distribution at = 0 of the position, velocity, and acceleration of
the target is N((50000, 5000, 0, 10, 0, 0)0 , I6 ). We sample the true trajectory of the target from (22)
with = 0.01. The observations arrive every 3.75 seconds, so for the implementation we use the
dynamics in (25) with = 3.75. We track the target for the first 375 seconds, so the total time
step T is 100. We set = 1000 in (23), q = 1 in (24), and the number of particles N = 1000. We
consider three different values of {0.01, 1, 100}. In each setting, the experiment is repeated 100
times. The results are given in Table 5. The APF has smaller RMSEs than the other four methods,
which shows that balancing the information from the past particles and the current observation in
the APF works well for the tracking problem.
ACCEPTED MANUSCRIPT
20
ACCEPTED MANUSCRIPT
6.3 Linear Gaussian Models
In this example, we consider a slightly higher dimensional SSM with linear state and observation
equations and Gaussian additive noise:
yt |xt = Hxt + ut , ut N(0, R);
(31)
xt |xt1 = xt1 + vt , vt N(0, Q).
Both the dimension of xt and yt are set to be 50, and the time step is considered up to T = 300.
The state vector is assumed to evolve according to a random walk which is a popular choice when
the state dynamics is unknown. The matrix H is very sparse, which is often the case in high di-
mensional SSMs. We generated the matrix H as follows. In each row, we first chose a position
randomly from that row, and then that position together with its two neighboring positions (one
neighboring position if the initial position is at the edge) are nonzero components of that row, and
all other components in that row are 0. The value of each nonzero component is drawn from the
standard normal distribution. The matrix H used for the experiment is given in Figure 3. The co-
variance matrix Q for the state noise is chosen to be a banded matrix with 4 on diagonal, 0.5 on the
first off-diagonal, 0.1 on the second off-diagonal, and 0 for all other components. The covariance
matrix R for the observation noise is 14 I50 , where the subscript 50 indicates the dimension of the
identity matrix.
We compared five methods: the APF, IPF, NPF, OPF, and the ensemble Kalman filter (EnKF)
(i)
which is proposed by Evensen (1994). The proposal for the NPF is N(xt |xt1 , Q), and its importance
weight is wt(i) = N(yt |Hxt(i) , 14 I50 )wt1
(i)
. The sampling procedure for the APF is given in Section 5.1
with ft (xt1 ) = xt1 . The forecast particle is generated from the state equation, and the likelihood
particle based on the observation equation is
!
l 0 0 1 1 0 1
ql (xt |yt ) = N H (HH ) yt , (H H) ,
4
which will also be used as the IPF proposal. We implemented the IPF with no multiple matching,
so its importance weight is
w(i) (i) (i) (i)

t = N(xt |xt1 , Q)wt .
ACCEPTED MANUSCRIPT
21
ACCEPTED MANUSCRIPT
The proposal for the OPF is

(i) (i) (i) 0 1 (i)
q(xt |yt , xt1 ) = N xt t 4Ht yt + Qt xt1 , t ,
where t = (4H 0 H + Q1 1
t ) , and the importance weight is
!1

1 1

0 (i) 0 (i)
w(i)
t = exp
(yt Hx (i)
) I 50 + HQH (yt Hx )
w . (32)
2 t1
4 t1 t1
The number of particles N for each experiment is set to be 100, and the experiment is repeated
for K = 1, 000 times. The results of different algorithms without and with resampling are given
in Tables 6 and 7 respectively. The NPF and IPF without resampling cannot give stable results, so
they are not included in Table 6. From the results, we can see that the performance of the APF
is close to the OPF, and both of them have significantly smaller RMSEs than the other methods,
including the EnKF. The NPF and IPF give quite large RMSEs which indicates that it could be
very risky to utilize only one equation in the SSM to construct the proposal distribution.
6.4 Lorenz-96 Model
The Lorenz-96 model (Lorenz, 2006) is used to study high-dimensional chaotic dynamics such as
the atmosphere. It is a SSM continuous in time with spatially discrete state space. The Lorenz-96
model is defined by a set of differential equations over time :
d, j
g j (, j ) = (,( j+1 mod k) ,( j2 mod k) ),( j1 mod k) , j + F, j = 1, . . . , k,
d
where , j can represent some atmospheric quantity (e.g., pressure, density, or moisture) at the j-th
position on a single latitude circle at time . After linearizing Lorenz-96 model via the fourth order
Runge-Kutta method with time step = 0.05, we have:
1
+, j f j (, j ) , j + (k1, j + 2k2, j + 2k3, j + k4, j ), j = 1, . . . , k, (33)
6
ACCEPTED MANUSCRIPT
22
ACCEPTED MANUSCRIPT
where
k1, j = g j (, j ),
1
k2, j = g j (, j + k1, j ),
2
1
k3, j = g j (, j + k2, j ),
2
k4, j = g j (, j + k3, j ).
The spatial relation in the state space is given as ,0 := ,k , ,1 := ,k1 , and ,k+1 :=
,1 . For notational convenience, let xt+1, j denote t, j which is the j-th spatial component of the
discretized state vector at time t. Then, we have the discretized nonlinear state equation after
adding random perturbations to (33):

xt = f1 (xt1,1 ), . . . , fk (xt1,k ) 0 + vt , (34)
where vt is Gaussian noise with mean 0 and a banded covariance matrix Q (see Figure 4) which
incorporates the discretization error and the spatial correlation into the model.
The dimension of xt is k = 40, and the constant F = 8. We consider two different observation
equations:
1. full observation: yt, j = xt, j + ut, j for j = 1, 2, . . . , k,
2. half observation: yt, j = xt,2 j1 + ut, j for j = 1, 2, . . . , k/2,
where ut, j follows independent N(0, 0.052 ) distribution.

Four algorithms are implemented for this model: the APF, IPF, NPF and OPF. For the full
observation case, the dimension of yt is 40, and the proposal based on the observation equation is
chosen to be ql (xtl |yt ) = N(yt , 0.052 I40 ) for the IPF and APF. The importance weight of the IPF is
f,(i)
w(i) (i) (i) (i)
t = p(xt |xt1 )wt1 . The APF generates xt from the state equation with a tapered covariance
matrix Q = I40 Q. The APF particles are combined as
!1 !
(i) 1 1 1 l,(i) 1 f,(i)
xt = I40 + Q x + Q xt ,
0.052 0.052 t
and the importance weight is
N(yt |xt(i) , 0.052 I40 )p(xt(i) |xt1
(i)
)
w(i)
t = (i)
wt1 .
N(xtl,(i) |yt , 0.052 I40 )
ACCEPTED MANUSCRIPT
23
ACCEPTED MANUSCRIPT
For the half observation case, the dimension of yt is 20, and we can partition the state space
into two parts: 1) xt,A := {xt,2 j1 } j=1,2,...,k/2 ; and 2) xt,B := {xt,2 j } j=1,2,...,k/2 . The second state vector xt,B
does not depend on yt . Hence, when we implement the APF, only xt,A will be updated by yt , and
xt,B will be updated through the state equation. That is, the final particle xt(i) = (xt,A
(i) (i)
, xt,B ) at time t is
!1 !
1 1 l,(i)
(i)
xt,A = I20 + Q1
A
1 f,(i)
x + QA xt,A ;
0.052 0.052 t,A
(i) f,(i)
xt,B = xt,B ,
where QA = I20 QA and QA is the sub-matrix of Q whose components correspond to xt,A . Both
f f l
xt,A and xt,B are generated from the augmented state dynamics. The proposal for xt,A based on the
l
observation equation is ql (xt,A |yt ) = N(yt , 0.052 I20 ). The importance weight is

(i)
)pQ (xtf,(i) |xt1
(i)
)
wt(i) = w(i)
t1
q(xt(i) |yt , xtf,(i) )pQ (xtf,(i) |xt1
(i)
)
(i) (i) (i)
p(yt |xt,A )p(xt |xt1 ) (i)
= l,(i)
wt1 .
ql (xt,A |yt )
The proposal distribution for the IPF is chosen to be N(xt,A |yt , 0.052 I20 )p(xt,B |xt1 ), so the IPF
importance weight is wt(i) = p(xt,A
(i) (i)
|xt1 )w(i)
t1 .
For both the full and half observation cases, the NPF utilizes p(xt |xt1 ) as the proposal distri-
bution, and its weight is wt(i) = N(yt |xt(i) , 0.052 Idim(yt ) )w(i)
t1 . The OPF utilizes the proposal in (18)
because the model has Gaussian additive noise and linear observation equation. The importance
weight of the OPF is
( )
1 0 1 (i) 0
wt = exp (yt Hxt1 )(R + HQH ) (yt Hxt1 ) w(i)
(i) (i)
t1 , (35)
2
where R+HQH 0 = 0.052 I20 +QA and Hxt1 = xt1,A for the half observation case, and R+HQH 0 =
0.052 I40 + Q and Hxt1 = xt1 for the full observation case.
The RMSEs of different methods based on 100 independent experiments are presented in Ta-
ble 8. The data for each experiment are generated with the number of time steps T = 1, 200. The
four particle filtering methods (APF, IPF, NPF, and OPF) are implemented with and without re-
sampling, and the number of particles N for each method is chosen to ensure that the computation
ACCEPTED MANUSCRIPT
24
ACCEPTED MANUSCRIPT
time is about the same for all four methods. The results in Table 8 show that the APF performs
better than the other three methods. In high dimensional SSMs, using proposal distributions that
generate highly correlated state vector is not efficient, since it might result in highly skewed im-
portance weights. This partially explains why the NPF performs poorly. The flexible design of the
APF enables us to use the tapered covariance matrix, and such localization technique enhances the
performance of particle filters in high dimensional SSMs. The IPF and OPF work well when the
observations are used to construct the proposal distribution for all components of the state vector
(the full observation case). When some components of the state vector are sampled from a distri-
bution without using the observations (IPF) or when highly correlated state vectors are sampled
from a distribution that contains only partial observations (OPF), the IPF and OPF proposals are
much less efficient than the APF proposal (the half observation case).
7 Discussion
Because the particle filter is a sequential importance sampling algorithm, choosing a good proposal
distribution is the key to the efficiency of the filtering algorithm. In this paper, we develop the aug-
mented particle filter whose proposal distribution combines information from both the observation
and state equations. The APF can be viewed as an extension of the OPF because the APF performs
similarly to the OPF when the OPF is available, but the APF can be applied to more general state
space models than the OPF. The simulation studies show that the APF performs better than the IPF
and NPF in general.
The APF has the potential to be used in high dimensional state space models. Its efficient
proposal combined with other dimension reduction techniques can potentially avoid the curse-of-
dimensionality discussed in Bengtsson et al. (2008).
Appendix A: Proof of Theorem 4.1

f
We first consider the target distribution p(x0:t , x1:t |y1:t ). Recall that given xt1 , xtf depends only
on xt1 through the state equation p(xtf |xt1 ), and it does not depend on other state vectors or
ACCEPTED MANUSCRIPT
25
ACCEPTED MANUSCRIPT
observations. Thus, we have

f
f p(x0:t , x1:t , y1:t )
p(x0:t , x1:t |y1:t ) =
p(yt |y1:t1 )p(y1:t1 )
f
p(yt |xt )p(xt |xt1 )p(xtf |xt1 ) p(x0:t1 , x1:t1 , y1:t1 )
=
p(yt |y1:t1 ) p(y1:t1 )
f
p(yt |xt )p(xt |xt1 )p(xt |xt1 ) f
= p(x0:t1 , x1:t1 |y1:t1 ).
p(yt |y1:t1 )
Qt
The proposal distribution is q(x0(i) ) k=1 q(xk(i) , xkf,(i) |yk , xk1
(i)
). So the importance weight is
(i) f,(i)
p(x0:t , x1:t |y1:t )
wt(i) = (i) Qt
q(x0 ) k=1 q(xk , xkf,(i) |yk , xk1
(i) (i)
)
f,(i)
(i)
)p(xtf,(i) |xt1
(i)
) (i)
p(x0:t1 , x1:t1 |y1:t1 )
(i) Qt1
q(xt(i) , xtf,(i) |yt , xt1
(i)
) q(x0 ) k=1 q(xk , xkf,(i) |yk , xk1
(i) (i)
)
(i)
)p(xtf,(i) |xt1
(i)
)
= (i) f,(i) f,(i) (i)
w(i)
t1
q(xt |yt , xt )p(xt |xt1 )
(i)
)
= w(i)
t1
l,(i) 1 1 1 1
ql (xt |yt )|(t + Qt ) t |
p(yt |xt(i) )p(xt(i) |xt1(i)
) (i)
wt1 .
ql (xtl,(i) |yt )
Therefore the recursive formula for the importance weight in Equation (7) is valid.
Appendix B: Theoretical Properties of the APF

The APF relocates the forecast particle based on the current observation, and utilizes the aug-
mented space to facilitate the calculation of the importance weights. To investigate theoretical
properties of the APF and compare it with the IPF and NPF, it is easier to look at the marginal pro-
posal of xt(i) in the APF which is given in (14). We will show that the marginalized APF proposal
is closer to the optimal one than the IPF and NPF proposals.
ACCEPTED MANUSCRIPT
26
ACCEPTED MANUSCRIPT
We consider the following general non-linear SSM with non-Gaussian measurement noise:
yt |xt = ht (xt , ut ), ut Ft ;
(36)
xt |xt1 = ft (xt1 ) + vt , vt N(0, Qt ).
The APF generates particles as follows. At the initial step t = 0, draw x0(i) from q(x0 ) for i =
1, . . . , N. For t = 1, . . . , T , we repeat the following steps:
1. Draw xtf,(i) from p(xtf |xt1

(i)
).
2. Let xt be a mode of p(yt |xt ), and let ht ( xt , ut ) denote the derivative of ht (xt , ut ) with respect to
xt at xt . Evaluate Ht := E[ht ( xt , ut )| xt ] and Rt := Var[ht ( xt , ut )| xt ].
3. Draw xtl,(i) from a proposal distribution ql (xtl |yt ) = N(xtl | xt , t ), where t = (Ht0 R1 1
t Ht ) .
4. Then, combine the two particles from Steps 1 and 3 as
xt(i) = Qt (t + Qt )1 xtl,(i) + t (t + Qt )1 xtf,(i) .
(i) 1 1 1
The marginalized APF proposal is q(xt |xt1 , yt ) = N(xt |(i)
t , t ), where t = (t + Qt ) and
1 1
(i) (i)
t = t (t xt + Qt ft (xt1 )). For the IPF, we use q(xt |yt ) = N(xt | xt , t ), which is the proposal for
the likelihood particle in the APF. The NPF uses the state equation to generate particles.
In order to compare the APF, NPF and IPF with the optimal proposal, we further assume that
P
p(yt |xt ) can be represented by a mixture of normal densities, i.e., p(yt |xt ) = k pkt N(xt |kt , Rkt ), so
(i) P 1 1 1
that the optimal proposal p(xt |xt1 , yt ) = k pkt N(xt |(i)
kt , Bkt ), where Bkt = Rkt + Qt and kt =
(i)
Bkt R1 1 (i)
kt kt + Bkt Qt f (xt1 ). The mixture normal representation of the optimal proposal does not
mean this form is available for sampling in practice.

(i)
We use the Kullback-Leibler (KL) divergence between the optimal proposal p(xt |xt1 , yt ) and
(i)
the APF, NPF and IPF proposals to compare the three filters. The KL divergence between p(xt |xt1 , yt )
and some other proposal q(xt ) is
(i) (i)
D[p(xt |xt1 , yt )||q(xt )] = E[log p(xt |xt1 , yt )] E[log q(xt )],
ACCEPTED MANUSCRIPT
27
ACCEPTED MANUSCRIPT
(i) P
where the expectation is taken with respect to p(xt |xt1 , yt ) = k pkt N(xt |(i)
kt , Bkt ). The first term on
the right hand side is the same for all three filters, so we focus on the second term. Let DAPF =
(i) (i)
E[log q(xt |xt1 , yt )], DNPF = E[log p(xt |xt1 )], and DIPF = E[log q(xt |yt )]. Then
X1
DAPF = pkt [tr([t ]1 Bkt ) + log |t | + ((i) (i) 0 1 (i) (i)
t kt ) [t ] (t kt )];
k
2
X1
DNPF = pkt [tr(Q1 (i) (i) 0 1 (i) (i)
t Bkt ) + log |Qt | + ( f (xt1 ) kt ) Qt ( f (xt1 ) kt )];
k
2
X1
DIPF = pkt [tr(1 (i) 0 1 (i)
t Bkt ) + log |t | + ( xt kt ) t ( xt kt )].
k
2
There are three terms on the right hand side in each of the above three equations. We first compare
the first two terms, and then compare the last term.
To compare the first two terms, we use the fact that the trace of a matrix A can be approximated
by the Taylor expansion of the matrix exponential up to the first order as tr(A) log |I + A|.
Therefore tr([t ]1 Bkt )+log |t | log |t +Bkt |, tr(Q1 1
t Bkt )+log |Qt | log |Qt +Bkt |, and tr(t Bkt )+
log |t | log |t + Bkt |. Since
t + Bkt = (1 1 1
t + Qt ) + Bkt = Qt + Bkt Qt (t + Qt )1 Qt ,
we have
|t + Bkt | |Qt + Bkt |.
Similarly we have |t + Bkt | |t + Bkt |. So the first two terms in the right hand side of DAPF is less
than or equal to the corresponding two terms in DNPF and DIPF .
P (i) 0 1 (i)
Next, we show the same conclusion for the last term k pkt ((i) (i)
t kt ) [t ] (t kt ) in DAPF .
To make the comparison easier, we assume that for some (0, 1), 1

t = Qt and 1

Rkt = Qt ,
so that (i)
t = (i)
f (xt1 )+(1) xt , kt(i) = (i)
f (xt1 )+(1)kt , and t = Qt . If p(yt |xt ) is symmetric
ACCEPTED MANUSCRIPT
28
ACCEPTED MANUSCRIPT
P
and centered at xt = xt , then k pkt ( xt kt )0 Q1 (i)
t ( xt f (xt1 )) = 0. So we have,
X
pkt (t(i) kt(i) )0 [t ]1 (t(i) kt(i) ) (37)
k
(1 )2 X
= pkt ( xt kt )0 Q1
t ( xt kt );
k
X
(i)
pkt ( f (xt1 ) kt(i) )0 Q1 (i) (i)
t ( f (xt1 ) kt ) (38)
k
X
(i)
= (1 )2 ( xt f (xt1 ))0 Q1 (i)
t ( xt f (xt1 )) + (1 )
2
pkt ( xt kt )0 Q1
t ( xt kt );
k
X
pkt ( xt kt(i) )0 t1 ( xt kt(i) ) (39)
k
(1 )3 X
(i)
= (1 )( xt f (xt1 ))0 Q1 (i)
t ( xt f (xt1 )) + pkt ( xt kt )0 Q1
t ( xt kt ).
k
(i)
In many SSMs with fast evolving hidden processes, particles f (xt1 ) do not tightly distributed
(1)2 P
around xt , so At ( xt f (xt1 ))0 Q1
(i) (i) (i) (i)
t ( xt f (xt1 )) is often much greater than Bt k pkt ( xt
kt )0 Q1 (i)
t ( xt kt ). In particular, if At
1
B(i) ,
1 t
then the last term of DAPF in (37) is less than or
equal to the corresponding terms of DNPF and DIPF in (38) and (39). Combining this conclusion
with the one for the first two terms in DAPF , we have DAPF min(DNPF , DIPF ), which implies that
the KL divergence between the optimal proposal and the APF proposal is the smallest among the
three filters.
Appendix C: Proof of Proposition 5.1

The particle xt(i) in the APF is a linear combination of xtf,(i) and xtl,(i) as given in (17). Steps 1 and 2
of the APF indicate that xtf,(i) follows N( ft (xt1
(i)
), Qt ) and xtl,(i) follows N(Ht0 (Ht Ht0 )1 yt , t ), and the
particles xtf,(i) and xtl,(i) are generated independently, therefore xt(i) , which is a linear combination
of xtf,(i) and xtl,(i) , also follows a normal distribution. So we only need to compute the mean and
variance of the marginal distribution of xt(i) given yt and xt1
(i)
as follows.
ACCEPTED MANUSCRIPT
29
ACCEPTED MANUSCRIPT
E(xt(i) |yt , xt1

(i)
) = E{(1 1 1 1 l,(i)
t + Qt ) (t xt + Q1
t xt
f,(i) (i)
)|yt , xt1 }
= (1 1 1 0 1 0 0 1 1 (i)
t + Qt ) {Ht Rt Ht Ht (Ht Ht ) yt + Qt ft (xt1 )}
= (1 1 1 0 1 1 (i)
t + Qt ) {Ht Rt yt + Qt ft (xt1 )}
= t(i) ;
Var(xt(i) |yt , xt1

(i)
) = Var{(1 1 1 1 l,(i)
t + Qt ) (t xt + Q1
t xt
f,(i) (i)
)|yt , xt1 }
= (1 1 1 1 1 1 1 1 1 1
t + Qt ) {t t t + Qt Qt Qt }(t + Qt )
= (1 1 1
t + Qt )
= t .

Thus, the marginal proposal distribution of q(xt(i) |yt , xt1
(i)
) in the APF is N((i)
t , t ), which is the
same as the OPF proposal (18).
ACCEPTED MANUSCRIPT
30
ACCEPTED MANUSCRIPT
References
Bengtsson, B., Bickel, P., and Li, B. (2008). Curse-of-dimensionality revisited: Collapse of the particle
filter in very large scale systems. Probability and Statistics: Essays in Honor of David A. Freedman,
2:316334.
Bertino, L., Evensen, G., and Wackernagel, H. (2003). Sequential data assimilation techniques in oceanog-
raphy. International Statistical Review, 71(2):223241.
Butala, M., Frazin, R., Chen, Y., and Kamalabadi, F. (2009). Tomographic imaging of dynamic objects with
the ensemble Kalman filter. IEEE Transactions on Image Processing, 18:15731587.
Chorin, A., Morzfeld, M., and Tu, X. (2010). Implicit particle filters for data assimilation. Communications
in Applied Mathematics and Computational Science, 5(2):221240.
Chorin, A. J. and Tu, X. (2009). Implicit sampling for particle filters. Proceedings of the National Academy
of Sciences, 106(41):1724917254.
Doucet, A., Godsill, S., and Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian
filtering. Statistics and Computing, 10(3):197208.
Doucet, A. and Gordon, N. J. (1999). Simulation-based optimal filter for maneuvering target tracking.
Signal and Data Processing of Small Targets, 3809(1):241255.
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic
Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press.
Elliott, R. J., Aggoun, L., and Moore, J. B. (1995). Hidden Markov Models: Estimation and Control. New
York: Springer.
Evensen, G. (1994). Data assimilation: The Ensemble Kalman Filter. Springer.
Gordon, N., Salmond, D., and Ewing, C. (1995). Bayesian state estimation for tracking and guidance using
the bootstrap filter. Journal of Guidance, Control, and Dynamics, 18:14341443.
Gordon, N., Salmond, D., and Smith, A. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state
estimation. IEE Proceedings F: Radar and Signal Processing, 140(2):107113.
ACCEPTED MANUSCRIPT
31
ACCEPTED MANUSCRIPT
Ikoma, N., Ichimura, N., Higuchi, T., and Maeda, H. (2001). Maneuvering target tracking by using particle
filter. In Joint 9th IFSA World Congress and 20th NAFIPS International Conference, volume 4, pages
22232228.
Kotecha, J. H. and Djuric, P. M. (2003). Gaussian particle filtering. IEEE Transactions on Signal Processing,
51(10):25922601.
Lin, M., Zhang, J., Cheng, Q., and Chen, R. (2005). Independent particle filters. Journal of the American
Statistical Association, 100(472):14121421.
Liu, J. S. (2001). Monte Carlo Strategies for Scientific Computing. New York: Springer.
Liu, J. S. and Chen, R. (1998). Sequential Monte Carlo methods for dynamic systems. Journal of the
American Statistical Association, 93(443):10321044.
Lorenz, E. (2006). Predictability: A problem partly solved. Predictability of Weather and Climate, Cam-
bridge University Press, 1:4058. Originally presented in a 1996 ECMWF workshop.
Pitt, M. K. and Shephard, N. (1999). Filtering via simulation: Auxiliary particle filters. Journal of the
American Statistical Association, 94(446):590590.
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition.
Proceedings of IEEE, 77:257284.
Tsay, R. S. (2002). Analysis of Financial Time Series. New York: Wiley.
van der Merwe, R., Doucet, A., de Freitas, N., and Wan, E. (2000). The unscented particle filter. In Neural
Information Processing Systems, pages 584590.
Wan, E. A. and van der Merwe, R. (2000). The unscented Kalman filter for nonlinear estimation. In IEEE
2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (AS-SPCC),
pages 153158.
Xiao, C. and Hill, D. J. (1996). Stability and absence of overflow oscillations for 2-D discrete-time systems.
IEEE Transactions on Signal Processing, 44(8):21082110.
ACCEPTED MANUSCRIPT
32
ACCEPTED MANUSCRIPT
y 1 y 2
x0 x1 x2
Figure 1: A graphical illustration of the state space model.
ACCEPTED MANUSCRIPT
33
ACCEPTED MANUSCRIPT
y 1 y 2
x0 x1 x2
x1f x2 f x3 f
Figure 2: An illustration of the augmented state space model.
ACCEPTED MANUSCRIPT
34
ACCEPTED MANUSCRIPT
50
2
45
1.5
40
1
35
0.5
30
0
25
-0.5
20
-1
15
-1.5
10
-2
5
-2.5
5 10 15 20 25 30 35 40 45 50
Figure 3: The sparse matrix H used in model (31).
ACCEPTED MANUSCRIPT
35
ACCEPTED MANUSCRIPT
1
10
15
20
0.5
25
30
35
40
0
5 10 15 20 25 30 35 40
Figure 4: A graphical representation of 1

2
Q. Each component is converted to the gray scale with
white being 1 and black being 0.
ACCEPTED MANUSCRIPT
36
ACCEPTED MANUSCRIPT
Table 1: The number of particles and the number of matchings chosen for each method in the
simulations to ensure approximately the same computation time.
Method CPU time (sec) # of particles # of matchings
APF 0.062 1,075 1
IPF 0.062 1,150 1
APF 0.063 435 3
IPF 0.063 670 6
NPF 0.063 1,850 1
Aux 0.063 1,475 1
EKF 0.012
UKF 0.028
ACCEPTED MANUSCRIPT
37
ACCEPTED MANUSCRIPT
Table 2: The RMSEs and their standard errors (in parentheses) of the six algorithms for different
combinations of and .
=2 =4 =8 = 16
APF 3.481 (0.008) 4.931 (0.008) 8.783 (0.014) 17.364 (0.025)
IPF 3.697 (0.008) 5.120 (0.008) 8.902 (0.014) 17.431 (0.025)
NPF 3.514 (0.008) 5.353 (0.013) 10.331 (0.024) 22.508 (0.046)
= 0.125
Aux 6.389 (0.019) 7.575 (0.020) 11.210 (0.024) 21.430 (0.039)
EKF 28.045 (0.166) 43.883 (0.171) 89.939 (0.396) 225.391 (1.527)
UKF 24.305 (0.072) 27.575 (0.088) 39.297 (0.186) 113.148 (0.797)
APF 3.609 (0.008) 4.981 (0.008) 8.802 (0.014) 17.382 (0.026)
IPF 3.815 (0.008) 5.161 (0.008) 8.916 (0.014) 17.455 (0.026)
NPF 3.577 (0.008) 5.175 (0.011) 9.679 (0.021) 21.230 (0.045)
= 0.25
Aux 5.143 (0.015) 6.820 (0.017) 10.959 (0.022) 21.348 (0.039)
EKF 23.581 (0.131) 32.943 (0.150) 64.335 (0.270) 166.614 (0.956)
UKF 17.176 (0.039) 21.391 (0.054) 31.897 (0.106) 88.747 (0.457)
APF 3.769 (0.007) 5.081 (0.008) 8.848 (0.014) 17.330 (0.025)
IPF 3.997 (0.008) 5.253 (0.008) 8.961 (0.014) 17.403 (0.025)
NPF 3.702 (0.007) 5.162 (0.010) 9.342 (0.019) 19.988 (0.042)
= 0.5
Aux 4.398 (0.012) 6.361 (0.015) 10.730 (0.020) 21.217 (0.037)
EKF 19.983 (0.092) 26.062 (0.137) 44.953 (0.199) 117.943 (0.537)
UKF 12.543 (0.025) 15.986 (0.034) 25.487 (0.066) 69.504 (0.264)
APF 4.015 (0.007) 5.249 (0.008) 8.861 (0.014) 17.359 (0.026)
IPF 4.241 (0.007) 5.420 (0.008) 8.979 (0.014) 17.425 (0.025)
NPF 3.908 (0.006) 5.257 (0.008) 9.136 (0.017) 19.076 (0.038)
=1
Aux 4.117 (0.008) 6.006 (0.014) 10.469 (0.019) 21.117 (0.036)
EKF 15.390 (0.062) 22.274 (0.102) 31.815 (0.138) 80.913 (0.330)
UKF 9.522 (0.019) 11.447 (0.024) 19.331 (0.044) 53.772 (0.143)
ACCEPTED MANUSCRIPT
38
ACCEPTED MANUSCRIPT
Table 3: The RMSEs and their standard errors (in parentheses) of the APF (top) and IPF (bottom)
with resampling and multiple matching for different combinations of and .
=2 =4 =8 = 16
APF: 3.468 (0.004) 4.959 (0.005) 8.822 (0.008) 17.387 (0.015)
= 0.125
IPF: 3.660 (0.004) 5.109 (0.005) 8.905 (0.008) 17.422 (0.015)
APF: 3.593 (0.004) 5.029 (0.005) 8.854 (0.008) 17.379 (0.015)
= 0.25
IPF: 3.770 (0.004) 5.148 (0.005) 8.912 (0.008) 17.409 (0.015)
APF: 3.780 (0.004) 5.149 (0.005) 8.901 (0.008) 17.376 (0.015)
= 0.5
IPF: 3.941 (0.004) 5.247 (0.005) 8.946 (0.008) 17.401 (0.015)
APF: 4.017 (0.004) 5.320 (0.005) 8.952 (0.008) 17.412 (0.015)
=1
IPF: 4.167 (0.004) 5.399 (0.004) 8.979 (0.008) 17.416 (0.015)
ACCEPTED MANUSCRIPT
39
ACCEPTED MANUSCRIPT
Table 4: The RMSEs and their standard errors (in parentheses) of the UPF (top) and ImPF (bottom)
for different combinations of and .
=2 =4 =8 = 16
UPF 6.630 (0.053) 8.110 (0.044) 13.328 (0.053) 24.963 (0.083)
= 0.125
ImPF 10.582 (0.007) 11.163 (0.011) 13.504 (0.018) 20.851 (0.033)
UPF 5.141 (0.040) 6.755 (0.038) 12.187 (0.051) 24.600 (0.087)
= 0.25
ImPF 10.460 (0.010) 11.136 (0.015) 13.924 (0.027) 22.201 (0.054)
UPF 4.523 (0.028) 6.011 (0.032) 10.909 (0.046) 23.631 (0.086)
= 0.5
ImPF 11.256 (0.027) 12.428 (0.028) 15.054 (0.036) 23.712 (0.069)
UPF 4.216 (0.017) 5.656 (0.023) 10.007 (0.038) 22.137 (0.084)
=1
ImPF 8.581 (0.052) 12.688 (0.043) 16.681 (0.048) 25.867 (0.075)
ACCEPTED MANUSCRIPT
40
ACCEPTED MANUSCRIPT
Table 5: The RMSEs and their standard errors of five filtering methods for model (25).
= 0.01 RMSE se(RMSE) CPU Time (sec)
APF 1.770 1.101 1.512
IPF 3.183 1.069 1.053
NPF 2820.432 431.089 0.926
UPF 4.889 2.924 282.140
ImPF 188.753 6.125 21991.150
=1
APF 1.925 1.075 1.591
IPF 5.670 0.981 1.083
NPF 2590.967 1215.372 0.995
UPF 3.547 0.803 281.150
ImPF 28.650 1.325 28973.821
= 100
APF 9.494 1.216 1.535
IPF 43.987 2.565 1.056
NPF 520.428 349.574 0.966
UPF 11.042 0.595 274.880
ImPF 16.566 1.104 29715.241
ACCEPTED MANUSCRIPT
41
ACCEPTED MANUSCRIPT
Table 6: The RMSEs and their standard errors of the APF, OPF and EnKF without resampling for
model (31).
RMSE se(RMSE) CPU Time (sec)
APF 38.704 0.364 11.540
OPF 37.582 0.355 7.234
EnKF 46.098 0.534 3.578
ACCEPTED MANUSCRIPT
42
ACCEPTED MANUSCRIPT
Table 7: The RMSEs and their standard errors of four filtering methods with resampling for model
(31).
RMSE se(RMSE) CPU Time (sec)
APF 35.870 0.411 11.943
OPF 35.156 0.450 7.537
IPF 941.913 8.519 5.489
NPF 101.322 0.506 2.683
ACCEPTED MANUSCRIPT
43
ACCEPTED MANUSCRIPT
Table 8: The RMSEs and their standard errors of different methods for model (34).
Half Observation RMSE se(RMSE) N CPU Time (sec)
APF (without resampling) 5.230 0.043 50 1.460
IPF (without resampling) 18.341 0.064 114 1.477
NPF (without resampling) 32.113 0.016 155 1.491
OPF (without resampling) 25.403 0.028 120 1.513
APF (with resampling) 3.923 0.040 50 1.578
IPF (with resampling) 17.035 0.073 110 1.588
NPF (with resampling) 31.453 0.017 150 1.629
OPF (with resampling) 25.578 0.028 110 1.590
Full Observation RMSE se(RMSE) N CPU Time (sec)
APF (without resampling) 0.368 0.000 50 1.579
IPF (without resampling) 0.447 0.000 175 1.641
NPF (without resampling) 32.109 0.016 175 1.693
OPF (without resampling) 0.730 0.031 125 1.608
APF (with resampling) 0.360 0.001 50 1.682
IPF (with resampling) 0.396 0.001 160 1.669
NPF (with resampling) 31.265 0.018 155 1.711
OPF (with resampling) 0.546 0.032 120 1.677
ACCEPTED MANUSCRIPT
44

Augmented Particle Filters

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Augmented Particle Filters

Transféré par

Droits d'auteur :

Formats disponibles

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20

Augmented Particle Filters

Jonghyun Yun, Fan Yang & Yuguo Chen

To link to this article: http://dx.doi.org/10.1080/01621459.2015.1135803

Accepted author version posted online: 02

Submit your article to this journal

Article views: 175

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Augmented Particle Filters

yt = ht (xt , ut ), observation equation;

xt = ft (xt1 , vt ), state equation,

2 Review of Particle Filters

following decomposition of the posterior distribution:

p(yt |xt(i) )p(xt(i) |xt1

sampling implies that

In addition, we can approximate the target posterior as

Doucet et al. (2000).

q(xt |yt , xt1 ) = p(xt |xt1 ),

q(xt |yt , xt1 ) = q(xt |yt ).

q(xt |yt , xt1 ) = p(xt |yt , xt1 ),

3 The Augmented Particle Filter

yt |xt = ht (xt , ut ), the observation equation;

xt |xt1 = ft (xt1 , vt ), the state equation; (4)

xtf |xt1 = ft (xt1 , vtf ), the augmented state equation,

t = 1, . . . , T , we repeat the following steps:

1. Draw xtf,(i) (forecast particle) from p(xtf |xt1

3. Let ht ( xt , ut ) denote the derivative of ht (xt , ut ) with respect to xt at xt , where xt is a temporary

4. Evaluate Qt := Var[ f ( xt1 , vtf )| xt1 ].

= Qt (t + Qt )1 xtl,(i) + t (t + Qt )1 xtf,(i) . (6)

p(yt |xt(i) )p(xt(i) |xt1

1. If p(yt |xt ) is proportional to a probability density function of xt when viewed as a function

We resample xtl,(i) with probability proportional to wtl,(i) , so xtl,(i) approximately follows a

4. If Ht , Rt , or Qt in Steps 3 and 4 cannot be computed analytically, we can estimate them by

compute xt(i) . If Ht0 R1

compute with (6).

reduce the variance of the importance weight. Let Km = (km,1 , . . . , km,N ), m = 1, . . . , M, be

and the importance weight is

4 Theoretical Justification of the APF

q(xt(i) , xtf,(i) |yt , xt1

q(xt(i) |yt , xtf,(i) ) can be written as

q(xt(i) |yt , xtf,(i) ) =q{(1 1 1 1 l,(i)

=ql ((1 1 1 1 l,(i)

The constant |(1 1 1 1 1

Theorem 4.1. In the APF algorithm, the importance weight w(i)

p(yt |xt(i) )p(xt(i) |xt1

q(xt(i) |yt , xt1

where xtf,( j) , j = 1, . . . , L, are i.i.d. samples from p(xtf |xt1

5 The APF for SSMs with Additive Errors

yt |xt = ht (xt ) + ut , ut (0, Rt );

xt |xt1 = ft (xt1 ) + vt , vt (0, Qt ),

Qt = Var( ft ( xt1 ) + vt | xt1 ) = Var(vt ) = Qt .

t = 1, . . . , T , we repeat the following steps:

1. Draw xtf,(i) from p(xtf |xt1

2. Draw xtl,(i) from a proposal distribution ql (xtl |yt ).

xt(i) = Qt (t + Qt )1 xtl,(i) + t (t + Qt )1 xtf,(i) .

5. Calculate the importance weight as

p(yt |xt(i) )p(xt(i) |xt1

xt |xt1 = ft (xt1 ) + vt , vt N(0, Qt ).

repeat the following steps.

1. Draw xtf,(i) by xtf,(i) = ft (xt1

4. The importance weight is