MultiSLEX JASA2005

SLEX Analysis of Multivariate Nonstationary
Time Series
Hernando O MBAO, Rainer VON S ACHS, and Wensheng G UO
We develop a procedure for analyzing multivariate nonstationary time series using the SLEX library (smooth localized complex exponentials), which is a collection of bases, each basis consisting of waveforms that are orthogonal and time-localized versions of the Fourier
complex exponentials. Under the SLEX framework, we build a family of multivariate models that can explicitly characterize the timevarying spectral and coherence properties. Every model has a spectral representation in terms of a unique SLEX basis. Before selecting
a model, we first decompose the multivariate time series into nonstationary components with uncorrelated (nonredundant) spectral information. The best SLEX model is selected using the penalized log energy criterion, which we derive in this article to be the KullbackLeibler
distance between a model and the SLEX principal components of the multivariate time series. The model selection criterion takes into
account all of the pairwise cross-correlation simultaneously in the multivariate time series. The proposed SLEX analysis gives results that
are easy to interpret, because it is an automatic time-dependent generalization of the classical Fourier analysis of stationary time series.
Moreover, the SLEX method uses computationally efficient algorithms and hence is able to provide a systematic framework for extracting
spectral features from a massive dataset. We illustrate the SLEX analysis with an application to a multichannel brain wave dataset recorded
during an epileptic seizure.
KEY WORDS: Multivariate nonstationary time series; SLEX model; SLEX principal components; SLEX transform; Time-frequency
analysis; Time-varying spectral matrix.
1. INTRODUCTION
Many empirical multivariate time series are nonstationary.
For example, electroencephalograms (EEGs) recorded during
an epileptic seizure display amplitudes and spectral distribution that vary over time (Fig. 1). These EEGs cannot be analyzed adequately using classical Fourier methods because their
autospectral and cross-spectral properties change with time.
In addition to nonstationarity, the massiveness of EEG datasets
presents yet another level of complexity, because it is not uncommon to record EEGs at dozens of channels, each with
hundreds of thousands of observations. Finally, these EEGs
can be highly correlated, and hence one would need to develop a methodology that can efficiently extract nonredundant
information.
The goal of this article is to develop a systematic framework
for analyzing multivariate nonstationary time series and provide theoretical justification so that we can perform rigorous
analysis of the dataset. The primary tool for our work is the
SLEX (smooth localized complex exponentials) library, which
is a collection of bases, each of which consists of SLEX waveforms that are time-localized and orthogonal generalizations of
the Fourier complex exponentials.
The first task is to develop a family of models, using the
SLEX library, that can explicitly characterize the time-changing
spectral and coherence features of the multivariate time series. Every model in the family has a spectral representation
in terms of a unique SLEX basis. Thus the SLEX analysis is
an appealing approach to nonstationary time series, because it
is a time-dependent analogue of Fourier analysis for stationary
processes.
Hernando Ombao is Assistant Professor, Department of Statistics and Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL 61822 (E-mail: ombao@uiuc.edu). His research was supported in part
by National Science Foundation grants DMS-01-02511 and DMS-04-05243.
Rainer von Sachs is Professor, Institut de Statistique, Universit Catholique
de Louvain, Belgium. His research was supported in part by the Belgian government via the Projet dActions de Recherche Concertes No. 98/03-217
and the IAP research network nr. P5/24. Wensheng Guo is Assistant Professor, Department of Biostatistics, University of Pennsylvania. His research was
supported in part by National Institutes of Mental Health grant 62298 and
CA 84438. The authors thank the editor, associate editor, and referees for the
constructive comments and suggestions.
Next we develop a criterion for selecting the best SLEX

model (or, equivalently, segmentation of the multivariate time
series). Before model selection, we address the problem of
high dimensionality and multicollinearity in the multivariate
time series. Our approach is not to perform many separate
pairwise bivariate analyses, because this can be overwhelming to the user. Moreover, it does not provide the big picture
of how all components of the multivariate time series simultaneously interact with each other. Rather, our approach is to
systematically extract a set of nonredundant spectral information that will be used in model selection and further analysis.
This leads to the first main contribution of this article. We develop a method of feature extraction by applying a time-varying
eigenvalueeigenvector decomposition of the SLEX spectral
density matrix. The outputs of this step are the zero-coherency
SLEX principal components whose spectra vary over time.
Our model selection criterion is based on log energy, which
we derive, as the second main contribution of this article, to
be the KullbackLeibler (KL) distance between a SLEX model
and the unknown process that generated the time series dataset.
The nonredundant spectral information of the SLEX principal
components will be used in model selection. Thus the criterion
takes into account the full information in the spectral matrix,
that is, the complete autocorrelation and cross-correlation structure in the signals. In practice, the log energy criterion tends to
select the model with very small time blocks. To prevent this
unnecessary splitting of stationary blocks from happening too
often, we add a complexity penalty, resulting in a complexitypenalized log energy criterion.
We now highlight some work in the statistics and signal processing literature that uses localized libraries. Nason,
von Sachs, and Kroisandt (NvSK) (2000) proposed a model
that uses wavelets as stochastic building blocks. They defined
a wavelet spectra that is a time-scale decomposition of the nonstationary signal. Ombao, Raz, von Sachs, and Guo (ORvSG)
(2002) developed a model that gives a spectral representation using the SLEX waveforms. The SLEX model admits a
519
2005 American Statistical Association

Journal of the American Statistical Association
June 2005, Vol. 100, No. 470, Theory and Methods
DOI 10.1198/016214504000001448
520
Journal of the American Statistical Association, June 2005
Figure 1. EEG Recorded During an Epileptic Seizure. T = 8,192. Sampling rate is 100 Hz. Only 8 EEG plots are shown, although the analysis
was conducted on the dataset that consists of p = 18 channels. The EEG plots in the left column are recordings from the left side of the brain:
T3 (left temporal lobe), F3 (left frontal), C3 (left central), and P3 (left parietal). The plots in the right column are recordings from the right side of the
brain: T4 (right temporal lobe), F4 (right frontal), C4 (right central), and P4 (right parietal).
spectrum that is a decomposition of power across time and

frequency. Thus it is a time-dependent analog of the Crmer
representation for stationary processes. Both models are univariate. In this article we develop a multivariate model that
takes into account the cross-correlation (coherence) structure
of the signals. The NvSK and ORvSG models assume that the
basis for signal representation is already known. In this article we develop a data-adaptive procedure for selecting the best
basis. Our criterion may be viewed as a generalization of that
used by Ombao, Raz, von Sachs, and Malow (ORvSM) (2001).
However, our current approach is not based on several separate
bivariate analyses. It explicitly takes into account the coherence
between all components of the multivariate time series simultaneously.
The application of the KL criterion, under the context of
localized libraries, is not confined to model selection. Saito
(1994a) developed a KL criterion for selecting a localized discriminant basis (LBD) that can best illuminate the differences
between classes of time series. Huang, Ombao, and Stoffer
(2004) adapted the LDB idea to the SLEX library and developed a classification criterion that is consistent. We also
mention the minimum description length (MDL) principle as
a possible criterion for model selection for multivariate time series. The MDL principle has been used in the contexts of noise
suppression and signal compression by Saito (1994b) and signal estimation by Moulin (1996).
The rest of the article is organized as follows. In Section 2
we discuss the essential ideas of the SLEX library. In Section 3
we present our new multivariate nonstationary SLEX model. In

Section 4 we develop the theoretical motivation for our model
selection criterion, and in Section 5 discuss the SLEX principal
components analysis (PCA) as a solution to the inherent problem of high dimensionality. In Section 6 we present the complete algorithm for selecting a model and estimating the spectral
density matrix. In Section 7, we illustrate the proposed SLEX
analysis by application to an 18-channel epileptic seizure EEG
dataset.
2. THE SLEX TRANSFORM
2.1 The SLEX Waveforms
Fourier analysis is the classical approach to investigating the spectral properties of stationary time series. The
Fourier basis functions, however, cannot adequately represent
processes whose spectral properties evolve with time. The historical approach to studying nonstationary time series is to
use tapered or windowed Fourier exponentials (Daubechies
1992). A windowed Fourier function is of the form F (u) =
(u) exp(i2u), where is a taper with compact support
and (1/2, 1/2]. Although the windowed Fourier functions are localized in time, they are in general nonorthogonal.
In fact, the BalianLow theorem says that no smooth window
exists such that the windowed Fourier basis functions are simultaneously orthogonal and localized (Wickerhauser 1994).
Orthogonality is an important property. It makes models mathematically elegant and facilitates the theoretical development
Ombao, von Sachs, and Guo: SLEX Analysis
521
of a model. Moreover, orthogonal transforms preserve the energy of the time series and allow use of the best basis algorithm
(BBA) of Coifman and Wickerhauser (1992), which is computationally efficient and hence facilitates the analysis of massive
datasets.
We develop a procedure that uses the SLEX basis functions that are simultaneously orthogonal and localized in time
and frequency. They evade the BalianLow obstruction because they are constructed by applying a projection operator,
rather than a single taper, on the Fourier waveforms. The details on the construction of a projection operator were given
by Wickerhauser (1994). Ombao (1999) showed that the application of a projection operator is identical to applying two
smooth and compactly supported windows to the Fourier basis
functions. A SLEX basis function (u) has the form
(u) = + (u) exp(i2u) + (u) exp(i2u),
(1)
where (1/2, 1/2], and u [, 1 + ], where 0 < < .5.

The windows + and are constructed by using rising cutoff
functions. These windows come in pairs; that is, once + is
specified, is determined. Plots of the window pairs and the
real and imaginary parts of a SLEX waveform were given by
ORvSM (2001). The SLEX basis functions are generalizations
of the windowed Fourier functions. Note that setting (u) = 0
for all u in (1) reduces the SLEX function to the windowed
Fourier function. The unique feature of the SLEX function is
the second window , which results in waveforms that are
simultaneously localized and orthogonal.
2.2 The SLEX Library
The SLEX library is a collection of bases, each of which
consists of the SLEX waveforms that are localized; thus they
are able to capture the local spectral features of the time series. The SLEX library allows a flexible and rich representation the observed time series. To illustrate these ideas, we
construct a SLEX library in Figure 2 with level J = 2. There
are seven dyadic time blocks in this library. These are S(0, 0),
which covers the entire time series; S(1, 0) and S(1, 1), which
are the two half blocks; and S(2, b), b = 0, 1, 2, 3, which are
the four quarter blocks. Note that in general, for each resolution
level j = 0, 1, . . . , J, there 2j time blocks each with length T/2j .
We adopt the notation S( j, b) to denote the block b on level j
where b = 0, 1, . . . , 2j 1. There are five possible bases from
this particular SLEX library. One particular basis is composed
of blocks S(1, 0), S(2, 2), and S(2, 3), which correspond to the
shaded blocks in Figure 2. We point out that each basis is allowed to have multi-resolution scales, that is, a basis can have
time blocks with different lengths. This is ideal for processes
Figure 2. A SLEX Library With Level J = 2. The shaded blocks represent one basis from the SLEX library.
whose regimes of stationarity have lengths that also vary with

time. In choosing the finest time scale (or deepest level) of
the transform J, the statistician will need some advice from
the scientific expert. For example, neurologists can give some
guidance regarding an appropriate time resolution of EEGs. In
general, the blocks should be sufficiently small so that we can
be confident that the time series is stationary in these blocks.
At the same time, the blocks should not be smaller than what is
necessary to control the variance of the spectral estimator.
2.3 Computing the SLEX Transform
The SLEX transform is a collection of coefficients corresponding to the SLEX waveforms in the SLEX library. We
demonstrate that the SLEX coefficients can be computed using the fast Fourier transform (FFT). Let Xt be one component
of a p-variate time series Xt of length T. The SLEX coefficients
(corresponding to Xt ) on block S( j, b) are defined as

(k ) = (Mj )1/2
Xt j,b,k ,t
dj,b
t
t 0
1/2
= (Mj )
+
Xt exp[i2k (t 0 )]
|S|
t
t 0

+ (Mj )1/2
Xt exp[i2k (t 0 )],
|S|
t
where Mj = |S( j, b)| = T/2j and j,b,k ,t = S( j,b),k ,t is
the SLEX basis vector on block S( j, b) oscillating at frequency k = k/Mj , where k = Mj /2 + 1, . . . , Mj /2. We
also note that the edge blocks in each level j, namely
S( j, 0) and S( j, 2j 1), are padded with 0s when we compute the SLEX transform. Finally, using the FFT, the number
of operations needed to compute the SLEX transform has order
of magnitude O[T(log2 T)2 ].
3. THE MULTIVARIATE SLEX MODEL
To motivate our multivariate SLEX model, we first take
a look at the Cramr (spectral) representation of a 0-mean
p-dimensional stationary time series Xt (t = 0, 1, . . . ), which
is
1/2
A() exp(i2t) dZ().
(2)
Xt =
1/2
The elements in the Cramr representation are the complex

exponentials, which are the building blocks of this stochastic representation; the p p transfer function matrix, A(),
which is Hermitian and constant in time; and the 0-mean
orthonormal random process dZ(). The spectral matrix,
f() = A()A (), where A denotes the complex conjugate
transpose of A, is constant in time. For nonstationary processes,
the transfer function (and hence the spectral density matrix)
is allowed to change over time. Dahlhaus (2001) developed
a time-dependent generalization of the Crmer representation
in (2). In this article we develop the SLEX model for multivariate nonstationary time series, which is complementary to
the Dahlhaus model. However, we place an important consideration in developing a model that allows for computationally
efficient methods of estimation, which is a practical necessity
when dealing with massive datasets. An essential distinction
522
between the Dahlhaus and SLEX models is in the representation, that is, Fourier versus SLEX waveforms. The spectral representation of the SLEX model replaces the Fourier complex
exponentials in (2) by the SLEX basis waveforms. Moreover,
for each T, the SLEX model provides an explicit partitioning of
the time-frequency plane. This enables the construction of estimators that are directly adapted to the model. Furthermore,
the SLEX model allows for inference based on resampling
(as explored in Ombao, von Sachs, and Guo 2002) because
its discrete time-frequency representation or synthesis equation makes it easy to generate nonstationary time series from
any SLEX model. We now formally define the multivariate
1 , . . . , X p ) , t = 0, . . . , T 1}
SLEX model. Let Xt,T = {(Xt,T
t,T
be a mean-0 p-variate nonstationary time
series. Let BT be the
basis in the SLEX library, and define i Si to be the blocks
in BT .

Definition 1. For a given T and basis
i Si BT , the
SLEX model of a 0-mean p-dimensional nonstationary random
process is
Xt,T =
i:
Si BT
Mi
M
i /2

ki =Mi /2+1
Si ,ki ,T Si ,ki ,t zSi ,ki ,
(3)
where (a) Si ,ki ,T is a sequence (in T) of fixed complexvalued p p transfer function matrices defined on block Si
and frequency ki satisfying Si ,ki ,T = Si ,ki ,T for all ki =
p
1, . . . , Mi /2 1; (b) zSi ,ki = [z1Si ,ki , . . . , zSi ,ki ] is a p 1
(complex-valued) random vector that satisfies, for all ki , kj 0,
the following: zSi ,ki = zSi ,ki and cov[zSi ,ki , zSj ,kj ] = Ipp when
ki /Mi = kj /Mj and 0pp otherwise.
Remark 1. a. The SLEX spectral density defined at rescaled
time u = t/T [uT] Si and frequency ki is fT (u, k ) =
|Si ,ki ,T |2 1Si ([uT]). The SLEX spectral density matrix varies
with time but is constant within each block Si .
b. At frequency ki = 0, 1/2, zSi ,ki is a p 1 complexvalued random vector; the real and imaginary parts of each
component are uncorrelated, with mean 0 and variance equal
to 1/2. At ki = 0, 1/2 then each component is real valued with
mean 0 and unit variance. Moreover, for each frequency ki /Mi ,
the components of zSi ,ki are uncorrelated.
c. For all i, j such that ki /|Si | = kj /|Sj |, cov(zSi ,ki ,
zSj ,kj ) = Ip ( p p identity matrix) and is the 0 matrix otherwise.
d. The diagonal elements are the spectral density matrix, denoted by f , = 1, . . . , p, are the autospectra, the
off-diagonal elements are the cross-spectra fm that satisfy
fm = fm , the phase
spectrum is arg( fm ), and the coherency
is Rm = | fm |/ f fmm .

e. Our approach to estimating fT (u, k ) is parallel to the
p
classical approach. Let dSi (ki ) = [dS1i (ki ), . . . , dSi (ki )] be
the SLEX vector of coefficients at time block Si and frequency ki . Compute the SLEX periodogram matrix ISi (ki ) =
dSi (ki )dSi (ki ), and smooth the SLEX periodogram matrix
over frequency. We discuss this further in Section 6.
f. Although in this article we do not elaborate on asymptotic theory (i.e., letting T tend to ), we mention that this is
possible by a straightforward analogy to the univariate model of
ORvSG (2002). To summarize, when we let T , we embed the SLEX model into a sequence of models with a spectral
density matrix that approaches, in the limit, a smoothly varying
function f(u, ) that does not depend on T and that we call the
evolutionary SLEX spectral density matrix. Every element
of f(u, ) for a fixed frequency is defined via a Haar wavelet
representation as a function of time, which when truncated to
some fixed scale in the (Haar) wavelet representation, JT , gives
the SLEX spectral density matrix for the time series of length T.
4. THE MODEL SELECTION CRITERION
We now explain the basic methodology for selecting the best
SLEX model for our observed possibly highly multivariate time
series, taking into account the full multivariate information. In
other words, we select the model (or the segmentation in time)
that best matches the full content of the p p SLEX spectral
density matrix fT (u, k ), which is estimated by smoothing the
SLEX periodogram matrix. To present the basic idea of our new
criterion in this section, we do not yet include the principal
components (PC) approach of the next section, which we use
to reduce the complexity of this multivariate model selection
problem. We also defer all technical details of the derivation to
the Appendix. We must emphasize that our new model selection criterion here is not merely a generalization of the bivariate
SLEX analysis of ORvSM (2001). Our new criterion explicitly
takes into account the autocorrelation and cross-correlation information from all time series channels simultaneously.
Our model selection criterion is based on minimizing a
complexity-penalized log energy cost function. This function
is additive over all time segments, or blocks, that make up one
SLEX model. The cost at one block S( j, b) is defined as
Mj /2
Cost( j, b) =

log det Ij,b (k ) + j,b ( p) Mj ,
(4)
k=Mj /2+1
where Ij,b is the smoothed periodogram matrix and j,b ( p) is

the data-driven complexity penalty for block Sj,b . Let hj,b be the
bandwidth used in smoothing the SLEX periodogram matrix.
A simple version of the complexity parameter that we use is
j,b ( p) = pj,b , where j,b takes the form

j,b = j,b (hj,b ) = log10 (e) hj,b 2 log Mj .
(5)

The cost for a particular model i Si BT is the sum of
the costs for all blocks in BT . For example, in Figure 2,
the cost for the model that corresponds to the shaded blocks
is C(1, 0) + C(2, 2) + C(2, 3). In the actual implementation,
we use the best basis algorithm (BBA) of Coifman and
Wickerhauser (1992). The essential idea is to compare the
cost at a parent block and the children blocks. If C( j, b) <
C( j + 1, 2b) + C( j + 1, 2b + 1), then we choose the parent
block S( j, b). Otherwise, we choose the children blocks.
In the Appendix we explain in more detail the log energy
and complexity penalty part of the criterion, which are inspired
by ideas from Donoho, Mallat, and von Sachs (DMvS) (2000).
Basically, minimizing the log energy part amounts to searching among all Gaussian SLEX processes for the one (with
block-stationary covariance matrix) that has minimal Shannon
entropy. This is actually equivalent to minimization of a KL
distance betweeen Gaussian processes. So in some sense we
are looking for the best KL approximation of a block-stationary

covariance SLEX process to our given data. In the multivariate case this involves computing the determinant of the (estimated) spectral matrix, which is quite naturally a challenge for
practical use of the criterion, especially when it it close to 0,
which happens when the data exhibit a high degree of multicollinearity. This problem is addressed in Section 5, where PCA
will help us use uncorrelated components that still carry the full
multivariate spectral information.
To avoid fitting a process with too many blocks of stationarity, a penalty must be added to this log energy
criterion. The

form of this penalty being proportional to Mj , the root of the
length of blocks S( j, b), can be motivated by expressing the sum
Mj /2
k=Mj /2+1 log det Ij,b (k ) as a projection onto a Haar wavelet
vector. This Haar wavelet lives as a function of frequencies on
this block S(j, b), and being orthonormal it is hence proportional to 1/ Mj . In other words, this is the correct normalization of this sum in terms of constant mean squared energy
over the block under consideration. Again, DMvS showed that
adding this penalty amounts to (Haar) wavelet thresholding of
the log periodograms and stabilizes the empirical log energy
criterion in the sense that its minimizer gives (asymptotically)
the correct segmentation (by log energy dissipation). In the Appendix we detail this and show that we can safely use the same
argument even if applied to the already kernel-smoothed SLEX
periodograms.
Finally, the question arises how to choose the complexity
penalty parameter j,b . We actually determine this quantity
from the data. We interpret our BBA search algorithm over
block partitions in time as a (univariate) variant of the dyadic
CART algorithm of Donoho (1997). In this algorithm exclusion or inclusion of a block is analogous to (Haar) wavelet
thresholding of the coefficients that represent this block S( j, b)
of length Mj . A common possibility is to use the universal
threshold, which is proportional to the standard deviation
of

these coefficients, times the well-known factor of 2 log Mj .
Hence we have to calculate the variance of these wavelet coefficients, for which we take the asymptotic variance of log
kernel-smoothed spectral estimates following Brillinger (1981,
cor. 5.6.3). Details on the exact derivation are to be found again
in our Appendix.
Note that the multivariate cost (4) is a generalization of the
cost for the univariate time series Xt , which is

Mj /2
,
k=Mj /2+1 log(Ij,b (k )) + j,b Mj . Note also that ORvSM
(2001) used the cost as the sum of the univariate costs for each
of the bivariate components. This simplification would be acceptable if the components of the multivariate time series are
uncorrelated, because the spectral matrix would be diagonal.
However, in many practical situations the components of multivariate time series show a high degree of cross-correlation.
Thus any multivariate model selection criterion must take into
account the cross-relationships between the components.
5. MODEL SELECTION BASED ON NONREDUNDANT
SPECTRAL INFORMATION
In practice, we frequently encounter multivariate time series
that exhibit a high degree of multicollinearity. This suggests
the need to extract nonredundant pertinent spectral information, from a massive high-dimensional dataset, that are to be
523
used in model selection and further analysis. There has been

a long history of analyzing high-dimensional multivariate data
by some sort of decomposition into uncorrelated components,
like independent component analysis (e.g., Cardoso 1998).
In analyzing high-dimensional data, frequency domain PCA
plays some dominant role in the treatment of dynamic models, such as vector autoregressive moving average (VARMA)
or dynamic factor models in econometric applications (Forni,
Hallin, Lippi, and Reichlin 2000). To our knowledge, however,
there exists no statistical approach of explicitly modeling the
deviation from stationarity in data with some time-varying autocorrelation and cross-correlation structure. Stock and Watson
(1998) gave a very general framework for a possibly timevarying factor model description in economic index theory.
In addition, Aguilar and West (2000) provided a comprehensive discussion on dynamic factor models for multivariate
financial time series and the incorporation of stochastic volatility components for latent factor processes. We finally point
out the work of West, Prado, and Krystal (1999) and the corresponding software that is available for downloading at the
website http://www.isds.duke.edu/%7EmW/tvar.html. This approach, which is complementary to our procedure, explores latent structure in multivariate nonstationary time series. It works
well for short time series but can be computationally costly, and
hence suggests subsampling the time series as a way to circumvent this problem. However, we avoid subsampling, because it
can lead to a loss of high-frequency information, which can be
valuable when analyzing epileptic seizure EEGs.
In this article we extract the nonredundant spectral features
in the dataset by computing the eigenvalueeigenvector decomposition of the time-varying SLEX matrix. This procedure
gives the SLEX principal components that are nonstationary
and have zero coherency. Hence the SLEX PCs contain nonredundant spectral information that is used in selecting the best
model (or best segmentation). Finally, we note that the role of
the SLEX PCs in our analysis goes beyond model selection.
The SLEX PC approach can provide systematic guidance for
the user to focus on a fewer number of channels that would
warrant further analyses.
5.1 Principal Components for Stationary Time Series
Brillinger (1981) motivated frequency domain PCA for stationary time series in the following way. Suppose that we
have a p-variate time series Xt = [X1 (t), . . . , Xp (t)] , with
mean 0 and spectral density matrix f(), which we now want
to approximate by a q-variate process Rt = [R1,t , . . . , Rq,t ]
(q p),
whose components have zero coherency, defined by
Rt =
= ct V , where {cr } is a p q valued filter sat
isfying
r= |cr | < . The filter coefficients {cr } are obtained using an optimality criterion that we now describe.
Suppose that we want to be
able to reconstruct the {Xt } from
the q-variate R(t) by
X(t) =
=
bt R(), where the filter
br is a p q matrix that satisfies
r= |br | < . We want

X(t) to be such that the mean squared approximation error
E[(Xt
Xt ) (Xt
Xt )] is minimized. To ease our discussion,
we suppose that the eigenvalues of f() are unique and let
v1 (), v2 (), . . . , vq () be the q largest eigenvalues arranged
in decreasing magnitude. Denote the corresponding eigenvectors by V 1 (), V 2 (), . . . , V q (). The solution to this problem
524
1/2
is to choose c = 1/2 c() exp(i2) d, where c() is the
matrix consisting of eigenvectors V 1 (), . . . , V q (). It turns
out that the spectrum of the mth principal component Rm,t at
frequency is the mth largest eigenvalue vm (). (For an excellent discussion on the applications of frequency domain PCA
in stationary time series, see Shumway and Stoffer 2000.)
ified. It implicitly renders irrelevant those components that do

not contribute much to the variance. From a numerical standpoint, it also avoids computational problems, because the term
wd log(vd ) is assigned the value 0 when vd and wd are both
close to 0, that is, when the absolute and relative contribution
to variance are small.
6. THE ALGORITHM OF THE MULTIVARIATE
SLEX METHOD
5.2 SLEX Principal Components Analysis for

Multivariate Nonstationary Time Series
In the nonstationary case, the filter coefficients are allowed
to vary over time. Equivalently, the spectra of the SLEX PCs,
which are the eigenvalues of the time-varying spectral density
matrix, also evolve with time. Our goal is to first decompose
the multivariate nonstationary time series into the SLEX PCs,
which are nonstationary components that have zero coherency.
The time-varying filter and SLEX PC spectra at block S( j, b)
are obtained by performing an eigenvalueeigenvector decomposition of the estimated spectral density matrix Ij,b (k ) for
each k . The best model (or best segmentation) is obtained by
applying the penalized log energy criterion on the SLEX PC.
The spectra of the SLEX PCs are simply the eigenvalues of
p
the spectral density matrix. Let v1j,b (k ), . . . , vj,b (k ) denote the
p eigenvalues arranged in decreasing magnitude. If we have
a prior knowledge on the reduced dimension q p, then the penalized log energy criterion (4) at block S( j, b) can be defined
in terms of the q SLEX PCs as
Mj /2
Cost( j, b) =
q

log(vdj,b (k )) + qj,b Mj .
(6)
k=Mj /2+1 d=1
In practice, however, q is rarely known. Moreover, even in the

stationary situation there is no consensus on the best approach
to selecting q. In this article we adopt a data-adaptive approach
that does not require the user to specify q. The basic idea is
to assign a weight to each SLEX component that is proportional to its variance (spectrum). Essentially, SLEX PCs with
larger eigenvalues are given more weight and those with smaller
eigenvalues are given smaller weights. The weight wdj,b (k ) of
the SLEX PC with the dth largest eigenvalue is defined as
p

wdj,b (k ) = vdj,b (k )
vcj,b (k ),
(7)
c=1
which is simply the proportion of variance explained by the

dth component at frequency k , in block S( j, b). The cost at
block S( j, b) is
Mj /2
Cost( j, b) =
p

wdj,b (k ) log vdj,b (k ) + j,b Mj ,
k=Mj /2+1 d=1
(8)
log(vdj,b (k ))
is the logarithm of the spectrum

where, as before,
of the dth principal component at frequency k in block S( j, b)
having applied PCA to the optimally smoothed periodogram
matrix. Note that this cost is based on the weighted eigenvalues; hence we do not need the factor q in the complexity
penalty term. One attractive feature of weighting the eigenvalues is that the optimal number q need not be explicitly spec-
The algorithm of the SLEX method incorporates the theoretical derivations in Section 4 and the practical considerations on
extracting nonredundant spectral information discussed in the
preceding section. The algorithm is as follows:
( ) (k = M /2 + 1, . . . , M /2) be the
Step 1. Let dj,b
k
j
j
SLEX coefficients of time series X ( = 1, . . . , p) on
block S( j, b). Next we compute the SLEX raw periodograms
,m
( )d m ( ), where
(k ) = dj,b
and cross-periodograms, Ij,b
k j,b k
, m = 1, . . . , p. Denote the resulting periodogram matrix
by Ij,b (k ).
,m
(k ) at each block
Step 2. Smooth the elements of Ij,b
S( j, b) using a kernel smoother with an optimal bandwidth
selected by a criterion based on generalized cross-validation
,
be the smoothed
(GCV) of the gamma deviance. Let Ij,b,h
SLEX periodograms in block S( j, b) of X , = 1, . . . , p, being smoothed using the block-specific bandwidth hj,b . We denote the error degrees of freedom by dfj,b,h = 1 trace(Hh )/
(Mj /2 + 1), where Hh is the smoother matrix. Then hj,b is
the minimizer of the sum of the GCV functions for {Xt },
( = 1, . . . , p),
GCV j,b (h) =
p
GCV j,b (h),
(9)
=1
where
GCV j,b (h)
=
Mj /2
2
(dfj,b,h )2
qk
k=0
,
(k )
Ij,b
,
(k )
Ij,b,h
log
,
(k )
Ij,b
,
(k )
Ij,b,h

1 ,
(10)
where qk = 1/2 when k = 0, Mj /2 and qk = 1 otherwise.

Step 3. Compute the eigenvalues at each block S( j, b) and
frequency k from the optimally smoothed periodogram matrix Ij,b (k ). Arrange them in decreasing order v1j,b (k ), . . . ,
p
vj,b (k ). Compute the weights wdj,b (k ), d = 1, . . . , p, that will
be assigned to the eigenvalues according to (7).
Step 4. Model selection: Compute the cost for each
block S( j, b),
Mj /2
Cost( j, b) =
p
k=Mj /2+1 d=1

wdj,b (k ) log vdj,b (k ) + j,b Mj ,
(11)
where the complexity parameter

j,b is defined in (5). The cost
for a particular model with basis i Si BT is the sum of the
costs for all blocks in the basis BT used for this model. The best
basis (or, equivalently, the best model or best segmentation) in
the SLEX library is the one that minimizes the cost. In the actual
search, we implement the BBA.
525
Step 5a. Point estimation: Extract the optimally smoothed

SLEX auto- and cross-periodograms (from step 2) at each of
the blocks S(j, b) of the best model. We denote the estimate of
the time-varying coherency and phase between X and X m as

(m,m)
,m
,
R ,m
(
)
=
|
I
(
)|/
(k )Ij,b (k )
(12)
Ij,b
k
k
j,b
j,b
and
,m (k ) = arg(I,m (k )).
j,b
j,b
(13)
Step 5b. Compute the confidence intervals using the asymptotic distribution theory:
a. SLEX spectrum: As developed by ORvSG (2002),
I , (k )/f , (k ) is asymptotically distributed as 2 /(2hj,b )
2hj,b
j,b
j,b
when k = 0, M/2 and h2j,b /(hj,b ) otherwise.
,m
b. SLEX coherence: Let rj,b
(k ) = tanh1 [R ,m
j,b (k )]. Then
,m
(k ) is asymptotically normal with mean and variance aprj,b
,m
(k ) and 1/[2(2hj,b p)]. This follows readily
proximately rj,b
from work by Goodman (1963) and Brillinger (1981, sec. 8.6).
c. On the time-dependent generalization of the distribution
of the log of eigenvalues and eigenvectors: We can extend the
results of Brillinger [1981, eqs. (9.4.18) and (9.4.22)]. Note
that log10 vdj,b (k ) is approximately unbiased and normally

distributed with standard deviation log10 (e)/ 2hj,b . Moreover, define the estimated and true cth element of the dth
c,d
c,d
time-varying eigenvector to be Vj,b
(k ) and Uj,b
(k ). Then
c,d
c,d
|Vj,b
(k )Uj,b
(k )|2

T2
is distributed as 22 /2, where

T2 = (hj,b )1/2 vdj,b (k )

c,d
vd,b (k )[vdj,b (k ) vj,b (k )]2 |Vj,b

(k )|2 .
=d
These approximate distributional results hold as long as the

block size Mj is sufficiently large (which, in our own experience, Mj 64 gives reasonable results).
Remark 2. a. The details on the smoothing procedure for
the SLEX periodograms, which behave asymptotically like tapered Fourier periodograms, were reported by Ombao, Raz,
Strawderman, and von Sachs (2001). Moreover, a single bandwidth is used in smoothing all components of the periodogram
matrix, which is sufficient to guarantee that the estimate of the
spectral density matrix is positive definite [i.e., the coherence
estimates (0, 1)].
b. The order of magnitude of the number of operations
for computing the SLEX transform is O[T(log2 T)2 ]; for computing, the SLEX eigenvalueeigenvector decomposition is
O[T 3 log2 T], and for model selection, it is O[T].
using either J = 6 or J = 7 is appropriate, and it turned out

that both resulted in identically best models. Before model
selection, the SLEX PCs were obtained via the time-varying
eigenvalueeigenvector decomposition of the SLEX matrix.
The best model has change points that occur at approximately 20, 30, 36, 38, 40, 61, 72, 73, 74, and 77 seconds from
the starting time. According to the neurologist, the physical
manifestations of seizure became evident around 40 seconds
from the start of recording. However, before this, the SLEX
analysis revealed that changes in the electrical activity of the
brain were already starting.
In the absence of any patient information, one would have
18!
= 153 pairwise cross-correlations, which
to examine all 2!16!
could be overwhelming. The SLEX PCA method was able to
systematically filter the nonredundant information and guided
the user to focus on the most interesting channels (T3 and T4
in this particular example). The first and second SLEX PCs account for approximately 70% of the variance in the EEGs. Thus,
instead of examining each of the 18 channels, we focus our attention on these two SLEX PCs. In the ensuing discussion, we
refer to the frequency bands as follows: delta is 04 Hz, alpha
is 812 Hz, and beta is 1830 Hz. The time-varying spectra of
the first SLEX PC [Fig. 3(a)] account primarily for the increase
in power in the lower frequencies after the onset of seizure. The
second SLEX PC [Fig. 3(b)], on the other hand, accounts for
the spread of power from the delta band to the alpha band. The
95% pointwise confidence intervals for log spectra at the alpha
and beta bands of the first SLEX PC (Fig. 4) clearly indicates
that the spectral power before seizure (t < 40 seconds) is significantly lower than that after seizure. We further examined
the first and second PCs by computing the magnitudes of their
eigenvectors at the delta, alpha, and beta bands. The magnitudes, given in Figures 5 and 6, can be interpreted as the timevarying weights at the EEG channels. Note that although the
information on the patient is available, the model selection and
(a)
(b)
7. THE SLEX ANALYSIS OF MULTICHANNEL EEGs

The dataset is an 18-channel EEG recorded from a patient of
Dr. Malow (neurologist at the University of Michigan). Each
EEG time series has length T = 8,192, recorded for about
82 seconds and then sampled at the rate of 100 Hz. The first
step in our method was to build the family of SLEX models and then select the one that is best according to our penalized log energy criterion. The neurologist suggested that
Figure 3. Time-Varying Spectra of the First (a) and Second (b) SLEX
PCs.
526

(a)
(b)
Figure 4. Pointwise Confidence Interval for the Time-Varying Log

Spectra of the First PC Component at the (a) 10 Hz (in the alpha range)
and (b) 25 Hz (beta range).
SLEX PCA steps in our analysis did not use any such information. It is very encouraging that the SLEX method, indepen(a)
dently of any patient information available, was able to identify

a set of the most interesting channels that included the T3 (left
temporal) channelthe site of seizure onset! In this article we
report only the eight largest weights. We observe that for the
first SLEX PC, most of the weights are concentrated on the T3
(left temporal lobe) and T4 (right temporal lobe). For the second SLEX PC, the weights are quite diffused at the temporal
and frontal lobe areas.
Estimates of the SLEX time-varying spectra at the eight
channels are illustrated in Figure 7. The primary information conveyed in the spectral plots is that the distribution
of power over frequency indeed evolves during the seizure
process. We see that power at the lower frequencies is increased and that power is spread to middle and higher frequencies during seizure. It is important to note that features
of the 18-channel EEGs were captured by the first and second
SLEX PCs.
One of the main goals in our analysis was to use coherence
to study connectivity between active brain areas; that is, how
the neuronal activity in one brain area may influence another.
As suggested by the first eigenvector, we examined two networks, namely coherence between T3 and the other channels
and coherence between T4 and the other channels. As shown in
Figures 8 and 9, the connectivity between brain areas changes
throughout the duration of the epileptic seizure. It is fascinating to note that in Figure 8, the coherence between T3 and
those at the left side of the brain, namely left parietal (P3), left
(b)
(c)
Figure 5. Time-Varying Weights of the First SLEX PC at (a) the Delta Frequency Band (14 Hz), (b) Alpha Frequency Band (812 Hz), and
(c) Higher Beta Frequency Band (2530 Hz). Darker shades represent larger weights.

(a)
527
(b)
(c)
Figure 6. Time-Varying Weights of the Second SLEX PC at the (a) Delta Frequency Band (14 Hz), (b) Alpha Frequency Band (812 Hz), and
(c) Higher Beta Frequency Band (2530 Hz). Darker shades represent larger weights.
Figure 7. SLEX Autospectral Estimates at the Eight Channels Whose Weights Are the Heaviest.
528

(a)
(b)
Figure 8. SLEX Coherence Estimates Between T3 and Channels on the (a) Left Side of the Brain (namely F, C3, and P3), and (b) Right Side of
the Brain (namely F4, C4, P4, and T4).
(a)
(b)
Figure 9. SLEX Coherence Estimates Between T4 and Channels on the (a) Left Side of the Brain (namely F3, C3, P3, and T3), (b) Right Side
of the Brain (namely F4, C4, and P4).
frontal (F3), and left central (C3), are very similar to each other.
Moreover, coherence between T3 and the counterparts on the
right side (P4, F4, and C4) are quite different. This suggests
that connectivity between brain areas on the same side of the
brain behave in a similar manner in the duration of the seizure.
A second interesting observation is the bilaterality (symmetry)
of the coherence. In Figure 9 the coherence pattern on T4 is
similar to that on T3. Bilateral synchrony is quite fascinating,
albeit not completely well understood. It suggests rapid propagation of a seizure from the left temporal to be to the right
temporal lobe. In addition, bilateral synchrony of the temporal
lobe is not uncommon and has been observed in experimental
paradigms (see Grunwald et al. 1999).
8. CONCLUSION
We have developed a systematic, flexible, and computationally efficient procedure for analyzing multivariate nonstationary time series using the SLEX library for representing the
time-varying autospectral and cross-spectral features of the signal. We addressed the problem of dimensionality by decomposing the multivariate time series into nonstationary components
with uncorrelated (nonredundant) spectral information. The
best model is selected using the penalized log energy criterion,
which we derived in this article to be the KL distance between
a model and the SLEX components. The proposed SLEX analysis for multivariate nonstationary time series yields results that
are easy to interpret, because it is an automatic time-dependent
generalization of the classical Fourier analysis of stationary
time series.
In summary, we have combined the development of a new
computationally efficient frequency-domain PCA-based algorithm and a new multivariate model of piecewise stationarity. We use this model to justify our automated search for the
segments of piecewise stationarity. In other words, our algorithm proves useful in particular for the model class of SLEX
processes that we propose in this article. However, we like
to have our particular model of nearly piecewise stationarity
understood in the sense of giving a good approximative description of data that deviate from second-order stationarity in
a broader sense. In particular, ORvSG (2002) showed the asymptotic equivalence between the univariate SLEX and univariate Dahlhaus model of local stationarity. A similar consideration is possible for the multivariate models. Once the best
adapted segmentation is chosen, one can develop asymptotic
theory of consistently estimating the model quantities, in as
much as it is done in the aforementioned univariate treatment
of ORvSG (2002).
APPENDIX: DERIVATION OF
COMPLEXITYPENALIZED
KULLBACKLEIBLER CRITERION
A.1 Motivation of the Log Energy Part
DMvS (2000) proposed a log energy criterion for analyzing univariate locally stationary processes using the local cosine transform.
The goal of DMvS (2000) is quite distinct from our own pursuits,
but the basic ideas can be adapted to our setup where we explicitly
model multivariate time series by selecting a basis from the SLEX library. Suppose now that we have a nonstationary Gaussian process
with probability law P and unknown covariance matrix . Our aim
529
is to approximate by a block-stationary covariance matrix that is the

Fourier back transform of the SLEX spectrum (which is piecewise constant in time). This amounts to approximating the given nonstationary
Gaussian process by an appropriate Gaussian SLEX process.
Let f (x) be the density of P and define the Shannon entropy as

(A.1)
H(P ) = f (x) log f (x) dx.
Note that for a Gaussian time series X1 , . . . , XT with probability law
P = N(0, ),
H(P ) = 1/2 log det() + T/2 log(2 ) + T/2.
(A.2)
This can be shown by simply plugging into (A.1) for the density f (x) the form of the Gaussian density and using the substitution
y = 1/2 x.
The basis minimizing the log-entropy cost among all bases in the
SLEX library of bases gives rise to finding the block-stationary covariance matrix, , say, which minimizes the Shannon entropy. The
link between these two minimization problems is given by searching among all covariance matrices of Gaussian probability laws that
are diagonal in some basis B in the given finite library of orthogonal
Fourier-like bases, here the SLEX bases. is actually the best approximation to the covariance matrix in the sense of the KL distance between the two Gaussian probability measures P and Q , with densities f and g. This distance is K(Q |P ) = f (x) log[f (x)/g(x)] dx
and has the following connection to the Shannon entropy in (A.2),
2K(Q |P ) = log det log det = 2H(Q ) 2H(P ).
(A.3)
The second equality follows directly from (A.2), whereas the first is
due to the special structure of being an approximation to and being
diagonal in B. For reasons of completeness, we give its proof now.
Proof of (A.3). We first note that = DB = B diag(B B)B
(i.e., conceptually, first is rotated into the basis B, then its offdiagonals are set to 0, then the result is back-rotated). Observing further that

K(Q |P ) = f (x) log( f (x)) log(g(x)) dx

= Ef log( f (x)) log(g(x)) ,
we get with

2 log( f (x)) log(g(x)) = log det + x 1 x log det x 1 x
and with Ef (xx ) = that

2Ef [x 1 x x 1 x] = tr( 1 Ef (xx )) tr( 1 Ef (xx )
= tr[ 1 ] T.
Now tr[ 1 ] = T, because tr[ 1 ] = tr[diag(B B)1 BB] = T
(as B1 = B , and because all diagonal elements in the resulting matrix are 1). Putting this together gives

2K(Q |P ) = 2Ef log( f (x)) log(g(x)) = log det log det ,
which ends the proof of the first equality in (A.3).
Equation (A.3) shows that minimizing the KL distance K(Q |P )
is the same as minimizing H(Q ). This follows from H(P ) being the entropy of the true unknown probability measure P and
hence being a constant with respect to the minimization problem.
Furthermore,
(A.2) shows that up to constants not depending on ,
2H(Q ) = k log(k ), where the k are the eigenvalues of . Note
first that for the univariate case of SLEX, these eigenvalues are approximately equal to the spectrum fT (k ) on the block S( j,b) of stationarity of the SLEX process. More specifically, still first for the
530
univariate case, it can be shown that, approximately, for the covariance

matrix M of dimension M M of one block of length M, say,

2H(Q M ) = log det( M ) T log fT () d
M/2
log fT (k ),
k=M/2+1
again, up to constants not depending on M . Now for our multivariate

case, we get

log det(fT (k )).
2H(Q ) = log det()
k
This has been shown in a similar context by Dahlhaus (2001,

prop. 2.5), which gives rates of this approximation even for the general locally stationary situation. On the empirical level, of course, the
theoretical quantities in the spectral matrix fT of one block need to be
Clearly, log det(I)
is a biased esreplaced by the estimated quantities I.
timator of log det f. However, the bias is asymptotically constant across
frequency (see, e.g., Brockwell and Davis 1991, sec. 10.3.2). Therefore, the effect of the bias is applied equally to all models in the SLEX
family, giving a constant shift in the penalized log energy criterion.
A.2 The Complexity Penalty Term
For simplicity for the univariate situation only, we motivate the
form
of the penalized criterion,
which for
block ( j, b) takes the

form k j log Ij,b (k ) + j Mj . Write j = k j log Ij,b (k ) and

j = k j log fj,b (k ), where fj,b () denotes the spectrum at frequency which is constant on the block ( j, b) in time. Now, by analogy to DMvS (2000, sec. 14), calculating wavelet transforms of the
log of the periodograms (resulting from the local cosine transform)
with respect to Haar wavelets, we can show the following result onthe
coarsest scaling coefficient corresponding to the vector j = 1/ Mj
as a function of frequencies
k , k j , supported on block
( j, b) of
length Mj : j = j , log Ij,b Mj and j = j , log fj,b Mj . Hence,
by DMvS [2000, eq. (14.6)],

P j / Mj + j j / Mj ( j, b) 1,
(A.4)
as the sample size T . This result comes from wavelet softthresholding of log-periodogram like statistics whenever the threshold
j is appropriately chosen to take the asymptotic distribution of these
statistics in consideration (cf. our considerations in Sec. A.3).
This allows the followinginteresting interpretation of (A.4). It is
sufficient to add a penalty j Mj to the empirical cost built with the
raw log-periodograms to ensure that the penalised cost is with high
probability never smaller than the true cost (based on the true logspectrum)which is a log-entropy dissipation condition (following
sec. 9.1). But this is a very nice simultaneous overestimation result
that pays off when minimizing over a finite selection of SLEX models, that is, block segments ( j, b). It protects simultaneously against
oversplitting and undersplitting, because for T sufficiently high, the
minimizer of the penalized empirical cost is equal to the minimizer
of the true cost. It remains to show that replacing in (A.4) the log
of the raw periodograms by the log of the kernel-smoothed periodograms, we do not lose this nice property.
For this, we observe

that, due to the concavity of the logarithm,
k j log fj,b (k )

j,b (k ), where log(I)
j,b (k ) denotes, as intermediate reflog(I)
k j
erence, the kernel smoother over log-periodograms. Further, due to
linearity and the normalization of the kernel weights to sum up to

j,b (k )
unity, k j log(I)
k j log Ij,b (k ), where stands
for equality if we neglect the different behavior of the periodograms
at frequencies 0 and .
A.3 Deriving the Complexity Penalty Parameter of (5)

As already indicated in the previous section, our proposed
data-driven penalty parameter is based on concepts developed for universal thresholds in wavelet thresholding as well as in complexitypenalization problems (Antoniadis and Fan 2001; Donoho 1997).
The universal threshold for keeping or killing wavelet coefficients
over a block of length Mj is proportional to the standard
deviation of

these coefficients times the well-known factor of 2 log Mj for protecting against large deviations of Gaussian variables. Because the
variance of Haar wavelet coefficients is actually equal to the one of
Haar scaling coefficients, we need calculate only the latter. Following
Brillinger (1981, cor. 5.6.3), we can calculate the asymptotic variance
of the (univariate) log kernel-smoothed spectral estimate log fhj,b (k )
at frequency k in the block S( j, b) of length Mj , smoothed by a
bandwidth hj,b to be var(log fhj,b (k )) = (log10 (e))2 /(Mj hj,b ) + o(1).
Summing over Mj frequencies in this block (compare the form of
our cost function) and treating all individual spectral estimates as
(approximately) independent over Fourier frequencies in this block,
the total (asymptotic) variance is Mj times the individual one de
rived earlier: var( k j log fhj,b (k )) Mj (log10 (e))2 /(Mj hj,b ) =
(log10 (e))2 /(hj,b ). Taking square roots, this motivates our choice of
the scale and block dependent j,b in (5), that is, j,b = log10 (e)

(hj,b )1 2 log(Mj ). In our approach, we use one bandwidth hj,b in
smoothing all autospectra, to ensure that our spectral matrix estimates
are nonnegative definite. As a result, the contribution of each component of the multivariate time series to the complexity penalty term is
the same. Thus in our multivariate situation, we simply use the complexity penalty parameter j,b ( p) = pj,b .
[Received November 2003. Revised July 2004.]
REFERENCES
Aguilar, O., and West, M. (2000), Bayesian Dynamic Factor Models and Variance Matrix Discounting for Portfolio Allocation, Journal of Business &
Economic Statistics, 18, 338357.
Antoniadis, A., and Fan, J. (2001), Regularization of Wavelet Approximations (with discussion), Journal of the American Statistical Association, 96,
939967.
Brillinger, D. (1981), Time Series: Data Analysis and Theory, Oakland, CA:
Holden-Day.
Brockwell, P., and Davis, R. (1991), Time Series: Theory and Mehods (2nd ed.),
New York: Springer-Verlag.
Cardoso, J. F. (1998), Multidimensional Independent Component Analysis,
in Proceedings of the ICASSP, 1998, Seattle.
Coifman, R., and Wickerhauser, M. (1992), Entropy-Based Algorithms
for Best Basis Selection, IEEE Transactions on Information Theory, 32,
712718.
Dahlhaus, R. (2001), A Likelihood Approximation for Locally Stationary
Processes, The Annals of Statistics, 28, 17621794.
Daubechies, I. (1992), Ten Lectures on Wavelets, Philadelphia: Society for Applied and Industrial Mathematics.
Donoho, D. (1997), CART and Best-Ortho-Basis: A Connection, The Annals
of Statistics, 5, 18701911.
Donoho, D., Mallat, S., and von Sachs, R. (2000), Estimating Covariances of
Locally Stationary Processes: Rates of Convergence of Best Basis Methods,
Technical Report 517, Stanford University, Dept. of Statistics.
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000), The Generalized
Dynamic-Factor Model: Identification and Estimation, Review of Economics
and Statistics, 82, 540554.
Goodman, N. (1963), Statistical Analysis Based Upon a Certain Multivariate
Complex Gaussian Distribution (an Introduction), The Annals of Mathematical Statistics, 34, 152177.
Grunwald, T., Beck, H., Lehnertz, K., Blmcke, I., Pezer, N., Kutas, M.,
Kurthen, M., Karakas, H., Van Roost, D., Wiestler, O., and Elger, C. (1999),
Limbic P300s in Temporal Lobe Epilepsy With and Without Ammons Horn
Sclerosis, European Journal of Neuroscience, 11, 18991906.
Hastie, T., and Tibshirani, R. (1990), Generalized Additive Models, London:
Chapman & Hall.
Huang, H.-Y., Ombao, H., and Stoffer, D. (2004), Discrimination and Classification of Nonstationary Time Series Using the SLEX Model, Journal of the
American Statistical Association, 99, 763774.

Moulin, P. (1996), Signal Estimation Using Adapted Tree-Structured Bases
and the MDL Principle, in Proceedings of IEEE Signal Processing Symposium on Time-Frequency and Time-Scale Analysis, Paris, pp. 141143.
Nason, G., von Sachs, R., and Kroisandt, G. (2000), Wavelet Processes and
Adaptive Estimation of the Evolutionary Wavelet Spectrum, Journal of the
Royal Statistical Society, Ser. B, 62, 271292.
Ombao, H. (1999), Theoretical Aspects of the SLEX, technical report, University of Pittsburgh, Dept. of Statistics.
Ombao, H., von Sachs, R., and Guo, W. (2000), Estimation and Inference for
Time-Varying Spectra of Locally Stationary SLEX Processes, in Proceedings of the 2nd International Symposium on Frontiers of Time Series Modeling, Nara, Japan, December 1417, 2000.
Ombao, H., Raz, J., Strawderman, R., and von Sachs, R. (2001), A Simple
GCV Method of Span Selection for Periodogram Smoothing, Biometrika,
88, 11861192.
Ombao, H., Raz, J., von Sachs, R., and Guo, W. (2002), The SLEX Model
of a Non-Stationary Random Process, Annals of the Institute of Statistical
Mathematics, 54, 171200.
531
Ombao, H., Raz, J., von Sachs, R., and Malow, B. (2001), Automatic Statistical
Analysis of Bivariate Nonstationary Time Series, Journal of the American
Statistical Association, 96, 543560.
Saito, N. (1994a), Local Feature Extraction and Its Applications, unpublished
doctoral dissertation, Yale University, Dept. of Mathematics.
(1994b), Simultaneous Noise Suppression and Signal Compression
Using a Library of Orthonormal Bases and the Minimum Description Length
Principle Criterion, in Wavelets in Geophysics, eds. E. Foufoula-Georgiou
and P. Kumar, New York: Academic Press.
Shumway, R. H., and Stoffer, D. S. (2000), Time Series Analysis and Its Applications, New York: Springer-Verlag.
Stock, J. H., and Watson, M. W. (1998), Diffusion Indexes, Working Paper W6702, National Bureau of Economic Research.
West, M., Prado, R., and Krystal, A. (1999), Evaluation and Comparison of
EEG Traces: Latent Structure in Nonstationary Time Series, Journal of the
American Statistical Association, 94, 10831094.
Wickerhauser, M. (1994), Adapted Wavelet Analysis From Theory to Software,
Wellesley, MA: IEEE Press.

MultiSLEX JASA2005

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

MultiSLEX JASA2005

Transféré par

Droits d'auteur :

Formats disponibles

SLEX Analysis of Multivariate Nonstationary

Next we develop a criterion for selecting the best SLEX

2005 American Statistical Association

Journal of the American Statistical Association, June 2005

spectrum that is a decomposition of power across time and

we present our new multivariate nonstationary SLEX model. In

Ombao, von Sachs, and Guo: SLEX Analysis

where (1/2, 1/2], and u [, 1 + ], where 0 < < .5.

whose regimes of stationarity have lengths that also vary with

The elements in the Cramr representation are the complex

Journal of the American Statistical Association, June 2005

Si ,ki ,T Si ,ki ,t zSi ,ki ,

is Rm = | fm |/ f fmm .

where Ij,b is the smoothed periodogram matrix and j,b ( p) is

Ombao, von Sachs, and Guo: SLEX Analysis

are looking for the best KL approximation of a block-stationary

used in model selection and further analysis. There has been

Journal of the American Statistical Association, June 2005

ified. It implicitly renders irrelevant those components that do

5.2 SLEX Principal Components Analysis for

k=Mj /2+1 d=1

In practice, however, q is rarely known. Moreover, even in the

which is simply the proportion of variance explained by the

k=Mj /2+1 d=1

is the logarithm of the spectrum

GCV j,b (h) =

GCV j,b (h),

where qk = 1/2 when k = 0, Mj /2 and qk = 1 otherwise.

k=Mj /2+1 d=1

where the complexity parameter 

Ombao, von Sachs, and Guo: SLEX Analysis

Step 5a. Point estimation: Extract the optimally smoothed

is distributed as 22 /2, where

vd,b (k )[vdj,b (k ) vj,b (k )]2 |Vj,b

These approximate distributional results hold as long as the

using either J = 6 or J = 7 is appropriate, and it turned out

7. THE SLEX ANALYSIS OF MULTICHANNEL EEGs

Journal of the American Statistical Association, June 2005

Figure 4. Pointwise Confidence Interval for the Time-Varying Log

dently of any patient information available, was able to identify

Ombao, von Sachs, and Guo: SLEX Analysis

Journal of the American Statistical Association, June 2005

Ombao, von Sachs, and Guo: SLEX Analysis

is to approximate by a block-stationary covariance matrix that is the

Journal of the American Statistical Association, June 2005

univariate case, it can be shown that, approximately, for the covariance

again, up to constants not depending on M . Now for our multivariate

This has been shown in a similar context by Dahlhaus (2001,

A.3 Deriving the Complexity Penalty Parameter of (5)

Ombao, von Sachs, and Guo: SLEX Analysis

Vous aimerez peut-être aussi

is Rm = | fm |/ f fmm .

GCV j,b (h),

where the complexity parameter

vd,b (k )[vdj,b (k ) vj,b (k )]2 |Vj,b