Vous êtes sur la page 1sur 5

ALEXANDRE TEMPOREL

(AT1691)

A DVA N C E D T O P I C S I N
M AC H I N E L E A R N I N G

- GUEST#2 -

WAVELETS FOR TIME-SERIES DATA


MINING
By Dr Yonghong Peng

UNIVERSITY OF BRISTOL

MARCH 2002
1) WHAT IS TIME SERIES AND TSDM?
A time series is a sample of a signal, a collection of observations or again a record of something like
temperature, water level or market data.

Time series can be used in the field of data mining for different purposes:

• Pattern discovery: by using techniques of data transformation, we can represent time series
data into a different form to represent a pattern more efficiently. Indeed, after a mathematical
transformation such as Fourrier Transform (FT) or Wavelet transform (WT), a pattern can be
discovered and be for example compared to another one and used for different purposes
described now…
• Pattern Comparing and Matching: in time domain, two series of data, which would look very
similar and would be difficult to dissociate can become clearly different in another domain
(using again a data transformation process such as FT or WT). Once patterns have been
discovered from, for example two distinct time-series, they can be compared to see if they
match in which case, we can think that the matching may underlie a concept from which rules
can be then created.
• Trend detection and forecasting: once a data set has been transformed into another
representation (as frequency domain), the data may be more efficiently examined to discover
eventual trends in the original time domain data set. This trend may be interesting as it can
help us to forecast an event, which can for example be an error.
• Knowledge generation: from trend detection, we can predict or forecast what the following
data can be and we may consequently construct knowledge. For example, using TSDM, we
could produce a rule, which says that when the stock prices of a company A shows a steep
increase, the stock price of company B shows a similar trend within the next 30 minutes. This
rule in indeed created because of the similarities of the two time series from the companies A
and B.
• Data pre-processing: two other common applications of TSDM are filters and data
normalisation. We will see that data series contain spectrum that can be examined
independently. A noise filter can be applied to each spectrum consequently removing the
coefficients that are classified as noise by setting coefficient to zero. Also we can use TSDM
techniques to rescale a set of data to improve for example the readability of a graph by
changing the Y-axis scaling factor (i.e. new Y axis going from 0 to 1).

2) TRANSFORMATION AND DATA REPRESENTATION


It exists different techniques to transform a set of data into another form. We can use mathematical
transformations to translate our time series input data into another domain to obtain further information
from that signal that is not readily available in the raw signal.

2.1) Fourrier Transform

Fourrier Transform works by translating a function from the time domain into a new function in the
frequency domain. The signal can be analysed for its frequency content because the Fourrier
coefficients of the transformed function represent the contribution of each sine and cosine function at
each frequency. (See graph 1,2,3 and 4 where Graph 1 & 2 are input signals and Graph3 & 4 represent
their equivalent and respective Fourrier Transforms showing the frequency content of each input
signal).

1
2.2) Problem with Fourrier Transform
Graph1 Graph2

Graph3 Graph4

Graphs illustrating signals in time domains and their Fourrier Transform in frequency domain [1]

The signal in graph1contains 4 frequency components at all times (10, 25, 50, 100 Hz) whereas graph3
represents a signal containing the same frequency component but at different intervals. [0 to 300
ms]=>100 Hz sinusoid, [300 to 600 ms] => 50 Hz, [600 to 800 ms] => 25 Hz sinusoid, [800 to 1000
ms] has a 10 Hz. The problem is that both signals Fourrier Transform (in graph 2 and 4) show the
spectrum content of our input signals but they do not give where in time those spectral components
appear. Indeed the time information in the frequency domain.

2.3) Short time Fourrier Transform

When the time localization of the spectral components is needed, a transform giving the time-frequency
representation of the signal is needed. STFT can do the trick and analyses the signal using a moving
window. The input signal is chopped up into sections (the width of the window), where each section is
analysed for its frequency content separately. The effect of the window is to localize the signal in time.
Output data may for example have the following format < [(A,B), C] , etc.... > where A and B are the
frequency-amplitude pair of the signal at time C (representing a short interval). Narrow windows give
good time resolution, but poor frequency resolution while wide windows give good frequency
resolution, but poor time resolution. The problem with STFT is that this transform technique tells us
what spectral component happens for an interval of time but what we really need is what spectral
component exists at any time instant. This problem is now presented and can be overcome by using
wavelet transform, which give us a variable resolution.

2.4) Wavelet transform VS STFT

An advantage of wavelet transforms is that the windows vary.

• In order to isolate signal discontinuities or when the input signal contains high frequencies, we
need to have some very short basis functions.

• However, in order to obtain detailed frequency analysis, we would like to have some very
long basis functions.

A way to achieve this is to have short high-frequency basis functions and long low-frequency ones.
Wavelet transforms presents this interesting feature compared to STFT which only have a moving
window of fixes width for all frequencies contained in the signal.

2
Figure 5 shows the coverage in the time-frequency plane of a STFT. Because a single window is used
for all frequencies, the resolution of the analysis is the same at all locations. However figure 6
representing an analysis using Wavelet techniques show different size of window giving better
information on the processed signal.

figure5 figure6

Coverage of the-frequency plane for STFT (figure5) and Wavelet transform (figure6). [2]

2.5) Wavelet Transform

Wavelets allow a time series to be viewed in multiple resolutions. Each resolution reflects a different
frequency. The wavelet technique takes averages and differences of a signal, breaking the signal down
into spectrum. Each step of the wavelet transform during an analysis produces two sets of values: a set
of averages and a set of differences (also known as wavelet coefficients). Each step produces a set of
averages and coefficients that is half the size of the input data. For example, if the time series contains
256 elements, the first step will produce 128 averages and 128 coefficients. The averages then become
the input for the next step (e.g., 128 averages resulting in a new set of 64 averages and 64 coefficients).
This continues until one average and one coefficient is calculated.

The average and difference of the time series is made across a window of values. A wavelet algorithm
calculates each new average and difference by shifting this window over the input data. For example, if
the input time series contains 256 values, the window will be shifted by two elements, 128 times, in
calculating the averages and differences. The next step of the calculation uses the previous set of
averages, also shifting the window by two elements. Logically, the window increases by a factor of two
each time.

The power of two coefficient (difference) spectrum generated by a wavelet calculation reflects change
in the time series at various resolutions. The first coefficient band generated reflects the highest
frequency changes. Each later band reflects changes at lower and lower frequencies.

3) USE OF WAVELET TECHNIQUES FOR TSDM


3.1) Pattern generation
Wavelets provide powerful methods for finding hidden pattern in time series. Once discovered, these
structures are useful for solving system classification problems and for predicting events in time series.
Example application areas include motor diagnostics, speech recognition or financial time series
prediction, etc....

Figure 7

3
Figure 8

Figure 7 and 8 illustrate two different data format to pass on to a data miner. The algorithm within the
data miner can then use these input data to extract useful rules from these data (generate knowledge to
do classification), predict events or find trends.... In figure 7, the data are presented to a data miner in
the form of discrete points through temporal data for different frequency bands whereas figure 8
presents the data in frequency-time manner as described at the end of chapter 2.5 in this report for a
pattern delimitated by a window.

3.2) Denoising noisy data [2]

In many fields of engineering, we are faced with the problem of recovering a true signal from
incomplete, indirect or noisy data.
Figure 9

“before” and “After” a magnetic resonance signal has been processed to remove noise [2]

The technique works in the following way. When you decompose a data set using wavelets, you use
filters that act as averaging filters and others that produce details. Some of the resulting wavelet
coefficients correspond to details in the data set. If the details are small, they might be omitted without
substantially affecting the main features of the data set. The idea of thresholding, then, is to set to zero
all coefficients that are less than a particular threshold. These coefficients are used in an inverse
wavelet transformation to reconstruct the data set. Figure 9 is a pair of "before" and "after" illustrations
of a nuclear magnetic resonance (NMR) signal. The signal is transformed, thresholded and inverse-
transformed. The technique is a significant step forward in handling noisy data because the denoising is
carried out without smoothing out the sharp structures. The result is cleaned-up signal that still shows
important details. [2]

Bibliography
[1] The Wavelet Tutorial by Robi Polikar.
http://engineering.rowan.edu/~polikar/WAVELETS/WTpart1.html
[2] A. Graps, "An Introduction To Wavelets"
http://www.amara.com/ftpstuff/IEEEwavelet.ps.gz
Applying the Haar Wavelet Transform to Time Series Information
http://www.bearcave.com/misl/misl_tech/wavelets/haar.html#WhyHaar
A survey of temporal knowledge Discovery Paradigms and methods
http://www.cs.flinders.edu.au/People/John_Roddick/Papers/TKDE_Survey.pdf

Vous aimerez peut-être aussi