Académique Documents
Professionnel Documents
Culture Documents
t t t
y y y
Taking first differencing is a very useful tool for removing non-
stationarity, but sometimes the differenced data will not appear
stationary and it may be necessary to difference the data a
second time.
2 1 2 1 1 1
2 ) ( ) (
+ = =
=
t t t t t t t t t t
y y y y y y y y y y
In practice, it is quite rare to proceed beyond second order
differences
2 1 2 1 1 1
) ( ) (
t t t t t t t t t t
y y y y y y y y y y
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
differences.
Seasonal differencing Seasonal differencing
With not stationary seasonal data, a seasonal difference must
be applied.
A seasonal difference is the difference between an
observation and the corresponding observation from the observation and the corresponding observation from the
previous year.
s t t t
y y y
=
p
,
1
,..
q
are parameters of models, C is the constant ,
1
,
t
are white noise error
terms and s is the number of periods per season
ARIMA MODELS
A The Box-Jenkins Approach
Advantages
Derived from solid mathematical statistics foundations
ARIMA models are a family of models and the BJ approach is a ARIMA models are a family of models and the BJ approach is a
strategy of choosing the best model out of this family
It can be shown that an appropriate ARIMA model can produce pp p M p
optimal univariate forecasts
Disadvantages
Requires large number of observations for the model
identification
Hard to explain and interpret to unsophisticated users Hard to explain and interpret to unsophisticated users
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.2.1 OUTLINE OF TIME SERIES 2.2.1 OUTLINE OF TIME SERIES
STATIONARY TIME SERIES
METHOD OF DIFFERENCING METHOD OF DIFFERENCING
THE WHITE NOISE MODEL
ARIMA MODELS ARIMA MODELS
- DESCRIPTION OF THE MODELS
GENERAL STRATEGY - GENERAL STRATEGY
Identification
Estimation and testing Estimation and testing
Application
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
The Box-Jenkins model building
process
Model
identification
Differencing the
series to achieve
stationarity
Model estimation
Is model
adequate
Modify
model
No
?
Yes
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Forecasts
The Box-Jenkins model building
( ) process (cont.)
I. Identification
Data preparation
Transformthedatatostabilizethevariance -Transform the data to stabilize the variance
-Differencing the data to obtain stationary series
Model selection Model selection
-Examine the data, ACF and PACF to identify potential
models (Autocorrelation and Partial Autocorrelation
Function are valuable tools for investigating the properties
of a time series.)
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
The Box-Jenkins model building
( ) process (cont.)
II. Estimation and testing
Estimation
E i i i l d l -Estimate parameters in potential models
-Select best model using suitable criteria
Diagnostics Diagnostics
-Check ACF/PACF of residuals
-Do test of residuals
-Are the residuals white noise?
III A li ti III. Application
Forecasting: use the best model
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.2.1 OUTLINE OF TIME SERIES 2.2.1 OUTLINE OF TIME SERIES
STATIONARY TIME SERIES
METHOD OF DIFFERENCING METHOD OF DIFFERENCING
THE WHITE NOISE MODEL
ARIMA MODELS ARIMA MODELS
- DESCRIPTION OF THE MODELS
GENERAL STRATEGY - GENERAL STRATEGY
Identification
Estimation and testing Estimation and testing
Application
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Identification Tools Identification Tools
Correlogram graph showing the ACF and the PACF at different lags. g g p g g
Autocorrelation function (ACF)- Autocorrelations are statistical measures
indicating how a time series is related to itself over time
The autocorrelation at lag 1 is the correlation between the original series z The autocorrelation at lag 1 is the correlation between the original series z
t
and the same series moved forward one period (represented as z
t-1
)
n
y y y y ) )( (
=
+ =
=
n
t
t
k t
k t t
k
y y
y y y y
r
1
2
1
) (
) )( (
Partial autocorrelation function (PACF) measures the correlation between
(time series) observations that are k time periods apart, after controlling for
correlations at intermediate lags (i.e., lags less than k). In other words, it is the
correlation between Y
t
and Y
t-k
after removing the effects of intermediate Y
s
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
EXAMPLE:
(a) A correlogram of a nonstationary time series
(b) A correlogram of a stationary time series after 1st order (b) A correlogram of a stationary time series after 1st order
differencing
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
I. Model identification I. Model identification
Plot the data Plot the data
Identify any unusual observations
If necessary, transform the data to stabilize the variance (logarithmic or y, ( g
power transformation of the data)
Check the time series plot ACF PACF of the data (possibly Check the time series plot, ACF, PACF of the data (possibly
transformed) for stationarity.
IF
Time plot shows the data scattered horizontally around a constant mean
ACF and PACF to or near zero quickly
Then, the data are stationary.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Construction of the time series chart
Id tifi ti f t ti it
180
Identification of non stationarity
120
140
160
80
100
O
3
20
40
60
0
0 1000 2000 3000 4000 5000 6000 7000 8000
time
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
I. Model identification I. Model identification
Use differencing to transform the data into a stationary
series series
For no-seasonal data take first differences
For seasonal data take seasonal differences
Check the plots again if they appear non-stationary, take
the differences of the differenced data.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
FIRST DIFFERENCES OF THE DATA FIRST DIFFERENCES OF THE DATA
transform the data into a stationary series transform the data into a stationary series
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
I. Model identification I. Model identification
When the stationarity has been achieved, check the ACF
and PACF plots for any pattern remaining.
There are three possibilities
AR or MA models AR or MA models
No significant ACF after time lag q indicates that
MA(q) may be appropriate. MA(q) may be appropriate.
No significant PACF after time lag p indicates that
AR(p) may be appropriate. (p) ay be app op a e.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
I. Model identification I. Model identification
Seasonality is present if ACF and/or PACF at the
seasonal lags are large and significant. g g g
If no clear MA or AR model is suggested, a
i t d l b i t mixture model may be appropriate.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Theoretical Patterns of ACF and
AC PACF
Type of Type of
Model Model
Typical Pattern of Typical Pattern of
ACF ACF
Typical Typical
Pattern of Pattern of Model Model ACF ACF Pattern of Pattern of
PACF PACF
AR ( AR (pp)) Decays exponentially Significant spikes AR ( AR (pp)) Decays exponentially
or with damped sine
wave pattern or both
Significant spikes
through lags p
wave pattern or both
MA ( MA (qq)) Significant spikes
through lags q
Declines
exponentially through lags q exponentially
ARMA ( ARMA (p,q p,q)) Exponential decay Exponential
d
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
decay
Theoretical Patterns of ACF and
PACF: Examples
0
0.05 0.4
-0.2
-0.15
-0.1
-0.05
0
1 2 3 4 5 6 7 8 9 10
n
d
p
a
c
f
0
0.1
0.2
0.3
n
d
p
a
c
f
acf
pacf
-0.4
-0.35
-0.3
-0.25
0.2
a
c
f
a
n
acf
pacf
-0.3
-0.2
-0.1
0
1 2 3 4 5 6 7 8 9 10
a
c
f
a
n
-0.45
Lag
-0.4
Lags
0 5
0.6
0.8
MA(1) MA(2)
0.3
0.4
0.5
f
a
n
d
p
a
c
f
acf
pacf
0.2
0.4
0.6
f
a
n
d
p
a
c
f
acf
pacf
0
0.1
0.2
1 2 3 4 5 6 7 8 9 10
a
c
f
-0.2
0
1 2 3 4 5 6 7 8 9 10
a
c
f
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
-0.1
Lags
-0.4
Lags
ARMA AR(1)
I. Model identification I. Model identification
Thegeneral non seasonal model isknownas The general non-seasonal model is known as
ARIMA (p, d, q):
p is the number of autoregressive terms.
d is the number of differences.
q is the number of moving average terms.
TheARIMA modelscanbeextendedtohandleseasonal The ARIMA models can be extended to handle seasonal
components of a data series.
Th l h h d i i The general shorthand notation is
ARIMA (p, d, q)(P, D, Q)
s
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Where s is the number of periods per season.
I. Model identification I. Model identification
Note that if any of p, d, or q are equal to zero, the
model canbewritteninashorthandnotationby model can be written in a shorthand notation by
dropping the unused part.
l Example
ARIMA(2, 0, 0) = AR(2) ( ) ( )
ARIMA (1, 0, 1) = ARMA(1, 1)
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Example: ARIMA(1, 0, 0) time
d series data.
1.8
Time Series Plot of AR1 data series
The ACF and PACF can be used to identify
1.6
1.4
1.2
1.0
y
an AR(1) model.
The autocorrelations decay
exponentially.
200 180 160 140 120 100 80 60 40 20 1
0.8
0.6
0.4
0.2
There is a single significant partial
autocorrelation.
200 180 160 140 120 100 80 60 40 20 1
1.0
Autocorrelation Function for AR1 data series
(with 5% significance limits for the autocorrelations)
1.0
Partial Autocorrelation Function for AR1 data series
(with 5% significance limits for the partial autocorrelations)
c
o
r
r
e
l
a
t
i
o
n
0.8
0.6
0.4
0.2
0.0
t
o
c
o
r
r
e
l
a
t
i
o
n
0.8
0.6
0.4
0.2
0.0
A
u
t
o
c
-0.2
-0.4
-0.6
-0.8
-1.0
P
a
r
t
i
a
l
A
u
-0.2
-0.4
-0.6
-0.8
-1.0
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Lag
50 45 40 35 30 25 20 15 10 5 1
Lag
20 18 16 14 12 10 8 6 4 2
Example: ARIMA(0, 0, 1) time series
data.
1.8
Time Series Plot of MA1 data series
1.6
1.4
1.2
1.0
0.8
The ACF and PACF can be used to identify an
MA(1) model.
Note that there is only one significant
200 180 160 140 120 100 80 60 40 20 1
0.6
0.4
0.2
0.0
autocorrelation at time lag 1.
The partial autocorrelations decay
exponentially.
200 180 160 140 120 100 80 60 40 20 1
1.0
Autocorrelation Function for MA1 data series
(with 5% significance limits for the autocorrelations)
1.0
0 8
Partial Autocorrelation Function for MA1 data series
(with 5% significance limits for the partial autocorrelations)
c
o
r
r
e
l
a
t
i
o
n
0.8
0.6
0.4
0.2
0.0
u
t
o
c
o
r
r
e
l
a
t
i
o
n
0.8
0.6
0.4
0.2
0.0
A
u
t
o
c
-0.2
-0.4
-0.6
-0.8
-1.0
P
a
r
t
i
a
l
A
u
-0.2
-0.4
-0.6
-0.8
-1.0
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Lag
50 45 40 35 30 25 20 15 10 5 1
Lag
20 18 16 14 12 10 8 6 4 2
Example: seasonal ARIMA
(0,1,1)(0,1,1)
12
Time Series Plot of first difference of seasonal
We take a seasonal difference and then we
o
f
s
e
a
s
o
n
a
l
200
100
We take a seasonal difference and then we
difference the data again(achieve stationarity)
The PACF shows the exponential decay in values.
The ACF shows a significant value at time
l 1
f
i
r
s
t
d
i
f
f
e
r
e
n
c
e
o
0
-100
lag 1.
This suggest a MA(1) model.
The ACF also shows a significant value at time lag
12
Year
Month
1973 1972 1971 1970 1969 1968 1967 1966 1965 1964
J an J an J an J an J an J an J an J an J an J an
-200
Autocorrelation Function for first difference of seasonal
Partial Autocorrelation Function for first difference of seasonal
(with 5% significance limits for the partial autocorrelations)
12
This suggest a seasonal MA(1).
a
t
i
o
n
1.0
0.8
0.6
0.4
0 2
(with 5% significance limits for the autocorrelations)
r
r
e
l
a
t
i
o
n
1.0
0.8
0.6
0.4
0.2
(with 5% significance limits for the partial autocorrelations)
A
u
t
o
c
o
r
r
e
l
a0.2
0.0
-0.2
-0.4
-0.6
-0.8
P
a
r
t
i
a
l
A
u
t
o
c
o
r0.2
0.0
-0.2
-0.4
-0.6
-0.8
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Lag
40 35 30 25 20 15 10 5 1
-1.0
Lag
40 35 30 25 20 15 10 5 1
-1.0
2.2.1 OUTLINE OF TIME SERIES 2.2.1 OUTLINE OF TIME SERIES
STATIONARY TIME SERIES
METHOD OF DIFFERENCING METHOD OF DIFFERENCING
THE WHITE NOISE MODEL
ARIMA MODELS ARIMA MODELS :
DESCRIPTION OF THE MODELS
GENERAL STRATEGY GENERAL STRATEGY :
Identification
Estimation and testing Estimation and testing
Application
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
II. Estimating the parameters II. Estimating the parameters
Once a tentative model has been selected, the parameters for the
model must be estimated.(
1
,
2
,
p
)
e.g. ARIMA(1, 0, 0) :
t t t
e y C y + + =
1 1
j
is the jth auto regression parameter
e
t
is the error term at time t.
One method of estimating the parameters is the maximum
likelihood procedure.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
II. Estimating the parameters
Maximum likelihood procedure.
After the determination of the estimates and their
standard errors, the t values can be constructed.
The parameters that are judged significantly different
f f from zero are retained in the fitted model.
The parameters that are not significantly different from
f zero are dropped from the model.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
II. Estimation of AR, MA, and ARMA
Models
- There are two criteria often used that reflect the closeness of fit
of the model and the number of parameters estimated
One is the Akaike information criterion (AIC) - One is the Akaike information criterion (AIC):
AIC = -2*ln(likelihood) + 2*k
- And the other is the Schwartz Bayesian criterion (SBC) S y (S C)
The latter is also called the Bayesian information criterion (BIC):
BIC = -2*ln(likelihood) + ln(N)*k
where:
k = is the number of parameters estimated in the model: k = p+q+P+Q
N = number of observations
If we are considering several ARMA models, we choose the one
with lowest AIC or BIC
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
with lowest AIC or BIC.
II. Diagnostic Checking II. Diagnostic Checking
Before using the model for forecasting, it must be
checked for adequacy.
A model is adequate if the residuals left over after
fitting the model is simpl white noise fitting the model is simply white noise.
A test can also be applied to the residuals as an A test can also be applied to the residuals as an
additional test of fit.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
II. Diagnostic Checking II. Diagnostic Checking
The residuals are white noise.
The ACF and PACF are well within their two standard error limits.
example:
ACF of Residuals for ISC PACF of Residuals for ISC
example:
o
n
1.0
0.8
0.6
0.4
(with 5% significance limits for the autocorrelations)
a
t
i
o
n
1.0
0.8
0.6
0.4
(with 5% significance limits for the partial autocorrelations)
A
u
t
o
c
o
r
r
e
l
a
t
i
o
0.2
0.0
-0.2
-0.4
-0 6
P
a
r
t
i
a
l
A
u
t
o
c
o
r
r
e
l
a
0.4
0.2
0.0
-0.2
-0.4
0 6
Lag
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0.6
-0.8
-1.0
Lag
P
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
-0.6
-0.8
-1.0
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
II. Diagnostic Checking g g
=
m
k
r N Q
2
= k
k
Q
1
2
m
k n
n n LB
m
k
k
2
1
2
) 2 (
+ =
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.2.1 OUTLINE OF TIME SERIES 2.2.1 OUTLINE OF TIME SERIES
STATIONARY TIME SERIES
METHOD OF DIFFERENCING METHOD OF DIFFERENCING
THE WHITE NOISE MODEL
ARIMA MODELS ARIMA MODELS :
DESCRIPTION OF THE MODELS
GENERAL STRATEGY GENERAL STRATEGY :
Identification
Estimation and testing Estimation and testing
Application
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
III. Application
F ti d l t f t Forecasting: use model to forecast
We use the model to forecast future values
e.g. : The forecasts are generated by the
following equation. g q
2 2 1 1
+ + =
Y Y c Y
t t t
4 . 287 ) 300 ( 219 . ) 195 ( 324 . 9 . 284
219 . ) 324 . ( 9 . 284
64 65 66
= + =
+ + = Y Y Y
5 . 234 ) 195 ( 219 . ) 4 . 287 ( 324 . 9 . 284
219 .
) 324 . ( 9 . 284
65 66 67
= + =
+ + = Y Y Y
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
III.THE PREDICTIVE ABILITY OF THE
MODEL
How can we test whether a forecast is accurate or not?
Mean absolute error (MAE)
Root mean squared error (RMSE)
The mean absolute percentage error (MAPE)
h A i h l l d F i h f l where A
t
is the actual value and F
t
is the forecast value
The best model minimizes these error indicators
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
The best model minimizes these error indicators.
Forecasting: use model to forecast:
Example
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
3. APLICATION IN THE CASE OF
ATHENS
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
DATA SOURCES DATA SOURCES
Th H ll i Mi i t f E i t l E d The Hellenic Ministry of Environmental Energy and
Climate Change has a network of measuring
t ti f th i i ll t t stations of the main air pollutants:
Carbon monoxide CO
Nitrogen monoxide NO
Nitrogen dioxide NO
2
Sulfur dioxide SO
2
Ozone O
3
Particulate matter < 10m PM
10
Particulate matter < 2,5m PM
2 5
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
,
2,5
RESULTS RESULTS
1 St ti ti l l i f i ll t t b id (CO) t 3 t ti 1. Statistical analysis of air pollutant carbon monoxide (CO) at 3 stations:
- Patision(urbantraffic station)
- Peristeri(urban station)
- Lycovrisi(suburban station)
Study of statistical descriptive measures of annual, monthly, daily and hourly
concentrations of carbon monoxide.
2. Time series analysis of CO in the year 2011 with ARIMA models
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
1.STATISTICAL ANALYSIS:
M S d d d i i Mean-Standard deviation
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
1.STATISTICAL ANALYSIS:
Relative standard deviation or coefficient variation Relative standard deviation or coefficient variation
RSD of annual
concentration
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
1.STATISTICAL ANALYSIS:
Skewness - Kurtosis
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
1.STATISTICAL ANALYSIS:
Median
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
1.STATISTICAL ANALYSIS:
PercentilePatision station
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
1.STATISTICAL ANALYSIS:
Mean of monthly concentration-Patision station
Mean of monthly concentration
4,5
5,0
Meanofmonthlyconcentration
3,0
3,5
4,0
m
3
)
2,0
2,5
3,0
C
O
(
m
g
/
m
Patision2001
P ti i 2011
0,5
1,0
1,5
C
Patision2011
0,0
,
Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
MONTHS
1.STATISTICAL ANALYSIS:
Mean of monthly concentration-Peristeri station
Mean of monthly concentration
1,2
1,4
Meanofmonthlyconcentration
1,0
,
m
3
)
0,6
0,8
C
O
(
m
g
/
m
Peristeri2001
P i t i 2011
0,2
0,4
C
Peristeri2011
0,0
Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
MONTHS
1.STATISTICAL ANALYSIS:
Mean of daily concentration-Patision station
Mean of daily concentration
4,0
4,5
Meanofdailyconcentration
3,0
3,5
m
3
)
2,0
2,5
C
O
(
m
g
/
m
Patision2001
Patision 2011
0,5
1,0
1,5
Patision2011
0,0
0,5
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
DAYS
1.STATISTICAL ANALYSIS:
Mean of daily concentration-Peristeri station
Mean of daily concentration
0,9
1,0
Meanofdailyconcentration
0 6
0,7
0,8
3
)
0,4
0,5
0,6
C
O
(
m
g
/
m
Peristeri2001
P i t i 2011
0,2
0,3
C
Peristeri2011
0,0
0,1
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
DAYS
1.STATISTICAL ANALYSIS:
Mean of hourly concentration-Patision station
f
6,00
Meanofhourlyconcentration
4,00
5,00
)
3,00
C
O
(
m
g
/
m
3
Patision2001
1,00
2,00
C
Patision2011
0,00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
HOURS
1.STATISTICAL ANALYSIS:
Mean of hourly concentration-Peristeri station
f
1,40
1,60
Meanofhourlyconcentration
1,00
1,20
1,40
3
)
0 60
0,80
1,00
C
O
(
m
g
/
m
3
Peristeri2001
P i t i 2011
0 20
0,40
0,60
C
Peristeri2011
0,00
0,20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
HOURS
2.TIME SERIES ANALYSIS:
C f h h Construction of the time series chart
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.TIME SERIES ANALYSIS:
transform the data into a stationary series transform the data into a stationary series
We take a seasonal difference and then we difference the data again(achieve We take a seasonal difference and then we difference the data again(achieve
stationarity)
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.TIME SERIES ANALYSIS:
ACF and PACF
We check the ACF and PACF plots for any pattern remaining p y p g
The PACF shows the
ti l d i l exponential decay in values.
The ACF shows a significant
value at time
lag 2 lag 2.
This suggest a MA(2) model.
The ACF also shows a
significant value at time lag significant value at time lag
24
This suggest a seasonal MA(1).
A seasonal model ARIMA fits
our data
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.TIME SERIES ANALYSIS:
Akaike information criterion (AIC)
W th Ak ik i f ti it i (AIC) t fi d th b t We use the Akaike information criterion (AIC) to find the best
model for our data.
We choose the one with lowest AIC: We choose the one with lowest AIC:
ARIMA (1,1,2)X(1,1,1)
24
:
order of AR order of order of MA order of order of order of
differencing seasonal AR seasonal seasonal MA
SAR differencing SMA
ARIMA (1,1,2)X(1,1,1)
24
: (1-
1
24
)(1
1
)(1)(1
24
)x
t
=(1-
1
2
)(1
1
24
)
t
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Where B
j
x
t
=x
t-j
B B called backward shift operator and
1
,
p
,
1
,
q
,
1
p
,
1
,..
q
are parameters of
the model
2.TIME SERIES ANALYSIS:
Estimating the parameters
We use the maximum likelihood procedure to estimate the parameters of
the model: ARIMA (1,1,2)X(1,1,1)
24
Parameter Estimate
AR( ) 0 757063 AR(
1
) 0,757063
MA(
1
) 0,861985
MA(
2
) 0,0795282 MA(
2
) 0,0795282
SAR(
1
) 0,0622429
SMA(
1
) 0,987624
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.TIME SERIES ANALYSIS:
t t f id l test of residuals
Before using the model for forecasting, it must be checked for adequacy. g g, q y
The residuals are white noise. The ACF is well within their two standard error limits.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.TIME SERIES ANALYSIS:
Forecasting
We use the model to forecast future values
Time Sequence Plot for CO
ARIMA(1,1,2)x(1,1,1)24
11
actual
5
7
9
C
O
forecast
95,0% limits
1
3
C
0 2 4 6 8 10
(X 1000,0)
-1
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.TIME SERIES ANALYSIS:
THE PREDICTIVE ABILITY OF THE MODEL THE PREDICTIVE ABILITY OF THE MODEL
The three statistics Mean absolute error (MAE) Root mean squared error The three statistics , Mean absolute error (MAE), Root mean squared error
(RMSE) and the mean absolute percentage error (MAPE) measure the magnitude
of the errors.
In this case, the model was estimated from the first 8699 data values. 50 data In this case, the model was estimated from the first 8699 data values. 50 data
values at the end of the time series were withheld to validate the model. The table
shows the error statistics for both the estimation and validation periods.
If the results are considerably worse in the validation period, it means that the y p ,
model is not likely to perform as well as otherwise expected in forecasting the
future.
St ti ti E ti ti P i d V lid ti P i d Statistic Estimation Period Validation Period
RMSE 0,47183 0,634806
MAE 0,321131 0,491219 MAE 0,321131 0,491219
MAPE 28,4324 23,6307
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
2.TIME SERIES ANALYSIS:
Forecasting
Time series forecast
3,5
4
Timeseriesforecast
2,5
3
g
/
m
3
)
1
1,5
2
C
O
(
m
g
actual
forecast
0
0,5
1
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
It can be seen that the predicted values produced by our proposed model follow
the actual values of CO emissions closely. It not only shows that our proposed model is
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
y y p p
capable of forecasting CO emissions but also speaks out the usefulness of the model.
2.TIME SERIES ANALYSIS: Forecasting
R l i hi b l d i d l Relationship between actual and estimated values
y=0,859x+0,214
R = 0 831
10
CO(mg/m
3
)2011EstimatedActualvalues
R =0,831
6
8
4
E
s
t
i
m
a
t
e
d
0
2
2
0 2 4 6 8 10 12
Actual
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
CONCLUSIONS:
DESCRIPTIVE STATISTICAL ANALYSIS DESCRIPTIVE STATISTICAL ANALYSIS
From the application of descriptive statistical analysis of carbon monoxide in the From the application of descriptive statistical analysis of carbon monoxide in the
case of Athens conclude that:
The mean and the standard deviation of annual concentration of CO are
greater in urban traffic Patision station This is expected since the biggest greater in urban-traffic Patision station. This is expected since the biggest
source of CO is road transport. There is a downward trend in the concentration
of the pollutant over the years. This is due mostly to the replacement of old
hi l i h h l vehicles with new technology ones
The relative standard deviation is very large at the suburban station and
presents a stability over time in all three stations.
The skewness of annual concentration of CO is greater in suburban Lykovrisi
station. The asymmetry is positive in all three stations, showing that the tail of
the distribution is to the right of the peak. This means that there are many
values with high concentrations.
The kurtosis of annual concentration of CO is positive in all three stations. The
distribution is very fine, so the observed values are close to average.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
distribution is very fine, so the observed values are close to average.
CONCLUSIONS:
DESCRIPTIVE STATISTICAL ANALYSIS DESCRIPTIVE STATISTICAL ANALYSIS
Th di f l t ti f CO h i il b h i t th The median of annual concentration of CO has a similar behavior to the mean.
The mean of monthly concentration of CO in both stations Patision and Peristeri shows
a decline over the last decade. Higher values of carbon monoxide are observed in
winter months with a maximum in December. This is due to the weather and traffic
conditions during this period . A decrease is observed the summer months, due to
weather conditions and the decrease of traffic due to summer vacation.
Similarly, the mean of daily concentration of CO in both stations Patision and Peristeri
shows a decline over the last decade. Higher values of carbon monoxide are
observed in the midweek, while the concentrations of CO decrease during the , g
weekend.
CO pollution show two peaks: 8-10H and 20-22H.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
CONCLUSIONS:
TIME SERIES ANALYSIS TIME SERIES ANALYSIS
A non-stationary time series statistical model with trend and seasonal effects to A non-stationary time series statistical model with trend and seasonal effects to
predict future estimates of carbon monoxide emissions in Athens is also used. The
data set used consists of hourly atmospheric CO concentrations in 2011.
Th d l d i l t d t tt t th d f lit b i The developed processes is evaluated to attest the degree of quality by using
various statistical criteria.
Finally, the accuracy of the proposed models is tested, by predicting and analyzing
the CO emission for 50 data values. The results ashow that the model can well
predict the future values.
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France
Application of descriptive statistical analysis and
ti i l i t h i ll ti time series analysis on atmospheric pollution
Thank you very much for your
attention
Efthimios Zervas
Hellenic Open University - Greece Hellenic Open University Greece
zervas@eap.gr
I. Rethemiotaki, E. Zervas, HOU DEEE12, December 2-4, 2012, Paris-France