Académique Documents
Professionnel Documents
Culture Documents
slides on www.neural-forecasting.com
Sven F. Crone
Centre for Forecasting
Department of Management Science
Lancaster University Management School
email: s.crone@neural-foreasting.com
EVIC05 Sven F. Crone - www.bis-lab.com
C (t pj , o pj ) '
o f j (net pj ) if unit j is in the output layer
pj = pj slides, data & additional info on
f ' (net ) w
j pj pk pjk if unit j is in a hidden layer www.neural-forecasting.com
k
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Agenda
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting or Prediction?
K-means Feature Selection Association rules Temporal Decision trees Linear Regression Exponential
Clustering Princip.Component Link Analysis association rules Logistic regress. Nonlinear Regres. smoothing
Neural networks Analysis (S)ARIMA(x)
Neural networks Neural networks
K-means Disciminant MLP, RBFN, GRNN Neural networks
Clustering Analysis
Class Entropy
ALGORITHMS
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting or Classification
Independent Metric scale Ordinal scale Nominal scale
dependent
Metric sale Regression Analysis of Supervised learning
Time Series DOWNSCALE Variance
Inputs Target
Analysis
...
...
Ordinal scale ...
...
DOWNSCALE DOWNSCALE DOWNSCALE ...
Cases ...
...
...
Nominal scale Classification Contingency .. .. .. .. .. .. .. .. .. ... .. ..
......... . .
DOWNSCALE Analysis ...
Forecasting or Classification?
Agenda
Forecasting Models
Definition
Time Series is a series of timely ordered, comparable observations
yt recorded in equidistant time intervals
Notation
Yt represents the t th period observation, t=1,2 n
EVIC05 Sven F. Crone - www.bis-lab.com
Approach
Unfortunately we cannot observe either of these !!!
Forecasting methods try to isolate the systematic part
Forecasts are based on the systematic part
The random part determines the distribution shape
Assumption
Data observed over time is comparable
The time periods are of identical lengths (check!)
The units they are measured in change (check!)
The definitions of what is being measured remain unchanged (check!)
They are correctly measured (check!)
data errors arise from sampling, from bias in the instruments or the
responses, from transcription.
EVIC05 Sven F. Crone - www.bis-lab.com
Assumption
there exists a cause-effect relationship, that keeps repeating itself
with the yearly calendar
Cause-effect relationship may be treated as a BLACK BOX
TIME-STABILITY-HYPOTHESIS ASSUMES NO CHANGE:
Causal relationship remains intact indefinitely into the future!
the time series can be explained & predicted solely from previous
observations of the series
80 100
80
60
60
40
40 regular fluctuation within a year
20
Long term movement in series 20 (or shorter period)
0 0
superimposed on trend and cycle
12
15
18
21
24
27
30
3
9
Year
10
15
20
25
30
35
40
45
5
Year
Dia gra m 1.3: Cycle - Dia gra m 1.4: Irre gula r -
alte r nating ups w ings of var ie d le ngth r andom m ove m e nts and thos e w hich
and inte ns ity r e fle ct unus ual e ve nts
10 350
8 300
250
6
200
4 150
2 Regular fluctuation 100
0
superimposed on trend (period 50
may be random)
10
15
20
25
30
35
40
45
5
0
Year
10
19
28
37
46
55
64
73
82
1
EVIC05 Sven F. Crone - www.bis-lab.com
Signal
level L
trend T
seasonality S
Noise
irregular,error 'e'
45
35
30
25
PULSE 20
15
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
[t]
trended / seasonal
40
etc. development
30
20
10
Okt 00
Okt 01
Nov 00
Dez 00
Jan 01
Feb 01
Mrz 01
Mai 01
Jun 01
Jul 01
Nov 01
Dez 01
Jan 02
Feb 02
Mrz 02
Mai 02
Jun 02
Jul 02
Sep 00
Sep 01
Aug 00
Apr 01
Aug 01
Apr 02
Trend changes (slope, direction) Datenwert original Korrigierter Datenwert
Seasonal pattern changes & shifts STATIONARY time series with level shift
EVIC05 Sven F. Crone - www.bis-lab.com
Time Series
Level
Trend
Random
Time Series
decomposed into Components
EVIC05 Sven F. Crone - www.bis-lab.com
REGULAR IRREGULAR
Time Series Patterns Time Series Patterns
40 25
50000 50
10
35
40000 20
30 40
8
25
30000 15
30
20 6
20000 10
15 20
4
10
10000 10 5
5 2
0 0 0
0 [t]
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
[t]
Jul 77
Jul 78
Jul 79
Jul 80
Jan 77
Mrz 77
Mai 77
Nov 77
Jan 78
Mrz 78
Mai 78
Nov 78
Jan 79
Mrz 79
Mai 79
Nov 79
Jan 80
Mrz 80
Mai 80
Nov 80
[t]
Sep 77
Sep 78
Sep 79
Sep 80
[t] [t]
0
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
-10 -5
Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert
Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert
Sales or 800
750
observation of Yt 700
650
Different possibilities to
time series at point t 600 combine components
550
500
[t]
Jan 60
Mrz 60
Mai 60
Jul 60
Nov 60
Jan 61
Mrz 61
Mai 61
Jul 61
Nov 61
Jan 62
Mrz 62
Mai 62
Jul 62
Nov 62
Jan 63
Mrz 63
Mai 63
Jul 63
Nov 63
Sep 60
Sep 61
Sep 62
Sep 63
Datenwert original Korrigierter Datenwert
= Original-Zeitreihen
consists of a 45
40
combination of f( ) 35
30
25
20
15
10
0 [t]
Jan 77
Mrz 77
Mai 77
Jul 77
Nov 77
Jan 78
Mrz 78
Mai 78
Jul 78
Nov 78
Jan 79
Mrz 79
Mai 79
Jul 79
Nov 79
Jan 80
Mrz 80
Mai 80
Jul 80
Nov 80
Sep 77
Sep 78
Sep 79
Sep 80
Base Level + Datenwert original Korrigierter Datenwert
St , Original-Zeitreihen
Additive Model
Seasonal Component 12
10
6
Yt = L + St + Tt + Et
4
[t]
0
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
Tt ,
Datenwert original Korrigierter Datenwert
60000
Et 50000
40000
Irregular or 30000
20000
Component 0
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
[t]
Additive
Trend Effect
Multiplicative
Trend Effect
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA Differencing
2. SARIMA Autoregressive Terms
3. SARIMA Moving Average Terms
4. SARIMA Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
p ( B)(1 B ) Z t = + q ( B)et
d
EVIC05 Sven F. Crone - www.bis-lab.com
Model selection
Examine ACF & PACF
Identify potential Models (p,q)(sq,sp) auto
Model Application
Model Application
Use selected model to forecast
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Modelling
ARIMA(p,d,q)-Models
ARIMA - Autoregressive Terms AR(p), with p=order of the autoregressive part
ARIMA - Order of Integration, d=degree of first differencing/integration involved
ARIMA - Moving Average Terms MA(q), with q=order of the moving average of error
SARIMAt (p,d,q)(P,D,Q) with S the (P,D,Q)-process for the seasonal lags
Objective
Identify the appropriate ARIMA model for the time series
Identify AR-term
Identify I-term
Identify MA-term
Identification through
Autocorrelation Function
Partial Autocorrelation Function
EVIC05 Sven F. Crone - www.bis-lab.com
Recap:
Let the mean of the time series at t be t = E (Yt )
and = cov Y , Y
t ,t ( t t )
t ,t = var (Yt )
Definition
A time series is stationary if its mean level t is constant for all t
and its variance and covariances t- are constant for all t
In other words:
all properties of the distribution (mean, varicance, skewness, kurtosis etc.) of a
random sample of the time series are independent of the absolute time t of
drawing the sample identity of mean & variance across time
EVIC05 Sven F. Crone - www.bis-lab.com
1 2 3 4
2nd order differences: d=2
xt-xt-1
EVIC05 Sven F. Crone - www.bis-lab.com
Integration
Differencing
Z t = Yt Yt 1
Transforms: Logarithms etc.
Where Zt is a transform of the variable of interest Yt
chosen to make Zt-Zt-1-(Zt-1-Zt-2)- stationary
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA Differencing
2. SARIMA Autoregressive Terms
3. SARIMA Moving Average Terms
4. SARIMA Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Problems
Independence of residuals often violated (heteroscedasticity)
Determining number of past values problematic
(Y Y )(Y
t t k Y )
k = t = k +1 n
(Y Y )
t =1
t
k 220
Graphical interpretation
200
180
160
low autocorrelations
X_t-1
140
100 120 140 160 180 200 220
[x_t]
EVIC05 Sven F. Crone - www.bis-lab.com
ARIMA-Modells: Parameter p
E.g. time series Yt 7, 8, 7, 6, 5, 4, 5, 6, 4.
ACF
0.6 Autocorrelations rt gathered at
lags 1, 2, make up the
autocorrelation function (ACF)
0.0
1 2 3 lag
-0.6
EVIC05 Sven F. Crone - www.bis-lab.com
.5 .5
0.0 0.0
-.5 -.5
Confidence Limits Confidence Limits
ACF
ACF
-1.0 Coefficient -1.0 Coefficient
1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15
2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16
1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
0,4 0,4
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
0,4 0,4
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
0,4 0,4
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
AR(2) model:
=ARIMA (2,0,0)
Yt = c + 1Yt 1 + 2Yt 2 + et
EVIC05 Sven F. Crone - www.bis-lab.com
Yt = c + 1Yt 1 + et
5
= 1.1 + 0.8Yt 1 + et
1
0
1 4 7
10
13
16
19
22
25
28
31
34
37
40
43
46 49
-1
-2
-3
-4
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA Differencing
2. SARIMA Autoregressive Terms
3. SARIMA Moving Average Terms
4. SARIMA Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Yt = c + et 1et 1 2 et 2 ... q et q
ARIMA(0,0,q)-model = MA(q)-model
1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
MA(1) model:
(0,0,1)
Yt = c + et 1et 1 =ARIMA
EVIC05 Sven F. Crone - www.bis-lab.com
1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
0,6 1st, 2nd & 3rd lag significant 0,6 Pattern in PACF
0,4 0,4
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
Yt = c + et 1et 1
3
2,5
2
1,5
1
= 10 + et + 0.2et 1
0,5
0
-0,5 1 4 7 13
10 16
19
22
25
28
31
34
37
40
43
46
49
-1
-1,5
-2 Period
EVIC05 Sven F. Crone - www.bis-lab.com
Yt = c + 1Yt 1 + et 1et 1
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA Differencing
2. SARIMA Autoregressive Terms
3. SARIMA Moving Average Terms
4. SARIMA Seasonal Terms
5. SARIMAX Seasonal ARIMA with Interventions
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Seasonality in ARIMA-Models
1
4
7
10
13
16
19
22
25
28
31
34
Seasonal: sYt=(1-Bs)Yt -,4000
Seasonality in ARIMA-Models
1.0000 ACF
.8000
.6000
ACF
.4000
Upper Limit
.2000
Low er Limit
.0000 Seasonal spikes
-.2000 .8000
= monthly data
1
4
7
10
13
16
19
22
25
28
31
34
-.4000 .6000
.4000
PACF
.2000
Upper Limit
.0000
Low er Limit
-.2000
1
4
7
10
13
16
19
22
25
28
31
34
-.4000
-.6000
PACF
EVIC05 Sven F. Crone - www.bis-lab.com
Seasonality in ARIMA-Models
s Yt = Yt-Yt-s =YtBsYt=(1-Bs)Yt
Seasonal difference followed by a first difference: (1-B) (1-Bs)
Yt
Seasonal ARIMA(1,1,1)(1,1,1)4-modell
(1 1B ) (1 1B ) (1 B ) (1 B
4 4
)t
Y = c + (1 1 B ) ( 1 ) et
1 B 4
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA Differencing
2. SARIMA Autoregressive Terms
3. SARIMA Moving Average Terms
4. SARIMA Seasonal Terms
5. SARIMAX Seasonal ARIMA with Interventions
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting Models
Causal Prediction
ARX(p)-Models
Agenda
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Pattern or noise?
Pattern or noise?
Pattern or noise?
Informatics 350
2
R = 0.9036
35
[citations]
200 20
Engineering
control applications in plants 150 15
automatic target recognition (DARPA) 100 10
explosive detection at airports
Mineral Identification (NASA Mars Explorer) 50 5
starting & landing of Jumbo Jets (NASA)
0 0
Meteorology / weather
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
Rainfall prediction [year]
ElNino Effects
Corporate Business Number of Publications by
Business Forecasting Domain
credit card fraud detection
simulate forecasting methods 5
Averages 25%
Autoregressive Methods 7%
[Forecastinng Method]
Exponential Smoothing 24%
Trendextrapolation 35%
Causal Methods
Econometric Models 23%
Neural Networks 9%
(objective) 23%
Regression 69%
Analogies 23%
Delphi 22%
Judgemental Methods
PERT 6%
(subjective) 2x%
Surveys 49%
[number replies]
Agenda
PQ-diagramm
in sample observations & forecasts out of sample
training validate = Test
Time series
actual value
Absolute Forecasting Errors
NN forecasted value
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
History
Developed in interdisciplinary Research (McCulloch/Pitts1943)
Motivation from Functions of natural Neural Networks
neurobiological motivation
application-oriented motivation [Smith & Gupta, 2000]
Turing Hebb Dartmouth Project Minsky/Papert Werbos 1st IJCNN 1st journals
1936 1949 1956 1969 1974 1987 1988
McCulloch / Pitts Rosenblatt Kohonen Rumelhart/Hinton/Williams
1943 1959 1972 1986
Minski 1954 INTEL1971 IBM 1981 Neuralware 1987 SAS 1997
builds 1st NeuroComputer 1st microprocessor Introduces PC founded Enterprise Miner
GE 1954 White 1988 IBM 1998
1st computer payroll system 1st paper on forecasting $70bn BI initiative
Agenda
o2 wi,2 j
oi
oj wi,j
n +1
n +5
n+2
n+6
n +3
n+ h
Mathematics as abstract n+4
representations of reality
use in software simulators, oi = tanh w ji o j i
j
hardware, engineering etc.
neural_net = eval(net_name);
[num_rows, ins] = size(neural_net.iw{1});
[outs,num_cols] = size(neural_net.lw{neural_net.numLayers,
neural_net.numLayers-1});
if (strcmp(neural_net.adaptFcn,''))
net_type = 'RBF';
else net_type = 'MLP';
end
fid = fopen(path,'w');
EVIC05 Sven F. Crone - www.bis-lab.com
Alternative notations
Information processing in neurons / nodes
Biological Representation
Graphical Notation
Input Input Function Activation Function Output
in1 wi,1
in2
wi,2
neti = wij in j ai = f ( neti j ) outi
i => weights wi j
wi,j
inj Neuron / Node ui
=
Unidirectional Information Processing
1 if
w o
j
ji j i 0
outi =
0 if w o ji j i < 0
j
EVIC05 Sven F. Crone - www.bis-lab.com
Input Functions
EVIC05 Sven F. Crone - www.bis-lab.com
2.2
o1 w1,i
0.71 2.2* 0.71
+4.0*-1.84
+1.0* 9.01
w2,i -1.84 = 3.212 -4.778 < 0
4 o2 0.00
0.00 oj
3.212
- 8.0
9.01 = -4.778
o3 w3,i
1
8.0
=o0 w0,i 1 w ji o j i 0
j
oi =
neti = wij o j j ai = f (net i ) 0 w ji o j i < 0
j j
EVIC05 Sven F. Crone - www.bis-lab.com
Hyperbolic Tangent
Logistic Function
EVIC05 Sven F. Crone - www.bis-lab.com
2.2
o1 w1,i
0.71 2.2* 0.71
+4.0*-1.84
+1.0* 9.01
w2,i -1.84 = 3.212 Tanh(-4.778)
4 o2 -0.9998
= -0.9998 oj
3.212
- 8.0
9.01 = -4.778
o3 w3,i
1
8.0
=o0 w0,i
neti = wij o j j ai = f (net i ) oi = tanh w ji o j i
j j
EVIC05 Sven F. Crone - www.bis-lab.com
y = o + 1 x1 + 2 x2 + ... + n xn +
1
0
x1
1
x2 2
x
n
n n y
n
xn
Also: n +1
un+1
yt
u1
yt-1 n+2
u2 un+2
n +5 yt+1
yt-2
un+5
...
u3
n +3
un+3
yt-n-1
un
Simplification
n+4
un+4
for complex models!
EVIC05 Sven F. Crone - www.bis-lab.com
Combination of Nodes
wl,j e i=1
e i=1
ol
=
2
N
N N
oi wij j
w1,i tanh oi wij j e i=1
oi wij j
o1 i =1
+ e i=1
2
N
oi wij j oi wij j
N
w2,i e i=1
e i=1
wi,j
o2
=
2
oj
N
oi wij j
N
w3,i e i=1
oi wij j
N
o3
+ e i=1
tanh oi wij j
i =1
wi,j+1
2
N
oi wij j oi wij j
N
e i=1
e i=1
ol wl,j
= 2
N
oi wij j
N
oi wij j
EVIC05 Sven F. Crone - www.bis-lab.com
X4 u4 u7 u11
Agenda
Hebbian Learning
wij = oi a j
x1 x W
1 2 3 4 5 6 7 8 9 10
1 5 o1 1
9 2
x2 3
2 6 4
o2 5
6
x3 10 7
3 7 8
9
w3,8 w3,10 10
8
E=o-t
o
Output-Vector
t
Teaching-Output
EVIC05 Sven F. Crone - www.bis-lab.com
k net pk o pj
=
k net pk
i
o pj 1+ e i
C (t pj , o pj ) o j (1 o j )( t j o j ) output nodes j
= wkj = pj wkj
net pk j =
o j (1 o j ) ( k w jk ) hidden nodes j
k k
C (t pj , o pj ) '
o f j (net pj ) if unit j is in the output layer
pj = pj
f ' (net ) w
pj pk pjk if unit j is in a hidden layer
j k
EVIC05 Sven F. Crone - www.bis-lab.com
random random
starting point 1 starting point 2
local
minimum
GLOBAL
minimum wj
EVIC05 Sven F. Crone - www.bis-lab.com
-2
-4
local search
stepsize fixed
follow steepest decent
2
local optimum = any valley
global optimum = deepest
valley with lowest error
0
0
-2
-2 -4
6
4
0 5
2
2.5
0
2 0
-2.5
4 -4
-2 0
0
2
4
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
y t + h = f (xt ) + t + h
yt+h = forecast for t+h
f (-) = linear / non-linear function
xt = vector of observations in t
et+h = independent error term in t+h
Interpretation
weights represent
autoregressive terms
Same problems /
shortcomings as
standard AR-models!
yt Extensions
u1
yt-1
multiple output nodes
u2 = simultaneous auto-
yt+1
regression models
yt-2 uj
u3 Non-linearity through
... different activation
function in output
yt-n-1
un node
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
yt +1 = yt wtj + yt 1 wt 1 j + yt 2 wt 2 j + ... + yt n 1 wt n 1 j j
linear autoregressive AR(p)-model
EVIC05 Sven F. Crone - www.bis-lab.com
Extensions
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 ) additional layers with
t n 1 nonlinear nodes
yt +1 = tanh
i =t
yi wij j
linear activation
Nonlinear autoregressive AR(p)-model function in output layer
EVIC05 Sven F. Crone - www.bis-lab.com
Interpretation
Autoregressive
modeling AR(p)-
approach WITHOUT
the moving average
terms of errors
nonlinear ARIMA
n +1
Similar problems /
un+1
yt shortcomings as
u1 standard AR-models!
yt-1 n+2
u2 un+2
n+5 yt+1
yt-2
un+5 Extensions
...
u3
n+3
un+3 multiple output nodes
yt-n-1
un = simultaneous auto-
n+4 regression models
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
un+4
yt +1 = tanh wkj tanh wki tanh w ji yt j j i k
k i
j
Nonlinear autoregressive AR(p)-model
EVIC05 Sven F. Crone - www.bis-lab.com
Interpretation
As single
Autoregressive
modeling AR(p)
n +1
n +5
n+2
n+6
n +3
n+h
n+4
yt +1 , yt + 2 ,..., yt + n = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
Interpretation
As single
Autoregressive
modeling AR(p)
Additional Event term
to explain external
events
n +1
Extensions
n+2 multiple output nodes
n+5 = simultaneous
multiple regression
n +3
n+4
yt +1 , yt + 2 ,..., yt + n = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
Max Temperature
Rainfall
Sunshine Hours
y = f ( x1 , x2 , x3 ,..., xn )
y = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj j
Linear Regression Model
EVIC05 Sven F. Crone - www.bis-lab.com
Max Temperature
Rainfall
Sunshine Hours
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
t n 1
yt +1 = log
i =t
yi wij j
Nonlinear Multiple (Logistic) Regression Model
EVIC05 Sven F. Crone - www.bis-lab.com
Interpretation
Similar to linear
Multiple Regression
Modeling
Without nonlinearity
in output: weighted
n +1 expert regime on non-
linear regression
With nonlinearity in
n+2 output layer: ???
n +5
n +3
n+4
y = f ( x1 , x2 , x3 ,..., xn )
y = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj j
Nonlinear Regression Model
EVIC05 Sven F. Crone - www.bis-lab.com
Focus
Problem!
BUT:
Can model MA(q)-process through extended AR(p) window!
Can model SARMAX-processes through recurrent NN
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
???
Simulation of
Neural Network prediction of
Artificial Time Series
EVIC05 Sven F. Crone - www.bis-lab.com
; ;
; ; ;
; ; ;
Agenda
Data Pre-processing
Transformation
Scaling
Normalizing to [0;1] or [-1;1]
NN Modelling Process
Modelling of NN architecture
Number of INPUT nodes
Number of HIDDEN nodes
Number of HIDDEN LAYERS manual
Number of OUTPUT nodes Decisions recquire
Information processing in Nodes (Act. Functions)
Interconnection of Nodes Expert-Knowledge
Training
Initializing of weights (how often?)
Training method (backprop, higher order )
Training parameters
Evaluation of best model (early stopping)
Evaluation
Evaluation criteria & selected dataset
EVIC05 Sven F. Crone - www.bis-lab.com
D= [DSE DSA ]
Dataset Selection Sampling
P= [C N S ]
Preprocessing Correction Normalization Scaling
A= [NI NS NL NO K T ]
Architecture no. of input no. of hidden no. of hidden no. of output connectivity / Activation
nodes nodes layers nodes weight matrix Strategy
U= [FI FA FO ]
signal processing Input Activation Output
function Function Function
L= [G PT,L IP IN B ]
learning algorithm choice of Learning initializations number of stopping
Algorithm parameters procedure initializations method &
phase & layer parameters
O
objective Function
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Data Preprocessing
Data Transformation
Verification, correction & editing (data entry errors etc.)
Coding of Variables
Scaling of Variables
Selection of independent Variables (PCA)
Outlier removal
Missing Value imputation
Data Coding
Binary coding of external events binary coding
n and n-1 coding have no significant impact, n-coding appears to be more
robust (despite issues of multicollinearity)
Outliers
extreme values
Coding errors
Data errors
Actions 0 10 253 -1 +1
Eliminate outliers (delete records)
replace / impute values as missing values
Binning of variable = rescaling
Normalisation of variables = scaling
EVIC05 Sven F. Crone - www.bis-lab.com
Asymmetry
of observations
Transform data
Transformation of data (functional transformation of values)
Linearization or Normalisation
Rescale (DOWNSCALE) data to allow better analysis by
Binning of data (grouping of data into groups) ordinal scale!
EVIC05 Sven F. Crone - www.bis-lab.com
2 = Woman
Recode as 1 of N Coding 3 new bit-variables
1 0 0 Business Press
0 1 0 Sports & Fun
0 0 1 Woman
Recode 1 of N-1 Coding 2 new bit-variables
1 0 Business Press
0 1 Sports & Fun
0 0 Woman
EVIC05 Sven F. Crone - www.bis-lab.com
Solutions
Missing value of interval scale mean, median, etc.
Missing value of nominal scale most prominent value in feature
set
EVIC05 Sven F. Crone - www.bis-lab.com
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Interconnection of Nodes
???
EVIC05 Sven F. Crone - www.bis-lab.com
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
40,0
35,0
30,0
25,0
[MAE]
20,0
15,0
10,0
5,0
lowest error
0,0
highest error
1 5 10 25 50 100 200 400 800 1600
[initialisations]
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
Experimental Results
0 ,2 5
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
significant positive correlations
training & validation set
AA
A
A
A
A
A
A
A A
A
A A A
A
AAA
A
A
A
A
A
A
A
A
A A
A
A
A A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
10
A
A
A
A
A A
A
A AA AA
A A
A
A
AA A A A
A
A A
AAA A AA AA
A
A
AAA
A A A A A A
A
A
AA
A
AA
A
A A
A A AAAAA A AAAAA
A A
A
A
AA
A
A
A
A
A
A
AAA
AA
A
A AA
AAA
AAAA
A
A A
AA A A
AA AA
A A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
AA
A A
AAA A
AAA A
A
AA
A
AA
A A
AA AA A
AA
A
A
A
A
A
A
A
A
AA
AA
A A
A
AAAAA
A AAA
A A A AA
A
A
A A
AAA A A A A
A A AAAA A
AA A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
AA
AAAAA
AAA
A
AA A
A
AA
A
A
A
A
AA A
6
A A A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
AA
A
AA
A
AAA
AA
A
AA
A
A
A
A
AA
AA
A A
AA
A
AA
A
A A AA
A A
A A A
A
AA
A A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
AA
A
AA
A
A
AA
AA
A
A
AAA
AA
A
AA
A
A
AA
A
AA
A A
A
AAA A
A AAA A A A
A
AA
A
A
A AAA
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
AA
A
AA
AA
A
AA
A
AA
AA
A
AA
AAAA
A A A A
A AA
A
A
AA
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
AA
A
AA
A
A
AAA
A
AA
A
A
AA
A
A
A
AA
A
A
AA
AA
AA
A AA A A
AA
A AA
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
AA
AA
AA
A
A
A
AA
AA
A
AA
A
AA
AA
AA
A
AAAA
AA A
A A
A A
A
AA
A
A
A
AAAA
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
A
A
AA
A
AA
A
AA
AA
A
A
A
AAAA
A
AA
A
A
AAA
AA A A A AAA
0 ,1 0 A
AAAA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
AA
AAA
A
A
A
AA
A
AA
AA
AA
A
AA
A AA
AA A
AAA
A
A
AA
A
A
AA
A
A
AA
A
A
A
AA
AA
AA
A
A
A
A
A
AA
A
A
AA
A
A
AA
A
AA
A
A A A
A
AAAAA
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
AA
A
A
AA A
A AAA A
A
AAA
A
AA
AA
A
A
AA
AA
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
AA
AA
A
AAA
A
AA
A
A
AA
A A A
A A
AA
AA
A
A
A
A
AA
A
A
AA
A
A
AA
A
A
A
AA
AA
A
A
AA
AA
A
A
A
A
A
AA
A
A
AA
A
A
A
AAA A
A
AAA A A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
AAA
A
A
AA AA
A
AAA
AA
AA
AA
A
A
AA
A
A
AA
A
AA
A
A
AA
A
A
A
AA
A
AAA
A
A AAA A
AA
A
A
AA
AA
A
A
AAA
A
A
A
AA
A
AA
A
A
AA
AA
A
AA
A
AA
AA
A AAAAA
A
0,00 0 ,10
0,15
low validation error high test error
higher validation error lower test error
0 ,0 5 0 ,0 5
0,10
0 ,1 5 valid
train
EVIC05 Sven F. Crone - www.bis-lab.com
decreasing correlation
S-Diagramm of Error Correlations E Tain
0,09 E Valid
for TOP 1000 ANNs by Test error
0,08
E Tain
E Test
20 ANN mov.Average Train
20 ANN mov.Average Test
0,035 E Valid
0,07
S-Diagramm of Error Correlations E Tain
high variance on test error
E Test
0,03 0,06 20 ANN mov.Average Train
20 ANN mov.Average Test
for TOP 1000 ANNs by Valid error E Valid
0,05
[Error]
0,025
0,3 0,04
E Test
[Error]
0,08 0,03
E Valid
0,015 0,02
E Test 150 MovAv. E Train
0,15
0,04
0,03
0,1
0,02
0,05
0,01
00
11 64 127
1056 2111190
31662534221
316 5276
379 6331
442 505
738656884416319496
69410551
757 11606
820 883
12661946
13716
[ordered ANN
[ordered ANN by
by Valid
Valid error]
error]
EVIC05 Sven F. Crone - www.bis-lab.com
MAPE & MSE are subject to upward bias by single bad forecast
Alternative measures may are based on median instead of mean
ef ,t
MdAPEf = Med 100
yt
Median Squared Error
MdSEf = Med(ef2,t )
EVIC05 Sven F. Crone - www.bis-lab.com
(
yt +f t y t +f ) 2
y
U= t
(y t y t +f )
2
y
t
EVIC05 Sven F. Crone - www.bis-lab.com
Simulation Experiments
EVIC05 Sven F. Crone - www.bis-lab.com
Agenda
1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!
EVIC05 Sven F. Crone - www.bis-lab.com
Valid Experiments
Evaluate using ex ante accuracy (HOLD-OUT data)
Use training & validation set for training & model selection
NEVER!!! Use test data except for final evaluation of accuracy
Evaluate across multiple time series
Evaluate against benchmark methods (NAVE + domain!)
Evaluate using multiple & robust error measures (not MSE!)
Evaluate using multiple out-of-samples (time series origins)
Evaluate as Empirical Forecasting Competition!
Reliable Results
Document all parameter choices
Document all relevant modelling decisions in process
Rigorous documentation to allow re-simulation through others!
EVIC05 Sven F. Crone - www.bis-lab.com
Forecasting Competition
Split up time series data 2 sets PLUS multiple ORIGINS!
Select forecasting model
select best parameters for IN-SAMPLE DATA
Forecast next values for DIFFERENT HORIZONS t+1, t+3, t+18?
Evaluate error on hold out OUT-OF-SAMPLE DATA
choose model with lowest AVERAGE error OUT-OF-SAMPLE DATA
Results M3-competition
simple methods outperform complex ones
exponential smoothing OK
neural networks not necessary
forecasting VALUE depends on
VALUE of INVENTORY DECISION
EVIC05 Sven F. Crone - www.bis-lab.com
t+1 t+2 t+3
SIMULATED = EX POST Forecasts
EVIC05 Sven F. Crone - www.bis-lab.com
EVIC05 Sven F. Crone - www.bis-lab.com
Further Information
Journals
Forecasting rather than technical Neural Networks literature!
JBF Journal of Business Forecasting
IJF International Journal of Forecasting
JoF Journal of Forecasting
Agenda
1. Process of NN Modelling
2. Tips & Tricks for Improving Neural Networks based forecasts
a. Copper Price Forecasting
b. Questions & Answers and Discussion
a. Advantages & Disadvantages of Neural Networks
b. Discussion
EVIC05 Sven F. Crone - www.bis-lab.com
- ANN can forecast any time - ANN can forecast any time
series pattern (t+1!) series pattern (t+1!)
- without preprocessing - without preprocessing
- no model selection needed! - no model selection needed!
- ANN offer many degrees of - ANN offer many degrees of
freedom in modeling freedom in modeling
- Freedom in forecasting with - Experience essential!
one single model - Research not consistent
- Complete Model Repository - explanation & interpretation
- linear models of ANN weights IMPOSSIBLE
- nonlinear models (nonlinear combination!)
- Autoregression models
- impact of events not directly
- single & multiple regres.
deductible
- Multiple step ahead
-
EVIC05 Sven F. Crone - www.bis-lab.com
Contact Information
Sven F. Crone
Research Associate
Internet www.lums.lancs.ac.uk
eMail s.crone@lancaster.ac.uk