Académique Documents
Professionnel Documents
Culture Documents
net pj
pj =
net pj
net pk
o pj
w ji
w ji
C (t pj , o pj )
net pj
C (t pj , o pj )
net pj
C (t pj , o pj ) o pj
o pj
net pj
How to on Neural
Network Forecasting
with limited maths!
= f j' ( net pj )
pj =
C (t pj , o pj ) net pk
C (t pj , o pj )
C (t pj , o pj ) net pj
pj =
o pt
p w ji
C (t pj , o pj )
o pj
f j' (net pj )
C (t pj , o pj )
wki o pi
net pk
o pj
C (t pj , o pj )
net pk
wkj = pj wkj
k
C (t pj , o pj ) '
f j (net pj ) if unit j is in the output layer
o
pj
pj =
f ' (net ) w
if unit j is in a hidden layer
pj pk pjk
j
k
Agenda
1.
Forecasting?
2.
Neural Networks?
3.
4.
Agenda
1.
Forecasting?
1.
2.
3.
2.
Neural Networks?
3.
4.
Forecasting or Prediction?
Data Mining: Application of data analysis algorithms &
TASKS
Descriptive
Data Mining
Categorisation: Summarisation
Clustering
& Visualisation
K-means
Clustering
Neural networks
Feature Selection
Association rules
ALGORITHMS
Association
Analysis
Sequence
Discovery
Temporal
association rules
Categorisation:
Classification
Predictive
Data Mining
Regression
Decision trees
Linear Regression
Logistic regress.
Nonlinear Regres.
Neural networks
Neural networks
MLP, RBFN, GRNN
Disciminant
Analysis
Time Series
Analysis
Exponential
smoothing
(S)ARIMA(x)
Neural networks
Forecasting or Classification
Independent
Metric scale
Ordinal scale
Nominal scale
dependent
Metric sale
Regression
Time Series
Analysis
Analysis of
DOWNSCALE
Variance
Supervised learning
Inputs
Ordinal scale
DOWNSCALE
Nominal scale
DOWNSCALE
Classification
NONE
Component
Analysis
Cases
Contingency
DOWNSCALE
Principal
DOWNSCALE
Analysis
Clustering
Target
...
...
...
...
...
...
...
...
.. .. .. .. .. .. .. .. .. ... .. ..
......... . .
...
Unsupervised learning
Inputs
Cases
...
...
...
...
...
...
...
...
.. .. .. .. .. .. .. .. .. ... ..
......... .
...
Forecasting or Classification?
What the experts say
Agenda
1.
Forecasting?
1.
2.
3.
SARIMA-Modelling
4.
2.
Neural Networks?
3.
4.
Forecasting Models
Time series analysis vs. causal modelling
Objective
Forecasting Methods
Time Series
Methods
Subjective
Forecasting Methods
Prophecy
educated guessing
Causal
Methods
Averages
Linear Regression
Moving Averages
Multiple Regression
Analogies
Naive Methods
Dynamic Regression
Delphi
Exponential Smoothing
Simple ES
Linear ES
Seasonal ES
Vector Autoregression
PERT
Survey techniques
Intervention model
Neural Networks
Dampened Trend ES
Simple Regression
Autoregression ARIMA
Neural Networks
Notation
Approach
Assumption
data errors arise from sampling, from bias in the instruments or the
responses, from transcription.
Assumption
100
120
80
100
80
60
60
10
45
40
35
30
Year
30
27
24
21
18
15
12
Year
25
20
20
15
40
20
10
40
350
300
250
6
200
150
100
50
82
73
64
55
46
37
28
19
10
0
1
45
40
35
30
25
20
10
Year
Regular fluctuation
superimposed on trend (period
may be random)
15
Signal
level L
trend T
seasonality S
Noise
irregular,error 'e'
Sales
LEVEL
+ SEASONALITY
TREND +
RANDOM ERROR
45
40
35
30
25
10
5
M 07.2002
M 06.2002
M 05.2002
M 04.2002
M 03.2002
M 02.2002
M 01.2002
M 12.2001
M 11.2001
M 10.2001
M 09.2001
M 08.2001
M 07.2001
M 06.2001
M 05.2001
M 04.2001
M 03.2001
M 02.2001
M 01.2001
M 12.2000
Datenwert original
[t]
Korrigierter Datenwert
LEVEL SHIFT
M 11.2000
0
M 10.2000
15
M 09.2000
20
M 08.2000
PULSE
Original-Zeitreihen
60
50
40
30
20
10
Datenwert original
Jul 02
Jun 02
Mai 02
Apr 02
Mrz 02
Feb 02
Jan 02
Dez 01
Nov 01
Okt 01
Sep 01
Aug 01
Jul 01
Jun 01
Mai 01
Apr 01
Mrz 01
Jan 01
Feb 01
Dez 00
Nov 00
Okt 00
Sep 00
0
Aug 00
STRUCTURAL BREAKS
[t]
Korrigierter Datenwert
Level
Trend
Random
Time Series
decomposed into Components
REGULAR
Time Series Patterns
STATIONARY)
Time Series
IRREGULAR
Time Series Patterns
SEASONAL
Time Series
Original-Zeitreihen
TRENDED
Time Series
FLUCTUATING
Time Series
INTERMITTANT
Time Series
Original-Zeitreihen
Original-Zeitreihen
Original-Zeitreihen
Original-Zeitreihen
60000
45
30
60
12
40
50000
25
50
10
35
40000
20
40
30
8
25
30000
15
30
6
20
20000
10
20
15
4
10
10
5
Datenwert original
Korrigierter Datenwert
Korrigierter Datenwert
Yt = f(Et)
Yt = f(St,Et)
time series is
influenced by
level & random
fluctuations
time series is
influenced by
level, season and
random
fluctuations
Korrigierter Datenwert
Yt = f(Tt,Et)
time series is
influenced by
trend from level
and random
fluctuations
Datenwert original
Korrigierter Datenwert
Datenwert original
M 07.2002
M 06.2002
M 05.2002
M 04.2002
M 03.2002
M 02.2002
M 01.2002
M 12.2001
M 11.2001
M 10.2001
M 09.2001
M 08.2001
M 07.2001
M 06.2001
M 05.2001
M 04.2001
M 03.2001
M 02.2001
M 01.2001
M 12.2000
M 11.2000
-5
M 10.2000
[t]
M 09.2000
M 07.2002
M 06.2002
M 05.2002
M 04.2002
M 03.2002
M 02.2002
M 01.2002
M 12.2001
M 11.2001
M 10.2001
M 09.2001
M 08.2001
M 07.2001
M 06.2001
M 05.2001
M 04.2001
M 03.2001
M 02.2001
M 01.2001
M 12.2000
M 11.2000
M 10.2000
-10
M 09.2000
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969
1968
1967
1966
1965
1964
1963
1962
1961
Datenwert original
M 08.2000
[t]
Yt = f( St, Tt, Et )
[t]
1960
Jul 80
Mai 80
Nov 80
Sep 80
Jan 80
Mrz 80
Jul 79
Nov 79
Mai 79
Mrz 79
Sep 79
Jul 78
Jan 79
Nov 78
Mai 78
Datenwert original
Sep 78
Jan 78
Mrz 78
Jul 77
Mai 77
Nov 77
Sep 77
0
Jan 77
[t]
Mrz 77
M 07.2002
M 06.2002
M 05.2002
M 04.2002
M 03.2002
M 02.2002
M 01.2002
M 12.2001
M 11.2001
M 10.2001
M 09.2001
M 08.2001
M 07.2001
M 06.2001
M 05.2001
M 04.2001
M 03.2001
M 02.2001
M 01.2001
M 12.2000
M 11.2000
M 10.2000
M 09.2000
M 08.2000
M 08.2000
10000
[t]
Korrigierter Datenwert
Number of periods
with zero sales is high
(ca. 30%-40%)
+ PULSES!
+ LEVEL SHIFTS!
+ STRUCTURAL BREAKS!
Sales or
observation of
time series at point t
800
750
Yt
Different possibilities to
combine components
700
650
600
550
Datenwert original
consists of a
combination of
Jul 63
[t]
Nov 63
Mai 63
Sep 63
Jan 63
Mrz 63
Jul 62
Nov 62
Mai 62
Sep 62
Jan 62
Mrz 62
Jul 61
Nov 61
Mai 61
Sep 61
Jan 61
Mrz 61
Jul 60
Nov 60
Mai 60
Sep 60
Jan 60
Mrz 60
500
Korrigierter Datenwert
Original-Zeitreihen
45
40
f(
35
30
25
20
15
10
Base Level +
Seasonal Component
St ,
[t]
Nov 80
Jul 80
Mai 80
Sep 80
Mrz 80
Jan 80
Nov 79
Jul 79
Mai 79
Datenwert original
Sep 79
Mrz 79
Jan 79
Nov 78
Sep 78
Jul 78
Mai 78
Mrz 78
Jan 78
Nov 77
Jul 77
Mai 77
Sep 77
Jan 77
Mrz 77
Korrigierter Datenwert
Additive Model
Yt = L + St + Tt + Et
Original-Zeitreihen
12
10
[t]
Tt ,
Datenwert original
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969
1968
1967
1966
1965
1964
1963
1962
1961
1960
Korrigierter Datenwert
Multiplicative Model
Yt = L * St * Tt * Et
Trend-Component
Original-Zeitreihen
60000
40000
30000
20000
10000
Datenwert original
Korrigierter Datenwert
M 07.2002
M 06.2002
M 05.2002
M 04.2002
M 03.2002
M 02.2002
M 01.2002
M 12.2001
M 11.2001
M 10.2001
M 09.2001
M 08.2001
M 07.2001
M 06.2001
M 05.2001
M 04.2001
M 03.2001
M 02.2001
M 01.2001
M 12.2000
M 11.2000
M 10.2000
0
M 09.2000
Irregular or
random ErrorComponent
50000
M 08.2000
Et
[t]
Additive
Seasonal Effect
Multiplicative
Seasonal Effect
No
Trend Effect
Additive
Trend Effect
Multiplicative
Trend Effect
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
1.
2.
3.
SARIMA-Modelling
4.
1.
SARIMA Differencing
2.
3.
4.
2.
Neural Networks?
3.
4.
popularised by George Box & Gwilym Jenkins in 1970s (names often used
synonymously)
models are widely studied
Put together theoretical underpinning required to understand & use ARIMA
Defined general notation for dealing with ARIMA models
p ( B)(1 B ) Z t = + q ( B)et
d
Data Preparation
Transform time series for stationarity
Difference time series for stationarity
Model selection
Model Estimation
Model Application
Model Application
Use selected model to forecast
ARIMA-Modelling
ARIMA(p,d,q)-Models
ARIMA - Autoregressive Terms AR(p), with p=order of the autoregressive part
ARIMA - Order of Integration, d=degree of first differencing/integration involved
ARIMA - Moving Average Terms MA(q), with q=order of the moving average of error
SARIMAt (p,d,q)(P,D,Q) with S the (P,D,Q)-process for the seasonal lags
Objective
Identify
Identify
Identify
Identify
Identification through
Autocorrelation Function
Partial Autocorrelation Function
Recap:
Definition
t = E (Yt )
t ,t = var (Yt )
Stationarity:
(A)= (B)
var(A)=var(B)
etc.
this time series:
(B)> (A) trend
instationary time series
10
2
1
xt
10
4-2=2
6-4=2
8-6=2
10-8=2
xt-xt-1
Yt={2,2,2,2} is
Differencing
Z t = Yt Yt 1
Dickey-Fuller Test
Serial Correlation Test
Runs Test
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
1.
2.
3.
SARIMA-Modelling
4.
1.
SARIMA Differencing
2.
3.
4.
2.
Neural Networks?
3.
4.
Weight of the
AR relationship
Observation in
t-1 (independent)
random component
(white noise)
Problems
Box-Pierce test
Ljung-Box test
(Y Y )(Y
k = t = k +1
t k
Y )
(Y Y )
t =1
220
Graphical interpretation
180
[x_t-1]
200
160
X_t-1
140
120
100
100
120
140
160
[x_t]
180
200
220
ARIMA-Modells: Parameter p
E.g. time series Yt 7,
lag 1:
ACF
8,
lag 2:
7, 8
8, 7
7, 6
6, 5
5, 4
4, 5
5, 6
6, 4
r1=.62
-0.6
7,
8,
7,
6,
5,
4,
5,
7
6
5
4
5
6
4
5,
4,
lag 3:
5,
6,
7,
8,
7,
6,
5,
4,
6
5
4
5
6
5
r3=.15
Autocorrelations rt gathered at
lags 1, 2, make up the
autocorrelation function (ACF)
6,
r2=.32
0.6
0.0
7,
lag
4.
AUTO
1.0
1.0
.5
.5
0.0
0.0
-.5
Coefficient
-1.0
1
3
2
5
4
7
6
9
8
11
10
13
12
15
14
ACF
ACF
Confidence Limits
-.5
Confidence Limits
Coefficient
-1.0
1
16
Lag Number
Random independent
observations
3
2
5
4
7
6
9
8
11
10
13
12
15
14
16
Lag Number
An AR(1) process?
Autocorrelation function
Pattern in ACf
0,4
0,2
0,2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2
-0,4
-0,4
-0,6
-0,6
Autocorrelation
-0,8
Confidence
-1
Interval
AR(1) model:
0,6
0,4
-0,2
0,8
0,8
0,6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,8
Partial Autocorrelation
Confidence
-1
Yt = c + 1Yt 1 + et
Interval
=ARIMA (1,0,0)
Autocorrelation function
Pattern in ACf
0,6
0,4
0,4
0,2
0,2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0,8
0,6
-0,2
-0,2
-0,4
-0,4
-0,6
-0,6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Autocorrelation
-0,8
Confidence
-1
Interval
AR(1) model:
-0,8
Partial Autocorrelation
Confidence
-1
Yt = c + 1Yt 1 + et
Interval
=ARIMA (1,0,0)
Autocorrelation function
Pattern in ACf
1
0,8
0,6
0,6
0,4
0,4
0,2
0,2
-0,2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2
-0,4
-0,4
-0,6
-0,6
Autocorrelation
-0,8
Confidence
-1
Interval
AR(2) model:
=ARIMA (2,0,0)
-0,8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Partial Autocorrelation
Confidence
-1
Interval
Yt = c + 1Yt 1 + 2Yt 2 + et
Autocorrelation function
Pattern in ACF:
dampened sine
1
0,8
0,4
0,2
0,2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2
-0,4
-0,4
-0,6
-0,6
Autocorrelation
-0,8
Confidence
-1
Interval
AR(2) model:
=ARIMA (2,0,0)
0,6
0,4
-0,2
-0,8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Partial Autocorrelation
Confidence
-1
Interval
Yt = c + 1Yt 1 + 2Yt 2 + et
Yt = c + 1Yt 1 + et
= 1.1 + 0.8Yt 1 + et
-1
-2
-3
-4
10
13
16
19
22
25
28
31
34
37
40
43
46
49
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
1.
2.
3.
SARIMA-Modelling
4.
1.
SARIMA Differencing
2.
3.
4.
2.
Neural Networks?
3.
4.
Yt = c + et 1et 1 2 et 2 ... q et q
ARIMA(0,0,q)-model = MA(q)-model
0,8
0,8
0,6
0,4
0,2
0,2
2 3 4
5 6 7
8 9 10 11 12 13 14 15 16
-0,2
-0,4
-0,4
-0,6
-0,6
Autocorrelation
-0,8
Confidence
-1
Interval
MA(1) model:
(0,0,1)
Pattern in PACF
0,6
0,4
-0,2
3 4
-0,8
6 7
9 10 11 12 13 14 15 16
Partial Autocorrelation
Confidence
-1
Yt = c + et 1et 1
Interval
=ARIMA
1
0,8
0,8
0,6
0,4
0,2
0,2
2 3 4
5 6 7
8 9 10 11 12 13 14 15 16
-0,2
-0,4
-0,4
-0,6
-0,6
Autocorrelation
-0,8
Confidence
-1
Interval
MA(1) model:
Pattern in PACF
0,6
0,4
-0,2
3 4
-0,8
6 7
9 10 11 12 13 14 15 16
Partial Autocorrelation
Confidence
-1
Yt = c + et 1et 1
Interval
=ARIMA (0,0,1)
1
0,8
1
0,8
0,6
0,4
0,2
0,2
2 3 4
5 6 7
8 9 10 11 12 13 14 15 16
-0,2
-0,4
-0,4
-0,6
-0,6
Autocorrelation
-0,8
Confidence
-1
Interval
MA(3) model:
=ARIMA (0,0,3)
Pattern in PACF
0,6
0,4
-0,2
-0,8
3 4
6 7
9 10 11 12 13 14 15 16
Partial Autocorrelation
Confidence
-1
Interval
Yt = c + et 1et 1
= 10 + et + 0.2et 1
3
2,5
2
1,5
1
0,5
0
-0,5 1
-1
-1,5
-2
10
13
16
19
22
25
28
31
34
37
40
43
46
49
Period
ARMA(1,1)-Model = ARIMA(1,0,1)-Model
Yt = c + 1Yt 1 + et 1et 1
Autocorrelation function
0,8
0,8
0,6
0,4
0,2
0,2
2 3 4
5 6 7
8 9 10 11 12 13 14 15 16
-0,2
-0,4
-0,4
-0,6
-0,6
-0,8
-1
0,6
0,4
-0,2
Autocorrelation
Confidence
Interval
=ARMA(1,1)=ARIMA (1,0,1)
-0,8
3 4
6 7
9 10 11 12 13 14 15 1
Partial Autocorrelation
Confidence
-1
Interval
Agenda
1.
Forecasting?
1.
2.
3.
SARIMA-Modelling
4.
1.
SARIMA Differencing
2.
3.
4.
5.
2.
Neural Networks?
3.
4.
Seasonality in ARIMA-Models
Identifying seasonal data:Spikes in ACF / PACF at seasonal lags,
e.g
1,0000
,8000
,6000
ACF
,4000
34
31
28
25
22
19
16
13
10
Simple: Yt=(1-B)Yt
-,2000
Seasonal: sYt=(1-Bs)Yt
-,4000
with s= seasonality, eg. 4, 12
Low er Limit
,0000
Differences
Upper Limit
,2000
Seasonality in ARIMA-Models
ACF
1.0000
.8000
.6000
ACF
.4000
Upper Limit
.2000
Low er Limit
Seasonal spikes
= monthly data
34
31
28
25
22
19
16
.6000
.4000
PACF
.2000
Upper Limit
.0000
34
31
28
25
22
19
16
13
10
-.2000
Low er Limit
4
13
10
.8000
-.4000
-.2000
.0000
-.4000
-.6000
PACF
Seasonality in ARIMA-Models
Extension of Notation of Backshift Operator
s Yt = Yt-Yt-s =YtBsYt=(1-Bs)Yt
Seasonal difference followed by a first difference: (1-B)
(1-Bs)
Yt
Seasonal ARIMA(1,1,1)(1,1,1)
4-modell
4
4
(1 1B ) (1 1B ) (1 B ) (1 B
Non-seasonal
AR(1)
4
Y
=
c
+
1
B
1
B
(
)
)t
( 1 ) et
1
Non-seasonal
difference
Seasonal
Seasonal
AR(1)
difference
Non-seasonal
MA(1)
Seasonal
MA(1)
Agenda
1.
Forecasting?
1.
2.
3.
SARIMA-Modelling
4.
1.
SARIMA Differencing
2.
3.
4.
5.
2.
Neural Networks?
3.
4.
Forecasting Models
Time series analysis vs. causal modelling
Causal Prediction
ARX(p)-Models
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
1.
2.
3.
SARIMA-Modelling
4.
2.
Neural Networks?
3.
4.
Fresh products
supermarket Sales
Seasonal, events,
heteroscedastic noise
Real model unknown
BL(p,q) Bilinear
Autoregressive Model
yt = 0.7 yt 1 t 2 + t
TAR(p) Threshold
Autoregressive model
yt = 0.9 yt 1 + t
= 0.3 yt 1 t
for yt 1 1
for yt 1 > 1
Random walk
yt = yt 1 + t
NN are nonparametric
35
2
R = 0.9036
300
eMail & url filtering
VirusScan (Symmantec Norton Antivirus)
250
Speech Recognition & Optical Character Recognition
Corporate Business
52
32
83
General Business
51
Sales forecasting
[year]
Number of Publications by
Business Forecasting Domain
2003
Rainfall prediction
ElNino Effects
1998
0
1997
Meteorology / weather
1996
5
1995
50
1994
10
1993
100
1992
15
1991
150
1990
20
1989
200
1988
Engineering
30
25
1987
350
[citations]
Forecast
Sales Forecast
Linear (Forecast)
2002
Informatics
2001
2000
Neurophysiology
1999
Marketing
Finance
Production
Product Sales
Electrical Load
21
10
Decomposition
30%
50%
24%
35%
23%
Econometric Models
9%
69%
Regression
80%
23%
Analogies
22%
Delphi
Surveys
70%
8%
Trendextrapolation
PERT
60%
7%
Exponential Smoothing
Neural Networks
40%
25%
Averages
Autoregressive Methods
20%
[Forecastinng Method]
0%
6%
49%
Judgemental Methods
(subjective) 2x%
[number replies]
Survey 5 IBF conferences in 2001
240 forecasters, 13 industries
Agenda
1.
Forecasting?
2.
Neural Networks?
1.
2.
3.
4.
Network Training
3.
4.
Processing
n +1
causal variables
image data (pixel/bits)
Finger prints
Chemical readouts
Output
time series prediction
n+5
class memberships
n+2
Black Box
n +3
n+4
dependent variables
n+6
class probabilities
principal components
n+ h
Input
Processing
Nominal, interval or
ratio scale (not ordinal)
n +1
n+2
Explanatory variables
Dummy variables
Output
Ratio scale
n+5
Black Box
n +3
n+4
n+6
n+ h
Regression
Single period ahead
Multiple period ahead
validate
= Test
PQ-diagramm
Time series
actual value
NN forecasted value
Agenda
1.
Forecasting?
2.
Neural Networks?
1.
2.
3.
4.
Network Training
3.
4.
brains are the most efficient & complex computer known to date
Human Brain
Computer (PCs)
Processing Speed
Neurons/Transistors
Weight
1500 grams
kilograms to tons!
Energy consumption
10-16 Joule
10-6 Joule
Computation: Vision
100 steps
billions of steps
Turing
Hebb
1936
1949
McCulloch / Pitts
1943
Dartmouth Project
1956
Rosenblatt
1959
Minski 1954
Minsky/Papert Werbos
1st IJCNN 1st journals
1969
1974
1987
1988
Kohonen
Rumelhart/Hinton/Williams
1972
1986
INTEL1971
founded
SAS 1997
Enterprise Miner
GE 1954
White 1988
IBM 1998
$70bn BI initiative
Neuroscience
Applications in Engineering
Pattern Recognition & Control
Applications in Business
Forecasting
Agenda
1.
Forecasting?
2.
Neural Networks?
1.
2.
3.
4.
Network Training
3.
4.
neti =
wijo j j
ai = f (neti )
oi = ai
oi
n +1
n +5
n+2
n+6
n +3
Mathematics as abstract
representations of reality
n+ h
n+4
oi = tanh w ji o j i
j
neural_net = eval(net_name);
[num_rows, ins] = size(neural_net.iw{1});
[outs,num_cols] = size(neural_net.lw{neural_net.numLayers,
neural_net.numLayers-1});
if (strcmp(neural_net.adaptFcn,''))
net_type = 'RBF';
else net_type = 'MLP';
end
fid = fopen(path,'w');
Alternative notations
Information processing in neurons / nodes
Biological Representation
Graphical Notation
Input
in1
i => weights wi
0 => bias
in2
Input Function
Output
wi,1
wi,2
neti = wij in j
j
inj
Activation Function
ai = f ( neti j )
wi,j
Neuron / Node ui
Mathematical Representation
1 if
yi =
0 if
outi
ji
x j i 0
ji
x j i < 0
w
j
alternative:
yi = tanh w ji x j i
j
in2
Input
Function
Activation
Function
Output
Function
wi,1
wi,2
oi = ai
inj
Output
wi,j
Neuron / Node ui
1 if
outi =
0 if
w o
ji
i 0
i < 0
w o
ji
outi
Input Functions
e.g.
a j = f act ( net j j )
inputs
2.2
o1
o2
weights
w1,i
0.71
w2,i -1.84
o3
w3,i
=o0
w0,i
9.01
information processing
2.2* 0.71
+4.0*-1.84
+1.0* 9.01
= 3.212
3.212
- 8.0
= -4.778
-4.778 < 0
0.00
output
oj
8.0
neti = wij o j j
j
ai = f (net i )
1 w ji o j i 0
j
oi =
0 w ji o j i < 0
j
0.00
e.g.
a j = f act ( net j j )
Hyperbolic Tangent
Logistic Function
inputs
2.2
o1
o2
weights
w1,i
0.71
w2,i -1.84
o3
w3,i
=o0
w0,i
9.01
information processing
2.2* 0.71
+4.0*-1.84
+1.0* 9.01
= 3.212
3.212
- 8.0
= -4.778
output
Tanh(-4.778)
= -0.9998
8.0
neti = wij o j j
j
ai = f (net i )
oi = tanh w ji o j i
oj
-0.9998
y = o + 1 x1 + 2 x2 + ... + n xn +
Single Linear Regression as a directed graph:
1
x1
x2
n n
xn
Unidirectional Information Processing
xi wij j
xi wij j
i=1
e i=1
e
tanh xi wij j =
2
N
N
i =1
xi wij j
xi wij j
i=1
+ e i=1
e
=> wi
with i
0 =>
Also:
n +1
un+1
yt
u1
yt-1
u2
n+2
un+2
n +5
n +3
un+5
yt-2
...
yt-n-1
u3
un+3
un
n+4
un+4
yt+1
Simplification
for complex models!
Combination of Nodes
Simple processing per node
Combination of simple nodes
creates complex behaviour
ok
ol
o1
w1,i
wk,j
wl,j
tanh oi wij j
i =1
o2
o3
w2,i
w3,i
oi wij j
oi wij j
e i=1
e i=1
=
2
N
oi wij j
oi wij j
e i=1
+ e i=1
tanh oi wij j
i =1
oi wij j
oi wij j
e i=1
e i=1
=
2
N
oi wij j
oi wij j
e i=1
+ e i=1
wi,j
oj
wi,j+1
ol
wl,j
tanh oi wij j
i =1
oi wij j
oi wij j
e i=1
e i=1
=
2
N
oi wij j
oi wij j
X3
X4
w5,8
u1
u5
u2
u8
u5
w8,12
u12
o1
U13
o2
u9
u3
u6
u10
u14
u4
u7
u11
input-layer
ok = tanh
wkj tanh
hidden-layers
wki tanh
output-layer
w ji o j j i k Min!
o3
= neural network
Statistics
Input Nodes
Output Node(s)
Dependent variable(s)
Training
Parameterization
Weights
Parameters
Agenda
1.
Forecasting?
2.
Neural Networks?
1.
2.
3.
4.
Network Training
3.
4.
Hebbian Learning
HEBB introduced idea of learning by adapting weights [0,1]
wij = oi a j
Delta-learning rule of Widrow-Hoff
wij = oi (t j a j )
= oi (t j o j ) = oi j
x1
x2
x3
w4,9
Input-Vector
9
6
10
7
3
w3,8
w3,10
Weight-Matrix
o1
o2
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
E=o-t
o
Output-Vector
Teaching-Output
net pj
pj =
net pj
net pk
o pj
w ji
w ji
wij = oi j
C (t pj , o pj )
f j ( net j )( t j o j )
output nodes j
mit j =
f j ( net j ) ( k w jk ) hidden nodes j
k
net pj
C (t pj , o pj )
net pj
C (t pj , o pj ) o pj
o pj
net pj
= f j' ( net pj )
pj =
C (t pj , o pj ) net pk
C (t pj , o pj )
C (t pj , o pj ) net pj
pj =
o pt
p w ji
C (t pj , o pj )
o pj
wij = oi j
f j' (net pj )
C (t pj , o pj )
wki o pi
net pk
o pj
C (t pj , o pj )
net pk
mit
f (net j ) =
wkj = pj wkj
k
C (t pj , o pj ) '
f j (net pj ) if unit j is in the output layer
o
pj
pj =
f ' (net ) w
if unit j is in a hidden layer
pj pk pjk
j
k
1
oi (t ) wij
f (net j ) = o j (1 o j )
1+ e i
o j (1 o j )( t j o j )
output nodes j
j =
o j (1 o j ) ( k w jk ) hidden nodes j
k
random
starting point 1
local
minimum
random
starting point 2
local
minimum
local
minimum
local
minimum
GLOBAL
minimum
wj
valley in mountains
0
-2
local search
stepsize fixed
follow steepest decent
-4
-4
2
0
-2
-2
-4
6
4
2.5
0
-2.5
-4
-2
0
0
2
4
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
2.
Neural Networks?
3.
4.
1.
2.
NN experiments
3.
Process of NN modelling
y t + h = f (xt ) + t + h
yt+h = forecast for t+h
f (-) = linear / non-linear function
xt = vector of observations in t
et+h = independent error term in t+h
nonlinear AR(p)
Feedforward NN (MLP etc.)
hierarchy of nonlinear AR(p)
Recurrent NN (Elman, Jordan)
nonlinear ARMA(p,q)
n +1
n +5
n+ 2
n+6
n +3
n+h
n+ 4
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
Input
Present new
data pattern to
Neural Network
Calculate
Neural Network
Output from
Input values
Compare
Neural Network
Forecast agains
<> actual value
yt
n +1
Backpropagation
un+1
Change weights
to reduce output
forecast error
Forward pass
n+2
u1
yt-1
u2
yt-2
yt-n-1
un+2
n +5
Backpropagation
un+5
u3
n+3
...
un+3
un
n+4
un+4
yt+1
Extensions
yt
u1
yt-1
u2
uj
yt-2
...
yt-n-1
yt +1
yt+1
u3
un
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
= yt wtj + yt 1 wt 1 j + yt 2 wt 2 j + ... + yt n 1 wt n 1 j j
linear autoregressive AR(p)-model
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
t n 1
yt +1 = tanh
yi wij j
i =t
Extensions
linear activation
function in output layer
n +1
Similar problems /
shortcomings as
standard AR-models!
un+1
yt
u1
yt-1
u2
yt-2
...
yt-n-1
u3
n+2
un+2
n+5
n+3
un+5
yt+1
un+3
un
n+4
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
un+4
Extensions
multiple output nodes
= simultaneous autoregression models
n +1
n +5
n+2
n+6
n +3
n+h
n+4
yt +1 , yt + 2 ,..., yt + n = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
Nonlinear autoregressive AR(p)-model
Extensions
n+2
n+5
n +3
n+4
yt +1 , yt + 2 ,..., yt + n = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
Nonlinear autoregressive ARX(p)-model
Max Temperature
Rainfall
Sunshine Hours
y = f ( x1 , x2 , x3 ,..., xn )
y = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj j
Linear Regression Model
Max Temperature
Rainfall
Sunshine Hours
y t +1 = f ( yt , yt 1 , yt 2 ,..., yt n 1 )
yt +1
t n 1
yi wij j
= log
i =t
n +1
With nonlinearity in
output layer: ???
n+2
n +5
n +3
n+4
y = f ( x1 , x2 , x3 ,..., xn )
y = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj j
Nonlinear Regression Model
Objective
Forecasting Methods
Time Series
Methods
Subjective
Forecasting Methods
Prophecy
educated guessing
Causal
Methods
Averages
Linear Regression
Moving Averages
Multiple Regression
Analogies
Naive Methods
Dynamic Regression
Delphi
Exponential Smoothing
Simple ES
Linear ES
Seasonal ES
Vector Autoregression
PERT
Survey techniques
Intervention model
Neural Networks
Dampened Trend ES
Simple Regression
Autoregression ARIMA
Neural Networks
Problem!
MLP most common NN architecture used
MLPs with sliding window can ONLY capture
BUT:
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
2.
Neural Networks?
3.
4.
1.
2.
NN experiments
3.
Process of NN modelling
???
Simulation of
Neural Network prediction of
Artificial Time Series
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
2.
Neural Networks?
3.
4.
1.
2.
NN experiments
3.
Process of NN modelling
1.
Preprocessing
2.
Modelling NN Architecture
3.
Training
4.
Evaluation
NN Modelling Process
Transformation
Scaling
Normalizing to [0;1] or [-1;1]
Modelling of NN architecture
Training
manual
Decisions recquire
Expert-Knowledge
D=
[DSE
Dataset
Selection
DSA
P=
[C
Preprocessing
Correction
A=
[NI
NL
Architecture
no. of input
nodes
NS
U=
[FI
FO
signal processing
Input
function
FA
L=
[G
PT,L
IP
learning algorithm
choice of
Algorithm
Sampling
Normalization
no. of hidden
nodes
Activation
Function
Learning
parameters
phase & layer
Scaling
no. of hidden
layers
NO
IN
no. of output
nodes
connectivity /
weight matrix
Output
Function
initializations
procedure
number of
initializations
O
objective Function
stopping
method &
parameters
Activation
Strategy
Simulation Experiments
Data Preprocessing
Data Transformation
Data Coding
X2
Income
X2
Income
100.000
90.000
80.000
70.000
60.000
50.000
40.000
30.000
20.000
10.000
0
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
X1
Hair Length
(x M in(x))
M ax(x) M in(x)
Min(x) = 0
Target [-1;1]
(72 0)
144
= 1 +
= 0.2005
119.95 0
119.95
X2
Income
X2
Income
100.000
90.000
80.000
70.000
60.000
50.000
40.000
30.000
20.000
10.000
0
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
X1
0.0 0.1 0.2 0.5 0.9 1.0 X1
Hair Length
Hair Length
Standardisation / Normalisation
y=
Logistic [0;1]
TanH [-1;1]
extreme values
Coding errors
Data errors
Actions
10
253
-1
+1
of observations
Transform data
Transformation of data (functional transformation of values)
Linearization or Normalisation
1 0 0 Business Press
0 0 1 Woman
Recode 1 of N-1 Coding 2 new bit-variables
1 0 Business Press
0 0 Woman
X2
Income
100.000
missing feature value for instance
90.000
80.000
some methods interpret as 0!
70.000
Others create special class for missing 60.000
50.000
40.000
30.000
20.000
10.000
0
X1
Hair Length
Solutions
Simulation Experiments
Simulation Experiments
Sig-Id
Sig-Sig (Bounded & additional nonlinear layer)
TanH-Id
TanH-TanH (Bounded & additional nonlinear layer)
Interconnection of Nodes
???
Simulation Experiments
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
2.
Neural Networks?
3.
4.
1.
2.
NN experiments
3.
Process of NN modelling
1.
Preprocessing
2.
Modelling NN Architecture
3.
Training
4.
[MAE]
25,0
20,0
15,0
10,0
5,0
lowest error
0,0
1
10
25
50
100
200
400
highest error
800
1600
[initialisations]
Simulation Experiments
Simulation Experiments
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
2.
Neural Networks?
3.
4.
1.
2.
NN experiments
3.
Process of NN modelling
1.
Preprocessing
2.
Modelling NN Architecture
3.
Training
4.
Experimental Results
Experiments ranked by validation error
Rank by
valid-error
TRAIN
Validation
Test
ANN ID
overall lowest
0,009207
0,011455
0,017760
overall highest
0,155513
0,146016
0,398628
1st
0,010850
0,011455
0,043413
39 (3579)
2nd
0,009732
0,012093
0,023367
10 (5873)
25th
0,009632
0,013650
0,025886
8 (919)
14400th
0,014504
0,146016
0,398628
33 (12226)
VALID
TEST
0 ,2 0
test
0 ,1 5
0 ,1 0
0 ,0 5
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A AA
A
A
A
A
A
AA A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A A
A
A
A
A
A
AAA
A
A
AAAAA
A
A
A AA
A
A
A
A
A
A
A
A AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
AAA
A
A
A
A
A
A
A
A
A
AA AA
A
A
A
A
A
AAA
A
A
A
A
A
A
A A
A
A
A
AA
A
A
AA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA A AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AAAA
A
A
A
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A A
A
A
AA A A
A A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
AA
AAA
A
A
A
A
A
A
A
A A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AAA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A AAA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AAA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
AA
A
0 ,10
0 ,0 5
0,10
train
0,15
0,00
10
Percentile Validation
0 ,2 5
0 ,0 5
0 ,1 5
valid
Train - Validate
Validate - Test
Train - Test
14400 ANNs
0,7786**
0,9750**
0,7686**
0,2652**
0,0917**
0,4204**
0,2067**
0,1276**
0,4004**
0,09
0,08
0,035
0,07
0,025
0,3
0,08
0,25
0,07
[Error]
decreasing correlation
high variance on test error
same results ordered
by training & test error
0,02
0,04
0,03
0,015
0,02
0,01
0,01
0
0,005
0,06
0,2
64
E Tain
E Valid
E Tain
E Valid
E Test
20 ANN mov.Average Train
20 ANN mov.Average Test
E Test
E Tain
20 ANN mov.Average Train
20 ANN mov.Average Test
E Valid
E Test
E Tain 150 MovAv. E Test
E Valid
E Test 150 MovAv. E Train
20 ANN mov.Average Train
20 ANN mov.Average Test
127 190 253 316 379 442 505 568 631 694 757 820 883 946
[ordered ANN by T rain error]
0
1
[Error]
[Error]
0,05
64
127 190 253 316 379 442 505 568 631 694 757 820 883 946
[ordered ANN by Valid error]
0,15
0,04
0,03
0,1
0,02
0,05
0,01
00
11
64 127
316 5276
379 6331
442 505
6319496
69410551
757 11606
820 883
1056
2111190
31662534221
7386568
8441
12661946
13716
[ordered ANN
ANN by
by Valid
Valid error]
error]
[ordered
et = yt Ft
1 n
ME t = Yt +k Ft +k
n k =1
1 n
2
MSE t = (Yt +k Ft +k )
n k =1
1 n
MAE = Yt +k Ft +k
n k =1
1 n Yt + k Ft + k
MAPE =
n k =1
Yt + k
T + nk
T + nk
Yt +k Ft (k )
1
1
(
)
=
MAPE
k
MAE(k ) =
Yt +k Ft (k )
Y
(n k +1) t =T
(n k +1) t =T
t +k
ef ,t
MdAPEf = Med
100
yt
Median Squared Error
MdSEf = Med(ef2,t )
yt +f t = y t
yt +f t y t +f
U=
2
(y t y t +f )
y
t
Simulation Experiments
Agenda
Forecasting with Artificial Neural Networks
1.
Forecasting?
2.
Neural Networks?
3.
4.
1.
2.
NN experiments
3.
Process of NN modelling
Use training & validation set for training & model selection
NEVER!!! Use test data except for final evaluation of accuracy
Results M3-competition
Baseline Sales
90 100 110
Method A
90
Method B
90
90
90
90
90
90
90
10
20
30
20
10
10
t+1
t+2 t+3
t+1
t+2 t+3
Method
Baseline Sales
Method A
90
90
Method B
90
90
90
90
90
90
10
20
10
10
20
10
30
50
20
10
20
30
20
t+1
t+2 t+3
t+1
t+2
Baseline Sales
Method A
90
90
Method B
90
90
90
90
90
90
10
20
10
10
20
10
30
50
20
10
20
30
20
A
A
B
High End
Midprice
Alyuda NeuroSolutions
NeuroShell Predictor
NeuroSolutions
NeuralPower
PredictorPro
Research oriented
Research
SNNS
JNNS JavaSNNS
JOONE
Mathlab Library
R-package
NeuroLab
M3-competition
airline-data
lynx-data
beer-data
Software Simulators
Ward Systems
Let your systems
learn the wisdom
of age and
experience
Attrasoft Inc.
Promised Land
NeuroDimension
Inc.
Easy NN
Easy NN Plus
NeuroSolutions Cosunsultant
Neurosolutions for Excel
NeuroSolutions for Mathlab
Trading Solutions
SPSS
SAS
Further Information
Literature & websites
Journals
Associations
Conferences
Newsgroups news.comp.ai.nn
Call Experts you know me ;-)
Agenda
Business Forecasting with Artificial Neural Networks
1. Process of NN Modelling
2. Tips & Tricks for Improving Neural Networks based forecasts
a. Copper Price Forecasting
b. Questions & Answers and Discussion
a. Advantages & Disadvantages of Neural Networks
b. Discussion
Disadvantages
Sven F. Crone
crone@bis-lab.de
www.lums.lancs.ac.uk
Summary Day I
- ANN can forecast any time series
pattern (t+1!)
- without preprocessing
- no model selection needed!
- ANN offer many degrees of
freedom in modeling
- Experience essential!
- Research not consistent
complimentary support!
- Support through MBA master
thesis in mutual projects
Contact Information
Sven F. Crone
Research Associate