Académique Documents
Professionnel Documents
Culture Documents
power systems
laboratory
Abhishek Rohatgi
Machine Learning Methods for Power
Markets
Master Thesis
PSL 1217
EEH Power Systems Laboratory
Swiss Federal Institute of Technology (ETH) Zurich
Examiner: Prof. Dr. Goran Andersson
Supervisor: Marcus Hildmann
Zurich, September 17, 2012
ii
Do not believe in anything simply because you have heard it.
Do not believe in anything simply because it is spoken and rumored by many.
Do not believe in anything simply because it is found written in your religious
books.
Do not believe in anything merely on the authority of your teachers and
elders.
Do not believe in traditions because they have been handed down for many
generations.
But after observation and analysis, when you nd that anything agrees with
reason and is conducive to the good and benet of one and all, then accept
it and live up to it.
-Gautama Buddha
Preface
This report is a result of my master thesis carried out at Power System
Laboratories (PSL), ETH Z urich from March, 2012 to August, 2012. Using
this opportunity, I would like to thank several people who gave me full
support during my days at PSL. I would like to thank Marcus Hildmann
for his support, useful ideas and comments on my work. I am thankful to
Prof. Goran Andersson for giving me the opportunity for the project work
at PSL. Last but not least, i am grateful to my colleagues at PSL for the
welcoming atmosphere and fruitful discussions.
Abhishek Rohatgi
Z urich, September 17, 2012
iii
iv
Abstract
To manage risks in electricity markets, forecasting of market variables like
spot price, load demand and Hourly price Forward Curve is an important
research area. Forecasting using linear estimation methods suer from the
problem of under-tting and over-tting. Ordinary Least Squares (OLS)
which is a popular linear estimation method, estimates the mean of the
data. Prole forecasting of time series needs non-linear estimation methods.
In this thesis, Support Vector Machines (SVM) and Extreme Learning Ma-
chines (ELM) are used for estimation of time series. The thesis consists of
two parts. In the rst part, SVM and ELM algorithms are presented and
simulated for forecasting of spot price time series. In the second half of the
thesis, the problem of constrained estimation is explained and two methods
are suggested based on SVM and ELM theory.
Support Vector Machine is a machine learning algorithm used for function
estimation. It converts the non-linear function estimation problem to a
convex optimization problem by using functions called kernels. Extreme
Learning Machine is a learning algorithm for Single Layer Feedforward Neu-
ral Networks and it estimates the function using Moore-Penrose generalized
inverse matrix of the activation function of the neural network.
A case study of spot price forecasting for Germany is presented using both
of the learning algorithms. After identifying the characteristics of the spot
price time series, a Non-Linear Autoregressive model with Exogenous inputs
(NARX) is proposed to capture the dynamics of spot prices. To simulate the
spot prices, a computationally simple version of SVM called Least Square
Support Vector Machine (LSSVM) is used. 1-day ahead, 3-day ahead and
5-day ahead forecasting is simulated for dierent lags of the spot price time
series. LSSVM performs better than ELM for out of sample forecasting.
However, parameters of LSSVM need to be tuned before training. The tun-
ing process is computationally intensive. Hence, ELM is much faster than
LSSVM. Additionally, out of sample residuals are analyzed for autocorrela-
tion to establish the validity of the model.
Constrained Estimation means to solve the model subjected to external con-
straints. It is required for estimation of time series like HPFC, PV in-feed
etc. A proposal is made to include the external constraints in the SVM
and ELM theory. The proposed SVM and ELM is applied to a case study
v
vi
of Photovoltaic in-feed forecasting. Results are presented for a few test
constraints. Both SVM and ELM produce good results for constrained esti-
mation. Finally, the thesis is concluded with a discussion of the future work
on the application of SVM and ELM for time series analysis and constrained
estimation.
Kurzfassung
Um mit den Risiken auf den Strommarkten umzugehen, ist die Prognos-
tizierung von Variablen wie zum Beispiel dem Spotpreis, den Lasten und
der Stundenterminpreis-Kurve (Hourly Price Forward Curce, HPFC) ein
wichtiges Forschungsgebiet. Prognoseverfahre, die auf linearen Schatzungen
basieren leiden an den Problemen der Unteranpassung (unter-tting) und
k=1
y
k
w
T
x
k
+b
(3.2)
The cost function R
emp
is based on Vapniks -insensitive loss function
which is dened as [2]:
|y f(x)|
=
_
0, if |y f(x)|
|y f(x)| , otherwise
(3.3)
11
12 CHAPTER 3. SVM AND ELM
The variable controls the accuracy and is predened. The regression
problem in eq.(3.1) is estimated using the following optimization problem[2]
min
w,b
J
p
(w) =
1
2
w
T
w
such that y
k
w
T
x
k
b , k = 1, . . . , N
w
T
x
k
+b y
k
, k = 1, . . . , N (3.4)
The inequalities in eq.(3.4) means that the training data lies inside -
tube of accuracy. However, it is possible that the training data might lie
outside the accuracy region, and hence eq.(3.4) is modied to include two
more slack variables ,
as follows [2]:
min
w,b,,
J
p
(w, ,
) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
such that y
k
w
T
x
k
b +
k
, k = 1, . . . , N
w
T
x
k
+b y
k
+
k
, k = 1, . . . , N
k
,
k
0, k = 1, . . . , N (3.5)
where c is a regularization constant. The Lagrangian of eq.(3.5) is
L(w, b, ,
; ,
, ,
) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
k=1
k
( +
k
y
k
+w
T
x
k
+b)
N
k=1
k
( +
k
+y
k
w
T
x
k
b)
k=1
(
k
k
+
k
) (3.6)
The Karush-Kuhn-Tucker (KKT) conditions of optimality[19] give the
3.1. SUPPORT VECTOR MACHINES 13
following equations [2]:
y
k
w
T
x
k
b +
k
, k = 1, . . . , N
w
T
x
k
+b y
k
+
k
, k = 1, . . . , N
k
,
k
,
k
,
k
0, k = 1, . . . , N
k
( +
k
y
k
+w
T
x
k
+b) = 0, k = 1, . . . , N
k
( +
k
+y
k
w
T
x
k
b) = 0, k = 1, . . . , N
k
= 0, k = 1, . . . , N
k
= 0, k = 1, . . . , N
L
w
= 0 w =
N
k=1
(
k
k
)x
k
L
b
= 0
N
k=1
(
k
+
k
) = 0
L
= 0 c
k
k
= 0
L
= 0 c
k
= 0 (3.7)
Eq.(3.7) and eq.(3.6) together gives the following dual problem:
max
,
J
d
=
1
2
N
k,l=1
(
k
k
)(
l
l
)x
T
k
x
l
N
k=1
(
k
+
k
)
+
N
k=1
y
k
(
k
k
)
such that
N
k=1
(
k
+
k
) = 0
k
,
k
[0, c] (3.8)
Using the value of w from eq.(3.7) in terms of and
, the estimated
function from eq.(3.1) can be written as following
f(x) =
N
k=1
(
k
k
)x
T
k
x +b
3.1.2 Nonlinear Extension of SVM
For extending the Linear SVM for non-linear systems, the regression problem
is written as following in the primal equation space:
f(x) = w
T
(x) +b (3.9)
14 CHAPTER 3. SVM AND ELM
The training data is {x
k
, y
k
}
N
k=1
and (.) : R
n
R
n
h
is a mapping from the
input space to a high dimensional feature space. The optimization problem
in the primal space is [2]
min
w,b,,
J
p
(w, ,
) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
such that y
k
w
T
(x
k
) b +
k
, k = 1, . . . , N
w
T
(x
k
) +b y
k
+
k
, k = 1, . . . , N
k
,
k
0, k = 1, . . . , N (3.10)
After taking the Lagrangian and applying the conditions of optimality[19],
the problem can be written in dual space as [2]
max
,
J
D
(,
) =
1
2
N
k,l=1
(
k
k
)(
l
l
)K(x
k
, x
l
)
k=1
(
k
+
k
) +
N
k=1
y
k
(
k
k
)
such that
N
k=1
(
k
k
) = 0
k
,
k
[0, c] (3.11)
Here K is the kernel and is dened as K(x
k
, x
l
) = (x
k
)
T
(x
l
). During
the transformation of the optimization problem from primal to dual(eq.(3.10)
to eq.(3.11)), the non-linear eects are moved to the kernel and eq.(3.11) be-
comes a convex optimization problem. This is also explained in g.3.1. The
estimated function can then be written as
f(x) =
N
k=1
(
k
k
)K(x, x
k
) +b (3.12)
In eq. (3.12), the output is written only in terms of the lagrange multi-
pliers and the kernel function. Hence, for estimation problems, one does not
need to know the underlying feature space (x). This is explained in more
detail in the following section of kernels.
3.1.3 Kernels
Kernels are a class of functions extensively used in statistics and probability
theory. A kernel function K maps R
n
R
n
R[2]. The advantage of the
kernel functions is that they can be used to avoid the explicit construction
3.1. SUPPORT VECTOR MACHINES 15
Figure 3.1: Primal and dual optimization problem of SVM
of the feature space (x) required for the non-linear SVMs. Fig. 3.1 shows
that the kernel functions remove the non-linear constraints from the primal
optimization problem. Any symmetric continuous function K(x, z) that
satisfy the Mercers condition [2] can be expressed as
K(x, z) =
n
H
i=1
i
(x)
i
(z) (3.13)
where (x) is a mapping from R
n
to Hilbert space H,
i
is a positive number,
x, z R
n
and n
H
is the dimension of the hilbert space. Eq.(3.13) can be
written as:
K(x, z) =
n
H
i=1
_
i
(x)
_
i
(z)
and then, if we dene
i
(x) =
i
(x) and
i
(z) =
i
(z) which leads
to
K(x, z) = (x)
T
(z)
For example, if (x) : R R
3
is dened as (x) =
_
x
2
,
2x, 1
[10],
then
(x)
T
(z) =
_
x
2
,
2x, 1
_
T
_
z
2
,
2z, 1
_
=
_
x
2
z
2
+ 2xz + 1
= (xz + 1)
2
(3.14)
16 CHAPTER 3. SVM AND ELM
This can be represented by polynomial kernels given by
K(x, z) = (xz +c)
d
(3.15)
with c = 1 and d = 2. In general, the polynomial kernels can be used
to represent any feature map consisting of all possible product monomials
of x up to degree d having dimension n
H
=
_
n +d
n
_
. So, by dening a
polynomial kernel, there is no need to explicitly dene the high dimensional
feature space (x). Dierent type of kernel functions exists for application
to non linear systems. A Gaussian kernel is dened as
K(x, z) = exp(
x z
2
2
) (3.16)
where is a tuning parameter. For a Gaussian kernel, (x) is innite di-
mensional [10]. In this thesis, Gaussian kernel is used.
3.1.4 Least Square Support Vector Machines
LSSVM is a method to reduce the computational eort required to solve
the solve the QP problems for the SVM. The optimization problem of SVM
contains inequality constraints. By removing all the inequality constraints
and substituting them by euqality constraints as shown below, it is possible
to reduce the computational eort since the dual problem is reduced to a
system of linear equations.
min
w,b,
J
p
(w, ) =
1
2
w
T
w +
N
k=1
(
2
)
such that y
k
w
T
(x
k
) b =
k
, k = 1, . . . , N (3.17)
The Lagrangian of primal problem of LSSVM in eq.(3.17) is
L(w, b, , ) =
1
2
w
T
w +
N
k=1
(
2
)
N
k=1
k
(y
k
w
T
(x
k
) b
k
)
(3.18)
where
k
are the lagrange multipliers.
After applying the conditions of optimality[19], the dual problem of
eq.(3.17) is obtained as a system of linear equations in and b[2]:
_
0 1
T
v
1
v
+I/
_ _
b
_
=
_
0
y
_
(3.19)
3.1. SUPPORT VECTOR MACHINES 17
where y = [y
1
; . . . ; y
N
] , 1
v
= [1; . . . ; 1] , = [
1
; . . . ;
N
] and
kl
=
(x
k
)
T
(x
l
) = K(x
k
, x
l
), K is the kernel. The estimated function is
y(x) =
N
k=1
k
K(x, x
k
) +b (3.20)
In LSSVM, there are no QP problems to solve. By using the linear solvers
(that are faster than convex optimization solvers), the computational speed
is increased by several times.
3.1.5 Tuning parameters
The performance of LSSVM depends on the choice of the regularization pa-
rameter in eq.(3.17)) and any other parameters used (c and d for polyno-
mial kernel (eq.(3.15)) or for Gaussian kernel (eq.(3.16))). Since Gaussian
kernel is used in this thesis, tuning of parameters refers to tuning of (, )
unless otherwise stated. The most popular techniques for tuning of parame-
ters are cross-validation and Bayesian inference. Cross-validation is based on
selecting parameters after evaluating the performance of a pre-dened grid of
parameters on the training data. In Bayesian inference, the parameters are
assumed to have a certain probability density function. For the determina-
tion of tuning parameters in this project, cross-validation technique is used.
The algorithm of m-fold cross-validation is outlined in Algorithm-1 [10]:
input : Training data T = (x
k
, y
k
)
N
k=1
output: Tuned parameters (, )
begin
Divide T in m parts T
1
, . . . , T
m
such that T =
m
k=1
T
k
;
Dene a N
1
N
2
grid of and ;
for all combinations of and do
for k =1:m do
Dene a set S
k
=
m
i=1,i=k
T
i
;
Train SVM on S
k
;
Calculate the performance of the SVM on the set T
k
. This
can be done by dening a loss function ( for example-
Mean Square Error);
end
end
Select the and with the lowest value of loss function ;
end
Algorithm 1: m-fold cross validation for parameter selection [10]
The most common value of m is 5 and 10. For m=1, it is called leave one
out cross validation. Leave one out cross validation is the less biased than
5 fold or 10 fold cross validation. However, the choice of m also depends
on the size of data. Leave-one-out crossvalidation is computationally more
18 CHAPTER 3. SVM AND ELM
intensive than other m-fold cross validation. Due to this reason, 10 fold
cross validation is used for tuning the parameters and the loss function used
is Mean Absolute Percentage Error(MAPE).
3.2 Extreme Learning Machines
The learning speed of SLFN is very slow since all the parameters needs to be
tuned iteratively. In [3], authors have proposed a new learning algorithm for
the single-hidden layer feedforward neural networks (SLFNs) called Extreme
Learning Machine(ELM). ELM is based on random hidden nodes which
means that the activation function parameters are chosen randomly. Then
it analytically determines the output weight of SLFNs.
3.2.1 ELM theory
Figure 3.2: Single Layer Feedforward Neural Network
Consider the SLFN with N
h
hidden nodes. The training data is {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
. The SLFN can be written as
N
h
i=1
i
h
i
(x
k
) = y
k
, k = 1, . . . , N (3.21)
h
i
(x
k
) = h(w
i
x
k
+b
i
) (3.22)
where w
i
R
n
is the weight vectors between the n input nodes and ith
hidden node,
i
R
m
is the weight vector between the ith hidden node
and the output nodes, b
i
is the threshold of the ith hidden node and h is
3.2. EXTREME LEARNING MACHINES 19
the activation function. Fig. 3.2 shows the architecture of SLFN. In matrix
form, eq.(3.22) is
H = Y (3.23)
where
H
NN
h
=
_
_
h(w
1
x
1
+b
1
) . . . h(w
N
h
x
1
+b
N
h
)
.
.
.
.
.
.
.
.
.
h(w
1
x
N
+b
1
) . . . h(w
N
h
x
N
+b
N
h
)
_
_
(3.24)
N
h
m
=
_
1
.
.
.
N
h
_
_
and Y
Nm
=
_
_
y
1
.
.
.
N
_
_
(3.25)
H is called the hidden layer matrix. It gives the transformation function
from the input space to the hidden neurons space. The ELM is based on
the following two theorems [3]:
Theorem 1 Given a standard SLFN with N hidden nodes and activation
function h : R R which is innitely dierentiable in any interval, for N
arbitrary distinct samples {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
, for any w
i
and b
i
randomly chosen from any intervals of R
n
and R, respectively, accord-
ing to any continuous probability distribution, then with probability one, the
hidden layer output matrix H of the SLFN is invertible and
_
_
H Y
_
_
= 0
Theorem 2 Given any small positive value > 0 and activation function
h : R R which is innitely dierentiable in any interval, there exists
N
h
N such that for N arbitrary distinct samples {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
, for any w
i
and b
i
randomly chosen from any intervals of R
n
and R, respectively, according to any continuous probability distribution, then
with probability one,
_
_
H Y
_
_
< .
For proof of both theorems see [3]. For training the SLFN, eq.(3.22)
leads to nding w
i
,
b
i
and
such that
_
_
H( w,
b
i
)
Y
_
_
= min
w
i
,b
i
,
_
_
H(w, b) Y
_
_
(3.26)
where w = w
1
, . . . , w
n
h
] and b = b
1
, . . . , b
N
h
. Since w and b are chosen
randomly, eq.(3.26) is reduced to
_
_
H(w, b
i
)
Y
_
_
= min
_
_
H(w, b) Y
_
_
(3.27)
Solution of eq.(3.27) is given by
= H
Y (3.28)
20 CHAPTER 3. SVM AND ELM
where H
_
_
_ =
_
_
_H
Y
_
_
_ ,
_
:
_
_
H Y
_
_
_
_
Hz Y
_
_
, z R
NN
h
_
(3.29)
According to [21], the generalization performance of feedforward neural
networks that reach the small training error, the smaller the norm of weights
is, the better the generalization performance of the network. Since moore-
penrose generalized inverse gives the least norm solution [20], ELM has good
generalization performance.
ELM Algorithm
Given the training data {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
, activation
function h and number of hidden nodes N
h
, the ELM algorithm can be
written as [3]:
1. Assign the weights w
i
and threshold b
i
randomly.
2. Calculate H.
3. Calculate
using eq.(3.28)
4. The output for a new input x is given by f(x) = h(x)
where
h(x) = [h(w
1
x +b
1
) . . . h(w
N
h
x +b
N
h
)] , w and b are from step 1
Chapter 4
Case Study - Spot Price
Forecasting
4.1 Characteristics of Electricity Prices
The characteristics of electricity that makes it dierent from other commodi-
ties are [16] :
1. A real time commodity: Electricity is a real time commodity which
means that it must be consumed at the same time it is generated.
Any imbalance in the production and consumption will lead to devia-
tion in frequency and this will aect the stability of the grid. Higher
production than consumption will increase the frequency and lower
production than consumption will decrease the frequency.
2. Non storable good: It is not possible to store electricity. Options to
store electricity exists at a small scale(e.g. batteries) but it is dicult
on a large scale. The pricing of a forward contract depends on the
storage costs. Hence, the pricing of electricity contracts cannot be
done similar to commodities that can be stored.
3. Characteristics of demand and supply: Electricity is a essential
commodity which makes the demand of electricity inelastic. Supply is
decided by the merit order curve of the power plants. If the demand
is low, it can be met by base load generators. With the increase in
renewable infeed, the supply curve has changed signicantly.
Due to dierent nature of electricity as compared to other commodities,
spot price of electricity is dicult to predict. Also, since electricity cannot
be stored, the pricing of electricity forward contracts cannot be done on the
basis of the classic formula of forward pricing:
F
t
= (S
0
+U) exp
r(Tt)
21
22 CHAPTER 4. CASE STUDY - SPOT PRICE FORECASTING
where F
t
is the Forward price,S
0
is the spot price at t = 0 and U is the storage
costs with interest rate r and time to maturity T t. The characteristics of
electricity prices are described below [16]:
1. Multi Scale Seasonality: Electricity prices show seasonal patterns.
It show intra-day, weekly and monthly seasonal cycles (hence the name
multi-scale seasonality). During a day, prices are higher at noon and
during evening because of high economic activity. Also, weekdays have
higher prices than weekends due to more demand. The price level also
vary with months. During winter, there is high heating requirement
that drives the electricity prices higher.
2. Dependency on External Factors: Electricity prices also depend
on external factors like temperature and load. The prices closely fol-
lows the trends of the load prole. High load increases the price.
3. Mean reversion: The mean reversion property means that the spot
prices tend to move towards an average value. The variations in the
spot prices are assumed to be temporary and the modeling of prices is
done using stochastic approach.
4. Jumps: Electricity prices are can move to high levels in very short pe-
riod of time. Since electricity cannot be stored, any extreme event(e.g.
a plant outage) can drive the prices higher in a short time interval.
4.2 Methods for Price forecasting
4.2.1 Importance and need of price forecasting
The introduction of deregulated electricity markets has bought the need to
study the electricity industry in more detail. On the generation side, many
private players have entered. Power exchanges have been set up to facilitate
electricity trading which has led to the entry of trading companies. The
retail and distribution side has also been privatized. The residential sector
is also opening slowly. The structural change in the electricity industry
has given rise to many risks and the spot price is one of the basic element
for managing these risks. The markets are still not full liberalized. This
further increase the need for price forecasting. Also, the technical issues
related with electricity makes it very dicult for the markets to set up the
electricity prices. Fundamentally, the prices are governed by the marginal
costs of the generators, congestion in the transmission grid and other grid
security issues, fuel prices, demand for electricity and policies set up by
the regulator. The lack of liquidity in the future markets of electricity also
stresses the need to forecast the prices.
4.2. METHODS FOR PRICE FORECASTING 23
4.2.2 Modeling of Electricity Prices
The choice of selecting a modeling method for price depends on the appli-
cation of the model. Long term price forecasting is used in the investment
decisions and power system planning. Short term price forecasting is mainly
used for the day ahead planning of generation schedule or by trading com-
panies for bidding purposes. For example, accurate forecasting will help the
generators to plan the generation schedule with the aim of prot maximiza-
tion. For long term price forecasting, the model must be able to capture
the fundamental elements that form the prices like costs of various genera-
tion technologies, emission constraints, demand, congestion in grid etc. For
short term price forecasting, the model should be able to capture the statis-
tical properties of the electricity prices over time. Accordingly, the modeling
approach for price forecasting is divided in two categories [13]:
1. Fundamental models: Fundamental models are the optimization
problems that aims to determine the marginal cost of every technol-
ogy/power plant in a specied region. The market price is determined
by the merit order curve and the demand of the electricity. The most
common inputs to a fundamental model are
regional demand
power plant cost curves
plant data like retirement
fuel prices
emissions constraints
transmission constraints
These input are used to form a cost optimization problem considering
the energy constraint, reserve capacity requirement, transmission con-
straints and unit chronological constraints. The resulting optimization
problem can be solved using an optimization solver and the results are
the marginal costs of every power plant considered in the optimization
equation. These can further be used to nd out the cash ows, cross
border ows, transmission capacity available and emissions.
2. Quantitative models: The quantitative models are used to nd out
the statistical characteristics of the spot prices with the main aim of
risk management [11]. The ultimate motive is not to give an accu-
rate number to price value but to nd out the relative movement of
the prices over short term time horizons. Volatility of spot prices is
the most important variable for risk managers. Stochastic Dierential
Equations (SDE) are typically used to characterize the properties of
spot prices like random walk, mean reversion and jumps. A few simple
models based on SDE are enumerated below [16]:
24 CHAPTER 4. CASE STUDY - SPOT PRICE FORECASTING
Figure 4.1: Combining Fundamental and Quantitative Approach for price
forecasting [1]
Random Walk: In this model, prices are assume to follow ran-
dom walk. The SDE for a random walk model is:
dP
t
= P
t
dt +P
t
dW
t
Here, the prices are modelled as a Geometric Brownian Motion.
P
t
denotes the spot price, W
t
is a Weiner process, is a drift
term, is the volatility and t is time
Mean Reversion: In addition to random walk of the prices, this
model also captures the mean reversion property of electricity
prices. The SDE for a mean reversion model is:
dP
t
= a( P
t
)dt +P
t
dW
t
Here, a is the speed of mean reversion
Mean Reversion with jumps: The mean reversion model can
be extended to explain the jumps seen in the price time series as
follows:
dP
t
= a( P
t
)dt +P
t
dW
t
+kdq
t
Here, kdq
t
represents the jumps with k representing the severity
of the jump and q
t
represents the frequency of the jumps. For a
detailed explanation of these models, refer [16]
In [1], authors have proposed to combine the two approaches. The argu-
ment is that since the markets are not mature and due to low liquidity, it is
4.2. METHODS FOR PRICE FORECASTING 25
necessary to look for additional data and hence the modeling can be made
more comprehensive if the fundamental approach and quantitative approach
are combined. Fig. 4.1 explains the approach. The bottom-up approach in-
cludes making a model based on considerations of the fundamental structure
of the system (here electricity spot prices). The fundamental model requires
external data (price depend on factors like temperature and load). The Fun-
damental model can be combined with nancial approach that is based on
model building using stochastic dierential equations. The combination of
fundamental approach and nancial approach is used to build price scenar-
ios.
26 CHAPTER 4. CASE STUDY - SPOT PRICE FORECASTING
Chapter 5
Model Representation and
Estimation
5.1 Time Series Analysis
To build a model for time series forecasting, a three step procedure is
followed[22]:
1. Identication- In this step, the data is analyzed to nd out the char-
acteristics of time series. For example, time series might have signi-
cant auto-correlation. Autocorrelation factor can be used to nd out
the suitable lag factors to be used in the modeling. Also, dependence
of time series on external factors should be analyzed.
2. Estimation- Based on the information obtained in the rst step, a
model is proposed. This model should be able to explain all the under-
lying characteristics like moving average, auto-correlation, dependency
on external factors etc.
3. Diagnostic Checking- This step involves applying statistical tests
to verify if the model is able to capture all the dynamics of the time
series. One of the ways is to check the autocorrelation of the residuals.
There should not be any signicant degree of auto-correlation among
the residuals. If the residuals show some correlation, it means that the
model is not able to capture all the dynamics of the time series.
Fig.5.1 shows a sample spot price time series and the steps involved in
the forecasting.
5.2 Short Term Spot Price Model
The spot price prole is dened as
y
t
= y
t
+
t
(5.1)
27
28 CHAPTER 5. MODEL REPRESENTATION AND ESTIMATION
Figure 5.1: Steps in time series forecasting
where y
t
is the spot price with average y
t
and
t
is a white noise process
with zero expectation and a nite variance i.e. (0,
2
). The model for
the hourly spot price forecasting is
y
t
=
y
t
+
t
(5.2)
Here, y
t
is the estimated spot price with estimated average
y
t
and
t
is a
white noise process with zero expectation and a nite variance i.e.
t
(0,
2
). The estimated average is written as the function of a regression
vector x
t
R
n
y
t
= f(x
t
) (5.3)
Characteristics of Spot Prices
Fig. 5.2 shows the autocorrelation of the hourly spot price series for Ger-
many taken from EPEX website for the month of March 2012. It shows
that the spot prices are signicantly correlated with the previous hour val-
ues. The blue lines show the 95% condence bounds. Any value of auto-
correlation above these bounds is termed signicant and must be explained
by the proposed model.
Fig.5.3 shows the relation between electricity price and load. The prole
of load and price follows the same pattern. When load increases, price
increases and when it decreases, price decreases. Fig. 5.4 show the relation
between temperature and price for a summer day. Low temperature implies
low price due to less cooling requirement.
5.2. SHORT TERM SPOT PRICE MODEL 29
0 5 10 15 20
0.2
0
0.2
0.4
0.6
0.8
Lag
S
a
m
p
l
e
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Autocorrelation of a sample spot price time series
Figure 5.2: Autocorrelation of the EPEX prices for March 2012
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
0.9
1
N
o
r
m
a
l
i
z
e
d
L
o
a
d
a
n
d
E
l
e
c
t
r
i
c
i
t
y
P
r
i
c
e
s
hours
Electricity Price
Load
Figure 5.3: Correlation of electricity prices and load
30 CHAPTER 5. MODEL REPRESENTATION AND ESTIMATION
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
0.9
1
N
o
r
m
a
l
i
z
e
d
t
e
m
p
e
r
a
t
u
r
e
a
n
d
e
l
e
c
t
r
i
c
i
t
y
p
r
i
c
e
s
hours
Electricity Price
Mean Temperature
Figure 5.4: Correlation of electricity prices and temperature
Fig.5.5 shows the multi-scale seasonality of the spot prices. The hour
axis shows the 168 hours of a week. It starts from Monday. The week axis
show the weeks of 4 months. It starts from February. Looking along the
hour axis, spot price are high during weekdays and low during weekends.
Also, a day has two peaks. The economic activity is high during the week
and hence the price are high. Intra-day peaks are explained by the relatively
high consumption of electricity at noon and during evening. Along the week
axis, February shows high prices due to high heating requirement. Prices
decrease during the summer as there is no heating required.
Model for spot prices
The regression vector should be able to capture the multi-scale seasonality
and dependency on external factors of the spot prices. It should also be
able to explain the strong autocorrelation observed in the price series. In
the proposed model, the evolution of spot prices is explained by the value of
the spot prices of the previous hours, a set of exogenous variables and a set
of dummy variables (g.5.6). The exogenous variables include previous year
load data and weather variables, dummy variables are used for seasonality.
All of them are described below:
1. Weather variables include maximum temperature T
max
t
, minimum tem-
perature T
min
t
, mean temperature , wind speed W
s
t
and precipitation
PP
t
. They can be grouped together as W
t
Wea
t
=
_
T
max
t
T
min
t
T
mean
t
W
s
t
PP
t
N
t=1
y
t
N
32 CHAPTER 5. MODEL REPRESENTATION AND ESTIMATION
Figure 5.6: NARX model for electricity spot prices
2. Mean Absolute Percentage Error(MAPE): Mean Absolute Per-
centage Error is another accuracy measure for time series forecasts.
The accuracy of the model is determined by the absolute value of the
percentage errors. It is dened as follows:
MAPE =
1
N
N
t=1
y
t
y
y
t
Chapter 6
Empirical Analysis- Price
Forecasting
In this chapter, the results of the spot price model are presented. The
code for the model is written in MATLAB and CVX [19] is used as the
optimization solver.
6.1 Spot Price Model
0 500 1000 1500 2000 2500 3000
50
0
50
100
150
200
250
e
u
r
o
/
M
W
h
hours
Spot price series (Mean=43.96, Std=17.94)
Figure 6.1: Spot price time series for Germany(Feb, 2012 to May, 2012)
The spot price model presented in the previous chapter is applied to the
spot price time series for Germany taken from the EPEX website [23]. Fig.
6.1 shows the spot price time series and following points describes the data
used:
33
34 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
The spot prices are from February 2012 to May 2012.
The resolution of the prices is one hour.
Unless stated otherwise, the length of the insample data is 70% of the
training data. The rest of the training data is used for the out of
sampling predictions to nd out the preformance of the model. The
length of the both insample and out of sample data is rounded o to
the nearest multiple of 24.
The vertical load data is taken from TSO website [26]
All the data for weather variables come from Bloomberg.
1 29 57 86 114 143 171 200
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
autocorrelation of the price time series
Lag Length
Figure 6.2: Autocorrelation of the Spot price time series
Fig. 6.2 shows the autocorrelation plot for the price time series till lag of
200. The two blue lines above and below the x-axis show the 95% condence
bounds. The prices show a high degree of autocorrelation. In the model,
the amount of lag is kept as a variable and the performance indicators are
then used to nd out the best choice of the lag value. The model is tested
for dierent lags(1 day, 2 day and 4 day) and forecasting days (1 day, 3 days
and 5 days).
6.2. LSSVMAND ELM- SIMULATION RESULTS FOR SPOT PRICES35
6.2 LSSVM and ELM - Simulation results for Spot
Prices
6.2.1 Parameter Selection for LSSVM
The LSSVM method described in eq.(3.19) is applied to the spot price model.
LSSVM model requires tuning of parameters (eq.(3.17)) and (gaussian
kernel parameter). 10-fold crossvalidation is used for tuning (, ). con-
trols the error term in the optimization problem of LSSVM and controls
the shape of the Gaussian kernel. The algorithm for m-fold crossvalidation
is explained previously. The grid for selecting (, ) is a 2520 grid of dier-
ent combination of elements of vector (2
1
, 2
2
, . . . , 2
25
) and (2
1
, 2
2
, . . . , 2
20
).
Fig. 6.3 - Fig. 6.5 shows the corssvalidation scores for 1 day lag, 2 day lag
and 4 day lag. The value of MAPE depends on the choice of (, ). Also,
the amount of lags aect the MAPE results which indicate that dierent
lags need to be tested for better results.
0
5
10
15
20
0
10
20
30
0.16
0.18
0.2
0.22
0.24
0.26
0.28
Gamma
Sigma
M
A
P
E
Figure 6.3: Cross validation scores(1 day lag)
6.2.2 Training Results
Fig. 6.6 shows the results of LSSVM model on the training data. In the
hours 0 200, LSSVM method is able to capture the information from the
high prices and hence the actual spot price (blue points) are covered by the
LSSVM model (red line). In other cases, for example, from hours 6001200,
LSSVM model does not cover the actual spot price. This is expected because
the prices are not high in this region and hence there is no information to
predict the spikes. It also shows that the LSSVM method does not suer
from over-tting. Matching of the training results of LSSVM with spikes in
36 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
0
5
10
15
20
0
5
10
15
20
25
0.1
0.15
0.2
0.25
0.3
0.35
Gamma
Sigma
M
A
P
E
Figure 6.4: Cross validation scores(2 day lag)
0
5
10
15
20
0
10
20
30
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
Gamma
Sigma
M
A
P
E
Figure 6.5: Cross validation scores(4 day lag)
6.2. LSSVMAND ELM- SIMULATION RESULTS FOR SPOT PRICES37
0 200 400 600 800 1000 1200 1400 1600 1800 2000
50
0
50
100
150
200
250
hours
e
u
r
o
/
M
W
h
function estimation using LSSVM
=88.7925,
2
=53678.8552
RBF
datapoints (blue .), and estimation (red line)
Figure 6.6: LSSVM training results
the region that does not contain high prices shows over-tting. In regions,
where the prices do not contain spikes, the LSSVM model is fully able to
capture the trends in the time series. Fig. 6.7 shows the training results of
ELM. As compared to LSSVM, ELM results show overtting. In the hours
0 200, ELM is able to capture the information similar to LSSVM but in
hours 0 1200, ELM is able to capture all the peaks even though the prices
in the surrounding region are not high. This is because ELM is able to
obtain zero error on the in-sample data (theorem 1) if the number of hidden
neurons are greater than the length of in-sample data. This enables ELM
to capture all the peaks in the training data and causes overtting.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
50
0
50
100
150
200
250
ELM results (Mean=46.430362, Std=19.469965)
e
u
r
o
/
M
W
h
hours
Figure 6.7: ELM training results
38 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
6.2.3 Forecasting Performance
The forecasting performance of LSSVM and ELM is tested on in-sample
data and out of sample data. The forecasting horizon is 1 day, 3 days and 5
days. Only the forecasting for 5 days is presented here. For the rest of the
cases, please see Appendix 1.
In-Sample
Fig. 6.8 shows the forecasting of LSSVM and ELM on in-sample data.
The forecasting is done on a one step ahead basis. The forecasted value of
the previous hour is used to update the input vector (autoregressive terms)
and then the updated input vector is used to forecast the price in the next
hour. In-sample prediction by LSSVM model is able to capture the spot
price characteristics as illustrated by the low value of MAPE (for Fig. 6.8,
MAPE is 0.0483). In general, LSSVM model performs good on the in sample
data. A typical daily prole of spot prices has two peaks- one in afternoon
and other in evening. LSSVM is able to capture both the peaks as shown
in Fig. 6.8. ELM also gives a low error in in-sample forecasting. It is
able to capture the intra-day peaks as well. Both LSSVM and ELM are
able to capture all the prole variations of the price in in-sample data (price
estimation surrounding hour 80 in g. 6.8). All the peaks are also forecasted.
0 20 40 60 80 100 120
20
40
60
80
100
120
140
160
180
200
220
hours
e
u
r
o
/
M
W
h
ELM
SVM
Actual
Figure 6.8: LSSVM In Sample t - 5 Day forecast
Out of Sample
Fig. 6.9 shows the out of sample simulations. The out of sample simulation
has higher MAPE than the in-sample t for both LSSVM and ELM. The
error increases as the horizon of forecasting is increased. However, intra-day
seasonality is successfully captured by out of sample simulations for both
6.2. LSSVMAND ELM- SIMULATION RESULTS FOR SPOT PRICES39
0 20 40 60 80 100 120
20
40
60
80
100
120
140
hours
e
u
r
o
/
M
W
h
ELM
SVM
Actual
Figure 6.9: LSSVM Out of Sample t - 5 Day forecast
LSSVM and ELM. In-sample simulations are better than out of sample in
capturing the spikes. For the 5 day-ahead out of sample forecasting, MAPE
for LSSVM is 0.144 and for ELM is 0.403. This is because ELM shows better
tting on in-sample data. Overtting on insample data causes the ELM to
give a less accurate forecast than LSSVM on out of sample data. Both
LSSVM and ELM are not able to capture the small prole variations of spot
price. This is expected because the model aims to capture the characteristics
of spot price based on seasonality and external factors. For the small prole
variations, more information is required.
6.2.4 Residual Analysis
Analysis of residuals is the last step of time series modeling. A model can
be validated by performing a few statistical tests on the residuals. Residuals
should be checked for any autocorrelation and assumption of white noise
with zero mean and nite variance should also be veried. The residuals are
analyzed by following methods:
Autocorrelation of residuals
Fig. 6.10 and g. 6.11 show the autocorrelation of the residuals for LSSVM
and ELM. The blue lines show the 95% condence bounds. Any value out-
side these bounds show a signicant autocorrelation. For LSSVM, only lag
2 show signicant values. For ELM, lag 2 and 3 show signicant values. All
the other values for dierent lags are within the bound represented by the
blue lines. This validates the model. If the residuals are found to be auto-
correlated, it means that the model is not able to capture all the dynamics
of the time series. In such a case, the model must be changed. One way to
40 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
0 2 4 6 8 10 12 14 16 18 20
0.5
0
0.5
1
Lag
S
a
m
p
l
e
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
autocorrelation of prediction error for out of sample data
Figure 6.10: Autocorrelation of Out of Sample residuals for LSSVM
change the NARX models if the residuals are found to be autocorrelated is
to use AR-NARX model[2]. AR-NARX model include terms for autocorre-
lated residuals and hence can correct the missing dynamics of the time series
in the results of NARX model.
Histogram
Eq. (8.1) assumes that the prices can be described by an average value and
an error term. The error term is assumed to be white noise. To validate
this, histogram is used. The results for LSSVM and ELM are shown in
g.6.12 and g.6.13. The normal distribution is shown by red line. For
both LSSVM and ELM, the assumption of white noise does not hold. The
distribution of residuals do not t to the normal distribution. To overcome
this, bootstrapping [24] can be used to report the results of spot prices along
with residuals.
6.3 Forecast Accuracy Analysis
Table 6.1 and Table 6.2 show the complete results for 1 day ahead, 3 day
ahead and 5 day ahead forecast for both LSSVM and ELM. The results
are simulated for 1 day, 2 day and 4 day lag. Every simulation is reported
with three gures: MAPE, MAE and Standard deviation of error. Results
are given for in-sample and out of sample simulations. LSSVM gives better
results for 4 day lag as compared to 3 day lag. The magnitude of MAPE
increases from 1 day ahead to 5 day ahead forecasting. Also the MAE of
6.3. FORECAST ACCURACY ANALYSIS 41
0 2 4 6 8 10 12 14 16 18 20
0.5
0
0.5
1
Lag
S
a
m
p
l
e
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
autocorrelation of prediction error for out of sample data
Figure 6.11: Autocorrelation of Out of Sample residuals for ELM
10 5 0 5 10 15
0
1
2
3
4
5
6
7
8
prediction error for out of sample data (Mean=2.94,Std=3.82,MAPE=0.0969,MSE=22.6082)
N
o
o
f
e
r
r
o
r
s
errors
Figure 6.12: Histogram of Out of Sample residuals for LSSVM
42 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
15 10 5 0 5 10 15 20 25 30 35
0
1
2
3
4
5
6
7
8
prediction error for out of sample data (Mean=9.57,Std=7.33,MAPE=0.2429,MSE=143.1494)
N
o
o
f
e
r
r
o
r
s
errors
Figure 6.13: Histogram of Out of Sample residuals for ELM
LSSVM is low for 4 day lag as comparison to 1 day and 2 day lag.
For in-sample simulations of ELM, MAPE for 1 day is comparable to 4
day lag (lower for 1 day ahead and 3 day ahead but higher for 4 day ahead).
For out of sample simulations of ELM, MAPE for 4 day lag is lower than 1
day lag. So, 4 day lag seems to be better choice than 1 day and 2 day lag.
Also, ELM shows low MAPE than LSSVM for in-sample simulations but for
out of sample simulations, the MAPE of LSSVM is lower than ELM.
6.3. FORECAST ACCURACY ANALYSIS 43
1 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.196 0.039 0.167 0.366 0.048 0.099
MAE 18.2623 3.4595 15.2421 34.2326 4.8743 9.4666
Std of Error 21.922 3.223 17.779 27.139 6.11 8.472
3 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.27 0.08 0.309 0.378 0.061 0.123
MAE 21.4112 7.7861 22.038 38.6319 5.7389 11.4071
Std of Error 28.408 9.954 29.853 31.417 7.921 9.408
5 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.312 0.096 0.389 0.384 0.119 0.094
MAE 23.4406 9.104 26.0821 38.1143 9.2411 9.2426
Std of Error 29.97 13.365 30.665 34.086 11.165 12.07
Table 6.1: Model Performance - Insample
1 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.11987 0.45476 0.13309 0.27384 0.096908 0.24286
MAE 4.8983 18.8295 5.3834 11.2104 3.961 10.1423
Std of Error 4.7967 9.6386 6.3221 5.9224 3.8184 7.3289
3 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.21148 0.70117 0.18462 0.31356 0.11746 0.2278
MAE 10.2485 28.9651 8.2508 12.5524 5.1676 9.6407
Std of Error 13.9248 13.2899 10.343 8.483 6.2144 8.162
5 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.23927 0.78456 0.19857 0.47402 0.14435 0.40396
MAE 11.5346 34.4959 9.3133 20.6412 6.5877 18.0641
Std of Error 14.3283 17.7412 11.186 17.5195 7.3444 19.796
Table 6.2: Model Performance - Out of Sample
44 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
6.4 Transition Case
0 100 200 300 400 500 600
0
50
100
150
200
250
e
u
r
o
/
M
W
h
hours
Figure 6.14: In-Sample spot prices for transition case
In this section, LSSVM and ELM are simulated on a transition period.
Fig. 6.14 shows that data used to train the LSSVM and ELM. It con-
tains highly volatile price series. The price vary from 210 euro/MWh to 20
euro/MWh. The simulation is done for price shown in g. 6.15. The prices
vary from 25 euro/MWh to 85 euro/MWh. This will help in comparing
the performance of LSSVM and ELM for a transition period (e.g.- extreme
events like grid failure).
Fig.6.16 and g. 6.17 show the in-sample and out of sample simulation
results. Exact values for the forecast accuracy measures (MAPE, MAE and
Std of Error) are given in Table 6.3 and 6.4. LSSVM and ELM results show
similar characteristics to the results presented earlier. For 4 day lag, LSSVM
works better than ELM for out of sample forecasting. The relative value of
MAPE for both LSSVM and ELM are of the same magnitude as that of
results presented in the non transition case. This proves that LSSVM and
ELM produce good results for transition periods also.
6.5 LSSVM and ELM - Execution Time
Main bottleneck for the LSSVM algorithm is the tuning of the parame-
ters. LSSVM requires the error parameters and any kernel parameter to be
tuned before forecasting. The parameters are tuned by cross-validation. As
cross-validation is computationally intensive, the time taken for tuning the
parameters is large. Fig. 6.18 and g. 6.19 shows the MATLAB proler
results for LSSVM and ELM.
6.5. LSSVM AND ELM - EXECUTION TIME 45
0 20 40 60 80 100 120 140
20
30
40
50
60
70
80
90
e
u
r
o
/
M
W
h
hours
Figure 6.15: Out of Sample spot prices for transition case
0 20 40 60 80 100 120
20
40
60
80
100
120
140
160
180
200
220
hours
e
u
r
o
/
M
W
h
ELM
SVM
Actual
Figure 6.16: LSSVM and ELM performance for transition case (in-sample)
46 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
0 20 40 60 80 100 120
20
40
60
80
100
120
140
hours
e
u
r
o
/
M
W
h
ELM
SVM
Actual
Figure 6.17: LSSVM and ELM performance for transition case (out of sam-
ple)
1 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.10862 0.010998 0.12179 0.20265 0.046016 0.067095
MAE 10.9525 1.0042 11.3353 20.1434 4.5813 6.5181
Std of Error 15.3677 1.1374 14.7118 18.7345 5.568 6.0637
3 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.28503 0.024226 0.22258 0.20116 0.054821 0.062249
MAE 30.5387 2.0749 17.7301 18.1292 5.5993 5.5309
Std of Error 34.5162 3.1191 24.3197 23.0094 7.3096 7.2614
5 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.31801 0.069171 0.29327 0.21263 0.08856 0.062601
MAE 32.5884 5.8066 21.4408 18.0301 7.6937 6.065
Std of Error 35.2139 9.6145 26.2777 24.0909 10.3579 10.0281
Table 6.3: Model Performance - Insample transition case
6.5. LSSVM AND ELM - EXECUTION TIME 47
1 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.19984 0.40075 0.22542 0.2096 0.20274 0.073761
MAE 7.912 17.3135 10.597 10.3734 8.9115 3.2521
Std of Error 11.3417 12.6874 9.2753 10.5921 10.8091 4.2678
3 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.36261 0.57271 0.26487 0.20324 0.20811 0.52137
MAE 16.5731 25.8249 12.7893 9.4969 9.7692 24.3478
Std of Error 18.6107 16.3675 10.5217 10.9103 8.3409 24.0487
5 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.35352 0.75766 0.3071 0.39624 0.2076 0.64918
MAE 17.0662 34.9022 15.1247 18.9643 10.1683 30.0395
Std of Error 18.3342 16.0931 13.8459 13.8474 8.4558 21.7506
Table 6.4: Model Performance - Out Sample transition case
Figure 6.18: MATLAB proler results for LSSVM
48 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
Figure 6.19: MATLAB proler results for ELM
Proler gives the running time taken by all the functions used in the
code. Function crossvallinsvm implements the crossvalidation for LSSVM.
It takes 22329 seconds to execute. For ELM there is no parameter to be
tuned. The number of hidden neurons is a parameter for ELM but if the
number of hidden neurons is of same magnitude as the length of the train-
ing data, the results do not depend on the number of hidden neurons [3].
Function trainelmmoore implements the ELM and it takes just 1.6 seconds.
Hence, ELM is much faster than LSSVM.
Part II
Constrained Estimation
49
Chapter 7
Constrained Estimation
Figure 7.1: Constrained Estimation
In forecasting, it is sometimes desirable to put some constraints during
the estimation step. Fig. 7.1 shows non-constrained estimation and con-
strained estimation. In non-constrained estimation, a model is proposed for
a given time series and it is trained using training data and predictions are
made. This approach is used in the rst part of the thesis. In constrained
estimation, constraints are added at the time of training the model as shown
in g. 7.1. The model is trained such that it ts the training data as well as
satisfy the constraints. This will help in making predictions that will follow
the constraints. For example, consider the forecasting of PV infeed. Based
on a empirical study that relates the feed-in tari (FIT) and investment
in PV industry, it is possible to dene a set of constraints that relate the
possible changes in FIT to the amount of PV infeed. These constrains can
be incorporated in the model at the time of training and then predictions
51
52 CHAPTER 7. CONSTRAINED ESTIMATION
can be made. This will result in a model that ts the training data and at
the same time satisfy the constraints also. This part of the thesis explains
the constrained estimation problems under the framework of Support Vector
Machines and Extreme Learning Machines.
7.1 SVM and ELM for constrained estimation
To apply the non-linear estimation algorithms to constrained estimation,
Support Vector Machines and Extreme Learning Machines are proposed in
this chapter. For using the support vector regression theory, it is proposed
to use the SVM with random feature spaces [5] and for using ELM theory,
an optimization based ELM is used[8]. The constrained estimation problem
can be written as:
f(x) = w
T
(x) +b (7.1)
such that w
T
(x
) (7.2)
Here, f(x) is the function to be estimated, x is the input vector, is a high
dimensional mapping similar to the estimation problem discussed previously
and (w, b) are the parameters of the model. The given constraint shows just
an example of the possible constraints. It means that during prediction the
value of the function should not be more than for inputs x
.
7.2 SVM with external constraints
To solve the estimation problems with constraints, it is required to include
the external constraints during the formulation of the optimization problem
of SVM. Consider the optimization problem of a standard SVM as given in
eq.(3.4). To solve a time series estimation problem with external constraints
using SVM theory, the optimization problem is modied along the lines of
eq. (7.2) as
min
w,b,,
J
p
(w, ,
) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
such that y
k
w
T
(x
k
) b +
k
, k = 1, . . . , N
w
T
(x
k
) +b y
k
+
k
, k = 1, . . . , N
k
,
k
0, k = 1, . . . , N
w
T
(x
) +b , S, S [N + 1, . . . , M] (7.3)
Eq.(7.3) is written for a time series dened by a training set {x
k
, y
k
}
N
k=1
,
is the future time for which we want to add constraints and M N + 1.
For other parameter, please refer eq.(3.4). One way to solve the primal
problem of eq.(7.3) is to convert it in the dual problem as described next.
7.2. SVM WITH EXTERNAL CONSTRAINTS 53
7.2.1 Solving the dual problem
The Lagrangian of the primal problem in eq.(7.3) is
L(w, b, ,
; ,
, ,
, ) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
k=1
k
( +
k
y
k
+w
T
(x
k
) +b)
N
k=1
k
( +
k
+y
k
w
T
(x
k
) b)
k=1
(
k
k
+
k
) +
M
=N+1
(w
T
(x
) +b ) (7.4)
The conditions of optimality give
L
w
= 0 w =
N
k=1
(
k
k
)(x
k
)
M
=N+1
(x
)
L
b
= 0
N
k=1
(
k
+
k
) +
M
=N+1
= 0
L
= 0 c
k
k
= 0
L
= 0 c
k
= 0 (7.5)
The dual problem can be written using eq.(7.5) and eq.(7.4)
max
,
,
J
d
=
1
2
N
k,l=1
(
k
k
)(
l
l
)K(x
k
, x
l
)
N
k=1
M
=N+1
(
k
k
)
K(x
k
, x
k=1
(
k
+
k
) +
N
k=1
y
k
(
k
k
)
M
=N+1
such that
N
k=1
(
k
+
k
) +
M
=N+1
= 0
k
,
k
[0, c] (7.6)
Using the value of w from eq.(7.5), the estimated function from eq.(3.1)
can be written as following
f(x) =
N
k=1
(
k
k
)K(x, x
k
)
M
=N+1
K(x, x
) +b
54 CHAPTER 7. CONSTRAINED ESTIMATION
To solve the constrained estimation with the dual problem, it is required
to write explicitly the dual problem every time more constraints are added
or the current constraints are modied. Also, it is also not possible to write
the dual problem in terms of kernel functions for every type of constraints.
Next, SVM with random feature spaces is described that makes it possible
to solve the optimization problem without explicitly converting to dual form
and hence kernel function need not be dened.
1
7.2.2 Solving with the random feature space
Figure 7.2: Random Feature Space based SVM for constrained estimation
Frenay et al.[5] has proposed a way to merge the ELM and SVM ap-
proaches by dening a new method to explicitly dene the feature space.
This feature space is called random feature space as the parameters used to
dene it can be selected randomly. In ELM, the input vectors are mapped
to the hidden layer neurons by a randomly generated matrix[3]. This is
analogous to dening a new feature space where the hidden layer acts as a
1
Random feature space can be related to a kernel known as ELM kernel[5]
7.3. ELM WITH EXTERNAL CONSTRAINTS 55
transformation from the input vector space to the hidden neurons space. So,
for example, the feature space can be dened as following for a sigmoidal
function
i
(x
k
) =
1
1 + exp(w
i
.x
k
b)
, i = 1, . . . , h
(x
k
) =
_
1
(x
k
)
2
(x
k
) . . .
h
(x
k
)
(7.7)
The mapping (.) : R
n
R
h
takes the input vector x
k
R
n
to the h-
dimensional space R
h
, where h is the dimension of the high dimensional
feature space. In [25], Liu also proposes to use explicitly dened feature
spaces to form an Extreme Support Vector Machine (ESVM). The optimiza-
tion problem in eq.(7.3) can now be solved without kernels as the feature
mapping is known. Knowing the feature mapping enables to write the ex-
ternal constraints in the optimization problem rather than solving the dual
problem that utilizes the kernels but needs to be formulated every time the
constraints are changed. In [5], ELM kernel has been introduced based on
random feature spaces that can enable to use the Fixed Size SVM approach
in case of large data sets. The ELM kernel[5] is dened as
k(x
k
, x
l
) =
1
p
x
k
x
l
where is dened as in equation 7.7. Fig.7.2 shows the SVM method for
constrained estimation.
7.3 ELM with external constraints
The SVM output can be written as
f(x) =
N
s
s=1
(
s
s
)K(x, x
s
) +b (7.8)
Eq.(7.8) suggests that SVM can be compared to a generalized single-
hidden layer feed forward networks [8]. Note that eq. (7.8) is same as
eq.(3.12) where s denotes the support vectors. is zero for all but support
vectors. A comparision of eq. (7.8) and eq. (3.22) suggests that kernel
k(x, x
s
) is comparable to activation function h(x) and the Lagrangian factors
(
s
s
) are comparable to output weights . So, there can be a possibility
for combining ELM and SVM[8]. In [8], Huang et al. has suggested two
ways to combine SVM and ELM theory:
using random kernels
using a optimization based ELM
For this thesis, optimization based ELM is more suitable as it will help
to include a set of external constraints. For details on kernel based ELM,
please refer to [8] and [9].
56 CHAPTER 7. CONSTRAINED ESTIMATION
7.3.1 Optimization based ELM
Consider the training data {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
. The
ELM as presented in eq.(3.28) obtains zero training error based on theorem
1. The ELM based on the Moore-Penrose generalised inverse is a solution
to the following problem:
minimize
N
i=1
h(x
i
) y
i
and
minimize (7.9)
The Moore-Penrose generalized inverse is a least square solution to eq.(7.9).
It is possible to formulate ELM as an optimization problem with an error
bound as presented below[8], [9].
min
,
1
,
2
: L
p
=
1
2
2
+C
N
i=1
(
1
+
2
)
such that:h(x
i
) y
i
1
, i = 1, . . . , N
h(x
i
) y
i
2
, i = 1, . . . , N
1
,
2
0, i = 1, . . . , N (7.10)
The ELM presented in eq.(7.10) prevents possible over-tting as it as-
sumes error bounds and has the possibility to improve the generalization
performance[8]. The activation function is formed using random w and b as
described earlier. So, in this case, it is possible to include a set of external
constraints as described in next section.
Optimization based ELM with external constraints
The optimization based ELM provides a way to include some sets of external
constraints. As already explained, in time series forecasting, it is highly
desirable to have the possibility of including external constraints. From
here on, the optimization based ELM is called ELM-variant [8]. The ELM-
variant with a set of external constraints is described below.
min
,
1
,
2
: L
p
=
1
2
2
+C
N
i=1
(
1
+
2
)
such that:h(x
i
) y
i
1
, i = 1, . . . , N
h(x
i
) y
i
2
, i = 1, . . . , N
1
,
2
0, i = 1, . . . , N (7.11)
h(x
, S, S [N + 1, . . . , M] (7.12)
7.4. RESULTS FOR A ARTIFICIAL KNOWN PROCESS 57
is the future time for which we want to add constraints and M N + 1.
Since activation function can be formed by using random w and b, the opti-
mization problem can be extended to include external constraints as given
in eq. (7.12). Fig. 7.3 shows the ELM-variant with external constraints.
Figure 7.3: ELM-variant with external constraints
7.4 Results for a articial known process
The random feature space based SVM and ELM-variant is applied to a
articial process in this section. The function to be estimated is:
y = (x
2
1)x
4
exp(x)
such that (x
test
) L.B.
L.B. (x
test
) U.B. (7.13)
L.B. means lower bound and U.B means upper bound.
7.4.1 Random feature space based SVM
SVM is applied to the function of eq. (7.13). The feature space is chosen
explicitly as a sigmoidal function and the parameters (w, b) are chosen ran-
58 CHAPTER 7. CONSTRAINED ESTIMATION
domly. The dimension of the feature space is predetermined. The dimension
of feature space does not aect the results as long as it is of same magnitude
as the size of input data. Fig. 7.4 shows the result of SVM without any
constraints.
1 0.5 0 0.5 1
0.5
0
0.5
1
1.5
2
x
y
original fn
SVM result
Figure 7.4: SVM results (Out of Sample)
Constraint: (x
test
) >= 0
Figure 7.5 shows the results of SVM for this constraint. In g.7.4 the SVM
estimates negative values. This constraint direct the optimization solver to
nd (w, b) such that for the points in the testing data x
test
, the function is
always greater than zero.
Constraint: 0 <= (x
test
) <= 1.4
Figure 7.6 shows the results of SVM for this constraint. In g.7.5, the
maximum estimated value is more than 1.4. This constraint put a upper
bound on the estimated value such that for the points in the testing data
x
test
, the function is always greater than zero but less than 1.4 .
7.4.2 ELM variant
ELM -variant is applied to the function in eq.(7.13). Fig.7.7 show the results
of ELM-variant without any constraints.
Constraint: (x
test
) >= 0
Figure 7.8 shows the results of ELM variant for this constraint. This con-
straint direct the optimization solver to nd (w, b) such that for the points
7.4. RESULTS FOR A ARTIFICIAL KNOWN PROCESS 59
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
y
original fn
SVM result
Figure 7.5: SVM results for constraint 1(Out of Sample)
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
y
original fn
SVM result
Figure 7.6: SVM results for constraint 2(Out of Sample)
in the testing data x
test
, the function is always greater than zero.
Constraint: 0 <= (x
test
) <= 1.5
Figure 7.9 shows the results of ELM variant for this constraint. In g.7.8,
the maximum estimated value is more than 1.5. This constraint put a upper
bound on the estimated value such that for the points in the testing data
x
test
, the function is always greater than zero but less than 1.5 .
60 CHAPTER 7. CONSTRAINED ESTIMATION
1 0.5 0 0.5 1
0.5
0
0.5
1
1.5
2
x
y
original fn
ELM result
Figure 7.7: ELM results for constraint 1(Out of Sample)
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
y
original fn
ELM result
Figure 7.8: ELM results for constraint 1(Out of Sample)
7.4. RESULTS FOR A ARTIFICIAL KNOWN PROCESS 61
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
y
original fn
ELM result
Figure 7.9: ELM results for constraint 2(Out of Sample)
62 CHAPTER 7. CONSTRAINED ESTIMATION
Chapter 8
Case Study - PV Infeed
Forecasting
8.1 Photovoltaic Infeed forecast model
The underlying equation for the PV infeed forecast is same as the Spot Price
forecast. The PV infeed can be written as:
y
t
= y
t
+
t
(8.1)
where y
t
is the PV infeed with average y
t
and
t
is a white noise process
with zero expectation and a nite variance i.e. (0,
2
). The model for
the hourly PV infeed is
y
t
=
y
t
+
t
(8.2)
Here, y
t
is the estimated PV infeed with estimated average y
t
and
t
is a
white noise process with zero expectation and a nite variance i.e.
t
(0,
2
). The estimated average is written as the function of a regression
vector x
t
in R
n
y
t
= f(x
t
) (8.3)
8.2 Characteristics of PV infeed
Photovoltaics infeed depend on a variety of factors. The amount of diused
radiation received on earth is correlated to temperature of the region. It can
also be correlated to the precipitation. Additionally, PV time series of two
given years shows signicant degree of cross correlation. PV infeed show
multi scale seasonality also. All these characteristics are discussed below in
detail:
63
64 CHAPTER 8. CASE STUDY - PV INFEED FORECASTING
Autocorrelation
Fig.8.1 shows the cross-correlation between PV infeed for 2012(March-June)
and 2011(March-June). The blue lines show 95% condence bounds. Any
value of cross correlation factor outside t he blue lines show a signicant
degree of cross-correlation.
20 15 10 5 0 5 10 15 20
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
Lag
S
a
m
p
l
e
C
r
o
s
s
C
o
r
r
e
l
a
t
i
o
n
Cross Correlation Function of PV infeed for 2011 and 2012
Figure 8.1: Cross correlation of PV infeed time series for 2011 and 2012
Multi-Scale Seasonality
PV time series shows dierent types of seasonality. Fig. 8.2 shows the PV
infeed for 4 months in a 3-d view. The PV infeed is plotted for every hour
and arranged by dierent weeks. The hour axis show 168 hours of the week.
The hour axis shows the intra-day seasonality. In-feed starts with zero,
increases as the sun increases the elevation angle and then decreases to zero
again in night. The week axis show the PV infeed over a period of 4 months
starting from March to June. The amount of in-feed increases as the month
change from March to June and summer approches.
Temperature
PV in-feed depend on temperature also. Temperature of a region can give an
indication of the amount of diused radiation received from the Sun during
a given day. So, the correlation between PV infeed and temperature can be
used to build a model for PV in-feed. Fig. 8.3 shows the relation between
PV in-feed and temperature.
8.2. CHARACTERISTICS OF PV INFEED 65
0 20 40 60 80 100 120 140 160 180
0
5
10
15
20
0
5000
10000
weeks
hours
P
V
i
n
n
e
e
d
(
M
W
)
Figure 8.2: Multi Scale Seasonality of PV infeed
0 50 100 150 200 250 300 350 400
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
N
o
r
m
a
l
i
z
e
d
t
e
m
p
e
r
a
t
u
r
e
a
n
d
P
V
i
n
f
e
e
d
hours
PV infeed
Mean Temperature
Figure 8.3: Correlation of PV infeed and Mean Temperature
66 CHAPTER 8. CASE STUDY - PV INFEED FORECASTING
8.3 Model for PV infeed
Based on the analysis of time series of PV, a NARX model is proposed.
NARX models are able to explain the autoregressive components and the
eect of external factors. The input vector for the NARX model of PV
consists of:
1. The Autoregressive part: The PV infeed is regressed with a window
consisting of previous year PV infeed values. The value of lag is kept
as a variable.
2. The external inputs: The external inputs taken in the input vector
are:
Weather variables include maximum temperature T
max
t
, mini-
mum temperature T
min
t
, mean temperature T
mean
t
, heating days
Hd
t
, cooling days Cd
t
and precipitation PP
t
. They can be grouped
together as Wea
t
Wea
t
=
_
T
max
t
T
min
t
T
mean
t
Hd
t
Cd
t
PP
t