Photovoltaic Output Prediction Using Support Vector Machines

eeh
power systems
laboratory
Abhishek Rohatgi
Machine Learning Methods for Power
Markets
Master Thesis
PSL 1217
EEH Power Systems Laboratory
Swiss Federal Institute of Technology (ETH) Zurich
Examiner: Prof. Dr. Goran Andersson
Supervisor: Marcus Hildmann
Zurich, September 17, 2012
ii
Do not believe in anything simply because you have heard it.
Do not believe in anything simply because it is spoken and rumored by many.
Do not believe in anything simply because it is found written in your religious
books.
Do not believe in anything merely on the authority of your teachers and
elders.
Do not believe in traditions because they have been handed down for many
generations.
But after observation and analysis, when you nd that anything agrees with
reason and is conducive to the good and benet of one and all, then accept
it and live up to it.
-Gautama Buddha
Preface
This report is a result of my master thesis carried out at Power System
Laboratories (PSL), ETH Z urich from March, 2012 to August, 2012. Using
this opportunity, I would like to thank several people who gave me full
support during my days at PSL. I would like to thank Marcus Hildmann
for his support, useful ideas and comments on my work. I am thankful to
Prof. Goran Andersson for giving me the opportunity for the project work
at PSL. Last but not least, i am grateful to my colleagues at PSL for the
welcoming atmosphere and fruitful discussions.
Abhishek Rohatgi
Z urich, September 17, 2012
iii
iv
Abstract
To manage risks in electricity markets, forecasting of market variables like
spot price, load demand and Hourly price Forward Curve is an important
research area. Forecasting using linear estimation methods suer from the
problem of under-tting and over-tting. Ordinary Least Squares (OLS)
which is a popular linear estimation method, estimates the mean of the
data. Prole forecasting of time series needs non-linear estimation methods.
In this thesis, Support Vector Machines (SVM) and Extreme Learning Ma-
chines (ELM) are used for estimation of time series. The thesis consists of
two parts. In the rst part, SVM and ELM algorithms are presented and
simulated for forecasting of spot price time series. In the second half of the
thesis, the problem of constrained estimation is explained and two methods
are suggested based on SVM and ELM theory.
Support Vector Machine is a machine learning algorithm used for function
estimation. It converts the non-linear function estimation problem to a
convex optimization problem by using functions called kernels. Extreme
Learning Machine is a learning algorithm for Single Layer Feedforward Neu-
ral Networks and it estimates the function using Moore-Penrose generalized
inverse matrix of the activation function of the neural network.
A case study of spot price forecasting for Germany is presented using both
of the learning algorithms. After identifying the characteristics of the spot
price time series, a Non-Linear Autoregressive model with Exogenous inputs
(NARX) is proposed to capture the dynamics of spot prices. To simulate the
spot prices, a computationally simple version of SVM called Least Square
Support Vector Machine (LSSVM) is used. 1-day ahead, 3-day ahead and
5-day ahead forecasting is simulated for dierent lags of the spot price time
series. LSSVM performs better than ELM for out of sample forecasting.
However, parameters of LSSVM need to be tuned before training. The tun-
ing process is computationally intensive. Hence, ELM is much faster than
LSSVM. Additionally, out of sample residuals are analyzed for autocorrela-
tion to establish the validity of the model.
Constrained Estimation means to solve the model subjected to external con-
straints. It is required for estimation of time series like HPFC, PV in-feed
etc. A proposal is made to include the external constraints in the SVM
and ELM theory. The proposed SVM and ELM is applied to a case study
v
vi
of Photovoltaic in-feed forecasting. Results are presented for a few test
constraints. Both SVM and ELM produce good results for constrained esti-
mation. Finally, the thesis is concluded with a discussion of the future work
on the application of SVM and ELM for time series analysis and constrained
estimation.
Kurzfassung
Um mit den Risiken auf den Strommarkten umzugehen, ist die Prognos-
tizierung von Variablen wie zum Beispiel dem Spotpreis, den Lasten und
der Stundenterminpreis-Kurve (Hourly Price Forward Curce, HPFC) ein
wichtiges Forschungsgebiet. Prognoseverfahre, die auf linearen Schatzungen
basieren leiden an den Problemen der Unteranpassung (unter-tting) und
Uberanpassung (over-tting). Die Methode der kleinsten Quadrate (Ordi-

nary Least Squares, OLS), die eine beliebte lineare Schatzungsmethode ist,
schatzt den Mittelwert der Daten. Die Prognose von Zeitreihen benotigt je-
doch nichtlineare Schatzverfahren. In dieser Arbeit werden Support Vector
Machines (SVM) und Extreme Learning Machines (ELM) f ur die Schatzung
von Zeitreihen verwendet. Die Arbeit besteht aus zwei Teilen. Im ersten
Teil werden SVM und ELM Algorithmen vorgestellt und f ur die Prognos-
tizierung von Spotpreis Zeitreihen eingesetzt. In der zweiten Halfte der
Arbeit wird das Problem mit der Schatzung mit Nebenbedingungen erklart
und es werden zwei Methoden, die auf SVM und ELM Theorie basieren,
vorgeschlagen.
Eine Support Vector Machine ist ein Konzept aus dem Bereich des maschi-
nellen Lernens, das f ur Schatzungen und Regressionen eingesetzt werden
kann. Es verwandelt ein nicht-lineares Schatzproblem in ein konvexes Op-
timierungsproblem unter Verwendung von sogenannten Kernels. Eine Ex-
treme Learning Machine ist ein Lernalgorithmus f ur Single Layer Feedfor-
ward Neural Networks und schatzt die Funktion mit der allgemeinen Moore-
Penrose Inversmatrix der Aktivierungsfunktion des neuronalen Netzes.
Eine Fallstudie uber die Spotpreisprognose f ur Deutschland wird f ur
beide Lernalgorithmen prasentiert. Nach der Identizierung der Eigen-
schaften der Spotpreis Zeitreihe wird ein nicht-lineares autoregressives Mod-
ell mit exogenen Inputs (Non-Linear Autoregressive Model with Exogenous
Inputs - NARX) vorgeschlagen, um die Dynamik der Spotpreise zu erfassen.
Um die Spotpreise zu simulieren, wird eine vereinfachte Version einer SVM,
die sogenannte Least Square Support Vector Machine (LSSVM) verwen-
det. One day ahead, three day ahead und ve day ahead Prognosen wer-
den f ur verschiedene Verzogerungen der Spotpreis Zeitreihen erstellt. Die
LSSVM f uhrt zu besseren Ergebnissen als die ELM f ur out of sample Prog-
nosen. Allerdings m ussen die Parameter der LSSVM vor dem Training
vii
viii
abgestimmt werden. Dieser Tuning-Prozess ist rechenintensiv. Daher ist
die ELM viel schneller als die LSSVM. Zusatzlich werden out of sample
residuals analysiert um per Autokorrelation die G ultigkeit des Modells zu
etablieren.
Die Schatzung mit Nebenbedingungen bedeutet, das Modell unter der
Ber ucksichtigung von aueren Bedingungen zu losen. Diese Art der Schatzung
wird f ur die Prognostizierung von Zeitreihen wie der HPFC, und Foto-
voltaikeinspeisung benotigt. Ein Vorschlag, die notigen externen Nebenbe-
dingungen in die SVM und ELM Theorie zu integrieren, wird gemacht. Die
vorgeschlagenen SVM und ELM werden in einer Fallstudie uber die Prognose
von Fotovoltaikeinspeisung angewendet. Ergebnisse f ur einige Testnebenbe-
dingungen werden vorgestellt. Die SVM und ELM f uhren beide zu guten
Ergebnissen f ur die Schatzung mit Nebenbedingungen. Die Arbeit schliesst
mit einer Diskussion uber zuk unftige Forschungsmoglichkeiten auf dem Ge-
biet der Anwendung von SVM und ELM f ur die Zeitreihenanalyse und die
Schatzung mit Nebenbedingungen ab.
Contents
List of Acronyms xiii
List of Symbols xv
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Machine Learning Algorithms . . . . . . . . . . . . . . 2
1.3.2 Estimation and Prediction for Electricity Markets . . 3
I Forecasting in Electricity Markets 5
2 Electricity Market Analysis 7
2.1 Electricity Market Deregulation . . . . . . . . . . . . . . . . . 7
2.2 Risks in Electricity Markets . . . . . . . . . . . . . . . . . . . 8
2.3 Non-Linear Problems in Electricity Markets . . . . . . . . . . 9
3 SVM and ELM 11
3.1 Support Vector Machines . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Linear SVM . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Nonlinear Extension of SVM . . . . . . . . . . . . . . 13
3.1.3 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.4 Least Square Support Vector Machines . . . . . . . . . 16
3.1.5 Tuning parameters . . . . . . . . . . . . . . . . . . . . 17
3.2 Extreme Learning Machines . . . . . . . . . . . . . . . . . . . 18
3.2.1 ELM theory . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Case Study - Spot Price Forecasting 21
4.1 Characteristics of Electricity Prices . . . . . . . . . . . . . . . 21
4.2 Methods for Price forecasting . . . . . . . . . . . . . . . . . . 22
4.2.1 Importance and need of price forecasting . . . . . . . 22
4.2.2 Modeling of Electricity Prices . . . . . . . . . . . . . . 23
ix
x CONTENTS
5 Model Representation and Estimation 27
5.1 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Short Term Spot Price Model . . . . . . . . . . . . . . . . . . 27
5.2.1 Forecast Accuracy Measures . . . . . . . . . . . . . . . 31
6 Empirical Analysis- Price Forecasting 33
6.1 Spot Price Model . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2 LSSVM and ELM - Simulation results for Spot Prices . . . . 35
6.2.1 Parameter Selection for LSSVM . . . . . . . . . . . . 35
6.2.2 Training Results . . . . . . . . . . . . . . . . . . . . . 35
6.2.3 Forecasting Performance . . . . . . . . . . . . . . . . . 38
6.2.4 Residual Analysis . . . . . . . . . . . . . . . . . . . . . 39
6.3 Forecast Accuracy Analysis . . . . . . . . . . . . . . . . . . . 40
6.4 Transition Case . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.5 LSSVM and ELM - Execution Time . . . . . . . . . . . . . . 44
II Constrained Estimation 49
7 Constrained Estimation 51
7.1 SVM and ELM for constrained estimation . . . . . . . . . . . 52
7.2 SVM with external constraints . . . . . . . . . . . . . . . . . 52
7.2.1 Solving the dual problem . . . . . . . . . . . . . . . . 53
7.2.2 Solving with the random feature space . . . . . . . . . 54
7.3 ELM with external constraints . . . . . . . . . . . . . . . . . 55
7.3.1 Optimization based ELM . . . . . . . . . . . . . . . . 56
7.4 Results for a articial known process . . . . . . . . . . . . . . 57
7.4.1 Random feature space based SVM . . . . . . . . . . . 57
7.4.2 ELM variant . . . . . . . . . . . . . . . . . . . . . . . 58
8 Case Study - PV Infeed Forecasting 63
8.1 Photovoltaic Infeed forecast model . . . . . . . . . . . . . . . 63
8.2 Characteristics of PV infeed . . . . . . . . . . . . . . . . . . . 63
8.3 Model for PV infeed . . . . . . . . . . . . . . . . . . . . . . . 66
8.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.5 SVM results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.5.1 SVM with random feature space without constraints . 67
8.5.2 SVM with random feature space with constraints . . . 67
8.6 ELM results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.6.1 ELM-variant with constraints . . . . . . . . . . . . . . 72
9 Conclusion 75
A LSSVM and ELM complete results 77
CONTENTS xi
B Complete Results for Constrained Estimation 83
Bibliography 89
xii CONTENTS
List of Acronyms
SVM Support Vector Machine
ELM Extreme Learning Machine
ANN Articial Neural Networks
TSO Transmission System Operator
ACF Autocorrelation Factor
NARX Non-Linear Autoregressive with External Inputs
AR Autoregressive
MAE Mean Absolute Error
MAPE Mean Absolute Percentage Error
OLS Ordinary Least Squares
EPEX European Power Exchange
SLFN Single Layer Feedforward Network
LSSVM Least Square Support Vector Machine
xiii
xiv CONTENTS
List of Symbols
{x
k
, y
k
}
N
k=1
Training data, x
k
is input, y
k
is output, N is the number of data points
High dimensional feature space for SVM
H Activation function matrix for ELM
Gaussian Kernel parameter
Regularization parameter of LSSVM
K(x
i
, x
j
) Kernel function dened for x
i
and x
j
Output weight of the SLFN
Wea
t
Weather variables
T
max
t
maximum temperature
T
min
t
minimum temperature
T
mean
t
mean temperature
W
s
t
wind speed
PP
t
precipitation
H
t
Dummy variable for hours of a day
D
t
Dummy variable for days of a week
M
t
Dummy variable for Months of a year
L
t
Vertical Load
xv
xvi CONTENTS
List of Figures
2.1 Deregulation of Electricity Industry . . . . . . . . . . . . . . 7
2.2 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Primal and dual optimization problem of SVM . . . . . . . . 15
3.2 Single Layer Feedforward Neural Network . . . . . . . . . . . 18
4.1 Combining Fundamental and Quantitative Approach for price
forecasting [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1 Steps in time series forecasting . . . . . . . . . . . . . . . . . 28
5.2 Autocorrelation of the EPEX prices for March 2012 . . . . . 29
5.3 Correlation of electricity prices and load . . . . . . . . . . . . 29
5.4 Correlation of electricity prices and temperature . . . . . . . 30
5.5 Seasonality of electricity prices . . . . . . . . . . . . . . . . . 31
5.6 NARX model for electricity spot prices . . . . . . . . . . . . . 32
6.1 Spot price time series for Germany(Feb, 2012 to May, 2012) . 33
6.2 Autocorrelation of the Spot price time series . . . . . . . . . . 34
6.3 Cross validation scores(1 day lag) . . . . . . . . . . . . . . . . 35
6.6 LSSVM training results . . . . . . . . . . . . . . . . . . . . . 37
6.7 ELM training results . . . . . . . . . . . . . . . . . . . . . . . 37
6.8 LSSVM In Sample t - 5 Day forecast . . . . . . . . . . . . . 38
6.9 LSSVM Out of Sample t - 5 Day forecast . . . . . . . . . . . 39
6.10 Autocorrelation of Out of Sample residuals for LSSVM . . . . 40
6.11 Autocorrelation of Out of Sample residuals for ELM . . . . . 41
6.12 Histogram of Out of Sample residuals for LSSVM . . . . . . . 41
6.13 Histogram of Out of Sample residuals for ELM . . . . . . . . 42
6.14 In-Sample spot prices for transition case . . . . . . . . . . . . 44
6.15 Out of Sample spot prices for transition case . . . . . . . . . 45
6.16 LSSVM and ELM performance for transition case (in-sample) 45
6.17 LSSVM and ELM performance for transition case (out of sam-
ple) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
xvii
xviii LIST OF FIGURES
6.18 MATLAB proler results for LSSVM . . . . . . . . . . . . . . 47
6.19 MATLAB proler results for ELM . . . . . . . . . . . . . . . 48
7.1 Constrained Estimation . . . . . . . . . . . . . . . . . . . . . 51
7.2 Random Feature Space based SVM for constrained estimation 54
7.3 ELM-variant with external constraints . . . . . . . . . . . . . 57
7.4 SVM results (Out of Sample) . . . . . . . . . . . . . . . . . . 58
7.5 SVM results for constraint 1(Out of Sample) . . . . . . . . . 59
7.6 SVM results for constraint 2(Out of Sample) . . . . . . . . . 59
7.7 ELM results for constraint 1(Out of Sample) . . . . . . . . . 60
8.1 Cross correlation of PV infeed time series for 2011 and 2012 . 64
8.2 Multi Scale Seasonality of PV infeed . . . . . . . . . . . . . . 65
8.3 Correlation of PV infeed and Mean Temperature . . . . . . . 65
8.4 PV infeed time series from March 2012 to Jun 2012 . . . . . . 67
8.5 Training results (SVM with random feature space,without
constraints) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.6 In Sample results (SVM with random feature space,without
constraints) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.7 Out of Sample results (SVM with random feature space,without
constraints) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.8 In Sample results (SVM with random feature space,with con-
straints 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.9 Out of Sample results (SVM with random feature space,with
constraints 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.10 Out of Sample results (SVM with random feature space,with
constraints 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.11 Out of Sample results (ELM with constraints 1) . . . . . . . . 72
8.12 Out of Sample results (ELM with constraints 2) . . . . . . . . 73
A.1 LSSVM In Sample t - 1 Day forecast . . . . . . . . . . . . . 77
A.2 LSSVM In Sample t - 3 Day forecast . . . . . . . . . . . . . 78
A.3 LSSVM Out of Sample t - 1 Day forecast . . . . . . . . . . . 78
A.4 LSSVM Out of Sample t - 3 Day forecast . . . . . . . . . . . 79
A.5 ELM In Sample t - 1 Day forecast . . . . . . . . . . . . . . . 79
A.6 ELM In Sample t - 3 Day forecast . . . . . . . . . . . . . . . 80
A.7 ELM Out of Sample t - 1 Day forecast . . . . . . . . . . . . 80
A.8 ELM Out of Sample t - 3 Day forecast . . . . . . . . . . . . 81
B.1 ELM-variant training results . . . . . . . . . . . . . . . . . . 83
B.2 In-sample results for PV in-feed(ELM, no constraint) . . . . . 84
B.3 Out of sample results for PV in-feed(ELM, no constraint) . . 84
B.4 Autocorrelation of in-sample residuals(SVM, no constraint) . 85
LIST OF FIGURES xix
B.5 Autocorrelation of out of sample residuals(SVM, no constraint) 85
B.6 Autocorrelation of in-sample residuals(SVM, constraint 1) . . 86
B.7 Autocorrelation of in-sample residuals(SVM, constraint 2) . . 86
B.8 Autocorrelation of out of sample residuals(SVM, constraint 1) 87
B.9 Autocorrelation of out of sample residuals(SVM, constraint 2) 87
xx LIST OF FIGURES
Chapter 1
Introduction
1.1 Motivation
The deregulation process going on in the electricity market has changed
a vertically integrated electricity sector to a horizontally integrated sector.
The generation, transmission and distribution sectors are separated and no
longer controlled by single utility. It has increased the risks for the utilities
in the electricity industry. There is growing need for managing the new risks
coming into the market due to deregulation. Also, the promotion policies
for renewable energy sources (like feed-in tari) are forcing the fossil fuel
based generation companies to change the business model. To manage the
risks, estimation of quantities like spot price, load prole and forward curves
is required.
Problems with linear estimation methods
The linear estimation methods suer from the problem of under-estimation
and over-estimation. For example, Ordinary Least Squares(OLS) is the
Maximum Likelihood Estimator(MLE) of the mean of normal distribution.
Hence, using OLS regression for estimating the prole of a time series will
lead to estimation based on mean of the data. Non-Linear methods can be
used to overcome this problem. Neural Networks are widely used for esti-
mation of non-linear systems. However, traditional learning methods suer
from non-convexity and over-tting. Support Vector Machine(SVM)[2] and
Extreme Learning Machine(ELM)[3] are two state of the art algorithms for
training the neural networks. Both of them overcome the problem of non-
convexity and hence give a globally optimum solution to the non-linear es-
timation problem. This thesis compares the performance of SVM and ELM
for prole forecasting of electricity spot prices.
1
2 CHAPTER 1. INTRODUCTION
Constrained Estimation
In time series analysis, the modeling approach follows three steps - identi-
cation, estimation and validation. Estimation is done using learning al-
gorithms by applying sample data to the model proposed after identifying
the characteristics of the time series. After validation, the model is used
for predicting the time series. Constrained Estimation refers to forecasting
subjected to some constraints. The constraints are dened at the time of
training the model on a given set of input data. In this way, the estima-
tion and prediction is combined [4] and the prediction can be more suitable
for the future developments. This approach is useful for applying any in-
formation available for the future to the models that are trained on past
data.
1.2 Outline
The thesis is organized as follows:
Part I - Forecasting in Electricity Markets
Chapter 2 describes the risks in Electricity markets and a brief introduction
to the learning algorithms for the neural networks is given. Chapter 3 de-
scribes the theory behind Support Vector Machine and Extreme Learning
Machine. Characteristics of electricity spot prices and the dierent methods
used for spot price forecasting are given in Chapter 4. A time series model
for spot price forecasting is presented in Chapter 5. Chapter 6 gives detailed
discussion of the results of the two learning algorithms.
Part II - Constrained Estimation
Chapter 7 describes the methodology used for solving the constrained esti-
mation problems using SVM and ELM theory. Chapter 8 provides a case
study of Photovoltaic infeed forecasting and the SVM and ELM algorithms
are applied for in-feed estimation subjected to constraints.
Chapter 9 concludes the report with the possible future research work.
1.3 Literature Survey
1.3.1 Machine Learning Algorithms
J. A. K. Suykens et. al. [2] presents the concept of Least Square
Support Vector Machines (LSSVM) that are computationally easier to im-
plement than Support Vector Machines(SVM). The implementation details
like selection of tuning parameters of LSSVM are also presented. The book
1.3. LITERATURE SURVEY 3
also addresses the bayesian inference for LSSVM, robustness and application
of LSSVM to large data set problems using the Nystrom method.
G. B. Huang et. al. [3] presents a new learning algorithm for single
layer feedforward neural networks called extreme learning machine (ELM).
It is based on generating the parameters of feedforward neural networks
randomly and hence eliminates the need for tuning. Examples of real world
problems are also presented for comparison of ELM, Back Propagation(BP)
and SVM.
Benoit Frenay and Michel Verleysen [5] proposes to bring the ELM
framework in the SVM by dening a new kernel called as ELM kernel. This
eliminates the need to tune the parameters since they are generated ran-
domly similar to ELM. The motivation to bring the randomness in SVM is
derived from the non-suitability of ELM for classication problems as ELM
is not based on the maximum-margin hyperplane principle.
Qiuge Liu, Qing He and Zhongzhi Shi [6] proposes a learning algorithm
in which the feature space is explicitly dened using a SLFN that has random
input weights. It has better generalization performance than ELM and is
faster than standard SVM. The proposed algorithm is called as non-linear
Extreme Support Vector Machine (ESVM) Classier.
J. A. K. Suykens and J. Vandewalle [7] give a method to apply the
SVM theory to train multilayer perceptron. In case of ELM, the hidden layer
is non parametric. In the proposed method, the hidden layer is parametric
and the parameters can be tuned by Quadratic Programming problem.
Guan-Bin Huang, Xiaojian Ding and Hongming Zhou[8] give the
optimization based Extreme Learning Machine which has less optimization
constraints than the SVM. The proposed ELM has better generalization
performance than SVM and is less sensitive to the number of hidden nodes.
Guang-Bin Huang, Hingming Zhou, Xiaojian Ding and Rui Zhang[9]
give a unied ELM for regression and classication. The ELM is presented
as an optimization problem with equality constraints and it is shown that
single optimization framework can be used for both regression and classi-
cation. It also introduces random feature mapping and kernels to solve the
optimization based ELM.
1.3.2 Estimation and Prediction for Electricity Markets
Marcelo Espinoza, J. A. K. Suykens, Ronnie Belmans and Bart De
moor[10] provides a load model for Short Term Load Forecasting(STLF)
based on non-linear system identication theory. NARX and AR-NARX
models are proposed for load forecasting and are estimation is done using
Support Vector Machines. A brief overview of SVM is also given and the
model is evaluated on performance measures like Mean Absolute Percantage
Error(MAPE) and Mean Square Error(MSE).
4 CHAPTER 1. INTRODUCTION
Weron and Misiorek[11] gives a comparison of various time-series model-
ing approaches for forecasting of spot electricity prices. The comparison is
done between parametric and semi-parametric models. A total of 12 dier-
ent models are considered for day ahead forecasting in California and Nordic
electricity markets. The semi-parametric models are found to be better than
parametric models for point and interval forecast.
Adam Misiorek, Stefan Trueck, Rafal Weron[12] compares the lin-
ear and non-linear time-series models for forecasting of electricity prices for
California electricity market. It also gives a brief overview of dierent types
of approaches for price forecasting like long term, medium term and short
term.
Derek W. Bunn and Nektaria karakatsani[13] reviews the various ap-
proaches to model the electricity prices. The review covers the stochastic
modeling approach and structural modeling approach. The factors that in-
uence the electricity prices are presented and modeling of volatility of prices
is also discussed.
Marcelo Espinoza et al.[14] gives the implementation of LSSVM for three
time series with dierent characteristics. The time series dier in terms
of seasonality and autocorrelation. Choice of models based on time series
characteristics is also explained. All the models are then evaluated in terms
of Mean Square Error(MSE).
Marcelo Espinoza, J. A. K. Suykens, Bart De Moor[15] address the
modication of the NARX model required if the residuals of the NARX
model shows signicant degree of autocorrelation. The resulting model
NAR(X)-AR gives the model that can account for better system dynam-
ics. Also, the modied LSSVM equations for these models are derived in
this paper.
Part I
Forecasting in Electricity
Markets
5
Chapter 2
Electricity Market Analysis
2.1 Electricity Market Deregulation
Figure 2.1: Deregulation of Electricity Industry
The introduction of competition in electricity sector has changed the
vertically integrated sector in a horizontal sector. Previously, the gener-
ation, transmission and distribution sectors were controlled by single com-
pany which made risk management simple as compared to the scenario when
the generation, transmission and distribution is owned by dierent entities.
Fig. 2.1 shows the change of the vertical industry in a horizontal indus-
try. The generation and distribution sectors have seen the entry of private
players whose aim is to maximize the prot. Transmission sector remains a
monopoly due to security reasons. Traders and regulators are the two new
entities. Since, one company no longer holds the entire supply chain, risk
management has become a important aspect of the electricity business. For
example, distribution companies must be able to meet the demands of the
consumers from the electricity bought at the power exchange. If there is
7
8 CHAPTER 2. ELECTRICITY MARKET ANALYSIS
some problem in the procurement of electricity in the delivery time, proper
mechanism must be in place to get electricity from alternative sources. The
increase of renewable energy in feed has also changed the nature of the
electricity business. All over the world, governments are implementing poli-
cies to favor the renewable sources. This is a source of additional risk for
traditional thermal plants as it has changed the merit order curve.
2.2 Risks in Electricity Markets
Every nancial market poses risks to its participants. Electricity markets are
no exception. The participants of Electricity market are generation compa-
nies, trading companies, distribution companies, consumers and regulators.
For example, there is a risk for a generation company that it is not able to
supply electricity at some point of time and hence will not receive the ex-
pected cash ows. Similarly, trading companies might face risk of not being
able to sell the nancial products. The risks in the Electricity market can
be divided in two broad categories [16], [13]
1. Traditional nancial risks
Price Risk - It is the risk present due to price movements. If
the prices are highly volatile, there is a high risk of losses due
to rise/fall in the prices. For example, a generation company
will lose a signicant amount of money if the price falls in the
electricity markets.
Credit Risk - This is the risk of default by a counter party like
non payment of money by the counter party due to bankruptcy.
Liquidity Risk - Liquidity risk is the risk present when the market
participants are not able to close their positions like for example
due to lack of nancial products.
Operational Risk - This is the risk present in daily operations of
the markets like failure of information systems, human error etc.
2. Electricity Specic risks
Volume Risk - Having a delivery contract of electricity does not
guarantee the supply of electricity. Supply is aected by the fuel
availability. In case of renewables, the supply is dependent on
the solar radiation, wind speed and hence poses a volume risk of
electricity.
Basis Risk - This refers to the risk present due to change in the
relative prices of two products. For example, a trading company
that is planning to speculate on the dierences of electricity prices
in two regions is exposed to this type of risk.
2.3. NON-LINEAR PROBLEMS IN ELECTRICITY MARKETS 9
Physical Risk - Electricity need to be transmitted using the grid
network. Any technical fault in transmission and distribution sys-
tem can lead to loss for the market players. Likewise, limitations
in transmission of electricity due to congestion is also a source of
risk for electricity markets.
Regulation Risk - Electricity is a basic need and hence needs to
be regulated. Also, the need to increase the renewable in-feed has
forced the regulators to change the policies in favor of producers
of renewable energy. This is a risk for the generation companies
producing energy from the fossil fuels. They need to adjust to
the constantly evolving policies for the renewable energy.
2.3 Non-Linear Problems in Electricity Markets
To manage the risks in electricity market, forecasting of dierent market
indicators is required. For example, load prole needs to be forecasted. It is
important for generation companies to have an idea of the possible load pro-
les so that they can optimize the generation costs. Forecasting load prole
is also important for Transmission System Operator (TSO) and for compa-
nies providing the regulating power. Forecasting is required for spot prices,
PV infeed, Wind infeed, Forward curves etc. One of the common require-
ment for the forecasting of these market indicators is non-linear estimation
methods.
Figure 2.2: Neural Networks
Articial Neural Networks are widely used for estimation of non-linear
systems. They are mathematical tools to estimate a non-linear function. A
neural network must be trained before it is used for estimation. In other
words, they must learn how to perform on a set of input to produce a
10 CHAPTER 2. ELECTRICITY MARKET ANALYSIS
desired output. There are dierent types of learning approaches: supervised
learning, unsupervised learning and reinforcement learning. In supervised
learning, the neural network is trained by a set of inputs and outputs. During
training, the inputs are given along with outputs so that neural network
knows what output to produce for a given set of input. After training,
neural netwrok is ready to estimate the output given the new inputs. In
unsupervised learning, only the inputs are given. The aim is to nd some
patterns in the input. Reinforcement learning refers to the learning in the
neural network has to identify which input should produce which output.
This is obtained by maximizing a reward signal [17]. In this thesis, learning
always means supervised learning.
Supervised learning methods
To train the neural networks, a lot of algorithms exists. Many of the learning
algorithms depend on the concept of gradient descent. Gradient descent does
not guarantee a global minima of the cost function unless some assumptions
are made of the cost function [17]. Support Vector Machine
1
and Extreme
Learning Machine are two algorithms that formulate the learning problem
as a optimization problem and hence global minimum is obtained by these
algorithms (g. 2.2).
1
Support Vector Machines were developed separately from the Neural Networks. How-
ever, now it is generally accepted in the research community that the Support Vector
Machines can be seen in the framework on Articial Neural Networks[2]
Chapter 3
SVM and ELM
3.1 Support Vector Machines
Support Vector Machines(SVM) is the machine learning algorithm used for
classication and regression. It was developed originally for classication
problems but later it was extended to regression problems[2]. SVM theory
is based on the concept of maximum margin hyperplanes which means sep-
arating a set of points using a plane that is at maximum possible distance
from each of the set of points[18]. The maximum distance is calculated by
solving an optimization problem. In this thesis, SVM is explained for regres-
sion only. SVM can be applied to both linear and non-linear problems. For
the sake of simplicity, linear SVM is described rst followed by non-linear
SVM.
3.1.1 Linear SVM
Consider the regression problem
f(x) = w
T
x +b (3.1)
where the input is {x
k
}
N
k=1
with output {y
k
}
N
k=1
. The cost function for
empirical risk minimization of the regression problem in eq.(3.1) is [2]
R
emp
=
1
N
N
k=1
y
k
w
T
x
k
+b
(3.2)
The cost function R
emp
is based on Vapniks -insensitive loss function
which is dened as [2]:
|y f(x)|
=
_
0, if |y f(x)|
|y f(x)| , otherwise
(3.3)
11
12 CHAPTER 3. SVM AND ELM
The variable controls the accuracy and is predened. The regression
problem in eq.(3.1) is estimated using the following optimization problem[2]
min
w,b
J
p
(w) =
1
2
w
T
w
such that y
k
w
T
x
k
b , k = 1, . . . , N
w
T
x
k
+b y
k
, k = 1, . . . , N (3.4)
The inequalities in eq.(3.4) means that the training data lies inside -
tube of accuracy. However, it is possible that the training data might lie
outside the accuracy region, and hence eq.(3.4) is modied to include two
more slack variables ,
as follows [2]:
min
w,b,,
J
p
(w, ,
) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
such that y
k
w
T
x
k
b +
k
, k = 1, . . . , N
w
T
x
k
+b y
k
+
k
, k = 1, . . . , N
k
,
k
0, k = 1, . . . , N (3.5)
where c is a regularization constant. The Lagrangian of eq.(3.5) is
L(w, b, ,
; ,
, ,
) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
k=1
k
( +
k
y
k
+w
T
x
k
+b)
N
k=1
k
( +
k
+y
k
w
T
x
k
b)
k=1
(
k
k
+
k
) (3.6)
The Karush-Kuhn-Tucker (KKT) conditions of optimality[19] give the
3.1. SUPPORT VECTOR MACHINES 13
following equations [2]:
y
k
w
T
x
k
b +
k
, k = 1, . . . , N
w
T
x
k
+b y
k
+
k
, k = 1, . . . , N
k
,
k
,
k
,
k
0, k = 1, . . . , N
k
( +
k
y
k
+w
T
x
k
+b) = 0, k = 1, . . . , N
k
( +
k
+y
k
w
T
x
k
b) = 0, k = 1, . . . , N
k
= 0, k = 1, . . . , N
k
= 0, k = 1, . . . , N
L
w
= 0 w =
N
k=1
(
k

k
)x
k
L
b
= 0
N
k=1
(
k
+
k
) = 0
L
= 0 c
k

k
= 0
L
= 0 c
k
= 0 (3.7)
Eq.(3.7) and eq.(3.6) together gives the following dual problem:
max
,
J
d
=
1
2
N
k,l=1
(
k

k
)(
l

l
)x
T
k
x
l

N
k=1
(
k
+
k
)
+
N
k=1
y
k
(
k

k
)
such that
N
k=1
(
k
+
k
) = 0
k
,
k
[0, c] (3.8)
Using the value of w from eq.(3.7) in terms of and
, the estimated
function from eq.(3.1) can be written as following
f(x) =
N
k=1
(
k

k
)x
T
k
x +b
3.1.2 Nonlinear Extension of SVM
For extending the Linear SVM for non-linear systems, the regression problem
is written as following in the primal equation space:
f(x) = w
T
(x) +b (3.9)
The training data is {x
k
, y
k
}
N
k=1
and (.) : R
n
R
n
h
is a mapping from the
input space to a high dimensional feature space. The optimization problem
in the primal space is [2]
min
w,b,,
J
p
(w, ,
) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
such that y
k
w
T
(x
k
) b +
k
, k = 1, . . . , N
w
T
(x
k
) +b y
k
+
k
, k = 1, . . . , N
k
,
k
0, k = 1, . . . , N (3.10)
After taking the Lagrangian and applying the conditions of optimality[19],
the problem can be written in dual space as [2]
max
,
J
D
(,
) =
1
2
N
k,l=1
(
k

k
)(
l

l
)K(x
k
, x
l
)
k=1
(
k
+
k
) +
N
k=1
y
k
(
k

k
)
such that
N
k=1
(
k

k
) = 0
k
,
k
[0, c] (3.11)
Here K is the kernel and is dened as K(x
k
, x
l
) = (x
k
)
T
(x
l
). During
the transformation of the optimization problem from primal to dual(eq.(3.10)
to eq.(3.11)), the non-linear eects are moved to the kernel and eq.(3.11) be-
comes a convex optimization problem. This is also explained in g.3.1. The
estimated function can then be written as
f(x) =
N
k=1
(
k

k
)K(x, x
k
) +b (3.12)
In eq. (3.12), the output is written only in terms of the lagrange multi-
pliers and the kernel function. Hence, for estimation problems, one does not
need to know the underlying feature space (x). This is explained in more
detail in the following section of kernels.
3.1.3 Kernels
Kernels are a class of functions extensively used in statistics and probability
theory. A kernel function K maps R
n
R
n
R[2]. The advantage of the
kernel functions is that they can be used to avoid the explicit construction
Figure 3.1: Primal and dual optimization problem of SVM
of the feature space (x) required for the non-linear SVMs. Fig. 3.1 shows
that the kernel functions remove the non-linear constraints from the primal
optimization problem. Any symmetric continuous function K(x, z) that
satisfy the Mercers condition [2] can be expressed as
K(x, z) =
n
H
i=1
i
(x)
i
(z) (3.13)
where (x) is a mapping from R
n
to Hilbert space H,
i
is a positive number,
x, z R
n
and n
H
is the dimension of the hilbert space. Eq.(3.13) can be
written as:
K(x, z) =
n
H
i=1
_
i
(x)
_
i
(z)
and then, if we dene
i
(x) =
i
(x) and
i
(z) =
i
(z) which leads
to
K(x, z) = (x)
T
(z)
For example, if (x) : R R
3
is dened as (x) =
_
x
2
,
2x, 1
[10],
then
(x)
T
(z) =
_
x
2
,
2x, 1
_
T
_
z
2
,
2z, 1
_
=
_
x
2
z
2
+ 2xz + 1
= (xz + 1)
2
(3.14)
This can be represented by polynomial kernels given by
K(x, z) = (xz +c)
d
(3.15)
with c = 1 and d = 2. In general, the polynomial kernels can be used
to represent any feature map consisting of all possible product monomials
of x up to degree d having dimension n
H
=
_
n +d
n
_
. So, by dening a
polynomial kernel, there is no need to explicitly dene the high dimensional
feature space (x). Dierent type of kernel functions exists for application
to non linear systems. A Gaussian kernel is dened as
K(x, z) = exp(
x z
2
2
) (3.16)
where is a tuning parameter. For a Gaussian kernel, (x) is innite di-
mensional [10]. In this thesis, Gaussian kernel is used.
3.1.4 Least Square Support Vector Machines
LSSVM is a method to reduce the computational eort required to solve
the solve the QP problems for the SVM. The optimization problem of SVM
contains inequality constraints. By removing all the inequality constraints
and substituting them by euqality constraints as shown below, it is possible
to reduce the computational eort since the dual problem is reduced to a
system of linear equations.
min
w,b,
J
p
(w, ) =
1
2
w
T
w +
N
k=1
(
2
)
such that y
k
w
T
(x
k
) b =
k
, k = 1, . . . , N (3.17)
The Lagrangian of primal problem of LSSVM in eq.(3.17) is
L(w, b, , ) =
1
2
w
T
w +
N
k=1
(
2
)
N
k=1
k
(y
k
w
T
(x
k
) b
k
)
(3.18)
where
k
are the lagrange multipliers.
After applying the conditions of optimality[19], the dual problem of
eq.(3.17) is obtained as a system of linear equations in and b[2]:
_
0 1
T
v
1
v
+I/
_ _
b
_
=
_
0
y
_
(3.19)
where y = [y
1
; . . . ; y
N
] , 1
v
= [1; . . . ; 1] , = [
1
; . . . ;
N
] and
kl
=
(x
k
)
T
(x
l
) = K(x
k
, x
l
), K is the kernel. The estimated function is
y(x) =
N
k=1
k
K(x, x
k
) +b (3.20)
In LSSVM, there are no QP problems to solve. By using the linear solvers
(that are faster than convex optimization solvers), the computational speed
is increased by several times.
3.1.5 Tuning parameters
The performance of LSSVM depends on the choice of the regularization pa-
rameter in eq.(3.17)) and any other parameters used (c and d for polyno-
mial kernel (eq.(3.15)) or for Gaussian kernel (eq.(3.16))). Since Gaussian
kernel is used in this thesis, tuning of parameters refers to tuning of (, )
unless otherwise stated. The most popular techniques for tuning of parame-
ters are cross-validation and Bayesian inference. Cross-validation is based on
selecting parameters after evaluating the performance of a pre-dened grid of
parameters on the training data. In Bayesian inference, the parameters are
assumed to have a certain probability density function. For the determina-
tion of tuning parameters in this project, cross-validation technique is used.
The algorithm of m-fold cross-validation is outlined in Algorithm-1 [10]:
input : Training data T = (x
k
, y
k
)
N
k=1
output: Tuned parameters (, )
begin
Divide T in m parts T
1
, . . . , T
m
such that T =
m
k=1
T
k
;
Dene a N
1
N
2
grid of and ;
for all combinations of and do
for k =1:m do
Dene a set S
k
=
m
i=1,i=k
T
i
;
Train SVM on S
k
;
Calculate the performance of the SVM on the set T
k
. This
can be done by dening a loss function ( for example-
Mean Square Error);
end
end
Select the and with the lowest value of loss function ;
end
Algorithm 1: m-fold cross validation for parameter selection [10]
The most common value of m is 5 and 10. For m=1, it is called leave one
out cross validation. Leave one out cross validation is the less biased than
5 fold or 10 fold cross validation. However, the choice of m also depends
on the size of data. Leave-one-out crossvalidation is computationally more
intensive than other m-fold cross validation. Due to this reason, 10 fold
cross validation is used for tuning the parameters and the loss function used
is Mean Absolute Percentage Error(MAPE).
3.2 Extreme Learning Machines
The learning speed of SLFN is very slow since all the parameters needs to be
tuned iteratively. In [3], authors have proposed a new learning algorithm for
the single-hidden layer feedforward neural networks (SLFNs) called Extreme
Learning Machine(ELM). ELM is based on random hidden nodes which
means that the activation function parameters are chosen randomly. Then
it analytically determines the output weight of SLFNs.
3.2.1 ELM theory
Figure 3.2: Single Layer Feedforward Neural Network
Consider the SLFN with N
h
hidden nodes. The training data is {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
. The SLFN can be written as
N
h
i=1
i
h
i
(x
k
) = y
k
, k = 1, . . . , N (3.21)
h
i
(x
k
) = h(w
i
x
k
+b
i
) (3.22)
where w
i
R
n
is the weight vectors between the n input nodes and ith
hidden node,
i
R
m
is the weight vector between the ith hidden node
and the output nodes, b
i
is the threshold of the ith hidden node and h is
3.2. EXTREME LEARNING MACHINES 19
the activation function. Fig. 3.2 shows the architecture of SLFN. In matrix
form, eq.(3.22) is
H = Y (3.23)
where
H
NN
h
=
_
_
h(w
1
x
1
+b
1
) . . . h(w
N
h
x
1
+b
N
h
)
.
.
.
.
.
.
.
.
.
h(w
1
x
N
+b
1
) . . . h(w
N
h
x
N
+b
N
h
)
_
_
(3.24)
N
h
m
=
_
1
.
.
.
N
h
_
_
and Y
Nm
=
_
_
y
1
.
.
.
N
_
_
(3.25)
H is called the hidden layer matrix. It gives the transformation function
from the input space to the hidden neurons space. The ELM is based on
the following two theorems [3]:
Theorem 1 Given a standard SLFN with N hidden nodes and activation
function h : R R which is innitely dierentiable in any interval, for N
arbitrary distinct samples {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
, for any w
i
and b
i
randomly chosen from any intervals of R
n
and R, respectively, accord-
ing to any continuous probability distribution, then with probability one, the
hidden layer output matrix H of the SLFN is invertible and
_
_
H Y
_
_
= 0
Theorem 2 Given any small positive value > 0 and activation function
h : R R which is innitely dierentiable in any interval, there exists
N
h
N such that for N arbitrary distinct samples {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
, for any w
i
and b
i
randomly chosen from any intervals of R
n
and R, respectively, according to any continuous probability distribution, then
with probability one,
_
_
H Y
_
_
< .
For proof of both theorems see [3]. For training the SLFN, eq.(3.22)
leads to nding w
i
,

b
i
and

such that
_
_
H( w,

b
i
)
Y
_
_
= min
w
i
,b
i
,
_
_
H(w, b) Y
_
_
(3.26)
where w = w
1
, . . . , w
n
h
] and b = b
1
, . . . , b
N
h
. Since w and b are chosen
randomly, eq.(3.26) is reduced to
_
_
H(w, b
i
)
Y
_
_
= min
_
_
H(w, b) Y
_
_
(3.27)
Solution of eq.(3.27) is given by
= H
Y (3.28)
where H
is the Moore-Penrose generalized inverse of matrix H [20]. The

given by eq.(3.28) is the unique least square solution of eq.(3.22) and also
has the smallest norm of weights which means
_
_
_
_
_
_ =
_
_
_H
Y
_
_
_ ,
_
:
_
_
H Y
_
_

_
_
Hz Y
_
_
, z R
NN
h
_
(3.29)
According to [21], the generalization performance of feedforward neural
networks that reach the small training error, the smaller the norm of weights
is, the better the generalization performance of the network. Since moore-
penrose generalized inverse gives the least norm solution [20], ELM has good
generalization performance.
ELM Algorithm
Given the training data {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
, activation
function h and number of hidden nodes N
h
, the ELM algorithm can be
written as [3]:
1. Assign the weights w
i
and threshold b
i
randomly.
2. Calculate H.
3. Calculate

using eq.(3.28)
4. The output for a new input x is given by f(x) = h(x)
where
h(x) = [h(w
1
x +b
1
) . . . h(w
N
h
x +b
N
h
)] , w and b are from step 1
Chapter 4
Case Study - Spot Price
Forecasting
4.1 Characteristics of Electricity Prices
The characteristics of electricity that makes it dierent from other commodi-
ties are [16] :
1. A real time commodity: Electricity is a real time commodity which
means that it must be consumed at the same time it is generated.
Any imbalance in the production and consumption will lead to devia-
tion in frequency and this will aect the stability of the grid. Higher
production than consumption will increase the frequency and lower
production than consumption will decrease the frequency.
2. Non storable good: It is not possible to store electricity. Options to
store electricity exists at a small scale(e.g. batteries) but it is dicult
on a large scale. The pricing of a forward contract depends on the
storage costs. Hence, the pricing of electricity contracts cannot be
done similar to commodities that can be stored.
3. Characteristics of demand and supply: Electricity is a essential
commodity which makes the demand of electricity inelastic. Supply is
decided by the merit order curve of the power plants. If the demand
is low, it can be met by base load generators. With the increase in
renewable infeed, the supply curve has changed signicantly.
Due to dierent nature of electricity as compared to other commodities,
spot price of electricity is dicult to predict. Also, since electricity cannot
be stored, the pricing of electricity forward contracts cannot be done on the
basis of the classic formula of forward pricing:
F
t
= (S
0
+U) exp
r(Tt)
21
22 CHAPTER 4. CASE STUDY - SPOT PRICE FORECASTING
where F
t
is the Forward price,S
0
is the spot price at t = 0 and U is the storage
costs with interest rate r and time to maturity T t. The characteristics of
electricity prices are described below [16]:
1. Multi Scale Seasonality: Electricity prices show seasonal patterns.
It show intra-day, weekly and monthly seasonal cycles (hence the name
multi-scale seasonality). During a day, prices are higher at noon and
during evening because of high economic activity. Also, weekdays have
higher prices than weekends due to more demand. The price level also
vary with months. During winter, there is high heating requirement
that drives the electricity prices higher.
2. Dependency on External Factors: Electricity prices also depend
on external factors like temperature and load. The prices closely fol-
lows the trends of the load prole. High load increases the price.
3. Mean reversion: The mean reversion property means that the spot
prices tend to move towards an average value. The variations in the
spot prices are assumed to be temporary and the modeling of prices is
done using stochastic approach.
4. Jumps: Electricity prices are can move to high levels in very short pe-
riod of time. Since electricity cannot be stored, any extreme event(e.g.
a plant outage) can drive the prices higher in a short time interval.
4.2 Methods for Price forecasting
4.2.1 Importance and need of price forecasting
The introduction of deregulated electricity markets has bought the need to
study the electricity industry in more detail. On the generation side, many
private players have entered. Power exchanges have been set up to facilitate
electricity trading which has led to the entry of trading companies. The
retail and distribution side has also been privatized. The residential sector
is also opening slowly. The structural change in the electricity industry
has given rise to many risks and the spot price is one of the basic element
for managing these risks. The markets are still not full liberalized. This
further increase the need for price forecasting. Also, the technical issues
related with electricity makes it very dicult for the markets to set up the
electricity prices. Fundamentally, the prices are governed by the marginal
costs of the generators, congestion in the transmission grid and other grid
security issues, fuel prices, demand for electricity and policies set up by
the regulator. The lack of liquidity in the future markets of electricity also
stresses the need to forecast the prices.
4.2. METHODS FOR PRICE FORECASTING 23
4.2.2 Modeling of Electricity Prices
The choice of selecting a modeling method for price depends on the appli-
cation of the model. Long term price forecasting is used in the investment
decisions and power system planning. Short term price forecasting is mainly
used for the day ahead planning of generation schedule or by trading com-
panies for bidding purposes. For example, accurate forecasting will help the
generators to plan the generation schedule with the aim of prot maximiza-
tion. For long term price forecasting, the model must be able to capture
the fundamental elements that form the prices like costs of various genera-
tion technologies, emission constraints, demand, congestion in grid etc. For
short term price forecasting, the model should be able to capture the statis-
tical properties of the electricity prices over time. Accordingly, the modeling
approach for price forecasting is divided in two categories [13]:
1. Fundamental models: Fundamental models are the optimization
problems that aims to determine the marginal cost of every technol-
ogy/power plant in a specied region. The market price is determined
by the merit order curve and the demand of the electricity. The most
common inputs to a fundamental model are
regional demand
power plant cost curves
plant data like retirement
fuel prices
emissions constraints
transmission constraints
These input are used to form a cost optimization problem considering
the energy constraint, reserve capacity requirement, transmission con-
straints and unit chronological constraints. The resulting optimization
problem can be solved using an optimization solver and the results are
the marginal costs of every power plant considered in the optimization
equation. These can further be used to nd out the cash ows, cross
border ows, transmission capacity available and emissions.
2. Quantitative models: The quantitative models are used to nd out
the statistical characteristics of the spot prices with the main aim of
risk management [11]. The ultimate motive is not to give an accu-
rate number to price value but to nd out the relative movement of
the prices over short term time horizons. Volatility of spot prices is
the most important variable for risk managers. Stochastic Dierential
Equations (SDE) are typically used to characterize the properties of
spot prices like random walk, mean reversion and jumps. A few simple
models based on SDE are enumerated below [16]:
Figure 4.1: Combining Fundamental and Quantitative Approach for price
forecasting [1]
Random Walk: In this model, prices are assume to follow ran-
dom walk. The SDE for a random walk model is:
dP
t
= P
t
dt +P
t
dW
t
Here, the prices are modelled as a Geometric Brownian Motion.
P
t
denotes the spot price, W
t
is a Weiner process, is a drift
term, is the volatility and t is time
Mean Reversion: In addition to random walk of the prices, this
model also captures the mean reversion property of electricity
prices. The SDE for a mean reversion model is:
dP
t
= a( P
t
)dt +P
t
dW
t
Here, a is the speed of mean reversion
Mean Reversion with jumps: The mean reversion model can
be extended to explain the jumps seen in the price time series as
follows:
dP
t
= a( P
t
)dt +P
t
dW
t
+kdq
t
Here, kdq
t
represents the jumps with k representing the severity
of the jump and q
t
represents the frequency of the jumps. For a
detailed explanation of these models, refer [16]
In [1], authors have proposed to combine the two approaches. The argu-
ment is that since the markets are not mature and due to low liquidity, it is
4.2. METHODS FOR PRICE FORECASTING 25
necessary to look for additional data and hence the modeling can be made
more comprehensive if the fundamental approach and quantitative approach
are combined. Fig. 4.1 explains the approach. The bottom-up approach in-
cludes making a model based on considerations of the fundamental structure
of the system (here electricity spot prices). The fundamental model requires
external data (price depend on factors like temperature and load). The Fun-
damental model can be combined with nancial approach that is based on
model building using stochastic dierential equations. The combination of
fundamental approach and nancial approach is used to build price scenar-
ios.
Chapter 5
Model Representation and
Estimation
5.1 Time Series Analysis
To build a model for time series forecasting, a three step procedure is
followed[22]:
1. Identication- In this step, the data is analyzed to nd out the char-
acteristics of time series. For example, time series might have signi-
cant auto-correlation. Autocorrelation factor can be used to nd out
the suitable lag factors to be used in the modeling. Also, dependence
of time series on external factors should be analyzed.
2. Estimation- Based on the information obtained in the rst step, a
model is proposed. This model should be able to explain all the under-
lying characteristics like moving average, auto-correlation, dependency
on external factors etc.
3. Diagnostic Checking- This step involves applying statistical tests
to verify if the model is able to capture all the dynamics of the time
series. One of the ways is to check the autocorrelation of the residuals.
There should not be any signicant degree of auto-correlation among
the residuals. If the residuals show some correlation, it means that the
model is not able to capture all the dynamics of the time series.
Fig.5.1 shows a sample spot price time series and the steps involved in
the forecasting.
5.2 Short Term Spot Price Model
The spot price prole is dened as
y
t
= y
t
+
t
(5.1)
27
28 CHAPTER 5. MODEL REPRESENTATION AND ESTIMATION
Figure 5.1: Steps in time series forecasting
where y
t
is the spot price with average y
t
and
t
is a white noise process
with zero expectation and a nite variance i.e. (0,
2
). The model for
the hourly spot price forecasting is
y
t
=

y
t
+
t
(5.2)
Here, y
t
is the estimated spot price with estimated average

y
t
and
t
is a
white noise process with zero expectation and a nite variance i.e.
t

(0,
2
). The estimated average is written as the function of a regression
vector x
t
R
n
y
t
= f(x
t
) (5.3)
Characteristics of Spot Prices
Fig. 5.2 shows the autocorrelation of the hourly spot price series for Ger-
many taken from EPEX website for the month of March 2012. It shows
that the spot prices are signicantly correlated with the previous hour val-
ues. The blue lines show the 95% condence bounds. Any value of auto-
correlation above these bounds is termed signicant and must be explained
by the proposed model.
Fig.5.3 shows the relation between electricity price and load. The prole
of load and price follows the same pattern. When load increases, price
increases and when it decreases, price decreases. Fig. 5.4 show the relation
between temperature and price for a summer day. Low temperature implies
low price due to less cooling requirement.
5.2. SHORT TERM SPOT PRICE MODEL 29
0 5 10 15 20
0.2
0
0.2
0.4
0.6
0.8
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Autocorrelation of a sample spot price time series
Figure 5.2: Autocorrelation of the EPEX prices for March 2012
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
0.9
1
N
o
r
m
a
l
i
z
e
d

L
o
a
d

a
n
d

E
l
e
c
t
r
i
c
i
t
y

P
r
i
c
e
s
hours

Electricity Price
Load
Figure 5.3: Correlation of electricity prices and load
0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
0.9
1
N
o
r
m
a
l
i
z
e
d

t
e
m
p
e
r
a
t
u
r
e

a
n
d

e
l
e
c
t
r
i
c
i
t
y

p
r
i
c
e
s
hours

Electricity Price
Mean Temperature
Figure 5.4: Correlation of electricity prices and temperature
Fig.5.5 shows the multi-scale seasonality of the spot prices. The hour
axis shows the 168 hours of a week. It starts from Monday. The week axis
show the weeks of 4 months. It starts from February. Looking along the
hour axis, spot price are high during weekdays and low during weekends.
Also, a day has two peaks. The economic activity is high during the week
and hence the price are high. Intra-day peaks are explained by the relatively
high consumption of electricity at noon and during evening. Along the week
axis, February shows high prices due to high heating requirement. Prices
decrease during the summer as there is no heating required.
Model for spot prices
The regression vector should be able to capture the multi-scale seasonality
and dependency on external factors of the spot prices. It should also be
able to explain the strong autocorrelation observed in the price series. In
the proposed model, the evolution of spot prices is explained by the value of
the spot prices of the previous hours, a set of exogenous variables and a set
of dummy variables (g.5.6). The exogenous variables include previous year
load data and weather variables, dummy variables are used for seasonality.
All of them are described below:
1. Weather variables include maximum temperature T
max
t
, minimum tem-
perature T
min
t
, mean temperature , wind speed W
s
t
and precipitation
PP
t
. They can be grouped together as W
t
Wea
t
=
_
T
max
t
T
min
t
T
mean
t
W
s
t
PP
t
5.2. SHORT TERM SPOT PRICE MODEL 31

0
50
100
150
200
0
5
10
15
20
50
0
50
100
150
200
250
hours
weeks
P
r
i
c
e

i
n

e
u
r
o
/
M
W
h
Figure 5.5: Seasonality of electricity prices
2. To capture the monthly, weekly and intraday seasonality, determinis-
tic binary values are used for computational simplicity [10]. A day
is represented by a vector D
t
{0, 1}
7
. for example, Sunday is
_
1 0 0 0 0 0 0
. Similarly, hours are represented by vectors

H
t
{0, 1}
24
and months by M
t
{0, 1}
12
3. The load include the vertical load data (taken from ?) for the past
two years and is represented by L
t
R
2
The regression vector can be written as
x
t
= {y
t1
, . . . , y
tj
, Wea
t
, H
t
, D
t
, M
t
, L
t
}
j is the amount of lag that we want to include in the model. In total, there
are j(lag)+5(weather)+24(hours)+7(day)+12(month)+2(load) = j +50
variables. The proposed model is called a NARX model (g.5.6).
5.2.1 Forecast Accuracy Measures
To check the model for accuracy, following statistical measures of error are
used:
1. Mean Absolute Error(MAE): Mean Absolute Error is an indicator
to nd out how close the results are to the actual values. It is dened
as follows:
MAE =
N
t=1
y
t

N
Figure 5.6: NARX model for electricity spot prices
2. Mean Absolute Percentage Error(MAPE): Mean Absolute Per-
centage Error is another accuracy measure for time series forecasts.
The accuracy of the model is determined by the absolute value of the
percentage errors. It is dened as follows:
MAPE =
1
N
N
t=1
y
t

y
y
t
Chapter 6
Empirical Analysis- Price
Forecasting
In this chapter, the results of the spot price model are presented. The
code for the model is written in MATLAB and CVX [19] is used as the
optimization solver.
6.1 Spot Price Model
0 500 1000 1500 2000 2500 3000
50
0
50
100
150
200
250
e
u
r
o
/
M
W
h
hours
Spot price series (Mean=43.96, Std=17.94)
Figure 6.1: Spot price time series for Germany(Feb, 2012 to May, 2012)
The spot price model presented in the previous chapter is applied to the
spot price time series for Germany taken from the EPEX website [23]. Fig.
6.1 shows the spot price time series and following points describes the data
used:
33
34 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING
The spot prices are from February 2012 to May 2012.
The resolution of the prices is one hour.
Unless stated otherwise, the length of the insample data is 70% of the
training data. The rest of the training data is used for the out of
sampling predictions to nd out the preformance of the model. The
length of the both insample and out of sample data is rounded o to
the nearest multiple of 24.
The vertical load data is taken from TSO website [26]
All the data for weather variables come from Bloomberg.
1 29 57 86 114 143 171 200
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
autocorrelation of the price time series
Lag Length
Figure 6.2: Autocorrelation of the Spot price time series
Fig. 6.2 shows the autocorrelation plot for the price time series till lag of
200. The two blue lines above and below the x-axis show the 95% condence
bounds. The prices show a high degree of autocorrelation. In the model,
the amount of lag is kept as a variable and the performance indicators are
then used to nd out the best choice of the lag value. The model is tested
for dierent lags(1 day, 2 day and 4 day) and forecasting days (1 day, 3 days
and 5 days).
6.2. LSSVMAND ELM- SIMULATION RESULTS FOR SPOT PRICES35
6.2 LSSVM and ELM - Simulation results for Spot
Prices
6.2.1 Parameter Selection for LSSVM
The LSSVM method described in eq.(3.19) is applied to the spot price model.
LSSVM model requires tuning of parameters (eq.(3.17)) and (gaussian
kernel parameter). 10-fold crossvalidation is used for tuning (, ). con-
trols the error term in the optimization problem of LSSVM and controls
the shape of the Gaussian kernel. The algorithm for m-fold crossvalidation
is explained previously. The grid for selecting (, ) is a 2520 grid of dier-
ent combination of elements of vector (2
1
, 2
2
, . . . , 2
25
) and (2
1
, 2
2
, . . . , 2
20
).
Fig. 6.3 - Fig. 6.5 shows the corssvalidation scores for 1 day lag, 2 day lag
and 4 day lag. The value of MAPE depends on the choice of (, ). Also,
the amount of lags aect the MAPE results which indicate that dierent
lags need to be tested for better results.
0
5
10
15
20
0
10
20
30
0.16
0.18
0.2
0.22
0.24
0.26
0.28
Gamma
Sigma
M
A
P
E
Figure 6.3: Cross validation scores(1 day lag)
6.2.2 Training Results
Fig. 6.6 shows the results of LSSVM model on the training data. In the
hours 0 200, LSSVM method is able to capture the information from the
high prices and hence the actual spot price (blue points) are covered by the
LSSVM model (red line). In other cases, for example, from hours 6001200,
LSSVM model does not cover the actual spot price. This is expected because
the prices are not high in this region and hence there is no information to
predict the spikes. It also shows that the LSSVM method does not suer
from over-tting. Matching of the training results of LSSVM with spikes in
0
5
10
15
20
0
5
10
15
20
25
0.1
0.15
0.2
0.25
0.3
0.35
Gamma
Sigma
M
A
P
E
0
5
10
15
20
0
10
20
30
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
Gamma
Sigma
M
A
P
E
0 200 400 600 800 1000 1200 1400 1600 1800 2000
50
0
50
100
150
200
250
hours
e
u
r
o
/
M
W
h
function estimation using LSSVM
=88.7925,
2
=53678.8552
RBF
datapoints (blue .), and estimation (red line)
Figure 6.6: LSSVM training results
the region that does not contain high prices shows over-tting. In regions,
where the prices do not contain spikes, the LSSVM model is fully able to
capture the trends in the time series. Fig. 6.7 shows the training results of
ELM. As compared to LSSVM, ELM results show overtting. In the hours
0 200, ELM is able to capture the information similar to LSSVM but in
hours 0 1200, ELM is able to capture all the peaks even though the prices
in the surrounding region are not high. This is because ELM is able to
obtain zero error on the in-sample data (theorem 1) if the number of hidden
neurons are greater than the length of in-sample data. This enables ELM
to capture all the peaks in the training data and causes overtting.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
50
0
50
100
150
200
250
ELM results (Mean=46.430362, Std=19.469965)
e
u
r
o
/
M
W
h
hours
Figure 6.7: ELM training results
6.2.3 Forecasting Performance
The forecasting performance of LSSVM and ELM is tested on in-sample
data and out of sample data. The forecasting horizon is 1 day, 3 days and 5
days. Only the forecasting for 5 days is presented here. For the rest of the
cases, please see Appendix 1.
In-Sample
Fig. 6.8 shows the forecasting of LSSVM and ELM on in-sample data.
The forecasting is done on a one step ahead basis. The forecasted value of
the previous hour is used to update the input vector (autoregressive terms)
and then the updated input vector is used to forecast the price in the next
hour. In-sample prediction by LSSVM model is able to capture the spot
price characteristics as illustrated by the low value of MAPE (for Fig. 6.8,
MAPE is 0.0483). In general, LSSVM model performs good on the in sample
data. A typical daily prole of spot prices has two peaks- one in afternoon
and other in evening. LSSVM is able to capture both the peaks as shown
in Fig. 6.8. ELM also gives a low error in in-sample forecasting. It is
able to capture the intra-day peaks as well. Both LSSVM and ELM are
able to capture all the prole variations of the price in in-sample data (price
estimation surrounding hour 80 in g. 6.8). All the peaks are also forecasted.
0 20 40 60 80 100 120
20
40
60
80
100
120
140
160
180
200
220
hours
e
u
r
o
/
M
W
h

ELM
SVM
Actual
Figure 6.8: LSSVM In Sample t - 5 Day forecast
Out of Sample
Fig. 6.9 shows the out of sample simulations. The out of sample simulation
has higher MAPE than the in-sample t for both LSSVM and ELM. The
error increases as the horizon of forecasting is increased. However, intra-day
seasonality is successfully captured by out of sample simulations for both
0 20 40 60 80 100 120
20
40
60
80
100
120
140
hours
e
u
r
o
/
M
W
h

ELM
SVM
Actual
Figure 6.9: LSSVM Out of Sample t - 5 Day forecast
LSSVM and ELM. In-sample simulations are better than out of sample in
capturing the spikes. For the 5 day-ahead out of sample forecasting, MAPE
for LSSVM is 0.144 and for ELM is 0.403. This is because ELM shows better
tting on in-sample data. Overtting on insample data causes the ELM to
give a less accurate forecast than LSSVM on out of sample data. Both
LSSVM and ELM are not able to capture the small prole variations of spot
price. This is expected because the model aims to capture the characteristics
of spot price based on seasonality and external factors. For the small prole
variations, more information is required.
6.2.4 Residual Analysis
Analysis of residuals is the last step of time series modeling. A model can
be validated by performing a few statistical tests on the residuals. Residuals
should be checked for any autocorrelation and assumption of white noise
with zero mean and nite variance should also be veried. The residuals are
analyzed by following methods:
Autocorrelation of residuals
Fig. 6.10 and g. 6.11 show the autocorrelation of the residuals for LSSVM
and ELM. The blue lines show the 95% condence bounds. Any value out-
side these bounds show a signicant autocorrelation. For LSSVM, only lag
2 show signicant values. For ELM, lag 2 and 3 show signicant values. All
the other values for dierent lags are within the bound represented by the
blue lines. This validates the model. If the residuals are found to be auto-
correlated, it means that the model is not able to capture all the dynamics
of the time series. In such a case, the model must be changed. One way to
0 2 4 6 8 10 12 14 16 18 20
0.5
0
0.5
1
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
autocorrelation of prediction error for out of sample data
Figure 6.10: Autocorrelation of Out of Sample residuals for LSSVM
change the NARX models if the residuals are found to be autocorrelated is
to use AR-NARX model[2]. AR-NARX model include terms for autocorre-
lated residuals and hence can correct the missing dynamics of the time series
in the results of NARX model.
Histogram
Eq. (8.1) assumes that the prices can be described by an average value and
an error term. The error term is assumed to be white noise. To validate
this, histogram is used. The results for LSSVM and ELM are shown in
g.6.12 and g.6.13. The normal distribution is shown by red line. For
both LSSVM and ELM, the assumption of white noise does not hold. The
distribution of residuals do not t to the normal distribution. To overcome
this, bootstrapping [24] can be used to report the results of spot prices along
with residuals.
6.3 Forecast Accuracy Analysis
Table 6.1 and Table 6.2 show the complete results for 1 day ahead, 3 day
ahead and 5 day ahead forecast for both LSSVM and ELM. The results
are simulated for 1 day, 2 day and 4 day lag. Every simulation is reported
with three gures: MAPE, MAE and Standard deviation of error. Results
are given for in-sample and out of sample simulations. LSSVM gives better
results for 4 day lag as compared to 3 day lag. The magnitude of MAPE
increases from 1 day ahead to 5 day ahead forecasting. Also the MAE of
6.3. FORECAST ACCURACY ANALYSIS 41
0 2 4 6 8 10 12 14 16 18 20
0.5
0
0.5
1
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
autocorrelation of prediction error for out of sample data
Figure 6.11: Autocorrelation of Out of Sample residuals for ELM
10 5 0 5 10 15
0
1
2
3
4
5
6
7
8
prediction error for out of sample data (Mean=2.94,Std=3.82,MAPE=0.0969,MSE=22.6082)
N
o

o
f

e
r
r
o
r
s
errors
Figure 6.12: Histogram of Out of Sample residuals for LSSVM
15 10 5 0 5 10 15 20 25 30 35
0
1
2
3
4
5
6
7
8
prediction error for out of sample data (Mean=9.57,Std=7.33,MAPE=0.2429,MSE=143.1494)
N
o

o
f

e
r
r
o
r
s
errors
Figure 6.13: Histogram of Out of Sample residuals for ELM
LSSVM is low for 4 day lag as comparison to 1 day and 2 day lag.
For in-sample simulations of ELM, MAPE for 1 day is comparable to 4
day lag (lower for 1 day ahead and 3 day ahead but higher for 4 day ahead).
For out of sample simulations of ELM, MAPE for 4 day lag is lower than 1
day lag. So, 4 day lag seems to be better choice than 1 day and 2 day lag.
Also, ELM shows low MAPE than LSSVM for in-sample simulations but for
out of sample simulations, the MAPE of LSSVM is lower than ELM.
6.3. FORECAST ACCURACY ANALYSIS 43
1 day ahead forecast
1 day lag 2 day lag 4 day lag
LSSVM ELM LSSVM ELM LSSVM ELM
MAPE 0.196 0.039 0.167 0.366 0.048 0.099
MAE 18.2623 3.4595 15.2421 34.2326 4.8743 9.4666
Std of Error 21.922 3.223 17.779 27.139 6.11 8.472
MAPE 0.27 0.08 0.309 0.378 0.061 0.123
MAE 21.4112 7.7861 22.038 38.6319 5.7389 11.4071
Std of Error 28.408 9.954 29.853 31.417 7.921 9.408
MAPE 0.312 0.096 0.389 0.384 0.119 0.094
MAE 23.4406 9.104 26.0821 38.1143 9.2411 9.2426
Std of Error 29.97 13.365 30.665 34.086 11.165 12.07
Table 6.1: Model Performance - Insample
MAPE 0.11987 0.45476 0.13309 0.27384 0.096908 0.24286
MAE 4.8983 18.8295 5.3834 11.2104 3.961 10.1423
Std of Error 4.7967 9.6386 6.3221 5.9224 3.8184 7.3289
MAPE 0.21148 0.70117 0.18462 0.31356 0.11746 0.2278
MAE 10.2485 28.9651 8.2508 12.5524 5.1676 9.6407
Std of Error 13.9248 13.2899 10.343 8.483 6.2144 8.162
MAPE 0.23927 0.78456 0.19857 0.47402 0.14435 0.40396
MAE 11.5346 34.4959 9.3133 20.6412 6.5877 18.0641
Std of Error 14.3283 17.7412 11.186 17.5195 7.3444 19.796
Table 6.2: Model Performance - Out of Sample
6.4 Transition Case
0 100 200 300 400 500 600
0
50
100
150
200
250
e
u
r
o
/
M
W
h
hours
Figure 6.14: In-Sample spot prices for transition case
In this section, LSSVM and ELM are simulated on a transition period.
Fig. 6.14 shows that data used to train the LSSVM and ELM. It con-
tains highly volatile price series. The price vary from 210 euro/MWh to 20
euro/MWh. The simulation is done for price shown in g. 6.15. The prices
vary from 25 euro/MWh to 85 euro/MWh. This will help in comparing
the performance of LSSVM and ELM for a transition period (e.g.- extreme
events like grid failure).
Fig.6.16 and g. 6.17 show the in-sample and out of sample simulation
results. Exact values for the forecast accuracy measures (MAPE, MAE and
Std of Error) are given in Table 6.3 and 6.4. LSSVM and ELM results show
similar characteristics to the results presented earlier. For 4 day lag, LSSVM
works better than ELM for out of sample forecasting. The relative value of
MAPE for both LSSVM and ELM are of the same magnitude as that of
results presented in the non transition case. This proves that LSSVM and
ELM produce good results for transition periods also.
6.5 LSSVM and ELM - Execution Time
Main bottleneck for the LSSVM algorithm is the tuning of the parame-
ters. LSSVM requires the error parameters and any kernel parameter to be
tuned before forecasting. The parameters are tuned by cross-validation. As
cross-validation is computationally intensive, the time taken for tuning the
parameters is large. Fig. 6.18 and g. 6.19 shows the MATLAB proler
results for LSSVM and ELM.
6.5. LSSVM AND ELM - EXECUTION TIME 45
0 20 40 60 80 100 120 140
20
30
40
50
60
70
80
90
e
u
r
o
/
M
W
h
hours
Figure 6.15: Out of Sample spot prices for transition case
0 20 40 60 80 100 120
20
40
60
80
100
120
140
160
180
200
220
hours
e
u
r
o
/
M
W
h

ELM
SVM
Actual
Figure 6.16: LSSVM and ELM performance for transition case (in-sample)
0 20 40 60 80 100 120
20
40
60
80
100
120
140
hours
e
u
r
o
/
M
W
h

ELM
SVM
Actual
Figure 6.17: LSSVM and ELM performance for transition case (out of sam-
ple)
MAPE 0.10862 0.010998 0.12179 0.20265 0.046016 0.067095
MAE 10.9525 1.0042 11.3353 20.1434 4.5813 6.5181
Std of Error 15.3677 1.1374 14.7118 18.7345 5.568 6.0637
MAPE 0.28503 0.024226 0.22258 0.20116 0.054821 0.062249
MAE 30.5387 2.0749 17.7301 18.1292 5.5993 5.5309
Std of Error 34.5162 3.1191 24.3197 23.0094 7.3096 7.2614
MAPE 0.31801 0.069171 0.29327 0.21263 0.08856 0.062601
MAE 32.5884 5.8066 21.4408 18.0301 7.6937 6.065
Std of Error 35.2139 9.6145 26.2777 24.0909 10.3579 10.0281
Table 6.3: Model Performance - Insample transition case
6.5. LSSVM AND ELM - EXECUTION TIME 47
MAPE 0.19984 0.40075 0.22542 0.2096 0.20274 0.073761
MAE 7.912 17.3135 10.597 10.3734 8.9115 3.2521
Std of Error 11.3417 12.6874 9.2753 10.5921 10.8091 4.2678
MAPE 0.36261 0.57271 0.26487 0.20324 0.20811 0.52137
MAE 16.5731 25.8249 12.7893 9.4969 9.7692 24.3478
Std of Error 18.6107 16.3675 10.5217 10.9103 8.3409 24.0487
MAPE 0.35352 0.75766 0.3071 0.39624 0.2076 0.64918
MAE 17.0662 34.9022 15.1247 18.9643 10.1683 30.0395
Std of Error 18.3342 16.0931 13.8459 13.8474 8.4558 21.7506
Table 6.4: Model Performance - Out Sample transition case
Figure 6.18: MATLAB proler results for LSSVM
Figure 6.19: MATLAB proler results for ELM
Proler gives the running time taken by all the functions used in the
code. Function crossvallinsvm implements the crossvalidation for LSSVM.
It takes 22329 seconds to execute. For ELM there is no parameter to be
tuned. The number of hidden neurons is a parameter for ELM but if the
number of hidden neurons is of same magnitude as the length of the train-
ing data, the results do not depend on the number of hidden neurons [3].
Function trainelmmoore implements the ELM and it takes just 1.6 seconds.
Hence, ELM is much faster than LSSVM.
Part II
49
Chapter 7
Figure 7.1: Constrained Estimation
In forecasting, it is sometimes desirable to put some constraints during
the estimation step. Fig. 7.1 shows non-constrained estimation and con-
strained estimation. In non-constrained estimation, a model is proposed for
a given time series and it is trained using training data and predictions are
made. This approach is used in the rst part of the thesis. In constrained
estimation, constraints are added at the time of training the model as shown
in g. 7.1. The model is trained such that it ts the training data as well as
satisfy the constraints. This will help in making predictions that will follow
the constraints. For example, consider the forecasting of PV infeed. Based
on a empirical study that relates the feed-in tari (FIT) and investment
in PV industry, it is possible to dene a set of constraints that relate the
possible changes in FIT to the amount of PV infeed. These constrains can
be incorporated in the model at the time of training and then predictions
51
52 CHAPTER 7. CONSTRAINED ESTIMATION
can be made. This will result in a model that ts the training data and at
the same time satisfy the constraints also. This part of the thesis explains
the constrained estimation problems under the framework of Support Vector
Machines and Extreme Learning Machines.
7.1 SVM and ELM for constrained estimation
To apply the non-linear estimation algorithms to constrained estimation,
Support Vector Machines and Extreme Learning Machines are proposed in
this chapter. For using the support vector regression theory, it is proposed
to use the SVM with random feature spaces [5] and for using ELM theory,
an optimization based ELM is used[8]. The constrained estimation problem
can be written as:
f(x) = w
T
(x) +b (7.1)
such that w
T
(x
) (7.2)
Here, f(x) is the function to be estimated, x is the input vector, is a high
dimensional mapping similar to the estimation problem discussed previously
and (w, b) are the parameters of the model. The given constraint shows just
an example of the possible constraints. It means that during prediction the
value of the function should not be more than for inputs x
.
7.2 SVM with external constraints
To solve the estimation problems with constraints, it is required to include
the external constraints during the formulation of the optimization problem
of SVM. Consider the optimization problem of a standard SVM as given in
eq.(3.4). To solve a time series estimation problem with external constraints
using SVM theory, the optimization problem is modied along the lines of
eq. (7.2) as
min
w,b,,
J
p
(w, ,
) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
such that y
k
w
T
(x
k
) b +
k
, k = 1, . . . , N
w
T
(x
k
) +b y
k
+
k
, k = 1, . . . , N
k
,
k
0, k = 1, . . . , N
w
T
(x
) +b , S, S [N + 1, . . . , M] (7.3)
Eq.(7.3) is written for a time series dened by a training set {x
k
, y
k
}
N
k=1
,
is the future time for which we want to add constraints and M N + 1.
For other parameter, please refer eq.(3.4). One way to solve the primal
problem of eq.(7.3) is to convert it in the dual problem as described next.
7.2. SVM WITH EXTERNAL CONSTRAINTS 53
7.2.1 Solving the dual problem
The Lagrangian of the primal problem in eq.(7.3) is
L(w, b, ,
; ,
, ,
, ) =
1
2
w
T
w +c
N
k=1
(
k
+
k
)
k=1
k
( +
k
y
k
+w
T
(x
k
) +b)
N
k=1
k
( +
k
+y
k
w
T
(x
k
) b)
k=1
(
k
k
+
k
) +
M
=N+1
(w
T
(x
) +b ) (7.4)
The conditions of optimality give
L
w
= 0 w =
N
k=1
(
k

k
)(x
k
)
M
=N+1
(x
)
L
b
= 0
N
k=1
(
k
+
k
) +
M
=N+1
= 0
L
= 0 c
k

k
= 0
L
= 0 c
k
= 0 (7.5)
The dual problem can be written using eq.(7.5) and eq.(7.4)
max
,
,
J
d
=
1
2
N
k,l=1
(
k

k
)(
l

l
)K(x
k
, x
l
)
N
k=1
M
=N+1
(
k

k
)
K(x
k
, x
k=1
(
k
+
k
) +
N
k=1
y
k
(
k

k
)
M
=N+1
such that
N
k=1
(
k
+
k
) +
M
=N+1
= 0
k
,
k
[0, c] (7.6)
Using the value of w from eq.(7.5), the estimated function from eq.(3.1)
can be written as following
f(x) =
N
k=1
(
k

k
)K(x, x
k
)
M
=N+1
K(x, x
) +b
To solve the constrained estimation with the dual problem, it is required
to write explicitly the dual problem every time more constraints are added
or the current constraints are modied. Also, it is also not possible to write
the dual problem in terms of kernel functions for every type of constraints.
Next, SVM with random feature spaces is described that makes it possible
to solve the optimization problem without explicitly converting to dual form
and hence kernel function need not be dened.
1
7.2.2 Solving with the random feature space
Figure 7.2: Random Feature Space based SVM for constrained estimation
Frenay et al.[5] has proposed a way to merge the ELM and SVM ap-
proaches by dening a new method to explicitly dene the feature space.
This feature space is called random feature space as the parameters used to
dene it can be selected randomly. In ELM, the input vectors are mapped
to the hidden layer neurons by a randomly generated matrix[3]. This is
analogous to dening a new feature space where the hidden layer acts as a
1
Random feature space can be related to a kernel known as ELM kernel[5]
7.3. ELM WITH EXTERNAL CONSTRAINTS 55
transformation from the input vector space to the hidden neurons space. So,
for example, the feature space can be dened as following for a sigmoidal
function
i
(x
k
) =
1
1 + exp(w
i
.x
k
b)
, i = 1, . . . , h
(x
k
) =
_
1
(x
k
)
2
(x
k
) . . .
h
(x
k
)
(7.7)
The mapping (.) : R
n
R
h
takes the input vector x
k
R
n
to the h-
dimensional space R
h
, where h is the dimension of the high dimensional
feature space. In [25], Liu also proposes to use explicitly dened feature
spaces to form an Extreme Support Vector Machine (ESVM). The optimiza-
tion problem in eq.(7.3) can now be solved without kernels as the feature
mapping is known. Knowing the feature mapping enables to write the ex-
ternal constraints in the optimization problem rather than solving the dual
problem that utilizes the kernels but needs to be formulated every time the
constraints are changed. In [5], ELM kernel has been introduced based on
random feature spaces that can enable to use the Fixed Size SVM approach
in case of large data sets. The ELM kernel[5] is dened as
k(x
k
, x
l
) =
1
p
x
k
x
l
where is dened as in equation 7.7. Fig.7.2 shows the SVM method for
constrained estimation.
7.3 ELM with external constraints
The SVM output can be written as
f(x) =
N
s
s=1
(
s

s
)K(x, x
s
) +b (7.8)
Eq.(7.8) suggests that SVM can be compared to a generalized single-
hidden layer feed forward networks [8]. Note that eq. (7.8) is same as
eq.(3.12) where s denotes the support vectors. is zero for all but support
vectors. A comparision of eq. (7.8) and eq. (3.22) suggests that kernel
k(x, x
s
) is comparable to activation function h(x) and the Lagrangian factors
(
s
s
) are comparable to output weights . So, there can be a possibility
for combining ELM and SVM[8]. In [8], Huang et al. has suggested two
ways to combine SVM and ELM theory:
using random kernels
using a optimization based ELM
For this thesis, optimization based ELM is more suitable as it will help
to include a set of external constraints. For details on kernel based ELM,
please refer to [8] and [9].
7.3.1 Optimization based ELM
Consider the training data {x
k
, y
k
}
N
k=1
where x R
n
and y R
m
. The
ELM as presented in eq.(3.28) obtains zero training error based on theorem
1. The ELM based on the Moore-Penrose generalised inverse is a solution
to the following problem:
minimize
N
i=1
h(x
i
) y
i
and
minimize (7.9)
The Moore-Penrose generalized inverse is a least square solution to eq.(7.9).
It is possible to formulate ELM as an optimization problem with an error
bound as presented below[8], [9].
min
,
1
,
2
: L
p
=
1
2

2
+C
N
i=1
(
1
+
2
)
such that:h(x
i
) y
i

1
, i = 1, . . . , N
h(x
i
) y
i

2
, i = 1, . . . , N
1
,
2
0, i = 1, . . . , N (7.10)
The ELM presented in eq.(7.10) prevents possible over-tting as it as-
sumes error bounds and has the possibility to improve the generalization
performance[8]. The activation function is formed using random w and b as
described earlier. So, in this case, it is possible to include a set of external
constraints as described in next section.
Optimization based ELM with external constraints
The optimization based ELM provides a way to include some sets of external
constraints. As already explained, in time series forecasting, it is highly
desirable to have the possibility of including external constraints. From
here on, the optimization based ELM is called ELM-variant [8]. The ELM-
variant with a set of external constraints is described below.
min
,
1
,
2
: L
p
=
1
2

2
+C
N
i=1
(
1
+
2
)
such that:h(x
i
) y
i

1
, i = 1, . . . , N
h(x
i
) y
i

2
, i = 1, . . . , N
1
,
2
0, i = 1, . . . , N (7.11)
h(x
, S, S [N + 1, . . . , M] (7.12)
7.4. RESULTS FOR A ARTIFICIAL KNOWN PROCESS 57
is the future time for which we want to add constraints and M N + 1.
Since activation function can be formed by using random w and b, the opti-
mization problem can be extended to include external constraints as given
in eq. (7.12). Fig. 7.3 shows the ELM-variant with external constraints.
Figure 7.3: ELM-variant with external constraints
7.4 Results for a articial known process
The random feature space based SVM and ELM-variant is applied to a
articial process in this section. The function to be estimated is:
y = (x
2
1)x
4
exp(x)
such that (x
test
) L.B.
L.B. (x
test
) U.B. (7.13)
L.B. means lower bound and U.B means upper bound.
7.4.1 Random feature space based SVM
SVM is applied to the function of eq. (7.13). The feature space is chosen
explicitly as a sigmoidal function and the parameters (w, b) are chosen ran-
domly. The dimension of the feature space is predetermined. The dimension
of feature space does not aect the results as long as it is of same magnitude
as the size of input data. Fig. 7.4 shows the result of SVM without any
constraints.
1 0.5 0 0.5 1
0.5
0
0.5
1
1.5
2
x
y

original fn
SVM result
Figure 7.4: SVM results (Out of Sample)
Constraint: (x
test
) >= 0
Figure 7.5 shows the results of SVM for this constraint. In g.7.4 the SVM
estimates negative values. This constraint direct the optimization solver to
nd (w, b) such that for the points in the testing data x
test
, the function is
always greater than zero.
Constraint: 0 <= (x
test
) <= 1.4
Figure 7.6 shows the results of SVM for this constraint. In g.7.5, the
maximum estimated value is more than 1.4. This constraint put a upper
bound on the estimated value such that for the points in the testing data
x
test
, the function is always greater than zero but less than 1.4 .
7.4.2 ELM variant
ELM -variant is applied to the function in eq.(7.13). Fig.7.7 show the results
of ELM-variant without any constraints.
Constraint: (x
test
) >= 0
Figure 7.8 shows the results of ELM variant for this constraint. This con-
straint direct the optimization solver to nd (w, b) such that for the points
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
y

original fn
SVM result
Figure 7.5: SVM results for constraint 1(Out of Sample)
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
y

original fn
SVM result
Figure 7.6: SVM results for constraint 2(Out of Sample)
in the testing data x
test
, the function is always greater than zero.
Constraint: 0 <= (x
test
) <= 1.5
Figure 7.9 shows the results of ELM variant for this constraint. In g.7.8,
the maximum estimated value is more than 1.5. This constraint put a upper
bound on the estimated value such that for the points in the testing data
x
test
, the function is always greater than zero but less than 1.5 .
1 0.5 0 0.5 1
0.5
0
0.5
1
1.5
2
x
y

original fn
ELM result
Figure 7.7: ELM results for constraint 1(Out of Sample)
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
y

original fn
ELM result
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
y

original fn
ELM result
Chapter 8
Case Study - PV Infeed
Forecasting
8.1 Photovoltaic Infeed forecast model
The underlying equation for the PV infeed forecast is same as the Spot Price
forecast. The PV infeed can be written as:
y
t
= y
t
+
t
(8.1)
where y
t
is the PV infeed with average y
t
and
t
is a white noise process
with zero expectation and a nite variance i.e. (0,
2
). The model for
the hourly PV infeed is
y
t
=

y
t
+
t
(8.2)
Here, y
t
is the estimated PV infeed with estimated average y
t
and
t
is a
white noise process with zero expectation and a nite variance i.e.
t

(0,
2
). The estimated average is written as the function of a regression
vector x
t
in R
n
y
t
= f(x
t
) (8.3)
8.2 Characteristics of PV infeed
Photovoltaics infeed depend on a variety of factors. The amount of diused
radiation received on earth is correlated to temperature of the region. It can
also be correlated to the precipitation. Additionally, PV time series of two
given years shows signicant degree of cross correlation. PV infeed show
multi scale seasonality also. All these characteristics are discussed below in
detail:
63
64 CHAPTER 8. CASE STUDY - PV INFEED FORECASTING
Autocorrelation
Fig.8.1 shows the cross-correlation between PV infeed for 2012(March-June)
and 2011(March-June). The blue lines show 95% condence bounds. Any
value of cross correlation factor outside t he blue lines show a signicant
degree of cross-correlation.
20 15 10 5 0 5 10 15 20
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
Lag
S
a
m
p
l
e

C
r
o
s
s

C
o
r
r
e
l
a
t
i
o
n
Cross Correlation Function of PV infeed for 2011 and 2012
Figure 8.1: Cross correlation of PV infeed time series for 2011 and 2012
Multi-Scale Seasonality
PV time series shows dierent types of seasonality. Fig. 8.2 shows the PV
infeed for 4 months in a 3-d view. The PV infeed is plotted for every hour
and arranged by dierent weeks. The hour axis show 168 hours of the week.
The hour axis shows the intra-day seasonality. In-feed starts with zero,
increases as the sun increases the elevation angle and then decreases to zero
again in night. The week axis show the PV infeed over a period of 4 months
starting from March to June. The amount of in-feed increases as the month
change from March to June and summer approches.
Temperature
PV in-feed depend on temperature also. Temperature of a region can give an
indication of the amount of diused radiation received from the Sun during
a given day. So, the correlation between PV infeed and temperature can be
used to build a model for PV in-feed. Fig. 8.3 shows the relation between
PV in-feed and temperature.
8.2. CHARACTERISTICS OF PV INFEED 65
0 20 40 60 80 100 120 140 160 180
0
5
10
15
20
0
5000
10000
weeks
hours
P
V

i
n
n
e
e
d

(
M
W
)
Figure 8.2: Multi Scale Seasonality of PV infeed
0 50 100 150 200 250 300 350 400
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
N
o
r
m
a
l
i
z
e
d

t
e
m
p
e
r
a
t
u
r
e

a
n
d

P
V

i
n
f
e
e
d
hours

PV infeed
Mean Temperature
Figure 8.3: Correlation of PV infeed and Mean Temperature
8.3 Model for PV infeed
Based on the analysis of time series of PV, a NARX model is proposed.
NARX models are able to explain the autoregressive components and the
eect of external factors. The input vector for the NARX model of PV
consists of:
1. The Autoregressive part: The PV infeed is regressed with a window
consisting of previous year PV infeed values. The value of lag is kept
as a variable.
2. The external inputs: The external inputs taken in the input vector
are:
Weather variables include maximum temperature T
max
t
, mini-
mum temperature T
min
t
, mean temperature T
mean
t
, heating days
Hd
t
, cooling days Cd
t
and precipitation PP
t
. They can be grouped
together as Wea
t
Wea
t
=
_
T
max
t
T
min
t
T
mean
t
Hd
t
Cd
t
PP
t
To capture the monthly and intraday seasonality, deterministic

binary values are used due to their simplicity[10]. Hours are rep-
resented by vectors H
t
{0, 1}
24
and months by M
t
{0, 1}
12
The regression vector can be written as
x
a,t
= {y
a1,t1
, . . . , y
a1,tj
, Wea
t
, H
t
, M
t
}
j is the amount of lag that we want to include in the model and a is the
year.
8.4 Implementation
The PV infeed model is applied to the PV time series for Germany taken
from the Tennet website [26]. Fig. 8.4 shows the PV time series and following
points describes the data used in detail:
The data is from March to June for both 2012 and 2011.
The resolution of the in-feed is one hour.
Unless stated otherwise, the length of the insample data is 70% of the
training data. The rest of the training data is used for the out of
sampling predictions to nd out the preformance of the model. The
length of the both insample and out of sample data is rounded o to
the nearest multiple of 24.
8.5. SVM RESULTS 67
0 500 1000 1500 2000 2500 3000
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
P
V

i
n
f
e
e
d

(
M
W
)
hours
PV infeed(March 2012Jun 2012)
Figure 8.4: PV infeed time series from March 2012 to Jun 2012
All the data for weather variables come from Bloomberg.
PV infeed model is implemented in MATLAB and CVX [19] is used as the
optimization solver.
8.5 SVM results
8.5.1 SVM with random feature space without constraints
Fig.8.5 shows the training results of SVM. The SVM is fully able to capture
the information in the time series. Blue lines show the trained values and
red line show the actual data. Every blue point lies on the red line which
shows good tting of the data.
Fig. 8.6 show the results for insample data and Fig.8.7 show out of
sample data. There are no constraints in these results. In-sample data
results are able to match the prole of infeed accurately as expected. Intra-
day peak is captured successfully. For out of sample simulation, the results
are not as close to the actual in-feed. However, intra day seasonality is
captured successfully in this case also. PV infeed should be zero during
night, evening and early morning. The estimation of PV during these hours
should give zero values. In sample simulations give values very close to zero
but out of sample simulations gives non zero values. For some hours, out of
sample in-feed is negative also.
8.5.2 SVM with random feature space with constraints
Two sets of constraints are applied on the PV time series at the time of
training the model. The aim of the constraints is to check if the model can
0 100 200 300 400 500 600
0.2
0
0.2
0.4
0.6
0.8
1
1.2
hours
N
o
r
m
a
l
i
z
e
d

P
V

i
n
f
e
e
d
Figure 8.5: Training results (SVM with random feature space,without con-
straints)
0 10 20 30 40 50 60 70 80 90 100
0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
SVM results
hours
P
V

i
n
f
e
e
d

Actual Infeed
SVM result
Figure 8.6: In Sample results (SVM with random feature space,without
constraints)
8.5. SVM RESULTS 69
0 10 20 30 40 50 60 70 80 90 100
0.2
0
0.2
0.4
0.6
0.8
1
1.2
SVM results
hours
P
V

i
n
f
e
e
d

Actual infeed
SVM result
Figure 8.7: Out of Sample results (SVM with random feature space,without
constraints)
be trained to make non-negative estimation and an upper cap on the value
of thr PV in-feed.
Constraint 1:
Following constraints were applied on future data:
0 w
T
(x
) +b 0.8, [10, . . . , 20]

0 w
T
(x
) +b 1, [1, . . . , 96] (8.4)

where is the future time and is relative to the end of insample time. So,
[10, . . . , 20] means [t + 10, . . . , t + 20], t is the length of the insample time
series data and so on.
Fig. 8.8 show the results for in-sample data and Fig. 8.9 show out of
sample data for constraint 1. Comparison of g. 8.8 and g. 8.6 shows that
The in sample results are not aected by the addition of constraints. Out of
sample results show considerable improvement in terms of removing negative
in-feed. Also, the constraint of upper limit on the PV in-feed is satised for
hours 10-20. However, out of sample results are aected signicantly in the
region where the PV in-feed is zero.
0 10 20 30 40 50 60 70 80 90 100
0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
SVM results
hours
P
V

i
n
f
e
e
d

Actual Infeed
SVM result
Figure 8.8: In Sample results (SVM with random feature space,with con-
straints 1)
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SVM results
hours
P
V

i
n
f
e
e
d

Actual infeed
SVM result
Figure 8.9: Out of Sample results (SVM with random feature space,with
constraints 1)
8.6. ELM RESULTS 71
Constraint 2: Following constraints were applied on future data:
0 w
T
(x
) +b 0.8, [10, . . . , 20]

0 w
T
(x
) +b 0.6, [30, . . . , 40]

0 w
T
(x
) +b 0.65, [55, . . . , 65]

0 w
T
(x
) +b 0.4, [80, . . . , 90]

0 w
T
(x
) +b 1, [1, . . . , 96] (8.5)

Fig.8.10 show out of sample simulation for the second set of constraints.
The second set of constraints is more strict than the rst set in the sense
that an upper limit is applied to all the peaks of PV in-feed. In this case
also, the negative values have been removed and the upper limit is satised
for all peaks but the problem of non-zero infeed, when the actual infeed is
zero, remains.
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
SVM results
hours
P
V

i
n
f
e
e
d

Actual infeed
SVM result
Figure 8.10: Out of Sample results (SVM with random feature space,with
constraints 2)
8.6 ELM results
This section describes the results of ELM-variant for the PV model. Only
the out of sample results for constrained estimation are given here. For
complete results, please refer Appendix 2.
8.6.1 ELM-variant with constraints
0 H(x
) 0.8, [10, . . . , 20]

0 H(x
) 1, [1, . . . , 96] (8.6)

series data and so on. Fig. A.7 and Fig.8.12 show the results .
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ELM results
hours
P
V

i
n
f
e
e
d

Actual infeed
ELM result
Figure 8.11: Out of Sample results (ELM with constraints 1)
0 H(x
) 0.8, [10, . . . , 20]

0 H(x
) 0.6, [30, . . . , 40]

0 H(x
) 0.65, [55, . . . , 65]

0 H(x
) 0.4, [80, . . . , 90]

0 H(x
) 1, [1, . . . , 96] (8.7)

series data and so on. For both of the out of sample simulations of the
ELM-variant, the constraints are satised. There is no negative infeed and
the upper limit is satised. But the problem of non-zero in-feed similar to
the SVM results remains.
8.6. ELM RESULTS 73
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
ELM results
hours
P
V

i
n
f
e
e
d

Actual infeed
ELM result
Figure 8.12: Out of Sample results (ELM with constraints 2)
Chapter 9
Conclusion
In the rst part of the thesis, Support Vector Machine and Extreme Learn-
ing Machine are presented to train the models for time series forecasting in
electricity markets. A NARX model is developed for spot price forecasting.
The electricity spot prices show autocorrelation and multi-scale seasonality.
Furthermore, prices are also correlated with weather parameters and aggre-
gate load prole. The model is trained using LSSVM and ELM. LSSVM
approach required tuning of parameters which is done using 10-fold cross-
validation. The simulation is done for 1-day ahead, 3-day ahead and 5-day
ahead forecasting. The amount of lag to capture the autocorrelation is also
varied for better accuracy. 1-day lag, 2-day lag and 4-day lag is used to simu-
late the spot prices. For nding the accuracy of forecast, two error estimates
are used: MAPE and MAE. For in-sample simulation, ELM works better
than LSSVM but LSSVM outperforms ELM for out of sample simulations.
For the model, 4-day lag gives the best performance. Both LSSVM and
ELM give good performance results for transition periods also. ELM works
much faster than LSSVM as it does not require any tuning of parameters.
In the second part of the thesis, constrained estimation is presented. The
model is subjected to external constraints and then trained using sample
data. This ensures that the prediction can follow possible future develop-
ments. It is proposed to use the random feature spaces based SVM to solve
the constrained estimation problem. Explicit construction of feature spaces
makes it easy to include the external constraints in the optimization soft-
ware. Converting the primal problem to dual problem is not required. The
dual form has the possible advantage of using kernel functions but it need
to be reformulated every time the constrains are modied which is a tedious
task. Also, the if the feature spaces do not appear in dot product form,
the dual problem also requires explicit construction of feature space. Hence,
random feature space based SVM is a better solution technique for con-
strained estimation. The ELM theory is applied to constrained estimation
problem using the optimization based ELM. ELM based on moore-penrose
75
76 CHAPTER 9. CONCLUSION
generalized inverse is a least square solution and hence external constraints
are dicult to implement. Optimization based ELM makes it easy to include
the constraints in the optimization solver. Both SVM and ELM are applied
to PV in-feed forecast subjected to external constraints. The constraints are
used to remove the negative in-feed of PV and to cut the peaks of the PV
in-feed during the day. SVM and ELM produce good performance results
on both sets of constraints.
Outlook
Large data sets
The tuning of parameters of SVM poses a challenge if one wants to use
SVM for large data sets. LSSVM method can be applied to large data sets
under the framework of xed-size LSSVM [2]. In xed-size LSSVM, the high
dimensional mapping is estimated separately and the optimization problem
of SVM is solved in primal space. The support vectors need to be chosen rst
and this method can overcome the problem with computationally intensive
standard LSSVM.
Interval forecast
Spot prices forecasts can be extended to include interval forecast. This can
help in making price scenarios and also put a condence interval on the point
forecasts.
Analysis of residuals in constrained estimation
The question of validation of models in constrained estimation should be
addressed. Without constraints, the model can be validated by analyzing
the residuals. When the model is subjected to constraints, the residuals are
dierent than the no constraint case.
Eect of constraints
Adding constraints at the time of training can lead to infeasible or incom-
plete optimization problem. A error catching mechanism should be designed
to report any constraints that lead to such problems.
Appendix A
LSSVM and ELM complete
results
0 5 10 15 20 25
40
60
80
100
120
140
160
hours
e
u
r
o
/
M
W
h
LSSVM results(insample,Mean
LSSVM
=74.34,Std
LSSVM
=24.35,Mean
epex
=78.95,Std
epex
=29.60)

LSSVM forecast
EPEX prices
Figure A.1: LSSVM In Sample t - 1 Day forecast
77
78 APPENDIX A. LSSVM AND ELM COMPLETE RESULTS
0 10 20 30 40 50 60 70 80
40
60
80
100
120
140
160
180
200
220
hours
e
u
r
o
/
M
W
h
LSSVM results(insample,Mean
LSSVM
=82.30,Std
LSSVM
=30.16,Mean
epex
=85.35,Std
epex
=35.92)

LSSVM forecast
EPEX prices
Figure A.2: LSSVM In Sample t - 3 Day forecast
0 5 10 15 20 25
30
35
40
45
50
55
hours
e
u
r
o
/
M
W
h
LSSVM results(out of sample, Mean
LSSVM
=45.10,Std
LSSVM
=5.58,Mean
epex
=42.16,Std
epex
=5.73)

LSSVM forecast
EPEX prices
Figure A.3: LSSVM Out of Sample t - 1 Day forecast
79
0 10 20 30 40 50 60 70 80
30
35
40
45
50
55
60
65
70
hours
e
u
r
o
/
M
W
h
LSSVM results(out of sample, Mean
LSSVM
=43.59,Std
LSSVM
=7.50,Mean
epex
=43.78,Std
epex
=9.40)

LSSVM forecast
EPEX prices
Figure A.4: LSSVM Out of Sample t - 3 Day forecast
0 5 10 15 20 25
20
40
60
80
100
120
140
160
hours
e
u
r
o
/
M
W
h
ELM results(insample,Mean
elm
=69.48,Std
elm
=21.86,Mean
epex
=78.95,Std
epex
=29.60)

ELM forecast
EPEX prices
Figure A.5: ELM In Sample t - 1 Day forecast
0 10 20 30 40 50 60 70 80
20
40
60
80
100
120
140
160
180
200
220
hours
e
u
r
o
/
M
W
h
ELM results(insample,Mean
elm
=74.25,Std
elm
=29.19,Mean
epex
=85.35,Std
epex
=35.92)

ELM forecast
EPEX prices
Figure A.6: ELM In Sample t - 3 Day forecast
0 5 10 15 20 25
30
35
40
45
50
55
60
65
hours
e
u
r
o
/
M
W
h
ELM results(out of sample, Mean
elm
=51.74,Std
elm
=9.43,Mean
epex
=42.16,Std
epex
=5.73)

ELM forecast
EPEX prices
Figure A.7: ELM Out of Sample t - 1 Day forecast
81
0 10 20 30 40 50 60 70 80
20
30
40
50
60
70
80
hours
e
u
r
o
/
M
W
h
ELM results(out of sample, Mean
elm
=52.12,Std
elm
=12.15,Mean
epex
=43.78,Std
epex
=9.40)

ELM forecast
EPEX prices
Figure A.8: ELM Out of Sample t - 3 Day forecast
Appendix B
Complete Results for
0 100 200 300 400 500 600
0.2
0
0.2
0.4
0.6
0.8
1
1.2
hours
N
o
r
m
a
l
i
z
e
d

P
V

i
n
f
e
e
d
Figure B.1: ELM-variant training results
83
84APPENDIXB. COMPLETE RESULTS FOR CONSTRAINEDESTIMATION
0 10 20 30 40 50 60 70 80 90 100
0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
ELM results
hours
P
V

i
n
f
e
e
d

Actual infeed
ELM result
Figure B.2: In-sample results for PV in-feed(ELM, no constraint)
0 10 20 30 40 50 60 70 80 90 100
0.2
0
0.2
0.4
0.6
0.8
1
1.2
ELM results
hours
P
V

i
n
f
e
e
d

Actual infeed
ELM result
Figure B.3: Out of sample results for PV in-feed(ELM, no constraint)
85
0 2 4 6 8 10 12 14 16 18 20
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Autocorrelation of residuals(In Sample)
Figure B.4: Autocorrelation of in-sample residuals(SVM, no constraint)
0 2 4 6 8 10 12 14 16 18 20
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Autocorrelation of residuals(Out of Sample)
Figure B.5: Autocorrelation of out of sample residuals(SVM, no constraint)
0 2 4 6 8 10 12 14 16 18 20
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Figure B.6: Autocorrelation of in-sample residuals(SVM, constraint 1)
0 2 4 6 8 10 12 14 16 18 20
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Figure B.7: Autocorrelation of in-sample residuals(SVM, constraint 2)
87
0 2 4 6 8 10 12 14 16 18 20
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Figure B.8: Autocorrelation of out of sample residuals(SVM, constraint 1)
0 2 4 6 8 10 12 14 16 18 20
0.5
0
0.5
1
Lag
S
a
m
p
l
e

A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Figure B.9: Autocorrelation of out of sample residuals(SVM, constraint 2)
Bibliography
[1] Stein-Erik Fleten and Jacob Lemming. Constructing forward price
curves in electricity markets. Energy Economics, 25:409424, 2003.
[2] J. A. K. Suykens, T. Van Gestel, J. de Brabanter, B. De Moor, and
J. Vandewalle. Least Square Support Vector Machines. World Scientic
Publishing Co. Pte. Ltd., 2002.
[3] Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. Extreme learn-
ing machine: Theory and applications. Neurocomputing, 70(13):489
501, 2006.
[4] Marcus Hildmann, Evdokia Kae, Yang He, and Goran Andersson.
Combined estimation and prediction of the hourly price forward curve.
IEEE Power and Energy Society General Meeting, 2012.
[5] Benoit Frenay and Michel Verleysen. Using svms with randomised
feature spaces: an extreme learning approach. In ESANN 2010
proceedings, European Symposium on Articial Neural Networks-
Computational Intelligence and Machine Learning., April 2010.
[6] Qiuge Liu, Qing He, and Zhongzhi Shi. Extreme support vector machine
classier. PAKDD, pages 222233, 2008.
[7] J. A. K. Suykens and J. Vandewalle. Training multilayer perceptron
classiers based on a modied support vector method. IEEE transac-
tions on Neural Networks, 10:907911, 1999.
[8] Guang-Bin Huang, Xiaojian Ding, and Hongming Zhou. Optimization
method based extreme learning machine for classication. Neurocom-
puting, 74(1a3):155163, December 2010.
[9] Guang-Bin Huang, Xiaojian Ding, Hongming Zhou, and Rui Zhang.
Extreme learning machine for regression and multiclass classication.
IEEE transactions on systems, man and cybernetics, 42(2), April 2012.
[10] Marcelo Espinoza, J. A. K. Suykens, Ronnie Belmans, and Bart De
moor. Electric load forecasting. IEEE Control Systems Magazine,
27:4357, 2007.
89
90 BIBLIOGRAPHY
[11] Rafal Weron and Adam Misiorek. Forecasting spot electricity prices:
A comparison of parametric and semiparametric time series models.
MPRA, (10428), 2008.
[12] Adam Misiorek, Stefan Trueck, and Rafal Weron. Point and interval
forecasting of spot electricity prices: Linear vs. non-linear time series
models. Studies in Nonlinear Dynamics and Econometrics, 10, 2006.
[13] Dr. Aurelio Fetz. Fundamental aspects of power markets - price fore-
casting. Part of Strommarkt-II lectures at ETH Z urich, Spring 2012,
29 February 2012.
[14] Marcelo Espinoza, Tillmann Flack, Johan A. K. Suykens, and Bart De
Moor. Time series prediction using ls-svms. ESTSP, 2008.
[15] Marcelo Espinoza, Johan A.K. Suykens, and Bart De Moor. Ls-svm re-
gression with autocorrelated errors. IFAC Symposium on System Iden-
tication, 14, 2006.
[16] Gustaf Unger. Hedging Strategy and Electricity Contract Engineering.
PhD thesis, ETH Zuerich Diss No. 14727, 2002.
[17] S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice
Hall New Jersey, 1999.
[18] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag,
1995.
[19] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cam-
bridge University Press, 2011.
[20] C.R.Rao and S.K. Mitra. Generalized Inverse of Matrices and its Ap-
plications. Wiley New York, 1971.
[21] Peter L. Bartlett. The sample complexity of pattern classication with
neural networks: The size of the weights is more important than the
size of the network. IEEE TRANSACTIONS ON INFORMATION
THEORY, 44(2), March 1998.
[22] George E. P. Box and Gwilym M. Jenkins. Time Series Analysis fore-
casting and control. Holden-Day Inc., 1976.
[23] EPEX European Power Exchange. www.epex.com, last accessed July
2012.
[24] Prof. Peter B uhlmann. Computational Statistics. Seminar for Statistik,
ETH Zuerich, Spring 2008.
BIBLIOGRAPHY 91
[25] Qiuge Liu, Qing He, and Zhongzhi Shi. Extreme support vector machine
classier. PAKDD, pages 222233, 2008.
[26] TenneT TSO Network Figures. www.tennettso.de, last accessed July
2012.
[27] Guang-Bin Huang and Lei Chen. Convex incremental extreme learning
machine. Neurocomputing, 70(16a18):30563062, October 2007.
[28] Stanley Weardon Shirley Dowdy and Daniel Chilko. Basics of Statistics
- Statistics for Research. Wiley Seriesin Probability and Statistics, 2004.
[29] Gene H. Golub, Michael Heath, and Grace Wahba. Generalized cross-
validation as a method for choosing a good ridge parameter. Techno-
metrics, 21:215223, 1979.

Photovoltaic Output Prediction Using Support Vector Machines

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Photovoltaic Output Prediction Using Support Vector Machines

Transféré par

Droits d'auteur :

Formats disponibles

eeh

Uberanpassung (over-tting). Die Methode der kleinsten Quadrate (Ordi-

is the Moore-Penrose generalized inverse of matrix H [20]. The

5.2. SHORT TERM SPOT PRICE MODEL 31

. Similarly, hours are represented by vectors

To capture the monthly and intraday seasonality, deterministic

) +b 0.8, [10, . . . , 20]

) +b 1, [1, . . . , 96] (8.4)

) +b 0.8, [10, . . . , 20]

) +b 0.6, [30, . . . , 40]

) +b 0.65, [55, . . . , 65]

) +b 0.4, [80, . . . , 90]

) +b 1, [1, . . . , 96] (8.5)

) 0.8, [10, . . . , 20]

) 1, [1, . . . , 96] (8.6)

) 0.8, [10, . . . , 20]

) 0.6, [30, . . . , 40]

) 0.65, [55, . . . , 65]

) 0.4, [80, . . . , 90]

) 1, [1, . . . , 96] (8.7)

Vous aimerez peut-être aussi