Académique Documents
Professionnel Documents
Culture Documents
Abstract—In this paper we focus our attention on the long-term The energy load is characterized by complex and highly
load forecasting problem, that is the prediction of energy consump- non-linear implicit relationships among several factors, such as
tion for several months ahead (up to one or more years), useful in
order to ease the proper scheduling of operative conditions (such as weather, seasonality, climatic conditions, and past usage energy
the planning of fuel supply). While several effective techniques are utilization trend. Due to the intrinsic difficulties of the load esti-
available in the short-term framework, no reliable methods have mation, many papers exist which face the problem of short-term
been proposed for long-term predictions. For this purpose, we de-
scribe in this work a new procedure, which exploits the Empirical forecasting, that is the (e.g., hourly) prediction of electricity
Mode Decomposition method to disaggregate a time series into two demand for successive time-steps ranging from one hour to few
sets of components, respectively describing the trend and the local days ahead [7]–[11]. In this framework, the techniques based
oscillations of the energy consumption values. These sets are then
used for training Support Vector Regression models. The experi- on Auto-Regressive (AR) models [12], Neural Networks (NN)
mental results, obtained both on a public-domain and on an office [13]–[16], and Support Vector Regression (SVR) [17], [18]
building dataset, allow to validate the effectiveness of the proposed often guarantee a satisfying degree of accuracy. In particular,
method.
the SVR-based approach [19], [20] is attractive due to its adapt-
Index Terms—Empirical mode decomposition, load forecasting, ability and robustness in dealing with non-linear phenomena
support vector regression.
generating the data. Moreover, while NNs often get trapped
in local minima during the training phase, the SVR learning
I. INTRODUCTION problem corresponds to the minimization of a convex cost
function, thus it guarantees that the global minimum is found.
T HE PROBLEM OF load forecasting represents a crucial
issue for operational planners and researchers, thus it has
captured the attention of both academy and power industry in
However, even the SVR partially neglects the underlying
phenomena originating the data (e.g., local characteristics of
the time series, such as peaks at particular hours of the day),
the last years. As a matter of fact, energy suppliers are facing an
especially when highly non-linear and non-stationary series are
ever greater competition, so that factors like quality and conti-
considered. In this framework, preprocessing decomposition
nuity of the offered service, long-term maintenance scheduling,
methods can help the SVR, as they allow to determine char-
and optimization of the dispatched power flow, must be properly
acteristic time/frequency scales for the series and emphasize
taken into account. These claims are validated by the growing
the local trends characterizing the data. In particular, if the
interest in smart grids [1], [2], which exploit a large amount of
exploited decomposition technique satisfies the conditions of
new technologies in power generation, transmission, distribu-
adaptiveness (i.e., the method is data-driven), locality (i.e.,
tion, and utilization [1], [3], [4] as well as advanced computa-
local properties of the oscillatory modes are emphasized) and
tional methods [5] in order to achieve optimization of proper
completeness (i.e., the original signal can be exactly recon-
power management and energy saving.
structed by re-assembling the components), it is possible to
It is straightforward noting that the quality of the load esti-
show how SVR models, trained using the decomposed signals,
mation has a remarkable impact on the economic operation of
result to be more effective than regressors, trained using the
the power distribution system, as many decisions can be based
raw time series [21].
on the predicted values: for example, the energy demand can be
Several decomposition methods have been proposed in the
aprioristically evaluated, thus allowing to relieve possible con-
literature [22]:
flicts with the energy supply (e.g., due to an insufficient fuel pro-
• Fourier Series [23] are widely used in signal processing,
vision). The importance of such accurate forecasts will further
but are most useful when the underlying process is linear
increase in the near future, because of the remarkable changes
and the time series is stationary;
occurring in the structure of power industry (e.g., due to dereg-
• Wavelet analysis [24], despite being attractive because of
ulation) [6].
the locality of the approach and the uniform temporal res-
olution used for all the frequency scales, is non-adaptive
Manuscript received August 26, 2011; revised January 11, 2012, July 27,
and, then, is most useful when characterizing gradual fre-
2012, October 13, 2012; accepted December 09, 2012. Date of publication Feb-
ruary 06, 2013; date of current version February 27, 2013. Paper no. TSG- quency changes in the time series;
00407-2011. • Principal Component Analysis (PCA) [25] is an adap-
The authors are with the DITEN—University of Genoa, Via Opera
tive approach, but unfortunately the distribution of
Pia 11A, I-16145 Genoa, Italy (e-mail: Luca.Ghelardoni@unige.it;
Alessandro.Ghio@unige.it; Davide.Anguita@unige.it). the eigenvalues does not allow to obtain characteristic
Digital Object Identifier 10.1109/TSG.2012.2235089 time/frequency scales.
Empirical Mode Decomposition (EMD) [26], [27], instead, is properties, intrinsic to electric load patterns, are forecasted by
particularly suitable as a preprocessing method in non-linear exploiting the cyclicities and the capacity of EMD to accurately
and non-stationary series analysis: a signal is detrended and, capture the underlying characteristics of the observed time se-
then, decomposed into a small number of Intrinsic Mode Func- ries for training the SVR models. The outputs of the two models
tions (IMFs). While the extracted trend describes the general are then re-assembled to obtain the load prediction, thanks to the
behavior of the time series, the IMFs obey to the requirements appealing completeness property of EMD.
of completeness, locality, and adaptiveness, which make EMD The paper is organized as follows: in Section II we briefly
an appealing decomposition method. In fact, the IMFs are mod- revise the Support Vector Machine for Regression algorithm,
ulated both in amplitude and frequency, in order to capture the while in Section III we present the Empirical Mode Decompo-
local characteristics of the time-varying signal (e.g., the infor- sition; then, in Section IV, we present the combined EMD-SVR
mation concerning higher order statistical moments are kept in approach of [21], which is particularly suitable for short-term
the set of components). Due to its properties, EMD has been ef- forecasting applications. Section V presents the proposed
fectively used as a preprocessing step for SVR in the short-term method, targeted towards the long-term prediction framework.
framework [21]: once the trend and the IMFs are extracted, SVR Finally, in Section VI, we present some experimental results
models are trained on each of these components and used for on real-world energy consumption series, which show the
predicting values in the near future. effectiveness of our method.
However, the scheduling of many operative conditions of the
energy distribution network requires that a wider time-horizon II. SUPPORT VECTOR MACHINE FOR REGRESSION AND TIME
is taken into account [28]: for example, load forecasting is im- SERIES PREDICTION
portant for contract evaluations and preparation of various so-
phisticated financial products on the energy liberalized market A. Support Vector Regression
[29]. Furthermore, the ability of performing accurate medium-
term (i.e., the hourly forecasting of electricity demand for some Let us consider a set of training data
days up to few months ahead) and long-term (i.e., the hourly , where, in general, ,
load forecasting for several months up to one or more years and . When targeting a regression problem, our objective
ahead) predictions is propaedeutic for the success of a utility is to find the best approximating function which, in prin-
company [28], as it allows to timely implement crucial opera- ciple, should not be far from the target function more than
tional decisions leading to the improvement of network relia- a user-defined , that is .
bility and to the reduced occurrences of equipment failures and By applying a map , with , to the patterns
blackouts [30]. For such purposes, the mere estimation of the so that, in the new space, the problem can be more efficiently
daily peak load [31] gives insufficient information to the electric solved by linear regression, we get the following Support Vector
network manager; at the same time, the long-term forecasting1, Regression (SVR) formulation [20] for training a model:
obtained using the conventional short-term techniques [32], is
often unreliable because of the error propagation in the estima-
tion of the load values [33]. Unfortunately, this issue afflicts
the conventional SVR-based and the combined EMD-SVR ap-
proaches too: in fact, high frequency components, which are
not properly captured by SVR [33], are typical of energy load
time series and are preserved in the decomposed IMFs, due
(1)
to the properties of locality and adaptiveness of EMD. More-
over, the exploitation of exogenous inputs to improve the fore-
casting, such as economic growth and weather/seasonal het- where is a hyperparameter, which must be properly tuned,
erogeneous parameters (e.g., temperature, humidity index, or and is the kernel function. In regres-
wind-chill index [17]), is not an option either [33], as predicting sion problems, two of the most common kernel functions are the
these factors in the long-term framework is, at least, as com- linear and the Gaussian ones of (2) and (3) [34]:
plex as the energy load forecasting problem itself. In fact, to the
authors best knowledge, no effective approaches have been pro- (2)
posed so far to target long-term forecasting problems. (3)
In this paper, we present a new technique, which allows to
address the long-term load forecasting problem, by splitting it Once the training phase is concluded and a model is iden-
into two sub-problems: on one hand, the IMFs, which better de- tified, the SVR feed-forward function can be evaluated as fol-
scribe the long-term trend of the time series, are selected and lows:
predicted by using a generalization of the EMD-SVR approach
of [21]; on the other hand, the components that describe the local (4)
1Since, for the scopes of our work, medium- and long-term predictions are
characterized by similar difficulties (though the latter one is even more chal-
lenging), they are typically likened: thus, for the sake of simplicity in this paper where the bias can be derived when Problem (1) is solved [19]
we will only distinguish between short- and long-term load forecasting methods. from the Karush-Kuhn-Tucker (KKT) conditions.
GHELARDONI et al.: ENERGY LOAD FORECASTING USING EMPIRICAL MODE DECOMPOSITION AND SUPPORT VECTOR REGRESSION 551
B. Support Vector Regression for Time Series Analysis In the following, one state-of-the-art algorithm to implement
the EMD, namely the technique based on the sifting process, is
Let be a time series, where the continuous presented, which allows to disassemble the original signal into
signal , observed for a time-interval , is sampled at the IMFs components [36].
associated time-steps2 . Two possibilities exist
for using SVR in the time series prediction problem. A. The Sifting Process
The first alternative consists in creating a set ,
based on , where and is The sifting process consists in four main steps:
a user-defined parameter, which corresponds to the number of 1) Identify all the extrema in the original series ;
previous time-steps, considered for the regression problem [35]. 2) Interpolate the local maxima to obtain the upper envelope
can be thus exploited for training an SVR model. During the of the data: for this purpose, cubic splines are used as they
feed-forward phase, when future values of the time series must guarantee trade-off between the interpolating capabilities
be estimated, the outputs of the model for the previous steps of the curve and the computational time, needed to identify
are recursively used as inputs for the successive forecasts. Thus the coefficients of the function;
, during the forecasting phase, will also contain the predicted 3) Repeat the previous step for interpolating the local minima
values : for this reason, we define this procedure as Recur- to obtain the lower envelope;
sive-SVR (R-SVR). 4) Compute the mean of the two envelopes and the
A second alternative for using SVR models in the time series function .
prediction problem consists in exploiting the time-steps as Ideally, satisfies the properties of an IMF and the process
inputs. In this case, an SVR regressor is learnt on a training set is over; in many practical cases, is not an IMF, thus the
, obtained by juxtaposing the two sets and : previous procedure must be reiterated further by using the func-
in other words, an approximation of the function is obtained tion in place of the original one and the following
in the time-interval . During the feed-forward phase, the result is obtained:
outputs of the model are simply obtained using time-steps as
inputs: thus, we define this procedure as Approximating-SVR (5)
(A-SVR).
The R-SVR procedure is most useful when values must be In practice, this procedure is usually reiterated up to times,
predicted outside the time-interval of observation and re- even though at some iterations the IMF properties are satisfied:
sults to be remarkably effective when low frequency compo-
(6)
nents characterize the samples [21]. On the contrary, due to its
scarce extrapolation capabilities, A-SVR is seldom used in time In fact, the main objectives of the sifting process are to eliminate
series problems, though it represents a valuable alternative to riding waves and to improve symmetry with respect to zero: ide-
R-SVR when the time-interval is fixed and high frequency os- ally, the best possible solution is thus obtained for large values of
cillations afflict the data. . Unfortunately, the iteration of the procedure has a side-effect:
the amplitude of the IMF-candidate function is smoothed
III. EMPIRICAL MODE DECOMPOSITION at every iteration. In other words, a proper stopping criterion
must be defined. For this purpose, an index has been proposed
The Empirical Mode Decomposition (EMD) technique [26], in [37]: a preselected number is aprioristically defined and the
[27] consists in decomposing a signal into a sum of func- iterative procedure is stopped after times the result function
tions, such that each of them satisfies two conditions: respects the IMF properties. No rigorous methods have been
1) The number of zero crossings must be equal to or differ proposed for properly selecting the value of ; however, from
from the number of extrema by no more than one; a practical point of view, is usually considered an
2) The mean value of the two envelopes, defined by interpo- effective choice [37].
lating, respectively, the local maxima and minima of , After the stopping criterion is satisfied, results to be
must be equal to zero. the first component of the EMD:
If both these properties hold, the decomposition is complete,
local, and adaptive [26]. Waveforms, satisfying the two previous (7)
conditions, give information concerning the intrinsic oscillation
modes of the data and, thus, are called Intrinsic Mode Functions Further decomposed signals can be obtained by re-sifting the
(IMFs). The IMFs represent oscillatory modes, but are much residue, computed as
more general than harmonic functions. In fact, it is worth noting
that the IMFs are modulated both in amplitude and in frequency (8)
and, consequently, are not restricted to be stationary. A pure
amplitude or frequency modulated signal represents a particular This procedure is repeated to find all the IMFs, i.e., until the
case of IMF. residue is characterized by less than two extrema and then
cannot be further decomposed. It is worth noting that, as candi-
2Note that the time-steps can also contain information concerning special
date functions which include high frequency components
days and holiday events: as these exogenous inputs are deterministic, they must
not be predicted and thus can be safely exploited also in the long-term frame- often require a larger number of steps to converge to IMFs, their
work. amplitude is usually more smoothed.
552 IEEE TRANSACTIONS ON SMART GRID, VOL. 4, NO. 1, MARCH 2013
From the definition of , the obtained IMFs can be re-as- From a qualitative point of view, the first set of IMFs, that we
sembled in order to obtain the original data define Principal-IMFs (P-IMFs), describes the long-term trend
of the time series, while the second set of IMFs, namely the Be-
(9) havioral-IMFs (B-IMFs), captures local features of the signal
(e.g., peak values or oscillations due to environmental factors).
In order to obtain a reliable prediction, the long-term load fore-
Note that the IMFs have local mean equal to zero, so the last casting method should contemporaneously allow to:
residue represents the trend (i.e., the longest period com- • Approximate the P-IMFs, where high frequency oscilla-
ponent) of the original signal. tions are usually absent;
• Approximate the B-IMFs, which are characterized by high
frequency components, through the exploitation of the in-
IV. THE COMBINED EMD-SVR APPROACH
formation regarding the cyclicities (embedded in them).
In this section, we briefly revise the combined EMD-SVR
approach, presented in [21] for the short-term load forecasting A. Identification of the Most Informative IMFs
problem. Given a time series of elements, the training phase In order to rank the most informative IMFs, once the EMD
of the method consists of few steps: procedure ends, we can make use of conventional techniques,
1) EMD is applied to identify the IMFs, defined as exploited in other decomposition methods as well (such as the
, for . For the sake of sim- PCA [25]), which allow to compute the energy associated to
plicity, let be the residue (i.e., the trend); each component. In the case of the IMFs and the residue, we
2) The number of previous values to be considered for define the overall energy of the whole time series as
R-SVR (see Section II-B) is defined by the user (or chosen
via a conventional model selection procedure [38], such as (10)
K-fold Cross Validation [39]);
3) The datasets for the R-SVR training are The percentage of energy associated to the -th IMF is:
created: , where
;
4) R-SVR models are trained on the IMFs % (11)
and the trend.
During the feed-forward phase, the prediction is obtained by where and the -th component rep-
re-assembling the values, predicted by the R-SVR resents the residue (which, for the sake of simplicity, is likened
models, analogously to (9). to an IMF). Once the values of energy are computed for the
Due to the appealing properties of EMD, R-SVR models, IMFs and the residue, we can simply rank the components in
trained on the IMFs, are usually effective and allow to obtain descending order and select the first IMFs which guarantee a
good results in the short-term framework (i.e., some hours up cumulative energy level greater than a user-defined and appli-
to few days ahead) [21]. Unfortunately, its generalization to the cation-dependent value % %
long-term analysis is not effective, mainly due to the limited
capabilities of R-SVR in dealing with higher frequency compo- (12)
nents included in the original series and preserved in some of
the IMFs: as a matter of fact, error propagation in the predic-
tion can cause the forecasted load to rapidly diverge from the These first components constitute the set of P-IMFs, while the
real value when the time-horizon is extended to some weeks or remaining ones are classified as B-IMFs.
months [33]. In order to avoid these large errors, we should be
B. P-IMFs Prediction
able to capture possible “basic” cyclicities (e.g., due to season-
ality), which, by the way, are maintained in the identified IMFs Let us consider the P-IMFs. Our aim is to predict each series
by construction. In the next section, we present a generalization behavior by using R-SVR models. It is worth noting that, as high
of the EMD-SVR approach. frequency oscillations are usually absent in the P-IMFs, we can
safely use the technique, presented in [21] and summarized in
Section IV, for approximating them: this method allows us to
V. THE EMD-SVR APPROACH FOR LONG-TERM LOAD
achieve a small forecasting error even in a long-term prediction.
FORECASTING
In the following, we will refer to this procedure as recursive
Broadly speaking, we can divide the IMFs into two cate- Support Vector regression for P-IMFs prediction (SVP).
gories: During the training phase, R-SVR models are created for
1) some IMFs, which are identified in few sifting steps, are the P-IMFs: let be the prediction for the -th component,
characterized by larger values of amplitude, and, thus, can where . We can re-assemble the predicted values
be considered the most informative components; in the SVP feed-forward phase, obtaining:
2) other IMFs contain high frequency components and need
more sifting iterations, which cause a smoothing in the am- (13)
plitude.
GHELARDONI et al.: ENERGY LOAD FORECASTING USING EMPIRICAL MODE DECOMPOSITION AND SUPPORT VECTOR REGRESSION 553
(16)
(15)
From the point of view of the computational effort, needed to
forecast the load values in the long-term framework, it is worth
The final training set for the A-SVR learning is generated
underlining that:
by juxtaposing the two time series and , that is • the training phase of the procedure is more
. In other words, the inputs are the pe- computationally intensive than other state-of-the-art tech-
riodic time-steps, while the targets consist in the output of niques, proposed in the short-term framework. This is
the B-IMFs re-assembling. From the A-SVR learning phase, mainly due to both the time required by the sifting process
we obtain a weighted-average over the user-defined periods of EMD to identify the IMFs, and the resources needed by
of the behavioral time series (see also (14)). We define the R-SVR and the A-SVR training procedures. How-
this procedure as approximating Support Vector regression for ever, the models are usually built offline, while the online
B-IMFs prediction (SVB). The feed-forward phase consists in predictions are obtained by exploiting the feed-forward
simply mapping the corresponding time-step, according to the functions of the R-SVR and A-SVR regressors;
chosen , and exploit the SVB model to obtain the predicted • the online prediction phase of the SVP+SVB algorithm,
output. once the models are created, only requires that SVR
The main drawback of the SVB approach consists in the par- feed-forward functions [see (4)] are computed. In addition
tial smoothing of the amplitude values of the output with to being computationally non-intensive, the SVP+SVB
554 IEEE TRANSACTIONS ON SMART GRID, VOL. 4, NO. 1, MARCH 2013
TABLE I
EUNITE DATASET—ERROR AND CORRELATION ON THE TEST SET
cover more than 99% of the total energy and, thus, are classified
as P-IMFs.
Table I shows the results, where the proposed method
( approach) is validated against the ARMA, the
conventional (linear and Gaussian) R-SVR, the conventional
EMD-SVR technique of [21] and the predictions, obtained by
simply applying the SVP procedure. It is straightforward noting Fig. 2. The load consumption series, collected in an office building in Italy.
that:
• due to the error propagation effect, the non-linear Gaussian TABLE II
R-SVR is characterized by a performance, which is much OFFICE BUILDING DATASET—ERROR AND CORRELATION ON THE TEST SET
worse than the linear R-SVR. On the other hand, the linear
R-SVR and the ARMA model are not able to properly cap-
ture the characteristics of the series and, then, the predic-
tions rapidly diverge from the real values in the long-term
framework;
• due to its appealing properties, the EMD technique allows
to enhance the extrapolation capabilities of the R-SVR.
Even the conventional EMD-SVR approach of [21], which R-SVR is less affected by error propagation effects than in the
is targeted toward short-term forecasting problems, is char- previous experiments, the predictions obtained with ARMA
acterized by an acceptable long-term performance; and conventional (both linear and non-linear) R-SVR models
• despite neglecting local oscillations, the performance in- tend to diverge from the true values. On the contrary, the EMD
dexes of the SVP method underline the validity of this ap- procedure allows to remarkably improve the performance
proach, especially when one is interested in studying only of the Recursive Support Vector Regression, even when the
the long-term trend of a series; short-term EMD-SVR technique of [21] is exploited. Though
• the best performance is obtained by the proposed the SVP represents an appealing alternative when one is not
technique, which allows to improve by more than interested in capturing the local oscillations of a series, the best
20% the MAPE of the EMD-SVR approach [21]. performance is obtained by the proposed SVP+SVB method,
which allows to remarkably improve the results of the previ-
ously cited EMD-SVR short-term approach.
C. Office Building Dataset
Further experimental results have been obtained on a dataset, VII. CONCLUSION
where electricity load consumption values have been collected We presented in this paper a new approach to the long-term
for two years (2010–2011) in an office building in Italy. Twelve load forecasting problem. In particular, the proposed method al-
measurements per day have been recorded, thus the whole lows to perform a long-term prediction of energy consumption
dataset, analogously to the previous one, consists of 8760 time series up to one year in the future with a hourly time scale
samples: one year (2010) was used as a training set, while the and, thus, represents an appealing solution when the scheduling
second year (2011) was used as a test set. The whole series is of operative conditions of the energy distribution network re-
shown in Fig. 2. quires that a wider time-horizon is taken into account.
By applying the EMD procedure to the series used as training The proposed approach exploits the Empirical Mode Decom-
set, we obtain 11 IMFs and the residue, which will be considered position and ad hoc techniques, based on Support Vector Re-
as IMF for the sake of simplicity. Once ranked, it is possible to gression, in order to obtain an accurate long-term estimation of
show that the residue and the IMF cover more than the the load: in particular, the SVP procedure allows to estimate the
99% of the total energy and, thus, are classified as P-IMFs. P-IMFs, which describe the general trend of the series, while the
Then, we apply the SVP and the SVB procedures and we SVB method models the B-IMFs, which give information re-
re-assemble the predictions. We compare the performance of the garding local characteristics of the data. The proposed approach
proposed SVP+SVB approach (and of the forecasting, obtained has been tested against some conventional state-of-the-art tech-
by only exploiting the SVP procedure) against four state-of- niques on both a public-domain and an office building dataset,
the-art techniques for load forecasting as in the previous exper- where electricity load demand values are provided: the experi-
iments. mental results clearly show that the SVP+SVB algorithm allows
Table II presents the results, obtained on the test set: anal- to obtain remarkable improvements in the long-term framework
ogous conclusions can be drawn. In fact, though Gaussian [4], [16].
556 IEEE TRANSACTIONS ON SMART GRID, VOL. 4, NO. 1, MARCH 2013