Vous êtes sur la page 1sur 8

IEEE TRANSACTIONS ON SMART GRID, VOL. 4, NO.

1, MARCH 2013 549

Energy Load Forecasting Using Empirical Mode


Decomposition and Support Vector Regression
Luca Ghelardoni, Student Member, IEEE, Alessandro Ghio, Member, IEEE, and
Davide Anguita, Senior Member, IEEE

Abstract—In this paper we focus our attention on the long-term The energy load is characterized by complex and highly
load forecasting problem, that is the prediction of energy consump- non-linear implicit relationships among several factors, such as
tion for several months ahead (up to one or more years), useful in
order to ease the proper scheduling of operative conditions (such as weather, seasonality, climatic conditions, and past usage energy
the planning of fuel supply). While several effective techniques are utilization trend. Due to the intrinsic difficulties of the load esti-
available in the short-term framework, no reliable methods have mation, many papers exist which face the problem of short-term
been proposed for long-term predictions. For this purpose, we de-
scribe in this work a new procedure, which exploits the Empirical forecasting, that is the (e.g., hourly) prediction of electricity
Mode Decomposition method to disaggregate a time series into two demand for successive time-steps ranging from one hour to few
sets of components, respectively describing the trend and the local days ahead [7]–[11]. In this framework, the techniques based
oscillations of the energy consumption values. These sets are then
used for training Support Vector Regression models. The experi- on Auto-Regressive (AR) models [12], Neural Networks (NN)
mental results, obtained both on a public-domain and on an office [13]–[16], and Support Vector Regression (SVR) [17], [18]
building dataset, allow to validate the effectiveness of the proposed often guarantee a satisfying degree of accuracy. In particular,
method.
the SVR-based approach [19], [20] is attractive due to its adapt-
Index Terms—Empirical mode decomposition, load forecasting, ability and robustness in dealing with non-linear phenomena
support vector regression.
generating the data. Moreover, while NNs often get trapped
in local minima during the training phase, the SVR learning
I. INTRODUCTION problem corresponds to the minimization of a convex cost
function, thus it guarantees that the global minimum is found.
T HE PROBLEM OF load forecasting represents a crucial
issue for operational planners and researchers, thus it has
captured the attention of both academy and power industry in
However, even the SVR partially neglects the underlying
phenomena originating the data (e.g., local characteristics of
the time series, such as peaks at particular hours of the day),
the last years. As a matter of fact, energy suppliers are facing an
especially when highly non-linear and non-stationary series are
ever greater competition, so that factors like quality and conti-
considered. In this framework, preprocessing decomposition
nuity of the offered service, long-term maintenance scheduling,
methods can help the SVR, as they allow to determine char-
and optimization of the dispatched power flow, must be properly
acteristic time/frequency scales for the series and emphasize
taken into account. These claims are validated by the growing
the local trends characterizing the data. In particular, if the
interest in smart grids [1], [2], which exploit a large amount of
exploited decomposition technique satisfies the conditions of
new technologies in power generation, transmission, distribu-
adaptiveness (i.e., the method is data-driven), locality (i.e.,
tion, and utilization [1], [3], [4] as well as advanced computa-
local properties of the oscillatory modes are emphasized) and
tional methods [5] in order to achieve optimization of proper
completeness (i.e., the original signal can be exactly recon-
power management and energy saving.
structed by re-assembling the components), it is possible to
It is straightforward noting that the quality of the load esti-
show how SVR models, trained using the decomposed signals,
mation has a remarkable impact on the economic operation of
result to be more effective than regressors, trained using the
the power distribution system, as many decisions can be based
raw time series [21].
on the predicted values: for example, the energy demand can be
Several decomposition methods have been proposed in the
aprioristically evaluated, thus allowing to relieve possible con-
literature [22]:
flicts with the energy supply (e.g., due to an insufficient fuel pro-
• Fourier Series [23] are widely used in signal processing,
vision). The importance of such accurate forecasts will further
but are most useful when the underlying process is linear
increase in the near future, because of the remarkable changes
and the time series is stationary;
occurring in the structure of power industry (e.g., due to dereg-
• Wavelet analysis [24], despite being attractive because of
ulation) [6].
the locality of the approach and the uniform temporal res-
olution used for all the frequency scales, is non-adaptive
Manuscript received August 26, 2011; revised January 11, 2012, July 27,
and, then, is most useful when characterizing gradual fre-
2012, October 13, 2012; accepted December 09, 2012. Date of publication Feb-
ruary 06, 2013; date of current version February 27, 2013. Paper no. TSG- quency changes in the time series;
00407-2011. • Principal Component Analysis (PCA) [25] is an adap-
The authors are with the DITEN—University of Genoa, Via Opera
tive approach, but unfortunately the distribution of
Pia 11A, I-16145 Genoa, Italy (e-mail: Luca.Ghelardoni@unige.it;
Alessandro.Ghio@unige.it; Davide.Anguita@unige.it). the eigenvalues does not allow to obtain characteristic
Digital Object Identifier 10.1109/TSG.2012.2235089 time/frequency scales.

1949-3053/$31.00 © 2013 IEEE


550 IEEE TRANSACTIONS ON SMART GRID, VOL. 4, NO. 1, MARCH 2013

Empirical Mode Decomposition (EMD) [26], [27], instead, is properties, intrinsic to electric load patterns, are forecasted by
particularly suitable as a preprocessing method in non-linear exploiting the cyclicities and the capacity of EMD to accurately
and non-stationary series analysis: a signal is detrended and, capture the underlying characteristics of the observed time se-
then, decomposed into a small number of Intrinsic Mode Func- ries for training the SVR models. The outputs of the two models
tions (IMFs). While the extracted trend describes the general are then re-assembled to obtain the load prediction, thanks to the
behavior of the time series, the IMFs obey to the requirements appealing completeness property of EMD.
of completeness, locality, and adaptiveness, which make EMD The paper is organized as follows: in Section II we briefly
an appealing decomposition method. In fact, the IMFs are mod- revise the Support Vector Machine for Regression algorithm,
ulated both in amplitude and frequency, in order to capture the while in Section III we present the Empirical Mode Decompo-
local characteristics of the time-varying signal (e.g., the infor- sition; then, in Section IV, we present the combined EMD-SVR
mation concerning higher order statistical moments are kept in approach of [21], which is particularly suitable for short-term
the set of components). Due to its properties, EMD has been ef- forecasting applications. Section V presents the proposed
fectively used as a preprocessing step for SVR in the short-term method, targeted towards the long-term prediction framework.
framework [21]: once the trend and the IMFs are extracted, SVR Finally, in Section VI, we present some experimental results
models are trained on each of these components and used for on real-world energy consumption series, which show the
predicting values in the near future. effectiveness of our method.
However, the scheduling of many operative conditions of the
energy distribution network requires that a wider time-horizon II. SUPPORT VECTOR MACHINE FOR REGRESSION AND TIME
is taken into account [28]: for example, load forecasting is im- SERIES PREDICTION
portant for contract evaluations and preparation of various so-
phisticated financial products on the energy liberalized market A. Support Vector Regression
[29]. Furthermore, the ability of performing accurate medium-
term (i.e., the hourly forecasting of electricity demand for some Let us consider a set of training data
days up to few months ahead) and long-term (i.e., the hourly , where, in general, ,
load forecasting for several months up to one or more years and . When targeting a regression problem, our objective
ahead) predictions is propaedeutic for the success of a utility is to find the best approximating function which, in prin-
company [28], as it allows to timely implement crucial opera- ciple, should not be far from the target function more than
tional decisions leading to the improvement of network relia- a user-defined , that is .
bility and to the reduced occurrences of equipment failures and By applying a map , with , to the patterns
blackouts [30]. For such purposes, the mere estimation of the so that, in the new space, the problem can be more efficiently
daily peak load [31] gives insufficient information to the electric solved by linear regression, we get the following Support Vector
network manager; at the same time, the long-term forecasting1, Regression (SVR) formulation [20] for training a model:
obtained using the conventional short-term techniques [32], is
often unreliable because of the error propagation in the estima-
tion of the load values [33]. Unfortunately, this issue afflicts
the conventional SVR-based and the combined EMD-SVR ap-
proaches too: in fact, high frequency components, which are
not properly captured by SVR [33], are typical of energy load
time series and are preserved in the decomposed IMFs, due
(1)
to the properties of locality and adaptiveness of EMD. More-
over, the exploitation of exogenous inputs to improve the fore-
casting, such as economic growth and weather/seasonal het- where is a hyperparameter, which must be properly tuned,
erogeneous parameters (e.g., temperature, humidity index, or and is the kernel function. In regres-
wind-chill index [17]), is not an option either [33], as predicting sion problems, two of the most common kernel functions are the
these factors in the long-term framework is, at least, as com- linear and the Gaussian ones of (2) and (3) [34]:
plex as the energy load forecasting problem itself. In fact, to the
authors best knowledge, no effective approaches have been pro- (2)
posed so far to target long-term forecasting problems. (3)
In this paper, we present a new technique, which allows to
address the long-term load forecasting problem, by splitting it Once the training phase is concluded and a model is iden-
into two sub-problems: on one hand, the IMFs, which better de- tified, the SVR feed-forward function can be evaluated as fol-
scribe the long-term trend of the time series, are selected and lows:
predicted by using a generalization of the EMD-SVR approach
of [21]; on the other hand, the components that describe the local (4)
1Since, for the scopes of our work, medium- and long-term predictions are
characterized by similar difficulties (though the latter one is even more chal-
lenging), they are typically likened: thus, for the sake of simplicity in this paper where the bias can be derived when Problem (1) is solved [19]
we will only distinguish between short- and long-term load forecasting methods. from the Karush-Kuhn-Tucker (KKT) conditions.
GHELARDONI et al.: ENERGY LOAD FORECASTING USING EMPIRICAL MODE DECOMPOSITION AND SUPPORT VECTOR REGRESSION 551

B. Support Vector Regression for Time Series Analysis In the following, one state-of-the-art algorithm to implement
the EMD, namely the technique based on the sifting process, is
Let be a time series, where the continuous presented, which allows to disassemble the original signal into
signal , observed for a time-interval , is sampled at the IMFs components [36].
associated time-steps2 . Two possibilities exist
for using SVR in the time series prediction problem. A. The Sifting Process
The first alternative consists in creating a set ,
based on , where and is The sifting process consists in four main steps:
a user-defined parameter, which corresponds to the number of 1) Identify all the extrema in the original series ;
previous time-steps, considered for the regression problem [35]. 2) Interpolate the local maxima to obtain the upper envelope
can be thus exploited for training an SVR model. During the of the data: for this purpose, cubic splines are used as they
feed-forward phase, when future values of the time series must guarantee trade-off between the interpolating capabilities
be estimated, the outputs of the model for the previous steps of the curve and the computational time, needed to identify
are recursively used as inputs for the successive forecasts. Thus the coefficients of the function;
, during the forecasting phase, will also contain the predicted 3) Repeat the previous step for interpolating the local minima
values : for this reason, we define this procedure as Recur- to obtain the lower envelope;
sive-SVR (R-SVR). 4) Compute the mean of the two envelopes and the
A second alternative for using SVR models in the time series function .
prediction problem consists in exploiting the time-steps as Ideally, satisfies the properties of an IMF and the process
inputs. In this case, an SVR regressor is learnt on a training set is over; in many practical cases, is not an IMF, thus the
, obtained by juxtaposing the two sets and : previous procedure must be reiterated further by using the func-
in other words, an approximation of the function is obtained tion in place of the original one and the following
in the time-interval . During the feed-forward phase, the result is obtained:
outputs of the model are simply obtained using time-steps as
inputs: thus, we define this procedure as Approximating-SVR (5)
(A-SVR).
The R-SVR procedure is most useful when values must be In practice, this procedure is usually reiterated up to times,
predicted outside the time-interval of observation and re- even though at some iterations the IMF properties are satisfied:
sults to be remarkably effective when low frequency compo-
(6)
nents characterize the samples [21]. On the contrary, due to its
scarce extrapolation capabilities, A-SVR is seldom used in time In fact, the main objectives of the sifting process are to eliminate
series problems, though it represents a valuable alternative to riding waves and to improve symmetry with respect to zero: ide-
R-SVR when the time-interval is fixed and high frequency os- ally, the best possible solution is thus obtained for large values of
cillations afflict the data. . Unfortunately, the iteration of the procedure has a side-effect:
the amplitude of the IMF-candidate function is smoothed
III. EMPIRICAL MODE DECOMPOSITION at every iteration. In other words, a proper stopping criterion
must be defined. For this purpose, an index has been proposed
The Empirical Mode Decomposition (EMD) technique [26], in [37]: a preselected number is aprioristically defined and the
[27] consists in decomposing a signal into a sum of func- iterative procedure is stopped after times the result function
tions, such that each of them satisfies two conditions: respects the IMF properties. No rigorous methods have been
1) The number of zero crossings must be equal to or differ proposed for properly selecting the value of ; however, from
from the number of extrema by no more than one; a practical point of view, is usually considered an
2) The mean value of the two envelopes, defined by interpo- effective choice [37].
lating, respectively, the local maxima and minima of , After the stopping criterion is satisfied, results to be
must be equal to zero. the first component of the EMD:
If both these properties hold, the decomposition is complete,
local, and adaptive [26]. Waveforms, satisfying the two previous (7)
conditions, give information concerning the intrinsic oscillation
modes of the data and, thus, are called Intrinsic Mode Functions Further decomposed signals can be obtained by re-sifting the
(IMFs). The IMFs represent oscillatory modes, but are much residue, computed as
more general than harmonic functions. In fact, it is worth noting
that the IMFs are modulated both in amplitude and in frequency (8)
and, consequently, are not restricted to be stationary. A pure
amplitude or frequency modulated signal represents a particular This procedure is repeated to find all the IMFs, i.e., until the
case of IMF. residue is characterized by less than two extrema and then
cannot be further decomposed. It is worth noting that, as candi-
2Note that the time-steps can also contain information concerning special
date functions which include high frequency components
days and holiday events: as these exogenous inputs are deterministic, they must
not be predicted and thus can be safely exploited also in the long-term frame- often require a larger number of steps to converge to IMFs, their
work. amplitude is usually more smoothed.
552 IEEE TRANSACTIONS ON SMART GRID, VOL. 4, NO. 1, MARCH 2013

From the definition of , the obtained IMFs can be re-as- From a qualitative point of view, the first set of IMFs, that we
sembled in order to obtain the original data define Principal-IMFs (P-IMFs), describes the long-term trend
of the time series, while the second set of IMFs, namely the Be-
(9) havioral-IMFs (B-IMFs), captures local features of the signal
(e.g., peak values or oscillations due to environmental factors).
In order to obtain a reliable prediction, the long-term load fore-
Note that the IMFs have local mean equal to zero, so the last casting method should contemporaneously allow to:
residue represents the trend (i.e., the longest period com- • Approximate the P-IMFs, where high frequency oscilla-
ponent) of the original signal. tions are usually absent;
• Approximate the B-IMFs, which are characterized by high
frequency components, through the exploitation of the in-
IV. THE COMBINED EMD-SVR APPROACH
formation regarding the cyclicities (embedded in them).
In this section, we briefly revise the combined EMD-SVR
approach, presented in [21] for the short-term load forecasting A. Identification of the Most Informative IMFs
problem. Given a time series of elements, the training phase In order to rank the most informative IMFs, once the EMD
of the method consists of few steps: procedure ends, we can make use of conventional techniques,
1) EMD is applied to identify the IMFs, defined as exploited in other decomposition methods as well (such as the
, for . For the sake of sim- PCA [25]), which allow to compute the energy associated to
plicity, let be the residue (i.e., the trend); each component. In the case of the IMFs and the residue, we
2) The number of previous values to be considered for define the overall energy of the whole time series as
R-SVR (see Section II-B) is defined by the user (or chosen
via a conventional model selection procedure [38], such as (10)
K-fold Cross Validation [39]);
3) The datasets for the R-SVR training are The percentage of energy associated to the -th IMF is:
created: , where
;
4) R-SVR models are trained on the IMFs % (11)
and the trend.
During the feed-forward phase, the prediction is obtained by where and the -th component rep-
re-assembling the values, predicted by the R-SVR resents the residue (which, for the sake of simplicity, is likened
models, analogously to (9). to an IMF). Once the values of energy are computed for the
Due to the appealing properties of EMD, R-SVR models, IMFs and the residue, we can simply rank the components in
trained on the IMFs, are usually effective and allow to obtain descending order and select the first IMFs which guarantee a
good results in the short-term framework (i.e., some hours up cumulative energy level greater than a user-defined and appli-
to few days ahead) [21]. Unfortunately, its generalization to the cation-dependent value % %
long-term analysis is not effective, mainly due to the limited
capabilities of R-SVR in dealing with higher frequency compo- (12)
nents included in the original series and preserved in some of
the IMFs: as a matter of fact, error propagation in the predic-
tion can cause the forecasted load to rapidly diverge from the These first components constitute the set of P-IMFs, while the
real value when the time-horizon is extended to some weeks or remaining ones are classified as B-IMFs.
months [33]. In order to avoid these large errors, we should be
B. P-IMFs Prediction
able to capture possible “basic” cyclicities (e.g., due to season-
ality), which, by the way, are maintained in the identified IMFs Let us consider the P-IMFs. Our aim is to predict each series
by construction. In the next section, we present a generalization behavior by using R-SVR models. It is worth noting that, as high
of the EMD-SVR approach. frequency oscillations are usually absent in the P-IMFs, we can
safely use the technique, presented in [21] and summarized in
Section IV, for approximating them: this method allows us to
V. THE EMD-SVR APPROACH FOR LONG-TERM LOAD
achieve a small forecasting error even in a long-term prediction.
FORECASTING
In the following, we will refer to this procedure as recursive
Broadly speaking, we can divide the IMFs into two cate- Support Vector regression for P-IMFs prediction (SVP).
gories: During the training phase, R-SVR models are created for
1) some IMFs, which are identified in few sifting steps, are the P-IMFs: let be the prediction for the -th component,
characterized by larger values of amplitude, and, thus, can where . We can re-assemble the predicted values
be considered the most informative components; in the SVP feed-forward phase, obtaining:
2) other IMFs contain high frequency components and need
more sifting iterations, which cause a smoothing in the am- (13)
plitude.
GHELARDONI et al.: ENERGY LOAD FORECASTING USING EMPIRICAL MODE DECOMPOSITION AND SUPPORT VECTOR REGRESSION 553

C. B-IMFs Prediction respect to the original B-IMFs due to the weighted-av-


eraging operation: however, it is worth underlining that the
The B-IMFs usually contain higher frequency components.
B-IMFs are the least informative components and that the
Unfortunately, the presence of short period oscillations causes
trend is well approximated through the SVP technique of
the prediction with the conventional combined EMD-SVR ap-
Section V-B. The main advantage of this method consists in
proach to be afflicted by large errors because of the scarce ex-
the exploitation of the intrinsic features of the time series, such
trapolation capabilities, which are typical of the R-SVR models
as the cyclicities, which allows the A-SVR to suitably capture
in these cases. In order to circumvent this issue, we have to re-
high frequency components in the target series.
sort to A-SVR and to exploit additional and application-depen-
dent information.
D. The SVP SVB Algorithm
For training purposes, let us consider the ranked set of
IMFs and the residue (see Section V-A). As a first step, since The combined SVP+SVB algorithm for long-term prediction
our target is to avoid overfitting every single B-IMF, which puts together the different pieces of the complex puzzle of load
could cause error propagation problems, we create a new series forecasting.
by simply re-assembling the identified B-IMFs, As a first step, given a time series and an
that is associated set of time-steps , we create the
model during the training phase:
1) apply the EMD method to find the residue and the IMFs.
(14) The latter ones can be then divided into P-IMFs and
B-IMFs by exploiting the results of Section V-A;
2) apply the SVP method of Section V-B and find R-SVR
where the first ranked components (i.e., the P-IMFs) are not
models for the P-IMFs;
considered and is the -th element of the -th IMF.
3) apply the SVB method of Section V-C and find the A-SVR
As a second issue, it is well-known that in load forecasting
model which better approximate the summation of the
problems the energy demand has a periodic behavior which
B-IMFs [see (14)].
strongly depends on many factors [17]: for example, energy load
Once the models have been created and we want to estimate
series are characterized by periods which approximately equals
the output at a generic time-step in the future (where ),
weeks, months and seasons. In such problems, the time-step in-
we have to implement the following feed-forward prediction
formation associated to the original time se-
steps:
ries , as introduced in Section II, consists usually in an in-
1) apply the feed-forward functions of the SVP models
teger index: the first measure has index 1 and the last one has
to predict every output from up to , following the
index . In order to take into account the periodicity, which is
R-SVR procedure described in Section II-B. The outputs of
intrinsic to the energy time series, a proper mapping
these models are re-assembled in the SVP prediction ,
must be applied, so that the index is normalized with reference
as shown in (13);
to the value of the period. For example, sub-series, with
2) map , according to the function defined during the
months, of elements (the number of measures-per-month) are
training phase;
created, which are characterized by an analogous behavior. In
3) apply the feed-forward function of the SVB model to fore-
other words, and are such that . A new time series
cast the behavioral value at . The output of this model is
is created by applying the mapping to all
;
the values in
4) the final predicted output of the series at is computed as

(16)
(15)
From the point of view of the computational effort, needed to
forecast the load values in the long-term framework, it is worth
The final training set for the A-SVR learning is generated
underlining that:
by juxtaposing the two time series and , that is • the training phase of the procedure is more
. In other words, the inputs are the pe- computationally intensive than other state-of-the-art tech-
riodic time-steps, while the targets consist in the output of niques, proposed in the short-term framework. This is
the B-IMFs re-assembling. From the A-SVR learning phase, mainly due to both the time required by the sifting process
we obtain a weighted-average over the user-defined periods of EMD to identify the IMFs, and the resources needed by
of the behavioral time series (see also (14)). We define the R-SVR and the A-SVR training procedures. How-
this procedure as approximating Support Vector regression for ever, the models are usually built offline, while the online
B-IMFs prediction (SVB). The feed-forward phase consists in predictions are obtained by exploiting the feed-forward
simply mapping the corresponding time-step, according to the functions of the R-SVR and A-SVR regressors;
chosen , and exploit the SVB model to obtain the predicted • the online prediction phase of the SVP+SVB algorithm,
output. once the models are created, only requires that SVR
The main drawback of the SVB approach consists in the par- feed-forward functions [see (4)] are computed. In addition
tial smoothing of the amplitude values of the output with to being computationally non-intensive, the SVP+SVB
554 IEEE TRANSACTIONS ON SMART GRID, VOL. 4, NO. 1, MARCH 2013

prediction phase can be either easily parallelized on


multi-core CPUs or, if required by the application, prop-
erly speeded-up by exploiting special purpose hardware
devices [40].

VI. EXPERIMENTAL RESULTS


In this section, we test the SVP+SVB approach on two real-
world datasets, which are exploited to show the effectiveness
of the method and to compare its performance against the other
state-of-the-art techniques, available in literature.
In particular, as graphical results are not shown due to space
constraints, we use the Mean Absolute Error (MAE), the Mean
Absolute Percentage Error (MAPE), the Normalized Mean
Square Error (NMSE) and the Relative Error Percentage (REP)
as error indexes for the comparison [42]. Furthermore the
Pearson Product-Moment Correlation Coefficient (PPMCC) Fig. 1. Original EUNITE competition electric load time series.
is used, which allows to compute the correlation between the
forecasted and the true values:
in order to predict the load values for January 1999, thus a
short-term framework problem was targeted. On the contrary,
as we are interested in the long-term prediction problem, we
(17) consider the first 4380 values (which corresponds to the electric
load values recorded in 1997) as a training set, while the second
where is the number of test patterns, is the estimated half of the series (year 1998) is used as an independent test set.
output, and To the authors best knowledge, given the hourly resolution
targeted by our approach, no effective long-term prediction
techniques have been presented so far in literature, since several
(18)
proposals cope with the simpler problem of daily or weekly
load peaks forecasting. Thus, we compared the proposed
method with the state-of-the-art techniques for short-term
A. Experimental Setup load prediction, able to cope with high-resolution time series.
In particular, four different approaches have been taken into
The experimental setup for our method is the following: account for this purpose:
• We apply the EMD procedure to the series, extracting the • Auto-Regressive Moving-Average (ARMA) models: in
IMFs and the residue. The P-IMFs are selected so our application, both the AR and the MA orders of the
to include at least 99% of the overall energy of the IMFs forecaster have been properly tuned by using a KCV
( in (12)); procedure;
• In order to avoid possible numerical issues, the IMFs and • A standard R-SVR regressor, based on a linear kernel,
the residue are normalized so that most of the values lie in which represents a generalization of the conventional
the range ; ARMA models. The experimental setup is analogous to
• The SVP procedure of Section V-B is applied, where the one applied to our technique (see Section VI-A), with
R-SVR models are trained using a Gaussian kernel. The the exception that the hyperparameter is not used;
optimal R-SVR hyperparameter values are found using a • A standard R-SVR regressor, which exploits a Gaussian
conventional 10-fold Cross Validation procedure and a grid kernel and is learnt using the original training time series.
search, in order to test several values of and [38], In this case, the experimental setup is the same as the one
[39]. The searching space for the hyperparameter is set used for our method;
to [10, 1000]; • The models created by exploiting the EMD-SVR of [21].
• The SVB procedure of Section V-C is applied, where the In this case, a Gaussian R-SVR regressor is trained on each
experimental setup for A-SVR is analogous to the one for of the IMFs and the residue: the predictions are then re-as-
the SVP method. sembled, analogously to (9). Once again, we use the same
experimental setup as our proposed method.
B. EUNITE Dataset
By applying the EMD procedure, summarized in Section III,
In order to test the performance of the proposed technique on to the training series, we obtain 13 IMFs and the residue: as
real-world data, we exploit the public-domain dataset, used for already remarked in Section V-A, in the following we assimilate
the EUNITE competition [41], where electricity load demand the residue to the other IMFs for the sake of simplicity. Then,
values, recorded every two hours in Eastern Slovakia during we exploit (10) and (11) in order to rank the most informative
1997 and 1998, are provided. The whole time series, shown in components (i.e., the ones characterized by the largest values of
Fig. 1, consists of 8760 samples, 12 for each day of the year. amplitude). It is possible to show that the residue (for which
For the competition, the whole 1997–1998 set had to be used %, according to (11)) and the IMF %
GHELARDONI et al.: ENERGY LOAD FORECASTING USING EMPIRICAL MODE DECOMPOSITION AND SUPPORT VECTOR REGRESSION 555

TABLE I
EUNITE DATASET—ERROR AND CORRELATION ON THE TEST SET

cover more than 99% of the total energy and, thus, are classified
as P-IMFs.
Table I shows the results, where the proposed method
( approach) is validated against the ARMA, the
conventional (linear and Gaussian) R-SVR, the conventional
EMD-SVR technique of [21] and the predictions, obtained by
simply applying the SVP procedure. It is straightforward noting Fig. 2. The load consumption series, collected in an office building in Italy.
that:
• due to the error propagation effect, the non-linear Gaussian TABLE II
R-SVR is characterized by a performance, which is much OFFICE BUILDING DATASET—ERROR AND CORRELATION ON THE TEST SET
worse than the linear R-SVR. On the other hand, the linear
R-SVR and the ARMA model are not able to properly cap-
ture the characteristics of the series and, then, the predic-
tions rapidly diverge from the real values in the long-term
framework;
• due to its appealing properties, the EMD technique allows
to enhance the extrapolation capabilities of the R-SVR.
Even the conventional EMD-SVR approach of [21], which R-SVR is less affected by error propagation effects than in the
is targeted toward short-term forecasting problems, is char- previous experiments, the predictions obtained with ARMA
acterized by an acceptable long-term performance; and conventional (both linear and non-linear) R-SVR models
• despite neglecting local oscillations, the performance in- tend to diverge from the true values. On the contrary, the EMD
dexes of the SVP method underline the validity of this ap- procedure allows to remarkably improve the performance
proach, especially when one is interested in studying only of the Recursive Support Vector Regression, even when the
the long-term trend of a series; short-term EMD-SVR technique of [21] is exploited. Though
• the best performance is obtained by the proposed the SVP represents an appealing alternative when one is not
technique, which allows to improve by more than interested in capturing the local oscillations of a series, the best
20% the MAPE of the EMD-SVR approach [21]. performance is obtained by the proposed SVP+SVB method,
which allows to remarkably improve the results of the previ-
ously cited EMD-SVR short-term approach.
C. Office Building Dataset
Further experimental results have been obtained on a dataset, VII. CONCLUSION
where electricity load consumption values have been collected We presented in this paper a new approach to the long-term
for two years (2010–2011) in an office building in Italy. Twelve load forecasting problem. In particular, the proposed method al-
measurements per day have been recorded, thus the whole lows to perform a long-term prediction of energy consumption
dataset, analogously to the previous one, consists of 8760 time series up to one year in the future with a hourly time scale
samples: one year (2010) was used as a training set, while the and, thus, represents an appealing solution when the scheduling
second year (2011) was used as a test set. The whole series is of operative conditions of the energy distribution network re-
shown in Fig. 2. quires that a wider time-horizon is taken into account.
By applying the EMD procedure to the series used as training The proposed approach exploits the Empirical Mode Decom-
set, we obtain 11 IMFs and the residue, which will be considered position and ad hoc techniques, based on Support Vector Re-
as IMF for the sake of simplicity. Once ranked, it is possible to gression, in order to obtain an accurate long-term estimation of
show that the residue and the IMF cover more than the the load: in particular, the SVP procedure allows to estimate the
99% of the total energy and, thus, are classified as P-IMFs. P-IMFs, which describe the general trend of the series, while the
Then, we apply the SVP and the SVB procedures and we SVB method models the B-IMFs, which give information re-
re-assemble the predictions. We compare the performance of the garding local characteristics of the data. The proposed approach
proposed SVP+SVB approach (and of the forecasting, obtained has been tested against some conventional state-of-the-art tech-
by only exploiting the SVP procedure) against four state-of- niques on both a public-domain and an office building dataset,
the-art techniques for load forecasting as in the previous exper- where electricity load demand values are provided: the experi-
iments. mental results clearly show that the SVP+SVB algorithm allows
Table II presents the results, obtained on the test set: anal- to obtain remarkable improvements in the long-term framework
ogous conclusions can be drawn. In fact, though Gaussian [4], [16].
556 IEEE TRANSACTIONS ON SMART GRID, VOL. 4, NO. 1, MARCH 2013

ACKNOWLEDGMENT [26] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.


C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition
The authors would like to thank Prof. Stefano Massucco and the Hilbert spectrum for nonlinear and non-stationary time series
(DITEN, University of Genoa) and his laboratory for providing analysis,” in Proc. Roy. Soc. Lond. A, 1998, vol. 454, pp. 903–995.
[27] N. E. Huang, “Empirical mode decomposition for analyzing acoustic
one of the datasets used for the experiments. signal,” U.S. Patent 10-073857 pending, 2003.
[28] R. Ruffing and G. K. Venayagamoorthy, “Short to medium range time
series prediction of solar irradiance using an echo state network,” in
REFERENCES Proc. Intell. Syst. Appl. Power Syst., Curitiba, Brazil, 2009, pp. 1–6.
[1] G. W. Arnold, “Challenges and opportunities in smart grid: A position [29] P. J. Santos, A. G. Martins, A. J. Pires, J. F. Martins, and R. V. Mendes,
article,” Proc. IEEE, vol. 99, pp. 922–927, 2011. “Short-term load forecast using trend information and process recon-
[2] Public Law 110–140—Energy Independence and Security Act, U.S. struction,” Int. J. Energy Res., vol. 30, pp. 811–822, 2006.
Government, 2007. [30] E. A. Feinberg and D. Genethliou, “Load forecasting,” Appl. Math.
[3] U.S. Department of Energy, “The smart grid: An introduction” Restructured Elect. Power Syst., pp. 269–285, 2005.
[Online]. Available: http://www.oe.energy.gov/SmartGridIntroduc- [31] H. K. Mohamed, S. M. El-Debeiky, H. M. Mahmoud, and K. M. El
tion.htm 2008 Destawy, “Data mining for electrical load forecasting in egyptian
[4] G. K. Venayagamoorthy, “Dynamic, stochastic, computational, and electrical network,” in Proc. Int. Conf. Comput. Eng. Syst., 2006, pp.
scalable technologies for smart grids,” IEEE Comput. Intell. Mag., 460–465.
vol. 6, no. 3, pp. 22–35, Aug. 2011. [32] M. N. Maralloo, A. R. Koushki, C. Lucas, and A. Kalhor, “Long term
[5] G. K. Venayagamoorthy, “Potentials and promises of computational in- electrical load forecasting via a neurofuzzy model,” in Proc. Int. CSI
telligence for smart grids,” in Proc. IEEE Power Gen. Soc. Gen. Meet., Comput. Conf., 2009, pp. 35–40.
Calgary, AB, Canada, 2009. [33] B. J. Chen, M. W. Chang, and C. J. Lin, “Load forecasting using sup-
[6] K. LaCommare and K. J. Eto, “Understanding the cost of power in- port vector machines: A study on EUNITE competition 2001,” IEEE
terruptions to U.S. electricity customers”. Berkeley, CA, Lawrence Trans. Power Syst., vol. 19, pp. 1821–1830, 2004.
Berkeley National Laboratory, LBNL-55718, 2004. [34] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[7] N. Amjady, F. Keynia, and H. Zareipour, “Short-term load forecast of [35] F. Takens, , D. A. Rand and L. S. Young, Eds., “Detecting strange at-
microgrids by a new bilevel prediction strategy,” IEEE Trans. Smart tractors in turbulence,” in Dynamical Systems and Turbulence. New
Grid, vol. 1, no. 3, pp. 286–294, 2010. York: Springer-Verlag, 1981, pp. 366–381.
[8] J. Q. Li, C. L. Niu, J. Z. Liu, and J. J. Gu, “The application of data [36] I. Magrin-Chagnolleau and R. G. Baraniuk, “Empirical mode decom-
mining in electric short-term load forecasting,” in Proc. IEEE Int. Conf. position based time-frequency attributes,” in Proc. 69th SEG Meet.,
Fuzzy Syst. Knowl. Discovery, 2008, pp. 519–522. Houston, TX, USA, 1999, pp. 1949–1952.
[9] F. Martinez-Alvarez, A. Troncoso, J. C. Riquelme, and J. S. Aguilar- [37] N. E. Huang, M. C. Wu, S. R. Long, S. S. P. Shen, W. Qu, P. Gloersen,
Ruiz, “LBF: A labeled-based forecasting algorithm and its application and K. L. Fan, “A confidence limit for empirical mode decomposition
to electricity price time series,” in Proc. IEEE Int. Conf. Data Mining, and Hilbert spectral analysis,” in Proc. Roy. Soc. Lond. A, 2003, vol.
2008, pp. 453–461. 459, pp. 2317–2345.
[10] D. X. Niu and Y. L. Wang, “Support vector machines based on data [38] C. W. Hsu, C. C. Chang, and C. J. Lin, “A practical guide to support
mining technology in power load forecasting,” in Proc. IEEE Int. Conf. vector classification,” Dept. of Computer Science, National Taiwan
Wireless Commun., Netw., Mobile Comput., 2007, pp. 5373–5376. University, Tech. Rep., 2003.
[11] H. Mao, X. J. Zeng, G. Leng, Y. J. Zhai, and J. A. Keane, “Short- [39] D. Anguita, A. Ghio, S. Ridella, and D. Sterpi, “K-fold cross validation
term and midterm load forecasting using a bilevel optimization model,” for error rate estimate in support vector machines,” in Proc. DMIN Int.
IEEE Trans. Power Syst., vol. 24, no. 2, pp. 1080–1090, 2009. Conf. Data Mining, Las Vegas, NV, USA, 2009.
[12] C. Chatfield, The Analysis of Time Series: An Introduction. London, [40] D. Anguita, A. Ghio, S. Pischiutta, and S. Ridella, “A support vector
U.K.: Chapman & Hall/CRC, 2003. machine with integer parameters,” Neurocomputing, vol. 72, pp.
[13] T. Hill, M. O’Connor, and W. Remus, “Neural network models for time 480–489, 2008.
series forecasts,” Manage. Sci., vol. 42, pp. 1082–1092, 1996. [41] European Network on Intelligent Technologies for Smart Adaptive
[14] X. Cai, N. Zhang, G. K. Venayagamoorthy, and D. C. Wunsch, Systems [Online]. Available: http://www.eunite.org [Online]. Avail-
II, “Time series prediction with recurrent neural networks using a able: http://neuron.tuke.sk/competition/
hybrid PSO-EA algorithm,” Neurocomputing, vol. 70, no. 13–15, pp. [42] E. E. Elattar, J. Goulermas, and Q. H. Wu, “Electric load forecasting
2342–2353, Aug. 2007. based on locally weighted support vector regression,” IEEE Trans.
[15] R. L. Welch, S. M. Ruffing, and G. K. Venayagamoorthy, “Compar- Syst., Man, Cybern. C, Appl. Rev., vol. 40, pp. 438–447, Jul. 2010.
ison of feedforward and feedback neural network architectures for short
term wind speed prediction,” in Proc. IEEE Int. Joint Conf. Neural Luca Ghelardoni was born in Genova, Italy, in 1984. He received the M.S.
Netw., Atlanta, GA, USA, Jun. 2009, pp. 3335–3340. degree in electronic engineering from the University of Genova in 2009. He
[16] B. Luitel and G. K. Venayagamoorthy, “Decentralized asynchronous is currently working toward the Ph.D. degree in “knowledge and information
learning in cellular neural networks,” IEEE Trans. Neural Netw. Learn. science” at the University of Genova.
Syst., vol. 23, no. 11, pp. 1755–1766, Nov. 2012. He is currently working in the development of data analysis and time series
[17] D. Niu, Y. Wang, and D. D. Wu, “Power load forecasting using support prediction techniques for industrial applications. His main research activities
vector machine and ant colony optimization,” Expert Syst. Appl., vol. concern machine learning approaches for time series prediction.
37, pp. 2531–2539, 2010.
[18] Y. C. Guo, “Knowledge-enabled short-term load forecasting based on
pattern-base using classification & regression tree and support vector Alessandro Ghio was born in Chiavari, Italy, in 1982. He received the M.S.
regression,” in Proc. IEEE Int. Conf. Natural Comput., 2009, pp.
degree in electronic engineering and the Ph.D. degree in “‘knowledge and in-
425–429.
formation science’” from the University of Genova, Italy, in 2010.
[19] V. Vapnik, “An overview of statistical learning theory,” IEEE Trans.
Neural Netw., vol. 10, pp. 988–999, 1999. His main research activities concern both theoretical and practical aspects
[20] A. Smola and B. Scholkopf, “A tutorial on support vector regression,” of smart systems based on computational intelligence and machine learning
Stat. Comput., pp. 199–222, 2004. methods. Currently, he also works as an IT consultant.
[21] C. J. Yu, Y. Y. He, and T. F. Quan, “Frequency spectrum prediction
method based on EMD and SVR,” in Proc. Int. Conf. Intell. Syst. De-
sign Appl., 2008, pp. 39–44. Davide Anguita (S’93–M’95) received the Laurea degree in electronic engi-
[22] R. Liu, “Empirical mode decomposition: A useful technique for neu- neering and the Ph.D. degree in computer science and electronic engineering
roscience?,” Comput. J. Club, 2002. from the University of Genova, Italy, in 1989 and 1993, respectively.
[23] S. Papoulis, The Fourier Transform and Its Applications. New York: After working as a Research Associate at the International Computer Science
McGraw Hill, 1960. Institute, Berkeley, CA, on special-purpose processors for neurocomputing, he
[24] J. J. Benedetto and M. Frazier, Wavelets: Mathematics and Applica- joined the Department of Biophysical and Electronic Engineering (DIBE, now
tions. Boca Raton, FL, USA: CRC, 1994. DITEN Dept.), University of Genova, where he is currently an Associate Pro-
[25] I. T. Jolliffe, Principal Component Analysis. New York: Springer, fessor of Smart Electronic Systems. His current research focuses on the theory
2002. and application of kernel methods and artificial neural networks.

Vous aimerez peut-être aussi