Académique Documents
Professionnel Documents
Culture Documents
of partition rules developed from explanatory variables. The computed. This way of estimating probabilities does not
output variable is classified by the explanatory variables rules requires the assumption of specific probability distributions for
that gives the maximum quality to the decision tree. This the variables, which represents an important advantage for this
particular model has been applied in a creative way to different methodology.
purposes with success. It has been successfully employed in a Pattern 1
ExP1
number of steady state security assessment applications ρ P1 = Pattern 2
[10,14-19], financial applications [20] or electricity market
∑ Ex p
p
Pattern 3
applications [21-23]. The main advantage of decision trees is
Pattern 4
the easy interpretability of the results and the supply of
Pattern 5
probability values without assuming normal distributions. ExP1
This paper summarizes three different applications, Pattern 6
developed in the Spanish electricity market context, that the ∑ Ex
p
p Pattern 7
authors have faced successfully using decision trees two Pattern 8
different issues with encouraging results. The objective of the Fig 1: Structure of a decision tree node and computation of each pattern
paper is pointing the great versatility of decision trees and their probability.
adequacy to be applied to very diverse different real To illustrate the use of a decision tree in an estimation
applications, being very interpretable and suitable for process, a virtual simplified decision tree (Fig 2) composed of
probabilistic approaches. In the first application, decision trees seven nodes, has been built to forecast an objective variable
have been used to predict stochastic residual demand curves in that can take 8 different patterns.
Pattern 1
the Spanish electricity market. Secondly, a methodology based Pattern 2
Pattern 4
Finally, decisions trees have also been applied to predict the Pattern 5
II. REVIEW OF DECISION TREES The first node contains all the learning set. The first
separation is done using the value of DGI explanatory
A decision tree is a classification data-mining tool aimed to
variable. Patterns of the objective variable 2, 3, 6, 7 and 8 go
extract useful information contained in large data sets and so it
to the left branch. The rest of the patterns are classified in the
can be used to help in decision-making processes. A decision
right one. Both left and right branches have one or more
tree is made up of a set of nodes that classify the past
separation rules. The left branch separates using only the DGI
realizations of an objective variable. Each classification is
explanatory variable. The right branch uses in addition the
achieved by separation rules according to the numerical or
value of the line status explanatory variable. As can be seen in
categorical values of the explanatory variables. The
Fig 2, the patterns are successively classified until a final node
classification rules of each node are derived from a
is reached. The final nodes of the tree usually contain more
mathematical process that minimizes the impurity of the
than one possible pattern. It is not convenient to force the tree
resulting nodes, using the available learning set. In this way,
to expand until all final nodes are pure (that is, containing a
by evaluating the separation rules using the numerical or
single pattern), because the decision tree complexity highly
categorical values of the explanatory variables a final node is
increases with respect to the information provided by the
reached. In a final node, no more separation rules are applied.
learning set causing an over-fitted tree. Conversely, if the
The final nodes can be used to estimate the probability of each
number of nodes of the tree is very poor, the tree results over-
pattern. If the node depicted in Fig 1 is the reached final node,
smoothed: the model is not complex enough to capture the
pattern P probability is computed as the division of the
basic information provided in the learning set. Over-fitting and
examples or past realizations of the pattern P contained in the
over-smoothing affect the capability of the decision tree for
node ( ExP ) by the total number of examples contained in the
predicting accurate values of new data not used to train the
node ( ∑ Ex p ). Fig 1 shows how the first pattern probability is tree. Real applications decision trees usually contain between
p
50 and 100 nodes.
When training a decision tree (that is, the construction of each curve must be sampled at the same energy vector
the separation rules), a portion of the total available data must ( q1 ,...,qn ) , in order to perform mathematical computations
be saved to assess the capability of generalization of the
between curves, or to classify them.
decision tree. The saved data is referred as the test data set,
while the portion of the total available data used to build the Price
tree is referred as the learning set. Practical figures of real
decision tree applications are to save 25% of the initial p 1i
available data to form the test set, and 75% of the total data to p ik
represent the learning set. A similar value of the efficiency
index of the tree applied to the learning data set (used to build
the tree) and the efficiency index of the tree applied to the test p in
data set (used to assess the capability of generalization of the
tree) represents a valuable measure to guarantee that the tree is
neither over-fitted nor over-smoothed. These efficiency
indexes reveal a good forecasting ability of the decision tree, q1 qk qn
at least in the short term where the explanatory factors do not Amount of energy
change. Once the tree has been built, if a significant reduction Fig 3: Sampling of the RDC at a set of n equally spaced values of energy
of the initial tree efficiency when applied with new data is
observed, new explanatory factors or market conditions may
B. Description of the methodology
need to be considered and a new analysis must be done to
update the decision tree. Fig 4 depicts the methodology for estimating the RDC
patterns , which comprises five steps.
III. RESIDUAL DEMAND CURVE ESTIMATION
Correlations study
A. Motivation y
Initial selection of variables.
The electricity industry has experienced a process of Study of the correlations
deregulation in an increasing number of countries. Within this of the explanatory variables.
A. Motivation Clustering
Power system constraints are addressed in Spain by
Clasification of each explanatory variable curves.
increasing and decreasing the generation of connected units, Clasification of the daily load curves (objective variable).
Patterns are obtained for each cluster.
and by connecting off-line ones, based on the generation bids
submitted by the generating agents into the market [27-30]. A
generator offer consists of a set of power-price blocks for each
hour of the following day. A minimum income complex
condition is also submitted in the offer. This condition consists 0 24
of a fixed income term and a variable income term. The fixed
term internalises the start up cost of the generating unit.
Decision tree
Voltage violations are more frequent than overloads in the
Spanish system, due to the lack of reactive power in the areas A decision tree is built in order to obtain the probability of each
daily load pattern.
where they occur and the existence of a big generation
imbalance between exporting and importing areas. The
generation demand imbalance of an electrical area is computed x1 < X 1 ?
subtracting to the total area demand the total area generation,
and thus, it indicates the magnitude of the energy transport x2 < X 2 ?
when any credible contingency occurs. This verification data are collected together and a power system scenario is built
includes running a number of static security assessment tools and converged.
that could be very time consuming. In this context, the It should be noted that the objective of the forecasting tool
necessity of forecasting power system scenarios for the is not the predictions of individual shunt device values, rather
following hourly scenarios emerges, with two main objectives: the main interest lays on the forecasting the overall of power
(a) to foresee potential network problems in normal operating system scenarios. The quality of the overall forecasting tool
condition or under the occurrence of contingencies, and (b) to can be measured by comparing the predicted scenarios with
anticipate preventive or corrective measures for selected the scenarios effectively provided by the state estimator of the
contingencies in order to comply with the N-1 or N-2 security energy management system. For this purpose, two comparison
criteria. A convenient time scope for forecasting future indexes have been defined: the branch flow quality index and
network scenarios may vary between several hours and few the bus voltage quality index. A very challenging research is
days, depending on different factors such as the complexity of being focused in detecting which of the different forecasted
the network (number of nodes, branches and generators), time power system data is responsible for the main inaccuracies
consumption of the available security assessment tools, risk yielded by the quality indexes and analyzing their impact in
aversion sensitivity of the system operator that is on charge of running the security assessment tools.
managing the transmission system or the accuracy accepted of
the predictions. VI. CONCLUSIONS
In order to anticipate the future state of the power system This paper has proved the great versatility of decision trees
different power system data must be forecasted: network showing their adequacy to be applied to very diverse different
topology, active and reactive power loads, generation of each probabilistic real applications. The main advantage of decision
generating unit, wind power prediction, and the position of trees is the easy interpretability of the results and the supply of
voltage control resources (which include the generator probability values without assuming normal distributions.
voltages, transformer ratios and shunt devices). Network Decision trees have been used to predict stochastic residual
topology is provided by the maintenance scheduling of the demand curves in the Spanish electricity market, to estimate
elements of the transmission system performed by the system the daily load pattern of units and to predict the values of
operator; active and reactive power loads are easily predicted reactors and capacitors of the Spanish power system in a short-
using the different geographical temperatures; generation is term time scope. In addition, the paper has provided a
provided from the market clearing of the units and wind power comprehensive a review of the different forecasting techniques
prediction and generation voltages is obtained by internal available pointing their advantages and disadvantages.
models using time series.
Switched shunt devices (reactors and capacitors) are usually VII. REFERENCES
large, and cannot be treated as continuous variables [1] B. S. Everitt, Cluster Analysis, Third Edition, New York: Wiley, 1993
maintaining high accuracy. Their discrete nature suggests [2] A. Baillo, “A methodology to develop optimal schedules and offering
classification tools such as decision trees as appropriate to strategies for a generation company operating in a short-term electricity
market”, PhD Thesis, Comillas University, October 2002.
predict their value. [3] Box G.E.P.,Jenkins. G.M.,Reinsel G.C.“Time Series Analysis”, Prentice
It should be noted that decision trees have also be tried to Hall, 1994.
forecast the transformer ratio of automatic load tap changing [4] A. Martín-Calmarza, I. De-la-Fuente, “New forecasting method for the
(ALTC) transformers, but no completely successful results residual demand curves using time series (ARIMA) models”, in Proc.
7th International Conference on Probabilistic Methods, Naples, Italy.
have been obtained and further research must be undertaken. September 2002.
One possible reason is the almost continuous nature of their [5] Pankratz.A. “Forecasting with Dynamic Regression models”, John
value: although transformer taps are discrete they may be Wiley & Sons, 1991.
[6] C. M. Bishop, “Neural Networks for Pattern Recognition”. New
treated as continuous due to the fact that the difference York:Oxford University Press, Date, 1995.
between two successive transformer taps is usually small. [7] A. Muñoz and T. Czernichow, "Variable Selection through
StatisticalSensibility Analysis: Application to Feedforward and
B. Description of the methodology Recurrent Neural Networks" Tech. Rep. 95-07-01 Institut National de
The methodology for estimating shunt devices comprises Télécomunictions (INT-SIM), Paris, 1995.
[8] Bengio.J, Frasconi.P. "Input-Output HMM's for Sequence Processing."
three steps: identification of explanatory variables, IEEE Transactions on Neural Networks 7(5), 1996.
development of the decision tree, and evaluating the results on [9] A. Mateo, A. Munoz, J. González, “Modeling and forecasting electricity
the power flow results. prices with input/output hidden Markov models”, IEEE Transactions
on Power Systems, Vol. 20, no. 1, pp. 13-24, February 2005.
First the potential explanatory variables are selected. One [10] L. Wehenkel, Automatic Learning Techniques in Power Systems,
forecasting tree is trained for each shunt device (the Spanish Kluwer Academic, Boston, 1999.
power system comprises a total amount of 51 shunt devices). [11] E. Sánchez-Úbeda, “Models for Data Analysis: contributions to
automatic learning”, PhD Thesis, Comillas University, October 1999.
In order to build the trees, historical values of the shunt [12] S. Makridakis, S. C. Wheelwright, and R. J. Hyndman, “Forecasting”,
devices and explanatory variables have been gathered for three New York: John Wiley and Sons, Date, 1998.
months. Once the shunt Mvar value of each shunt device has [13] Y. Xia, H. Leung, N. Xie, and E. Bosse, "A New Regression Estimator
With Neural Network Realization," IEEE Transactions on Signal
been estimated, the total forecasted values of the power system
Processing, vol. 53, February 2005, 2005.