Académique Documents
Professionnel Documents
Culture Documents
= =
=
s t
n
i
n
j
t
j
s
i
t s
A
d
n n
t s
d
1 1
,
1
) , ( x x ) (1) 5
0
10
15
20
Fig. 2. Hierarchical clustering using the Ward linkage criterion.
where and are the number of objects in s and t,
and are the generic objects in s and t, and the
distance d(.) is evaluated by using the Euclidean norm.
( ) s
n
x
( ) t
n
( ) s
i
x
( ) t
j
B. k-means
The classical k-means clustering [16] groups a data set of
x
(m)
(m = 1, , M) samples in k = 1, , K clusters by
means of an iterative procedure. A first guess is made for
the K cluster centres c
(k)
(usually chosen in a random
fashion among the samples of the data set). The K centres
classify the samples in the sense that the sample x
(m)
belongs to cluster k if the distance || x
(m)
- c
(k)
|| is the
minimum of all the K distances. The estimated centres are
used to classify the samples into clusters (usually by
Euclidean norm) and their values c
(k)
are recalculated. The
procedure is repeated until stabilisation of the cluster
centres. Clearly, the optimal number of clusters is not
known a priori and the clustering quality depends on the
value of K.
The hierarchical tree of Fig. 1 is obtained by grouping
the load profiles of the data set by this method. The
samples are represented in the horizontal axis, whereas the
distances between clusters are in the vertical axis. The
height of the vertical branches represents the distance
between each pair of merged clusters. The clusters are then
constructed by choosing in the binary tree the maximum
distance admissible, or by selecting directly the distance
corresponding to desired number of clusters.
In the Ward linkage criterion, the clusters are formed in
order to minimise the increase of the within-cluster sums of
squares. The distance d
W
(s,t) between two clusters s and t is
then measured as the increase of these sums of squares if
the two clusters were merged:
C. Fuzzy k-means
Fuzzy k-means clustering [17] is rather similar to k-means
clustering, but each sample x
(m)
has a grade of membership
a
mk
to each cluster k. The procedure is initialised by
choosing K samples c
(k)
as cluster centres and assigning a
membership degree with the K centres to the M samples of
the data set. Each cluster centre c
(k)
is then updated by
replacing his value with the fuzzy mean of all the samples
with regard to the cluster k:
(3)
( ) ( )
1
1
2
1
2
= =
|
|
.
|
\
|
|
|
.
|
\
|
=
M
m
mk
M
m
m
mk
k
a a x c
z
The procedure is repeated until stabilisation of the cluster
centres. Again, the number of clusters and membership
criteria are user-defined parameters, which have to be
tuned by trial-and-error.
D. Modified follow-the-leader
Lets consider the follow the leader procedure introduced
in [18], which does not require initialisation of the number
of clusters and uses an iterative process to compute the
cluster centres. A first cycle of the algorithm sets the
number K of clusters and the number of patterns n
(k)
belonging to each cluster k = 1, ..., K by using a follow the
leader approach depending on a distance threshold . The
subsequent cycles refine the clusters, by possibly
reassigning the patterns to closest clusters. The procedure
stops when the number of patterns changing clusters in a
single cycle is zero. The process is essentially controlled by
the distance threshold , which has to be chosen by a trial-
and-error approach.
This procedure has been modified by the authors to fit
the needs of the proposed classification, with two different
objectives. The first objective is to keep into account the
different dispersion of the data in the input vector. For this
purpose, the Euclidean metric used in the original
algorithm has been modified by introducing for each index
a weighting factor
2
2
/
h
, where is the variance of
the h
2
h
th
feature computed from all the load diagrams in the
initial population and
2
is the average value of the
variance for h = 1,, H. As such, the impact of the
indices having a high variance is amplified in the
computation of the weighted Euclidean distance.
2
h
=
=
K
k
k
d
K
d
CDI
1
2
1
L
R
, (7)
and an Euclidean form of the Davies-Bouldin Index (DBI)
[20], representing the system-wide average of the similarity
measures of each cluster with its most similar cluster, for i,
j = 1, , K:
Fig. 3. Structure of the uni-dimensional SOM.
III. METRICS FOR ASSESSING CLUSTERING ADEQUACY
( )
( )
( )
( )
( ) ( )
( )
=
)
+
=
K
k
j i
j i
d
d d
K
DBI
1
,
max
1
j i
L L
r r
(8)
We consider any clustering algorithms forming K customer
classes, from which we build the corresponding K class
representative load diagrams by computing the weighted
average of the initial load diagrams, assuming as weights
the reference powers. In order to rank the clustering results
adequacy, we define the distance
( )
( )
k k
d L ,
) (
r between a
representative load diagram r
(k)
and the subset L
(k)
,
computed as the geometric mean of the Euclidean distances
between r
(k)
and ember of L each m
(k)
, and the infra-set
mean distance
( )
(
k
d L
=
=
K
k
k k
d
K
MIA
1
) ( 2
,
1
L r (5)
We consider a set of 234 non-residential customers
connected to the MV distribution system [12]. The
representative load diagram of each customer is obtained
by averaging the data measured with a 15-minute cadence
in a day with a given loading condition (spring weekdays).
Then, each representative load diagram contains 96 values.
Each clustering algorithm assigns the representative load
diagram of each customer to a specific cluster, providing a
complete and non-overlapping positioning of all customers.
and the Similarity Matrix Indicator (SMI), proposed by the
authors [6], defined as the maximum off-diagonal element
of the symmetrical similarity matrix, whose terms are built
by computing a logarithmic function of the Euclidean
distance between any pair of class representative load
diagrams, for i, j = 1, .., K:
( ) ( )
( ) | |
)
|
|
.
|
\
|
=
>
1
, ln
1
1 max
j i
j i
d
SMI
r r
(6)
Most of the clustering algorithms used require a
preventive assignment of the number of clusters to be
formed. The only exception is the modified follow-the-
leader algorithm, in which the number of clusters decreases
by increasing the distance threshold. As such, the distance
threshold has been adjusted during the analysis in order to
obtain the same numbers of clusters imposed to the other
algorithms. Results of different types of analysis are
presented in the sequel.
Fig. 4. Clustering results for the modified follow-the-leader algorithm. Horizontal axis: quarters of hour. Vertical axis: per unit power.
Fig. 5. Clustering results for various clustering algorithms. Horizontal axis: quarters of hour. Vertical axis: per unit power.
a) Hierarchical (average)
b) Hierarchical (Ward)
c) k-means
d) Fuzzy k-means
e) uni-dimensional SOM
A. Customer class formation for a specified number of
classes
We run the clustering algorithms with the number of
customer classes set to K = 16. Fig. 4 and Fig. 5 show the
results obtained from the modified follow-the-leader
algorithm (with distance threshold set to 2.266) and from
the other clustering algorithms, respectively. In each plot,
the red line represents the class representative load
diagram, while the blue lines show the load diagrams
falling into each customer class. The results show that two
algorithms the modified follow-the-leader and the
hierarchical clustering run with the average distance
linkage criterion are able to provide a highly detailed
separation of the clusters, easily isolating uncommon load
patterns.
B. Indicative evaluation on computational speed
We performed an indicative evaluation of the
computational speed for K = 16 customer classes. Even
though clustering analysis is normally performed off-line
and as such computational speed may not be a primary
issue, indications on computation times may be useful for
choosing the clustering algorithm to be used when the
number of customers is large. Computation time depends
on the PC usage, so that the duration indicated is the
average value obtained from several computations for the
same case. Table I presents a comparison among the
average computational speed obtained from a set of tests of
the algorithms. The Relative Computation Time Index
RCTI, defined for each algorithm as the ratio between its
computation time and the one of the k-means algorithm
(0.11 seconds on a Pentium III 733 MHz PC), is used to
easily represent the results. It can be noted that the k-
means, the modified follow-the-leader algorithm and the
uni-dimensional SOM exhibit a significantly fast
performance. Fig. 6 shows the bubbles of activity created
by the uni-dimensional SOM. Each unit corresponds to a
customer class, and the circle size is proportional to the
number of customers included into each class.
Fig. 6. Bubbles of activity created by the uni-dimensional SOM.
Fig. 7. Adequacy ranking with the MIA indicator Fig. 8. Adequacy ranking with the SMI indicator
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
10 12 14 16 18 20
number of customer classes
S
M
I
Follow-the-leader
k-means
Uni-dimensional SOM
Fuzzy k-means
Hierarchical (average)
Hierarchical (Ward)
0
0.05
0.1
0.15
0.2
10 11 12 13 14 15 16 17 18 19 20
number of customer classes
M
I
A
Follow-the-leader
k-means
Uni-dimensional SOM
Fuzzy k-means
Hierarchical (average)
Hierarchical (Ward)
0
0.2
0.4
0.6
0.8
1
10 12 14 16 18 20
number of customer classes
C
D
I
Follow-the-leader
k-means
Uni-dimensional SOM
Fuzzy k-means
Hierarchical (average)
Hierarchical (Ward)
0
0.5
1
1.5
2
2.5
3
10 12 14 16 18 20
number of customer classes
D
B
I
Follow-the-leader
k-means
Uni-dimensional SOM
Fuzzy k-means
Hierarchical (average)
Hierarchical (Ward)
TABLE I
RELATIVE COMPUTATION TIME INDEX
Clustering algorithm RCTI
k-means
1
Modified follow-the-leader
4
uni-dimensional SOM
7
Hierarchical (Ward)
25
Hierarchical (average)
53
Fuzzy k-means
270
Fig. 9. Adequacy ranking with the CDI indicator Fig. 10. Adequacy ranking with the DBI indicator
[8] B.D.Pitt and D.S.Kirschen, Application of Data Mining Techniques
to Load Profiling, Proc. IEEE PICA99, Santa Clara, CA, May 16-
21, 1999, pp.131-136.
C. Adequacy assessment of the clustering algorithms
The significant differences in computational speed could
suggest using the k-means algorithm. However, the most
important information for evaluating the effectiveness of
the clustering algorithm to be used is about clustering
adequacy. We performed repeated computations of the
clustering algorithms by varying the number of customer
classes imposed, and we computed the adequacy indicators
for all algorithms. The results are shown in Fig. 7 to Fig.
10. The analysis performed shows that the information
provided by the adequacy indicators is highly consistent. In
fact, the adequacy ranking (based on increasing values of
the indicators for the same number of customer classes) is
nearly similar for the same number of customer classes.
[9] D.Gerbec, S.Gasperic, I.Simon and F.Gubina, Hierarchic Clustering
Methods for Consumers Load Profile Determination, Proc. 2
nd
Balkan Power Conference, pp.9-15, Belgrade, Yugoslavia, 19-21
June 2002.
[10] A.Nazarko and Z.A.Styczynski, Application of Statistical and
Neural Approaches to the Daily Load Profile Modelling in Power
Distribution Systems, Proc. IEEE Transm. and Distrib. Conference,
New Orleans, LA, April 11-16, 1999, Vol.1, pp.320-325.
[11] R.Lamedica, L.Santolamazza, G.Fracassi, G.Martinelli and
A.Prudenzi, A Novel Methodology Based on Clustering Techniques
for Automatic Processing of MV Feeder Daily Load Patterns, Proc.
IEEE PES Summer Meeting 2000, Seattle, WA, July 16-20, 2000,
Vol.1, pp.96-101.
[12] G.Chicco, R.Napoli, F.Piglione, P.Postolache, M.Scutariu and
C.Toader, Options to Classify Electricity Customers, Proc.
MedPower 2002, Athens, Greece, November 4-6, 2002, paper
MED02-234.
V. CONCLUDING REMARKS
[13] A.P.Birch, C.S.zveren and A.T.Sapeluk, A generic load profile
technique using fuzzy classification, Proc. IEE Metering and Tariff
for Energy Supply Conference, 3-5 July 1996, pp. 203-207.
The results of adequacy assessment show that two
algorithms the modified follow-the-leader and the
hierarchical clustering run with the average distance
linkage criterion emerge as the most promising ones.
Both algorithms are able to provide a highly detailed
separation of the clusters, isolating load patterns with
uncommon behaviour and creating large groups containing
the remaining load diagrams. The other algorithms tend to
distribute the load diagrams among the groups formed. An
overall evaluation of the algorithms leads to consider the
modified follow-the-leader as the most efficient one, on the
basis of both clustering adequacy and computational speed.
[14] M.R.Anderberg, Cluster Analysis for Applications, Academic Press,
New York, 1973.
[15] J.H.Ward, Hierarchical grouping to optimise an objective function,
Journal of the American Statistical Association, 58 (1963) 236-244.
[16] J.T. Tou, R.C. Gonzalez, Pattern Recognition Principles, Addison-
Wesley, 1974.
[17] J.C.Bezdek, Pattern Recognition with Fuzzy Objective Function
Algorithms, Plenum Press, New York, 1981.
[18] Y.-H.Pao and D.J.Sobajic, Combined use of unsupervised and
supervised learning for dynamic security assessment, IEEE Trans.
on Power Systems 7, 2 (May 1992) 878-884.
[19] T.Kohonen, Self-Organisation and Associative Memory, 3
rd
Ed.,
Springer-Verlag, Berlin, 1989.
[20] D.L.Davies and D.W.Bouldin, A Cluster Separation Measure,
IEEE Trans. on Pattern Analysis and Machine Intelligence PAM-1,
2 (April 1979) 224-227.
The choice of the most convenient algorithm depends on
the operators needs. If a detailed customer classification is
needed, a promising strategy could be using the modified
follow-the-leader algorithm (or the hierarchical clustering
run with the average distance linkage criterion) to obtain a
first macro-classification, in which uncommon load
diagrams are isolated. Then, the study may be refined by
using the same algorithm on some macro-classes.
VII. BIOGRAPHIES
Gianfranco Chicco received his Ph.D. degree in
Electrotechnical Engineering in Italy in 1992. He
is Associate Professor of Distribution Systems at
the Politecnico di Torino, Italy. His research
activities include power systems and distribution
systems analysis, competitive electricity markets,
and power quality.
VI. REFERENCES
[1] S.V.Allera and A.G.Horsburgh, Load profiling for the energy
trading and settlements in the UK electricity markets, Proc.
DistribuTECH Europe DA/DSM Conference, London, 27-29 October
1998.
Roberto Napoli graduated in Electrotechnical
Engineering at the Politecnico di Torino, Italy, in
1969. He is Professor of Electric Power Systems at
the Politecnico di Torino and chairman of the
Italian Electric Power Systems National Research
Group. His research activities include power
system analysis, planning and control, artificial
intelligence applications, and competitive
electricity markets.
[2] P.Stephenson, I.Lungu, M.Paun, I.Silvas and G.Tupu, Tariff
Development for Consumer Groups in Internal European Electricity
Markets, Proc. CIRED 2001, Amsterdam, The Netherlands, June
18-21, 2001, paper 5.3.
[3] G.Chicco, R.Napoli, P.Postolache, M.Scutariu and C.Toader,
Customer Characterisation Options for Improving the Tariff Offer,
IEEE Trans. on Power Systems, 18, 1, February 2003, 381-387.
[4] C.S.Chen, M.S.Kang, J.C.Hwang and C.W.Huang, Synthesis of
power system load profiles by class load study, Electrical Power
and Energy Systems 22 (2000) 325-330.
Federico Piglione graduated in Electrotechnical
Engineering at the Politecnico di Torino, Italy, in
1977. He is Associate Professor of Industrial
Electrical Systems at the Politecnico di Torino,
Italy. His major research interests include power
system analysis, load forecasting, neural networks,
and artificial intelligence applications to power
systems.
[5] G.Chicco, R.Napoli, P.Postolache, M.Scutariu and C.Toader,
Electric Energy Customer Characterization for Developing
Dedicated Market Strategies, Proc. IEEE Porto PowerTech, Porto,
Portugal, September 10-13, 2001, paper POM5-378.
[6] G.Chicco, R.Napoli, F.Piglione, P.Postolache, M.Scutariu and
C.Toader, A Review of Concepts and Techniques for Emergent
Customer Categorisation, Proc. Telmark Discussion Forum,
London, September 2-4, 2002, paper 2_4.
[7] A.K.Jain, M.N.Murty and P.J.Flynn, Data Clustering: a Review,
ACM Computing Surveys 31, 3 (September 1999) 264-323.