Smart Metering

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2017.2680542, IEEE
Transactions on Smart Grid
Identifying Topology of Low Voltage Distribution

Networks Based on Smart Meter Data
Satya Jayadev P, Student Member, IEEE, Nirav Bhatt, Member, IEEE,
Ramkrishna Pasumarthy, Member, IEEE and Aravind Rajeswaran
AbstractIn a power distribution network, the network topol- Some of the LV consumers operate on single-phase and they
ogy information is essential for an efficient operation. This draw power from one of the three phases of a distribution
network connectivity information is often not available at the transformer. The phase connectivity of those consumers also
low voltage level due to uninformed changes that happen from
time to time. In this paper, we propose a novel datadriven forms a part of the network topology information. This infor-
approach to identify the underlying network topology for low mation is important in maintaining load and voltage balances
voltage (LV) distribution networks including the load phase con- in the three phases of the distribution transformers and the
nectivity from time series of energy measurements. The proposed distribution feeders. Unbalanced loads on transformers and
method involves the application of Principal Component Analysis feeders lead to higher copper losses and voltage drop, and
(PCA) and its graph-theoretic interpretation to infer the steady
state network topology from smart meter energy measurements. consequently affect the life of the assets [7].
The method is demonstrated through simulation on randomly The network topology information might not be accurately
generated networks and also on IEEE recognized Roy Billinton available at all time because of changes that take place due
distribution test system. to network reconfiguration, repairs, maintenance and load
Index TermsPhase Identification, Distribution Network balancing [7], [8]. Moreover, the consumers might have a
Topology, Smart Meters, Principal Component Analysis, Graph facility to switch between phases when a phase trips, and,
Theory thus changing the topology. Often, the network operators are
not aware of such changes in the topology [9].
I. I NTRODUCTION A number of attempts were made to solve the problems
of phase identification, and topology identification. Smart
T HE complexity of power distribution networks is increas-
ing day by day with advancements in technology and
addition of new and sophisticated components to the power
grid technologies have further intensified the search for new
methods of inferring connectivity. The methods for topology
grid. The concentration of power systems control and mon- identification can be classified into two categories: (i) Hard-
itoring has traditionally been at the generation, transmission ware based methods, and (ii) software based methods.
and high voltage distribution levels. A need for advancement in Hardware based methods include microprocessor based
control at the Low Voltage (LV) distribution level arose with phase identification system and signal injection device de-
the advent of active distribution networks with intermittent signed for phase measurement [10], [11]. However, the ad-
Distributed Energy Resources (DER) such as solar, wind ditional hardware and staff required for these devices to work,
energy etc., and plug-in devices such as electric vehicles. makes these options costly.
Research is actively pursued in the areas of control and Software based methods have become popular with the
automation of distribution networks, and in many cases, it is advent of Advanced Metering Infrastructure (AMI) such as
assumed that network topology information is available [1] smart meters and Phasor Measurement Units (PMU). These
[3]. devices are installed at important nodal points and they gen-
The topology of an LV distribution network gives the con- erate large amount of data at regular time intervals which can
nectivity among its numerous assets such as feeders, distribu- be collected and analysed at centralised data centres. In the
tion transformers, distributors and consumers. The information literature, researchers have proposed methods for analysing
of the underlying network topology is useful for efficient inte- these data for topology identification.
gration of renewable energy sources and efficient management There were some software based methods which were
of outages in distribution networks [4], [5]. Further, for a presented prior to the development of AMI. One is a search
reliable state estimation in a distribution network, accurate algorithm to determine phase information using power flow
information of the network topology is essential [6]. measurements and load data [12]. Its drawback is that it ig-
nores noise and uncertainty in data. Another method proposed
The finance support to Satya Jayadev P. from Data Science Initiative Grant realtime monitoring of changes in the underlying network
of IIT Madras, and Nirav Bhatt from Department of Science & Technology, topology based on the status of circuit breakers [13].
India through INSPIRE Faculty Fellowship is acknowledged.
Satya Jayadev P and Ramkrishna Pasumarthy are with Department of Elec- The latest methods include optimization based approach to
trical Engineering, Nirav Bhatt is with the Department of Chemical Engineer- infer phase connectivity and also network topology from time
ing, Indian Institute of Technology Madras, India and Aravind Rajeswaran is series of power measurements [14], [15]. The authors proposed
with the Departments of CSE and EE at the University of Washington Seattle.
ee15d202@smail.iitm.ac.in, niravbhatt@iitm.ac.in, ramkrishna@ee.iitm.ac.in, Mixed Integer Programming (MIP) based solution which is
aravraj@cs.washington.edu, computationally intensive to solve without relaxing the con-
1949-3053 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2017.2680542, IEEE
straints. In [16], a technique to identify the phases based TABLE I: Notations

on cross-correlation method using the time series of voltage Symbol Meaning
measurements, is presented. [17] proposed a linear regression zim (j) Energy measured at ith node in the j th time interval
based algorithm for phase identification which considers the Actual energy consumed at ith node in the j th
zit (j)
time interval
correlation between consumer voltage and substation voltage. Z Data matrix stacked with N vectors of energy measurements
It requires the Geographical Information System (GIS) model n Number of nodes or meters
which may not always be available. In another method for N Number of measurements or samples per node
nl Number of layers in a tree network
phase identification, data obtained from micro synchrophasor ni Number of independent variables
measurement units (PMU) is analysed [18]. In [6], [19], nd Number of dependent variables
methods were proposed to infer topology from time series of c Number of consumer nodes in a network
measurements from PMUs. In [20], an algorithm to identify C Constraint matrix
R Regression matrix
the network topology using voltage correlation analysis is Error matrix stacked with error vectors e(j), in N
presented. In [21], a hypothesis testing based technique to the E
time intervals
topology identification is proposed using the signals generated e Error covariance matrix
by the PLC network laid in a smart grid. (j) Vector of technical losses in the j th time interval
Vector of random errors in measurements in the j th
In this work, we propose a novel method for identifying the (j)
time interval
underlying steady state network topology from smart meter Vector of errors due to imperfect synchronization of meters
(j)
energy measurements. Our approach integrates PCA and its in the j th time interval
graph-theoretic interpretation (as shown in [22]) to identify the Covariance matrix of technical losses
Error covariance matrix corresponding to random errors
network topology from time series of energy measurements,
in measurements
based on the principle of energy conservation. To factor in
Error covariance matrix corresponding to errors due to
the high penetration of DERs, we show how our formulation imperfect synchronization
takes care of the presence of DERs behind energy meters.
We also consider the presence of technical losses, random
where zt (j) is the vector of true values of the variables at
errors in measurements and errors due to imperfect time
the j th time instance and e(j) is the vector of errors due to
synchronization of smart meters.
As a preliminary, in our prior work [23], we introduced this noise. It is assumed that the error is normally independent and
approach for phase identification. In this paper, we elucidate identically distributed (i.i.d.) as follows:
our method in detail and extend it to solve the problem of e(j) N (0, e2 I) (3)
network topology identification. We present the algorithms ex-
plicitly and corroborate them through simulations on randomly where e2 is an error variance, N indicates the Gaussian
generated networks and also on Roy Billinton test system. A distribution, and I is the (n n) identity matrix. The n
simple example is also presented for better understanding. We variables are linearly related by the following model:
also perform time complexity analysis of the algorithm. C zt (j) = 0 (4)
The rest of the paper is organised as follows. Section II
revisits some necessary preliminaries. The mathematical for- where C is a (p n)-dimensional constraint matrix, with p
mulation of the problem is given in Section III. The proposed being the number of linear relationships. From measurement
solution and algorithms are presented in Section IV. Finally, vectors available at N time instants, an (nN )-dimensional Z
the simulation results with time complexity analysis and matrix can be constructed by stacking zm (j), j = 1, 2, . . . , N
conclusions are provided in Sections V and VI, respectively. vectors. Eq. (4) indicates that the noise-free data lies in an
(n p)-dimensional subspace orthogonal to the p-dimensional
II. P RELIMINARIES subspace spanned by the rows of the C. The objective of
model identification using PCA is to estimate the (n p)-
A. Principal Component Analysis (PCA)
dimensional true data subspace and the p-dimensional con-
PCA is one of the widely used tools of multivariate data strained subspace, given the data matrix Z.
analysis with many applications. PCA has been applied to In PCA, these subspaces are obtained from the eigenvectors
identify a model in the presence of noise [24], [25]. In this of the covariance matrix Sz = ZZT . The subspaces are
paper, we will use the method of model identification using identified such that the sum of the squared difference between
PCA from measurements and, we will revisit it next. the measured values and denoised estimates of the values
1) Model Identification using PCA: Let zm (j) be defined of the variables is minimised [25]. The eigenvectors of the
as a sample of n variables measured at the j th time instance, covariance matrix can be determined using Singular Value
as follows: Decomposition (SVD) of Z as follows:
T
zm (j) =
m
z1 (j), z2m (j), , znm (j) (1)
SVD(Z) = U1 S1 V1T + U2 S2 V2T (5)
Generally, the measured values are corrupted by random noise
where U1 is the set of orthonormal eigenvectors corresponding
leading to an error in the samples. The vector of measured
to the (np) largest eigenvalues of Sz while U2 is the orthog-
variables can be written as:
onal eigenvectors corresponding to the smallest p eigenvalues
zm (j) = zt (j) + e(j) (2) of Sz . S1 and S2 are diagonal matrices with the singular values
of Z. It has been shown that SR (UT2 ) SR (C), where SR (.) B. Graph Theory Overview
indicates the subspace spanned by the rows of (.) matrix [26]. In this section, certain concepts of algebraic graph theory
Then, UT2 satisfies the following relationship: pertaining to this work are revised.
UT2 z = 0 (6) 1) Graph and Sub-Graph: A graph G = (NG , NE ) con-
tains a set of nodes (NG ) and edges (NE ), whose connectivity
Hence, UT2
gives the constraint matrix and it is to be observed represents a network of physical or abstract elements. A graph
that the constraint matrix suffers from the rotational ambiguity: is said to be directed if its edges are directed from one node
UT2 z = Cz = QCz = 0 (7) to another. S is a sub-graph of G with set of nodes NS NG
and set of edges ES EG such that ES contains all edges
where Q is a non-singular matrix. Hence, the estimated with both end points in NS .
constraint matrix C is not unique and may not be the one A graph is said to be connected if there exists a path between
which has direct physical interpretation. every pair of its nodes, otherwise the graph is disconnected.
A regression model can also be obtained by partitioning the The connected sub-graphs of a disconnected graph are referred
variables into a set of dependent variables zd having dimension to as its components.
(nd = p) and independent variables zi (ni = n p). The 2) Tree: A tree is a graph with no circuits, and any two
columns of C corresponding to the zd and zi can also be nodes in a tree are connected by a unique path. A directed
partitioned as follows: C = [Cd , Ci ], where Cd and Ci are the tree is a directed graph with no circuits. A disconnected graph
(nd nd ) and (nd ni )dimensional matrices, respectively. with its components as trees is called a forest. Fig. 1 shows a
Then, from Eq. (7) we obtain directed forest with three tree components. In a directed graph,
Cd zd + Ci zi = 0. (8) a parent node is a node which has an edge directed to another
node(s) called the child node(s). In Fig. 1, the nodes 4, 5 and
Since U2d is of full rank, we can express Eq. (8) in terms 6 are child nodes to the parent node 1.
of the estimated regression matrix relating the dependent and
independent variables as follows: Parent
P1 P2 P3 Nodes
zd = (Cd ) 1
Ci zi = Rzi , (9)
where R is the (nd ni )dimensional regression matrix. The
a b c d e f g h i
regression matrix R is proven to be unique [26].
The estimate of subspace of C is not optimal in max-
imum likelihood sense when the assumption of i.i.d. error C1 C2 C3 C4 C5 C6 C7 C8 C9
in Eq. (3) does not hold, e(j) N (0, e ). In such cases,
two approaches proposed in [25] can be used to estimate
Child Nodes
C. Next, we will describe briefly one of the approaches
when the error covariance matrix e is known. For details Fig. 1: A Forest with three tree components
of both approaches, refer [25]. The approach transforms the
data matrix by scaling it with Cholesky factor of the error 3) Incidence Matrix: The incidence matrix (A) of a graph,
covariance matrix. Cholesky decomposition of e is given G = (NG , NE ), describes the incidence of edges on nodes
by: and an element in A is defined as follows for a directed graph:
e = LLT (10)
where L is the (n n)dimensional lower triangular matrix. +1, if edge j enters node i

The noisy data matrix is transformed into Zs as follows: Aij = 1, if edge j leaves node i (16)

0, if edge j is not incident on i

Zs = L1 Z = L1 Zt + L1 E (11)
where E is the error matrix, and Zt is the data matrix having It is of dimension n e where n = |NG | and e = |NE |
the true values. The covariance matrix of the transformed
Proposition 1
matrix is:
A directed graph (or a directed forest) can be uniquely con-
Szs = Zs ZTs (12)
structed from an incidence matrix, provided there are no self
It is shown that by applying PCA on Zs , we get an estimate of loops.
the constraint matrix C pertaining to the transformed data, on
which inverse transformation is applied to get constraint matrix Proof 1 The proof of the Proposition is similar to the proof
corresponding to original data [26]. We apply PCA to Zs and of Theorem 8 in Chapter 3 in [27]. By definition of incidence
get an estimate of the constraint matrix and the regression matrix A, the number of nodes and edges of the graph are
matrix corresponding to the original data as follows: known through the number of rows and columns of A.
T T We now have to show that the edge connectivity between the
SVD(Zs ) = U1s S1s V1s + U2s S2s V2s (13)
nodes can be uniquely determined from the elements of A.
C = UT2s L1 (14) Each column of A corresponding to an edge, has only one +1
R = (Cd )1 Ci (15) element and only one -1 element which uniquely point to the
nodes where that particular edge begins and ends. Thus, this B. Energy measurements
information gives the connectivity of all the edges without
Since we assume that smart meters are installed at all the
any ambiguity. Since there are no self loops which cannot
nodal points in the network, energy measurements in watt-hour
be represented in the incidence matrix of a directed graph,
(Wh) are obtained from the meters over for regular time inter-
there is no loss of information making the graph reconstruction
vals, generally, fifteen or thirty minutes. These measurements
complete and unique.
are collected at a centralised location. The measurements are
stacked together to form a data matrix, Z, as follows:
C. Types of Distribution Networks
Based on the topology of the networks, the distribution

Z = zij (nN ) (17)
networks are classified as:
i Radial distribution networks: In this configuration, each where zij , henceforth denoted as zim (j), is the energy mea-
of the feeders and distributors is fed by a single source. surement corresponding to the ith node in the j th time interval,
Hence, this type of network is characterized by existence n is the number of nodes in the network and N is the number
of a unique path from the source (substation) to each of of measurements captured per node.
the consumers.
ii Ring main distribution networks: In this configuration,
the feeders and distributors may be connected to multiple C. Energy Conservation
sources for higher reliability of power supply. Hence,
In this section, the concept of energy conservation will be
multiple sources are available for feeding a load, and
illustrated using an example. Consider a graph of a power
there may be multiple paths between such sources and
network having eight energy meters (denoted as nodes 1,
loads. However, during steady network operation, the tie-
2, . . ., 8), connected through seven power lines (denoted as
switches and circuit breakers are configured such that only
edges a, b, . . . , g) as shown in Fig. 3. The incoming energy
one source feeds a load, and an electrically active path
is captured by an energy meter at each of the nodes.
between them is unique. Hence, the active steady state
network can still be considered to be radial. The principle of conservation of energy implies that the
sum of energies of incoming lines is equal to sum of energies
III. P ROBLEM F ORMULATION of outgoing lines at any node. Assuming noise-free readings,
A. Distribution Network as a Tree the meter readings at the nodes 1, 2, . . . , 8 in Fig. 3 can be
related via the following equations, by applying the principle
of conservation of energy, for all times j = 1, . . . , N :
Substation Node
Feeder & z1t (j) = z2t (j) + z3t (j) (18)
Transformer z2t (j) = z4t (j) + z5t (j) (19)
z3t (j) = z6t (j) + z7t (j) + z8t (j). (20)
Nodes
Note that the parent node (meter) reading is equal to the sum of
its child nodes (meters) readings in the graph of a distributed
network due to the energy conservation. This principle leads
a set of linear equations between the nodal readings described
Consumer Nodes by:
X
Fig. 2: Tree representation of network topology zk (j) = zi (j), k K, i Ik (21)
i
The topology of a distribution network can be considered
where K is the set of all parent nodes in the graph and Ik is
to be the connectivity between the meters installed at the
the set of child nodes to the parent node k.
substation, feeders, transformers, and consumer mains. A
graph can be constructed by assigning nodes to each of the
meters and the connections between them can be represented 1
as edges. Since the paths from the substation to each of the a b
consumers are unique as described in Section II-C, the graph
of a distribution network is a tree as shown in Fig. 2. For the 2 3
meters of 3-phase loads and 3-phase transformers, we assign
three separate nodes corresponding to each of the three phases,
c d e f g
in order to maintain consistency with the single-phase loads.
The main advantage of such an assignment is that it also allows
4 5 6 7 8
us to determine the phase identity of single-phase consumers1.
1 In this paper, the words meters, nodes and variables are interchange- Fig. 3: A distribution network with eight energy meters
able. Also the words samples and readings are interchangeable. connected through seven power lines
M: Energy meter
of technical losses reflected in the dependent nodes mea-
P surements, in the j th time interval. They can be modelled as
Consumer a
Gaussian with a non-zero mean, and heteroscedastic variance
M
DER
as follows:
(j) N ( , ) (22)
Consumer b
M
where is the vector of means and is the covariance
DER matrix. The mean captures the sum of the constant losses
Consumer c
and the average of the variable losses while the variance
M
captures the change in the variable losses. Since there are
DER no correlations between losses in the different lines, is
a diagonal matrix with no covariance elements.
Fig. 4: A network with DERs 2) Random errors in meter readings: The latest ANSI
standard for electricity meters stipulates that electricity meters
must be of 0.2 or 0.5 accuracy class [29]. This indicates that
D. Network with Distributed Energy Resources (DERs) the meter reading can be in the range of 0.2% and 0.5% of
true values for 0.2 and 0.5 accuracy class meters, respectively.
The penetration of DERs in distribution networks is increas-
This error can also be modelled to be Gaussian with each
ing gradually, and it is imperative to consider their presence
variable having a different error variance. The distribution of
in the network. The DERs are generally installed behind
the error vector due to random errors (j) in the readings
consumer meters, and they supply power to the consumers or
during the j th time interval is given by:
feed-in power to the grid when supply exceeds demand locally.
This feed-in of excess supply is compensated by the utilities (j) N (0, ) (23)
to the consumers through the concept of net-metering [28].
where is a diagonal errorcovariance matrix due to uncor-
With the net metering, the meter outputs is the net imbalance
related errors.
of local demand and supply.
3) Clock synchronization errors (CSE): Though the clocks
The advantage of dealing with energy measurements and
of all the meters are assumed to be synchronized, the synchro-
energy conservation is that they are unaffected by the net
nism may not be perfect leading to time intervals of energy
metering. For example, consider a transformer phase, say
measurements to be varying. The variation is generally in the
Phase P, feeding three consumers, say a, b, and c, each
order of milliseconds to few seconds. The error introduced
having a DER connected, as shown in Fig. 4. The energy
by this variation is modelled to be a zero-mean Gaussian
meters of each of the consumers measure the net energy
distribution as follows:
consumed. Suppose in j th time interval, the net consumption
of Consumer a is negative while that of Consumers b and (j) N (0, ) (24)
c is positive. For example, zat (j) = 20 units, zbt (j) = 50
units and zct (j) = 100 units. Energy fed in through meter a where is an n-dimensional diagonal matrix.
is consumed by Consumers b and c, while the rest of the With random errors and clock synchronization errors, the
demand is satisfied by Phase P. Thus, in an ideal scenario, measurements at the j th interval can be written as:
zPt (j) = 130 units. It can be observed that even in such a case zm (j) = zt (j) + (j) + (j). (25)
energy conservation holds i,e. zPt (j) = zat (j) + zbt (j) + zct (j).
Thus, the formulation inherently takes care of DERs in the It is assumed that the cross-correlation between the different
network. Hence, the methods proposed in the next sections components of error is negligible and we get,
can also be applied to the network with DERs, which becomes (j) + (j) N (0, e ), e = + . (26)
evident in Section IV. However, inferring the presence of
DERs in the underlying network is not within the scope of IV. P HASE AND T OPOLOGY I DENTIFICATION
this work.
Throughout this section, the following assumptions are
made:
E. Losses and Errors 1) There is no theft of electricity and there are no un-metered
In practice, we have to account for technical losses and loads in the network.
sources of noise in the measurements due to random errors 2) The topology of the underlying network remains unal-
and clock synchronization errors in the smart meter readings. tered while the N measurements are captured.
These losses and errors are modelled next.
1) Technical losses: The technical losses include the con- A. Phase Identification
stant losses such as iron losses, dielectric losses, and the In the phase identification problem, we can distinguish two
variable losses such as copper losses. The latter vary with the kinds of nodes: (i) three nodes corresponding to the three
load in the network and also depend on the length of the lines. phases of a transformer, (ii) consumer nodes. Since each
These losses introduce an offset in the energy conservation consumer node is connected to only one of the phases, the
equations written on the nodal readings. Let (j) be a vector graph turns out to be a forest with three trees. Each tree
has a parent node representing a phase and a number of The elements of , corresponding to each phase, are esti-
child nodes representing consumers. Then, the problem of mated as fractions of t proportional to the mean of respective
phase identification is to determine which child nodes are phase readings. The element of corresponding to the k th
descendants of which parent node. phase is estimated as:
Let us consider the forest shown in Fig. 1. In this forest, the
N
phase meters are parent nodes, and the consumer meters are P
zkm (j)
child nodes. Then, the incidence matrix (A) for this network j=1
(k) = t , k P. (29)
is given by: N
zkm (j)
PP
a b c d e f g h i k j=1

P1 1 1 1 0 0 0 0 0 0 The pre-processing step includes separation of from each
P2
0 0 0 1 1 1 0 0 0 of the samples to ensure zero offset in the model, as follows:
P3
0 0 0 0 0 0 1 1 1

C1
1 0 0 0 0 0 0 0 0 z(j) = zm (j) , j = 1, . . . , N. (30)
C2 0 1 0 0 0 0 0 0 0
has only three variance elements corresponding to the
C3 0 0 1 0 0 0 0 0 0
three phases. They are estimated from the variance in the
C4 0 0 0 1 0 0 0 0 0
total technical loss in fractions of variances of respective phase
C5 0 0 0 0 1 0 0 0 0
readings. The variance in total technical loss (denoted by lt ),
C6 0 0 0 0 0 1 0 0 0
is estimated as:
C7 0 0 0 0 0 0 1 0 0
2
0 0 0 0 0 0 0 1 0 N

C8 P P m P m
C9 0 0 0 0 0 0 0 0 1 zk (j) zi (j) t
j=1 k i
Var[lt ] = , k P, i C.
The sub-matrix corresponding to the first three rows (related N
(31)
to the parent nodes) of the incidence matrix A provides the
The diagonal element of corresponding to the k th phase
edge connectivity of the given network. Indeed, inferring this
is estimated as:
sub-matrix of A from measurements is sufficient to obtain the
edge connectivity of the graph. This connectivity is unique Var[zk ]
(k) = Var[lt ] P , k P, (32)
according to Proposition 1. In general, the incidence matrix Var[zk ]
can be split as: k
where Var[zk ] is the variance in N readings corresponding to

Ad
A= (27) k th phase. The diagonal elements of are estimated based
Ai
on the accuracy class of the meters. Let be the accuracy
where Ad are the rows corresponding to parent nodes and Ai class of a meter and as a result, the random errors in all the
are the rows corresponding to child nodes. meter readings lie within percentage of the reading. As
Due to the nature of phase and consumer measurements, nearly all values of a Gaussian distribution lie within three
the parent node variables can be taken as the dependent times its standard deviation, we estimate percentage of the
variables and the child node variables as the independent mean of the readings as three times the standard deviation of
variables. With this notation, the regression matrix, R, given random errors. The diagonal element of corresponding to
by PCA on measurements, relates the parent and child nodes ith variable is estimated as:
in accordance with principle of energy conservation. It can 2
be verified that the regression matrix R which regresses the zi
(i) = i P C, (33)
dependent variables on the independent variables is in fact the 3 100
matrix Ad with a negative sign. The uniqueness of R makes it
comparable element-wise to Ad and hence, the connectivity where zi is the mean of the ith variable. The standard deviation
of the underlying graph can be inferred from R. in the error due to imperfect time synchronization is taken as
In the presence of technical losses, Eq. (7) transforms into the deviation in the reading caused by one second change in
C zt = . Also, PCA assumes the errors in measurements the time interval. For each variable, this deviation is estimated
due to noise, to be i.i.d.. Hence, to estimate C through PCA, from the mean of its readings. Hence, the diagonal element of
, , and are to be estimated from data and Z needs corresponding to the ith variable is estimated as:
to be pre-processed. The elements of are estimated from 2
the mean of the total technical loss over all samples. Let P and zi
(i) = i P C, (34)
C be the sets of rows of Z corresponding to phase nodes and 60T
consumer nodes, respectively. The mean is calculated from where T is the time interval of a reading in minutes.
the difference in the summation of phase readings and the Now, e is calculated following Eq. (26). Let Z be the data
summation of consumer readings, as follows: matrix after the pre-processing step of offset separation. PCA
is applied on Z as described in Section II-A1 to estimate R
N

P P m P m
zk (j) zi (j) and the phase connectivity is inferred. Algorithm 1 describes
j=1 k i
t = , k P, i C. (28) the method for phase identification.
N
Algorithm 1 Phase Identification Algorithm 2 Topology Identification

1: Start with N samples of measurements stacked in Z 1: Start with N samples of measurements stacked in Z and
2: Estimate as per Eq. (29) let l = 1.
3: Subtract from all columns of Z as per Eq. (30) to get 2: while l nl do
T
Z 3: Let Z = [zTi ] i Nl+1 Nl
4: Estimate , and as per Eqs. (31) to (34) 4: Estimate from Z as per Eq. (29)
5: Calculate e using Eq. (26) and add variance elements of 5: Subtract from all columns of Z as per Eq. (30)
to corresponding elements of e to get Z
6: Compute C by applying PCA on Z following Eqs. (10) 6: Estimate , and from Z as per Eqs. (31) to
to (15) (34)
7: Calculate R as per Eq. (9) 7: Calculate e using Eq. (26) and add variance elements
8: Round off R to truncate deviations due to noise and of to corresponding elements of e
numerical residues. In each column, the element closest 8: Compute C by applying PCA on Z following
to 1 is rounded to 1 and rest 0. Eqs. (10) to (15)
9: Infer phase connectivity from R 9: Calculate R as per Eq. (9)
10: End 10: Round off R to truncate deviations due to noise and
numerical residues.
11: Let Rl = R and l = l + 1
B. Topology Identification 12: Infer the topology from R1 , ..., RL1 .
The solution to the phase identification problem can be 13: End
extended to the topology identification problem by visualizing
the tree structures in a layered manner. The nodes of the tree
can be separated into layers with each layer having meters nodes as shown in Fig. 6. The network has nine consumers
(nodes) operating at known voltage level, as indicated in Fig. 5. connected to different phases of a transformer. We generated
Any set of two successive layers, when visualised separately, nine sample readings including losses and errors for all the
appears as a forest of directed trees. meters in this network. The measurement matrix Z is given
Now, the problem is reduced to finding connectivity of in Table II, with each column being a sample vector. Steps 2
a forest of directed trees, which is similar to the phase to 5 are applied to Z to compute the constraint matrix A 2 .
identification problem. By inferring the connectivity between
all possible successive layers, the complete network topology TABLE II: Measurements for the network in Fig. 6
can be identified. 308.4 341.6 217.3 175.3 273.0 388.3 365.5 311.7 317.4
147.3 232.4 143.7 2.5 203.4 152.1 119.2 67.2 64.7
57.4 12.0 41.9 65.0 49.6 151.1 205.9 203.5 199.6
Voltage Level 4
86.5 78.1 20.4 98.8 9.3 64.8 21.7 24.3 33.8
(Layer 4) 299.7 847.6 360.5 477.2 728.8 689.6 622.5 537.8 524.9
198.6 380.1 156.4 232.7 246.7 283.7 158.0 344.1 367.8
Voltage Level 3 28.7 80.0 14.3 50.4 61.2 70.2 38.4 73.2 88.9
(Layer 3) 55.1 342.8 169.8 164.5 382.0 301.5 391.0 93.7 38.9
409.2 271.8 294.8 466.2 270.0 449.9 478.0 590.0 433.4
127.7 143.5 242.5 122.0 98.9 252.3 64.8 167.6 242.1
Voltage Level 2 30.3 53.3 0.1 88.6 40.5 29.9 95.1 46.1 28.5
(Layer 2) 233.2 60.9 29.3 230.1 114.3 144.9 288.6 344.4 138.8
Voltage Level 1 The corresponding regression matrix R is estimated to be:

(Layer 1)

0.97 1.16 1.21 0.13 0.04 0.04 0.13 0.29 0.05
Fig. 5: Layer-Wise Tree representation of Network Topology 0.1 0.07 0.11 1.12 0.83 1.08 0.13 0.25 0.05
0.11 0.07 0.03 0.13 0.46 0.05 0.96 0.82 1.01
Let the layers be numbered from bottom to top as shown It can be observed that the network connectivity can be
in Fig. 5. Let nl be the number of layers and Nl be the set of inferred from rounded R matrix. R is rounded off to get:
nodes present in layer l. Let zTi be the ith row of data matrix
Z. The following is the algorithm to topology identification: 2 3 4 6 7 8 10 11 12

V. S IMULATION R ESULTS 1 1 1 1 0 0 0 0 0 0
5 0 0 0 1 1 1 0 0 0
The proposed algorithms are demonstrated through simu-
9 0 0 0 0 0 0 1 1 1
lations on noisy data sets. The simulations are conducted on
MATLAB R 2014a. Next, we demonstrate the algorithm on the larger networks.
The network is built using random number generators in
A. Phase Identification MATLAB , R as follows:
First, we illustrate the proposed algorithm on a simple 2 Note that A can be computed with the rotational ambiguity as mentioned
example. Consider a phase connectivity network with twelve in Section II-A1. Hence, the numerical values have not been given here
1) The number of consumers connected per phase are chosen

0.25
randomly (uniformly) between 75 and 100. N =c
2) The N readings for each of the consumer meters are N = 2c
0.2
sampled from one of the three uniform distributions, with N = 3c
N = 4c
ranges (0 100), (0 300) and (0 500), to account for
t in seconds
0.15 N = 5c
consumers with different ranges of loads.
3) Now, the N readings for each of the three phase meters
0.1
are determined by summation of the meter readings of
consumers connected to them, respectively.
0.05
4) The relative distances of the consumers from the trans-
former are assigned randomly, from a set of numbers.
0
The product of these distances with respective consumer 50 100 150 200 250 300
readings is taken and scaled to the range (5 10). As n

the technical losses depend on consumer loads and their Fig. 7: No. of nodes Vs Simulation time
distances from the transformer, these scaled products are
taken as the percentages over the consumer readings
to calculate losses. The losses are added to the phase of networks on which the algorithms are tested. It can be
readings appropriately. observed that our method performs better on this front as well.
5) The random errors are introduced by assuming 0.5 accu-
TABLE III: Results of comparative simulations
racy class meters.
6) To account for synchronization errors, 15 minute time No. of
PCA based MIP based
interval is assumed and Gaussian error is added, with method method
Samples (N )
Success Rate (%)
standard deviation equal to the deviation in reading c 0 0
caused by one second change in the interval. 2c 100 10
3c 100 90
The algorithm is then applied to hundred data sets, with 4c 100 80
different values of N as multiples of c, where c is the number
of consumer nodes (Windows 10, Intel i5-4200U 1.64 Ghz
processor, 6 GB RAM). The time taken to arrive at the solution
B. Topology Identification
against the number of nodes for different number of readings
is plotted as shown in Fig. 7. The proposed algorithm for Topology Identification is tested
It can be observed from the Fig. 7 that the time taken for by simulating data for the Bus 2 of Roy Billinton distribution
phase identification is in the order of milli-seconds while an al- test system [30], which has 2004 nodes as per our formulation.
ternate method in [14], which uses same type of data, presents The simulation is conducted as follows:
time taken to be in the order tens of seconds. Assuming that 1) The N readings for each of the consumer meters were
the computational power used in both the cases to be of same sampled from a uniform distribution with mean and
order, our method clearly outperforms the method proposed in maximum equal to the average and peak loads of the
[14], in terms of time. consumers, as mentioned in [30].
To compare the identification capability of our algorithm 2) The relative distances of the consumers from their source
with that proposed in [14], phase identification of 10 ran- transformer and that of the transformers from their source
domly generated networks with upto 200 consumer nodes, feeders were randomly assigned.
was performed using both the algorithms and the success rates 3) The transformer and feeder meter readings, at each of
are reported in Table III. The success rate is the percentage the N time intervals, are then determined by appropriate
of networks that are exactly identified out of total number summation of consumer meter readings.
4) The noise in the samples due to technical losses, random
errors and time synchronization errors are added in a
similar way, as described in Section V-A.
Phase The above simulation is repeated 10 times with different
1 5 9 Nodes number of readings, N . The success rate and the average time
taken for the algorithm to arrive at the solution, are shown in
Table IV.
a b c d e f g h i
C. Time complexity of the algorithms

2 3 4 6 7 8 10 11 12 Time complexity of an algorithm is a formal measure of
how fast the algorithm runs and how well it scales up with the
Consumer Nodes increase in number of inputs. In the proposed algorithms, the
inputs are the number of nodes n and the number of samples
Fig. 6: An example graph for phase identification per node N , which increase with the size of the network. Thus,
TABLE IV: Topology identification: Simulation results estimation, and detecting non-technical losses such as power
No. of Success Rate theft. We also propose to extend this approach for inferring
Average Time (sec)
Samples (N ) (%) the underlying network for missing data scenario.
c 10 4.02
2c 100 7.47
ACKNOWLEDGMENT
3c 100 10.54 We would like to thank Prof S. Narasimhan of IIT Madras
4c 100 13.91 for his valuable inputs.
5c 100 18.15
R EFERENCES
[1] P. John Dirkman, Enhncing utility outage management system perfor-
the running time of our algorithms depend on the values of n mance, Schneider Electric White Paper, 2014.
[2] J. Fan, The evolution of distribution, IEEE Power and Energy Maga-
and N . zine, vol. 7, pp. 6368, 2009.
Time complexity is generally expressed through the O- [3] W. Kersting, Distribution system modeling and analysis, 2nd ed. CRC
notation which gives an asymptotic bound on the expected Press, 2007.
[4] C. Lueken, P. M. Carvalho, and J. Apt, Distribution grid reconfiguration
value of the running time of the algorithm. The complexity reduces power losses and helps integrate renewables, Energy Policy,
can be determined by finding the most time expensive step vol. 48, pp. 260273, 2012.
of the algorithm. In both the algorithms we proposed, the [5] F. Melo, C. Candido, C. Fortunato, N. Silva, F. Campos, and P. Reis,
Distribution automation on lv and mv using distributed intelligence,
step of PCA application on data, which involves singular in IEEE 22nd International Conference and Exhibition on Electricity
valued decomposition of Z, turns out to be most expensive. Its Distribution, 2013, pp. 14.
time complexity is O(N 2 n) which shows that we proposed a [6] G. Cavraro, Modeling, control and identification of a smart grid, Ph.D.
thesis, University of Padova, 2015.
polynomial time algorithm. Hence, our algorithm is fast and [7] K. Dickson, Reduction of power losses using phase load balancing
scales up well to large number of nodes. method in power networks, in World Congress on Engineering and
As an illustration, we also verified that a polynomial of Computer Science, San Francisco, USA, 2009.
[8] D. Das, A fuzzy multiobjective approach for network reconfiguration
degree three approximately fits the points plotted with the of distribution systems, IEEE Trasactions on Power Delivery, vol. 21,
number of nodes n on X-axis and simulation time t on the pp. 202209, 2006.
Y-axis, for N = 5c. The fitted curve is presented at Fig. 8. [9] J. Huang, V. Gupta, and Y.-F. Huang, Electric grid state estimators
for distribution systems with microgrids, in Annual Conference on
Information Sciences and Systems (CISS), Princeton, USA, 2012.
[10] C. S. Chen, T. T. Ku, and C. H. Lin, Design of phase identification
0.22 system to support three-phase loading balance of distribution feeders,
0.2 in Industrial and Commercial Power Systems Technical Conference
0.18
(I&CPS), Baltimore, USA, 2011, pp. 18.
[11] S. Zhiyu, M. Jaksic, P. Mattavelli, D. Boroyevich, J. Verhulst, and
0.16
M. Belkhayat, Three-phase ac system impedance measurement unit
t in seconds
0.14 (imu) using chirp signal injection, in Applied Power Electronics Confer-
0.12 ence and Exposition (APEC), 2013 Twenty-Eighth Annual IEEE, 2013.
[12] M. Dilek, R. P. Broadwater, and R. Sequin, Phase prediction in
0.1
distribution systems, IEEE Power Engineering Society Winter Meeting,
0.08 2002.
0.06 [13] M. Kezunovic, Monitoring of power system topology in real-time, in
0.04
39th Hawaii International Conference on System Sciences, 2006.
[14] V. Arya, D. Seetharam, S. Kalyanaraman, K. Dontasn, C. Pavlovski,
0.02
S. Hoy, and J. R. Kalagnanam, Phase identification in smart grids,
80 100 120 140 160 180 200 220 240 260 280 in IEEE International Conference on Smart Grid Communications,
n Brussels, Belgium, 2011, pp. 16.
[15] V. Arya, T. Jayram, S. Pal, and S. Kalyanaraman, Inferring connectivity
Fig. 8: Polynomial fit on no. of nodes Vs simulation time model from meter measurements in distribution networks, in 4th
International Conference on Future Energy Systems, 2013.
[16] H. Pezeshki and H. Wolfs, Consumer phase identification in a three
phase unbalanced lv distribution network, IEEE PES Innovative Smart
VI. C ONCLUSION Grid Technologies, Europe, 2012.
In this paper, we proposed a novel data-driven approach [17] A. Tom, Advanced metering for phase identification, transformer iden-
tification, and secondary modeling, IEEE Transactions on Smart Grid,
for inferring the phase connectivity and network topology of vol. 4, 2013.
an LV distribution network. The proposed approach uses PCA [18] M. H. Wen, R. Arghandeh, A. von Meier, Poolla, and V. O. Li, Phase
and its graph theoretic interpretation to infer the topology from identification in distribution networks with micro-synchrophasors, IEEE
Power and Energy Society General Meeting, Denver, CO, 2015.
energy measurements. The proposed algorithms have been [19] S. Wiel, R. Bent, E. Casleton, and E. Lawrence, Identification of
corroborated by simulation of random networks and also by topology changes in power grids using phasor measurements, Applied
simulating Roy Billinton distribution test system. Due to the Stochastic Models in Business and Industry, vol. 30, no. 6, pp. 740752,
2014.
presence of losses and noise in the measurements, accurate [20] S. Bolognani, N. Bof, D. Michelotti, R. Muraro, and L. Schenato, Iden-
topology identification requires more than N = c measure- tification of power distribution network topology via voltage correlation
ments per node in most of the cases. Further, the problem analysis, in 52nd IEEE Conference on Decision and Control, Florence,
Italy, 2013.
can be solved in the polynomial time with time complexity [21] T. Erseghe, S. Tomasin, and A. Vigato, Topology estimation for smart
O(N 2 n), and hence, the solution can be transferred to practice micro grids via powerline communications, IEEE Transactions on
in a straightforward manner. Signal Processing, vol. 61, no. 13, pp. 33683377, 2013.
[22] A. Rajeswaran and S. Narasimhan, Network topology identification
In the future, we propose to use this technique for solv- using PCA and its graph theoretic interpretations, in arXiv preprint
ing the problems of detecting changes in the topology, loss arXiv:1506.00438, 2015.
10
[23] P. S. Jayadev, A. Rajeswaran, N. P. Bhatt, and P. Ramkrishna, A novel

approach for phase identification in smart grids using graph theory and
principal component analysis, in American Control Conference, Boston,
USA, 2016.
[24] I. Jolliffe, Principal Component Analysis, 2nd ed. Springer-Verlay, New
York, 2002.
[25] S. Narasimhan and S. Shah, Model identification and error covariance
matrix estimation from noisy data using pca, Control Engineering
Practice, vol. 16, pp. 146155, 2008.
[26] S. Narasimhan and N. P. Bhatt, Deconstructing principal component
analysis using a data reconciliation perspective, Computers & Chemical
Engineering, 2015.
[27] B. Andrasfai, Graph Theory: Flows, Matrices. Akademiai Kiado,
Budapest, 1991.
[28] P. Andreas, K. George, and H. Ioannis, A review of net metering
mechanism for electricity renewable energy sources, J. Energy and
Environment, vol. 4, pp. 9751002, 2013.
[29] Ansi c12.20-2010, American National Standard for Electricity Meters,
pp. 111, 2010.
[30] R. Allan, R. Billinton, I. Sjarief, L. Goel, and K. So, A realibility
test system for educational purposes - basic distribution sytem data and
results, IEEE Transactions on Power Systems, vol. 6, no. 2, pp. 813
820, 1991.
Satya Jayadev P earned his Bachelors in Electrical

and Electronics Engineering from Gayatri Vidya
Parished College of Engineering. Currently, he is
is a Phd scholar in the Department of Electrical
Engineering at IIT Madras. He is affiliated with the
ILDS and systems & control groups at IIT Madras.
His research interests include analysis, optimization
and control of Smart Electrical Grids, applying tools
from machine learning.
Nirav Bhatt earned his Masters in Chemical Engi-

neering from IIT Madras, and Docteur es Science
(DSc) from EPFL, Switzerland. Currently, he is
INSPIRE Fellow with Systems & Control Group at
IIT Madras. His research interests include modeling
and identification of network systems, and machine
learning and data analysis of engineering and finan-
cial systems.
Ramkrishna Pasumarthy obtained his PhD in Sys-

tems and Control from the University of Twente, The
Netherlands in 2006. He is currently an Assistant
Professor at the Department of Electrical Engineer-
ing, IIT Madras, India. His research interests lie
in the area of modeling and control of physical
systems, infinite dimensional systems and control of
computing systems.
Aravind Rajeswaran is a PhD student at the Uni-

versity of Washington Seattle, affiliated with the EE
and CSE departments. For his work on optimization
and graph algorithms in the context of computational
sustainability, he received the best undergraduate
thesis award from IIT Madras. His current research
is on intelligent control of robots and animated char-
acters using reinforcement learning and trajectory
optimization.

Smart Metering

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Smart Metering

Transféré par

Droits d'auteur :

Formats disponibles

This article has been accepted for publication in a future issue of this journal, but has not been

Identifying Topology of Low Voltage Distribution

straints. In [16], a technique to identify the phases based TABLE I: Notations

Algorithm 1 Phase Identification Algorithm 2 Topology Identification

Voltage Level 1 The corresponding regression matrix R is estimated to be:

1) The number of consumers connected per phase are chosen

readings is taken and scaled to the range (5 10). As n

C. Time complexity of the algorithms

[23] P. S. Jayadev, A. Rajeswaran, N. P. Bhatt, and P. Ramkrishna, A novel

Satya Jayadev P earned his Bachelors in Electrical

Nirav Bhatt earned his Masters in Chemical Engi-

Ramkrishna Pasumarthy obtained his PhD in Sys-

Aravind Rajeswaran is a PhD student at the Uni-

Vous aimerez peut-être aussi