Académique Documents
Professionnel Documents
Culture Documents
Quality
Mohammad Azama, Fang Tua, Yuri Shlapaka, Thiagalingam Kirubarajana, Krishna Pattipatia* and
Rajaiah Karanamb
a
The deregulation of energy markets, the ongoing advances in communication networks, the proliferation of intelligent
metering and protective power devices, and the standardization of software/hardware interfaces are creating a dramatic shift
in the way facilities acquire and utilize information about their power usage. The currently available power management
systems gather a vast amount of information in the form of power usage, voltages, currents, and their time-dependent
waveforms from a variety of devices (for example, circuit breakers, transformers, energy and power quality meters,
protective relays, programmable logic controllers, motor control centers). What is lacking is an information processing and
decision support infrastructure to harness this voluminous information into usable operational and management knowledge to
handle the health of their equipment and power quality, minimize downtime and outages, and to optimize operations to
improve productivity.
This paper considers the problem of evaluating the capacity and reliability analyses of power systems with very high
availability requirements (e.g., systems providing energy to data centers and communication networks with desired
availability of up to 0.9999999). The real-time capacity and margin analysis helps operators to plan for additional loads and
to schedule repair/replacement activities. The reliability analysis, based on computationally efficient sum of disjoint products,
enables analysts to decide the optimum levels of redundancy, aids operators in prioritizing the maintenance options for a
given budget and monitoring the system for capacity margin. The resulting analytical and software tool is demonstrated on a
sample data center.
Keywords: Power quality, reliability, capacity margin, sum of disjoint products, maintenance scheduling, residual life.
1. INTRODUCTION
The key factors that play a significant role in reliable, efficient and cost effective power supply to production or service
facilities are as follows:
1.
2.
Power quality
3.
4.
Among the above stated factors, the first three are critical to high-tech industries such as semiconductor device fabrication
facilities, Internet data centers and telecommunication switching centers. This is because these industries are intolerant to the
slightest interruptions of power and are highly sensitive to power quality variations.
When the primary power source (usually a feeder from the power company) fails, emergency/standby sources need to be
online within a specified time window to ensure continued satisfactory operation of a facility. Standby generators and
Further author information: (Send correspondence to Krishna Pattipati)
Krishna Pattipati: E-mail: krishna@engr.uconn.edu
Mohammad Azam: E-mail: tinku@engr.uconn.edu
uninterruptible power supply (UPS) are the commonly used equipments as secondary sources of power. Superconducting
magnetic energy system (SMES) is also being incorporated into the existing secondary sources of power supply.
Power quality is the degree to which the utilization and delivery of electrical power affects the performance of electrical
equipment. Any power line disturbance (e.g., sags, spikes, surges, electrical noise, harmonics, brownouts) that affects the
performance of sensitive electronic equipment is related to power quality. As sensitive electronic loads proliferate on
commercial utility grids, the concern over power quality also increases. In modern industrial power systems, devices such as
power line conditioners (PLC), isolation transformers, transient voltage surge suppressors (TVSS), static transfer switches
(STS) and reactors are employed to provide immunity against power quality problems. Due to relatively small power
handling capacity, these devices are deployed only in some specific downstream areas of an industrial power system (e.g.,
servers, semiconductor fabrication equipment).
A system operates in a healthy state when it has sufficient capacity reserve to meet a deterministic criterion such as the
loss of the largest unit. In a marginal state, the system is not in any difficulty, but does not have sufficient margin to meet the
specified criterion. The system load exceeds the available capacity in the at risk state. The information on unused (if any)
capacity of an installation in critical systems is useful in deciding the placement of additional equipment in existing floor
space, while maintaining the system in a healthy state.
All power users have to pay energy and demand charges. Users, whose consumption patterns are highly fluctuating, need
to incorporate intelligent schemes for load scheduling to avoid the penalty for over-peaking. By scheduling standby
generators, industries can save on demand charges, while continuing normal operation.
Despite the use of above schemes, industrial power systems do not attain complete immunity against interruptions and
unhealthy operations. Simultaneous failure of both primary and secondary power sources and deterioration in equipment
performance due to disturbances and aging are the major causes. A prediction tool capable of forecasting plant reliability,
dynamically scheduling maintenance activities and analyzing capacity margin can significantly enhance the degree of
immunity.
In this paper we focus on the first three problems. We developed an efficient software tool for evaluating the reliability
of a power system. The reliability analysis identifies the components that are likely to jeopardize the stringent availability
requirements. Routine maintenance alone is almost always insufficient to strictly ensure the desired level of system
availability. Here, we propose a dynamic maintenance schedule to satisfy the demand at a particular level of availability at
minimum cost, or alternatively maximize the system availability within a given budget. We also focus on the computation of
excess capacity at different points of the system based on real-time monitored data (taking the power quality information into
account). The excess capacity analysis enables an operator to ascertain the bottlenecks in the system; this aids in ensuring
sufficient capacity margins.
The paper is organized as follows. Section 2 describes the basic methodology for reliability and capacity margin
analysis, and maintenance scheduling. Section 3 applies the methodology to the power system of a data center. Section 4
summarizes the work and proposes future plans.
2. METHODOLOGY
The theoretical approach for reliability and capacity margin analysis is discussed in this section.
(1)
We can introduce a function called the structure function ( x ) to determine whether the system is functioning or not. It is
defined by
1, if the system is functioning when the state vector is x
if the system has failed when the state vector is x
0,
( x) =
where
(2)
x = ( x1 ,. . . . . . . . . x n ) is the state vector. We assume that the replacement of a failed component with a
functioning one causes no deterioration in the performance of the system, i.e., the system is monotonic. This implies that
( x ) is a monotonically increasing function of x , that is, if x i y i i = 1,2,
, n, then ( x ) (y).
LL
A state vector x is called a path vector if ( x ) = 1 . If, in addition, (y) = 0 for all
minimal path vector. If x is a minimal path vector, then the set A = {i : xi = 1} is called a minimal path set. Thus, a minimal
path set is a minimal set of components whose functioning ensures the functioning of the system. Alternately, a state
vector x is called a cut vector if ( x ) = 0 . If in addition, (y) = 1 for all y > x , then x is said to be a minimal cut vector. If
x is a minimal cut vector, then the set C = {i : xi = 0} is called a minimal cut set. In other words, a minimal cut set is a set of
components whose failure ensures the failure of the system. Minimal cut sets are more suitable for analyzing a system in the
failure space, while minimal path sets are suitable for analysis in the success space. In the following, we will use the minimal
path set approach7.
Let
{Ai }is=1 denote the minimal path sets of a given system. We define
by
j ( x ) = max
j
xi
i A
(3)
The above expression implies that a system will function if and only if all the components of at least one of the minimal path
set are functioning. Consequently,
( x ) = max j ( x ) = max
j
xi
i A
1 if j ( x ) = 1 for some j
=
0 if j ( x ) = 0 for all j
(4)
Further, we assume that the state of the ith component, x i is a random variable such that
P{xi = 1} = pi = 1 P{xi = 0}. The value p i , which equals to the probability that the ith component is functioning, is
called the reliability of the ith component. We introduce another variable r such that
r = P{ ( x ) = 1} = E{ ( x )}
Here r stands for the reliability of the system. When the components, e.g., the random variables
can be expressed as a function of the component reliabilities. That is,
r = r ( p)
(5)
where p = ( p1 , . . . . . . , pn ) . The function r ( p) is called the reliability function. When the lifetimes of components are
exponentially distributed, the reliability of the ith component at time t (given that it was operational at t = 0 ) can be computed
as e i t . Here,
i = Ti 1 , where
We can illustrate the concepts of minimal path sets described above via the following example. Suppose the system
consist of eight components is arranged as in Fig. 1, where the list of minimal path sets is also shown.
A
H
E
B
G
D
(7)
(8)
To compute the value of r, we need to simplify the above expression as the sum of disjoint products (independent
events). However, this simplification is an NP hard problem. In the following, we employ an efficient method for computing
the sum of disjoint products (SDP).
2.1.2 Simplification of the reliability expression
There are several algorithms for obtaining SDP. Among them Abraham algorithm1, Abraham Lock Revised (ALR)
algorithm2, Abraham Lock Wilson (ALW) algorithm3 and Klaus Heidtmanns (KDH88) algorithm4 are efficient for different
type of systems. For example, KDH88 and Abraham algorithm are very efficient for small-sized networks, ALR works well
with both medium and small-sized networks, while ALW can handle even larger networks (but, complexity of
implementation is higher for ALW). Based on the consideration of complexity of implementation and speed of operation, we
find that ALR algorithm works most efficiently for our sample problem (power system of a data center). The algorithm, in a
concise form, is presented below:
1.
Find all the minimal paths and order them according to the following rules:
a)
Order by the size of the term, smaller terms precede larger terms.
b) For each group of terms of the same size, do lexicographic ordering. For example, abc precedes abd, abd
precedes bce, etc.
Suppose there are altogether s minimal paths, { A1
i = 2,
LL , s, determine whether
Ai is disjoint with A1 ,
LL, A
s
i 1 by
Form a polynomial, where each term is a set of variables in a prior path that is not also in the incumbent.
ii) Simplify the polynomial by absorption (we denote this polynomial by APj ).
ALR algorithm requires that the indicator variables x i be represented by alphabets. So, from here onwards, x A will be denoted as A ,
x B will be denoted as B , etc. Hence, we need to rewrite the expression for r in (7) as,
r = E{ ( x )} = E{max( BFH , BGH , ACEH , ACFH , ADFH , ADGH , BDFH , BDGH , ACEFH )}
iii) Invert the simplified form by an iterative rapid minimized inversion procedure. Order the inverted terms
according to the rules stated in step 1 (we denote this polynomial by AmPj ).
iv) Convert the minimized inverted form into a disjoint polynomial, AD j using the steps described below:
a)
Form a polynomial consisting entirely of 0-valued variables, where each term of this polynomial consists
only of those variables that: A) Are in a prior sister term of the inverted form, and
B) Are not included in the incumbent term of the inverted form.
Invert and minimize (all variables are now 1-valued), we used Shier and Whitteds SW 35 method for fast
inversion.
d) If the minimized inverted form consists of 1 term, continue; else, if it contains factors of the form
x + y + z , etc, put these factors into disjoint form, i.e., x + x y + x yz , etc.
e)
Multiply the factors obtained in step (d) by the incumbent term from the inverted form and augment the
disjoint subpolynomial. When all terms of the inverted form for the incumbent minimal path have been
processed, proceed to step (v) of the main algorithm.
v) Multiply each term of the disjoint polynomial by all of the 1-valued variables of the incumbent path.
vi) Augment the system polynomial.
vii) Go to the next path set that is found to be non-disjoint in step 1 and start over from step 2( i).
For the example in Fig. 1, the ALR algorithm works as follows:
Term 1: A1 BFH
Labeled as disjoint according to step 1 (as this is the first term the of the ordered path set).
Term 2: A2 BGH
Not disjoint with A1. Step 2(i) gives rise to the polynomial AP = F and AmP = AP = F . After performing inversion
2
(as stated in step 2(iii)), we obtain, AmP = F . Since no ordering is required here, we perform step 2(iv) next and
2
determine AD = F . Step 2(v) results in the disjoint term, BF GH .
2
Term 3: A3 ACEH
Not disjoint with (A1, A2). Step 2(i) gives rise to the polynomial AP3 = BF + BG and AmP3 = AP3 = BF + BG . After
performing inversion (as stated in step 2(iii)), we obtain AmP3 = B + F G. Again, since no ordering is required here,
AB CEH , ABCEF G H .
Performing disjoint operations for the rest of the path sets, we finally obtain
( x ) = BFH + B FGH + A BCEH + ABCE F GH + A BC E FH + ABC DFH + ABC D EGH + ABC DE FGH
+ ABDF GH + A BD FGH + A BC DEF GH
from which system reliability r can be computed directly.
Rapid inversion is simultaneous multiplication and inversion. Every term of the product is a Boolean product of a term T of the
multiplicand, which is a Boolean product of inverse of several variables, by the inverse V of a single variable. When T contains V, the
T-by-V product is T, and the term-by-V products for all the terms of the multiplicand that differ from T in that they contain all of the
variables of T except V are absorbed into T. For example, let the multiplicand be:
P = T + A B D + B C G + B D E F , the variable V = A and the term T = A B C . Then the resultant product is P.V = T + A B D .
i)
LL x
Replace the i th
i
a successful solution set, S success .
and x k | k j S success ( j = 1,
min
, then insert x i in
LL n, k = 1, LL n, ).
Compute the system availability. If current a > a min , then include {x j , x k } in the successful solution set, S success .
iii) Continue to increase the number of components to be replaced and determine all the elements of S success .
iv) Compute the cost to implement each successful solution and find the one with minimum cost.
2.
LL
set
of components,
x1
LL x
take
components
ii) Find the combinations with replacement cost less than or equal to the specified budget and compute availability with
replacement of each combination at t = t c . Find the combination that results in maximum availability and label it as
the best solution.
Sensors/Component State Evaluator: This block translates the monitored data into useful information for evaluating
and estimating the states of components in the system. By using signal processing techniques and making
Component specifications,
System parameters
Theoretical Reliability
Predictor
Sensors/
Component State
Evaluators
Monitored Data
Partially
Implemented Block
Power
System
In practical situations, allocated budget for dynamic maintenance seldom exceeds the cost required to replace more than a few component
at a time. Consequently, we set m = 2 as default. However, the software is able to handle higher values of m.
comparison with the conditions expected/specified at different levels of performance, useful information concerning
residual lifetime of a component is extracted here6.
2.
Lifetime Data Modifier: Theoretical evaluation of reliability is solely based on the components MTTF data. Since
real operating conditions of a component differ from assumed ones, discrepancies arise between specified and true
MTTFs. Using the historical data on reduction of lifetime related to the deterioration of performance and comparing
it to the processed information on component state, better predictions of MTTFs could be made. This block will be
implemented by making use of Neural Network techniques6.
3.
Theoretical Reliability Predictor: Based on the system component specifications and residual lifetime predictions,
this module predicts the theoretical reliability of the system. It also provides information on fault propagation in the
system.
4.
Availability Evaluation & Maintenance Scheduling Block: This block computes the availability of the system
projected at different instances in the future and also proposes alternative solutions for maintenance scheduling.
5.
Comparison and Best Solution Evaluation Block: After generating a series of alternative solutions, this block selects
one which provides the maximum availability under specified maintenance budget or one which ensures a given
level of availability with minimum cost.
After deployment at a particular site, the accuracy of lifetime data modifier block improves with usage. Implementation of
this block should enhance the performance of the reliability predictor.
{U }
ki
i
out m m =1 .
{C }
{V
i
out m
i
, I out
m
ji
i
inq q =1 .
{L }
ki
i
out m m =1
ji
i
inq q =1 and
Let the corresponding rated capacities of the units connected to incoming and outgoing lines be
{C }
, pf }
ji
i
inq q =1 and
{U }
{L }
ki
i
out m m =1
, respectively. Let the measured voltages, currents and power factors on the outgoing lines be
ki
i
out m m =1 .
U ini 1
Liin2
i
in2
Liin j
U ini j
Liout1
Liin1
Liout 2
Ui
i
out k i
L
i
i
U out
1
i
U out
2
i
U out
k
i
i
i
i
U i out m from U i is, Pout
= Vout
.I out
. pf out
. Hence, the total power output from U i is,
m
m
m
m
ki
ji
i
Poi = Pout
. The sum of the capacities of input units is given by, Cini = Cini q . The excess capacity of Ui can be
m
m =1
evaluated as,
q =1
i
in
i
o
i
in
i
o
In order to improve reliability, redundant components are added to a power system. Typically, one finds four different
types of redundant situations:
1) A unit has no redundancy (represented by O Operating alone)
2) A unit has redundancy and the redundant units are operational and are sharing load (represented by O/S
Operating and Sharing).
3) A unit has redundancy, but the redundant units are not operational and require a specified amount of time to be
brought into operation (represented by R/C Redundant Cold Spare).
4) A unit has redundancy and the redundant units can be readily brought into operation (represented by R/H
Redundant Hot Spare).
For case 1, the expression for C Excessi can be readily used. If we encounter O/S type of redundancy, then we need to run a
Load flow/Power flow analysis on the system to determine the exact amount of power being handled by the unit and to
compute C Excessi for each unit. For cases 3 and 4, the excess capacities of the redundant units are zero. The capacity margin of
the system is evaluated as,
C m arg insystem = min(C Excessi ) . Capacity margin of a system is a dynamic performance indicator; it
is evaluated from the monitored real-time data and is vital for the well-being analysis of a system.
The above stated procedure gives satisfactory results only under ideal operating conditions, i.e., under the situation
where the power quality is excellent and the devices experience no derating phenomena. In practice, one needs to consider
the effects of harmonics and decrease in capacity of devices due to non-ideal operating conditions. These are currently being
incorporated into our software.
Load Flow
analysis
Trend data
Calculate,
C i Poi
Is it an in
node (source)?
Modification
of monitored
Monitored
power data
yes
The Marginal Capacity of
node i is,
C i Poi
min(Ci , j C ji ) Poi
acquired. This module also identifies the device (component) having minimum capacity margin. If this capacity margin falls
below a specified limit, which is essential for a healthy system, then an alarm is triggered.
**
We are in the process of implementing this block. At present monitored data is directly fed to the Load Flow sub-module.
The Load Flow sub-module has been designed with provisions to choose among Newton-Raphson, Gauss-Siedel and Fast Decoupled
Computation methods.
480 VAC
UTILITY
FEED A
System A Utility
Distribution
Switchgear
SYSTEM "A"
UTILITY DISTRIBUTION
SWITCHGEAR
System A
Computer
Switchgear
480 V AC
Utility
Feed B
480 V AC
Gen.
480 VAC
UTILITY
FEED B
System B Utility
Distribution
Switchgear
SYSTEM "A"
UPS / BYPASS
Battery
System B
UPS/Bypass
SYSTEM "B"
UPS / BYPASS
Generator
Switchgear
SYSTEM "B"
UTILITY DISTRIBUTION
SWITCHGEAR
SYSTEM "B"
COMPUTER
SWITCHGEAR
System A
UPS/Bypass
480 VAC
System B
Computer
Switchgear
SYSTEM "A"
COMPUTER
SWITCHGEAR
480 V AC
Gen.
480 VAC
GENERATOR
SWITCHGEAR
System A
Mechanical
Switchgear
System B
Mechanical
Switchgear
SYSTEM "A"
MECHANICAL
SWITCHGEAR
SYSTEM "B"
MECHANICAL
SWITCHGEAR
MCCs
MCCs
MCC'S
MCC'S
Battery
BATTERY
BATTERY
System A
UPS
Switchboard
System B
UPS
Switchboard
SYSTEM "A"
UPS
SWITCHBOARD
SYSTEM "B"
UPS
SWITCHBOARD
Maintenance
Switchgear
MAINTENANCE
SWITCHGEAR
Load
Bank
LOAD
BANK
System A Critical
Switchboard
SYSTEM "A"
CRITICAL
SWITCHBOARD
PDU
PDU
RPP
RPP
PDU
PDU
System B Critical
Switchboard
SYSTEM "B"
CRITICAL
SWITCHBOARD
STATIC
SWITCH
Static Switch
PDU
PDU
RPP
RPP
RPP
RPP
Dual Corded
Server
DUAL CORDED
SERVER
SINGLE CORDED
SERVER
Figure 5. One line diagram of a data center with 480VAC primary feed
there is a wide diversity in the power handling capacities of these devices due to various derating phenomena. Along with the
monitored data, some deterministic models (e.g., for reduction of power handling capacity with aging, effect of harmonics
on power handling capacity, etc.) are needed to determine the excess capacity of the system.
3.2 Evaluation of Reliability/Availability and Excess Capacity of Power System of a Data Center
3.2.1 Reliability/Availability evaluation and maintenance scheduling
The software enables the graphical entry of the one line diagram of a system in terms of nodes (corresponding to devices) and
edges (representing signal/power carrying lines) as well as the specifications of components (both nodes and edges) through
multiple dialog boxes. Fig. 6.1 shows the one line diagram of a power system, while Fig. 6.2 shows how a user can define the
parameters for reliability improvement and maintenance scheduling.
Figure 6.1. User Interface (One line diagram of data center power system)
The user can define lifetime distributions of components (the default is exponential, options for normal and Weibull are also
available) and distribution parameters for reliability/availability evaluation. For maintenance scheduling, the user can define
component replacement cost, replacement budget, reliability lower bound, and desired reliability improvement. In addition,
the user is allowed to insert faults in the system to simulate system operation under faulty conditions.
Figures 7.1-7.2 (shown in previous page) show reliability/availability under maintained and non-maintained conditions
for the specifications given in Fig. 6.2. From Fig. 7.1, it can be seen that components indexed 3 and 10 (i.e., Gen. A and the
maintenance switchgear, respectively) in the block diagram were replaced (i.e., maintenance was performed) when system
reliability/availability had fallen below the specified lower bound. Fig. 7.2 simulates the situation where no maintenance is
scheduled.
3.2.2 Excess capacity computation
Excess capacity computation requires some component specifications from the user and real-time data from different
monitoring devices. User needs to specify different line and bus parameters (e.g., line resistance and reactance, half line
susceptance, line tap setting, bus type and min/max Mvar for the generation buses). Bus voltage, current, power-factor, load
Mw and Mvar, generator Mw and Mvar and injected Mvar values are initialized with data obtained from the monitoring
system. Node ratings, such as rated voltage, rated current, rated capacity and desired capacity margin are to be specified by
the user. The computed excess capacity values appear on screen below the corresponding nodes and present user with an
overall picture of the excess capacity of the system.
REFERENCES
[1] J. A. Abraham, An improved algorithm for network reliability, IEEE Trans. Reliability, vol. R-28, 1979, pp 58-61.
[2] M. O. Locks, A minimizing algorithm for sum of disjoint products, IEEE Trans. Reliability, vol. R-36, 1987, pp 445453.
[3] J. M. Wilson, An improved minimizing algorithm for sum of disjoint products, IEEE Trans. Reliability, vol. 39, 1990,
pp 42-45.
[4] K. D. Heidtmann, Smaller sums of disjoint products by subproduct inversion, IEEE Trans. Reliability, vol. 38, 1989,
pp 305-311.
[5] D. R. Shier & D. E. Whited, Algorithms for generating minimal cut sets by inversion, IEEE Trans. Reliability, vol. R34, 1985, pp 314-319.
[6] Fang Tu, Signal processing & neural network toolbox and its application to failure diagnosis and prognosis, SPIE
Conference on Fault Diagnosis, Prognosis and System Health Management, Orlando, Florida, April 2001.
[7] Sheldon M. Ross, Introduction to probability models, New York: Academic Press, c1980, pp 477-486.
[8] Hadi Sadat, Power system analysis, WCB/Mc Graw Hill Book Company, New York, 1999, ch-6, pp 189-240.
[9] Alexander Kusko, Emergency standby power systems, Mc Graw Hill Book Company, New York, 1989.
[10] Math H. J. Bollen, Understanding power quality problems, IEEE Press, New Jersey, 2000.
[11] W. Edward Reid, Power quality issues standards and guidelines, IEEE Trans. Industry Applications, vol. 32, 1996,
pp 625-629.
This performance was registered on a PC with inteltm Pentium III, 667 MHz processor.
[12] IEEE Recommended practice for the design of reliable industrial and commercial power systems (The Gold Book),
IEEE standard 493, 1990.
[13] A. Arsoy, S. M. Halpin, Y. Liu & P. F. Riberio, Modeling and simulation of power system harmonics, IEEE product #
EC 102 (CD ROM), 1999.