Vous êtes sur la page 1sur 6

1

Performance Evaluation of Counter-Based


Dynamic Load Balancing Schemes for Massive
Contingency Analysis with Different Computing
Environments
Yousu Chen, Member, IEEE, Zhenyu Huang, Senior Member, IEEE, and Daniel Chavarría-Miranda

safely and reliably operate today’s power grids. As electricity


Abstract—Contingency analysis is a key function in the demand continues to grow, the personnel operating and
Energy Management System (EMS) to assess the impact of managing the power grid are facing some new fundamental
various combinations of power system component failures based challenges.
on state estimation. Contingency analysis is also extensively used One of the new challenges in power grid operation is the
in power market operation for feasibility test of market solutions.
separation of administrative boundaries – called Balancing
High performance computing holds the promise of faster analysis
Areas (or BAs) – which own, operate, and/or manage their
of more contingency cases for the purpose of safe and reliable
operation of today’s power grids with less operating margin and own areas of the grid. When performing contingency analysis,
more intermittent renewable energy sources. This paper each BA looks no further than its own boundaries. The case of
evaluates the performance of counter-based dynamic load contingencies occurring simultaneously is not considered,
balancing schemes for massive contingency analysis under which will likely have very large system-wide impact. This
different computing environments. Insights from the requires “N-x” contingency analysis, i.e. analysis of
performance evaluation can be used as guidance for users to simultaneous occurrence of multiple contingencies in multiple
select suitable schemes in the application of massive contingency BAs.
analysis. Case studies, as well as MATLAB simulations, of Another challenge is that as the current power grid
massive contingency cases using the Western Electricity
operation is closer to the edge of stability, the consequence
Coordinating Council power grid model are presented to
illustrate the application of high performance computing with could be massive blackouts resulting in significant disruption
counter-based dynamic load balancing schemes. of electricity supplies and economic losses [1][2]. Power grid
blackouts often involve failure of multiple elements as
Index Terms—Contingency Analysis, Energy Management revealed in recent examples. Preventing and mitigating
System, Parallel Computing, and Dynamic Load Balancing. blackouts requires “N-x” contingency analysis. The North
American Electricity Reliability Corporation (NERC) moves
I. INTRODUCTION to mandate contingency analysis from “N-1” to “N-x” in its
grid operation standards [3]. All this calls for a massive
CONTINGENCY analysis is a key function in the Energy
Management System (EMS), which assesses the ability of number of contingency cases to be analyzed. As an example,
the Western Electricity Coordinating Council (WECC) system
the power grid to sustain various combinations of power grid
component failures based on state estimates. Because of the has about 20,000 elements. Full “N-1” contingency analysis
heavy computation involved, today’s contingency analysis can constitutes 20,000 cases, “N-2” is roughly 108 cases, and the
be updated only every few minutes for only a selected set of number increase exponentially with “N-x”.
“N-1” contingency cases (i.e., failures of one component). For Earlier high-performance computing (HPC) applications to
example, the EMS system at Bonneville Power Administration power system problems such as state estimation and
(BPA), one of the well-maintained systems, runs 500 contingency analysis have achieved promising results
contingency cases in a time interval of five minutes. Though it [5][6][7]. The authors’ previous work [4] established the
has been a common industry practice, analysis based on framework of “N-x” parallel massive contingency analysis
limited “N-1” cases may not be adequate to assess the with a dynamic load balancing scheme using a single counter,
vulnerability of today’s power grids due to new development illustrated by case studies of massive 300,000-contingency-
in power grid and market operations. case analysis using the Western Electricity Coordinating
The trend of operating power grids closer to their capacity Council power grid model. Since the scalability of dynamic
and integrating more and more intermittent renewable energy load balancing schemes with a large number of processors is
demands faster analysis of massive contingency cases to likely to be limited by counter congestion, in this paper we

978-1-4244-6551-4/10/$26.00 ©2010 IEEE


2

continue to investigate counter congestion management with computational efficiency is determined by the longest
multiple-counter dynamic load balancing schemes under execution time of individual processors. Hence, the
different computing environments and provide guidance for computational power is not fully utilized as many processors
users to select suitable schemes to meet their own needs. are idle while waiting for the last task to finish; with the
This paper starts with an introduction of counter-based dynamic load balancing scheme, the computation time on each
dynamic load balancing schemes in Section II and an processor is optimally equalized.
overview of two main network environments, InfiniBand and Figure 2 shows the performance comparison of static and
Ethernet, in Section III, followed by Section IV on the dynamic computation load balancing schemes with WECC
MATLAB simulation results for multi-counter based dynamic full N-1 contingency cases (17,346 cases). Clearly, dynamic
load balancing schemes. Section V presents the actual case load balancing has better linear scalability. Figure 3 shows the
studies and performance analysis using a HP cluster computer, evenness of execution time with different load balancing
NWICEB. Section VI provides the guidance for users to select schemes with 32 processors, where the variation of execution
suitable schemes and discusses relevant issues on contingency time with dynamic load balancing scheme is much smaller
selection and decision support capabilities in the context of than its counterpart.
massive contingency analysis. Section VII concludes the
paper with future work suggested. 14,000-bus WECC Full N-1 Analysis
35

II. COUNTER-BASED DYNAMIC LOAD BALANCING SCHEME 30


Static load balancing
Dynamic load balancing
25
Contingency analysis is inherently a parallel process

Speedup
20
because multiple contingency cases can be easily divided onto
multiple processors and communication between different 15

processors is minimal. The data access is homogeneous. 10

Therefore, cluster-based parallel machines are well suited for 5

contingency analysis. For the same reason, the challenge in 0

parallel contingency analysis is not on the low-level algorithm 0 5 10 15 20


Number of processors
25 30 35

parallelization but on the computational load balancing (task Figure 2 Performance comparison of static and dynamic computation load
partitioning) to achieve the evenness of execution time for balancing schemes with WECC full N-1 contingency cases
multiple processors.
The framework of parallel contingency analysis is shown 1000 Static Load Balancing
in Figure 1 [4]. Each contingency case is essentially a power Dynamic Load Balancing

flow solution. In our investigation, the full Newton-Raphson 950


Time (seconds)

power flow solution is implemented. Given a solved base 900


case, each contingency updates its admittance matrix with an Average: 834 sec

incremental change from the base case. One processor is 850


designated as the master processor (Proc 0 in Figure 1) to
800
manage case allocation and load balancing, in addition to 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
running contingency cases. Processor #

Figure 3 Evenness of execution time for WECC full N-1 contingency analysis
Proc 0:
(1) Distribute base case Y0 matrix with different computational load balancing schemes
(2) Perform load balancing (static/dynamic)
Proc
Proc00
(3) Distribute case information to other processors The speedup performance of dynamic load balancing
(4) Perform contingency analysis scheme can be estimated using the following equation:

Speedup = N P
(tc + tio ) (1)
Proc
Proc11 Proc
Proc22 … Proc
ProcNN
t c + tio +t + P
(N − 1)t w
cnt
Other Proc’s: 2
(1) Update Y matrix based on case information: Y = Y0 + ΔY
(2) Perform contingency analysis
where N P is the number of processors, t cnt is the counter
Figure 1 Framework of parallel massive contingency analysis
updating time, t w is the waiting time due to counter
They are two main categories of load balancing schemes: congestion, t c and tio are the average computation time and
static load balancing and dynamic load balancing. The static I/O time, respectively. In order to improve speedup
load balancing scheme pre-allocates an equal number of cases performance, it needs to reduce the counter updating time
to each processor, while the dynamic load balancing scheme t cnt , as well as the waiting time t w .
allocates tasks to processors based on the availability of a The counter update time t cnt is mainly determined by the
processor. With the static load balancing scheme, the overall
3

network bandwidth and speed. Minimizing t cnt usually means network is approximately 1/20 of that of Ethernet.
selecting a high-performance network connection between
TABLE 1 THE TYPICAL VALUES OF LATENCY IN 1GB/SEC ETHERNET, AND
processors. The waiting time t w is due to counter congestion. INFINIBAND NETWORKS
Though more processors would improve the speedup 1GB/Sec Ethernet InfiniBand
performance, they also increase the possibility of counter
Latency (μSec) ~30 ~1-2
congestion as shown in (1). Counter congestion will occur
when multiple requests arrive at the same time. Therefore, the Bandwidth 100~200 MB/Sec 10GB/Sec*
scalability of dynamic load balancing schemes is likely to be * InfiniBand is a type of communications links between processors and I/O
limited by counter congestion. devices that offers throughput of up to 2.5 GB/Sec, and it can achieve
10GB/Sec or higher bandwidth through double-rate and quad-rate techniques.
In order to better manage counter congestion, we propose a
multi-counter based dynamic load balancing scheme with task
IV. MATLAB SIMULATION RESULTS
stealing. The framework of this new scheme is illustrated in
Figure 4 with two counters. The equal number of cases is pre- Before the actual case studies are conducted, Matlab
allocated to the two counter groups. Each group has its own simulations are performed to predict the performance of multi-
counter (Proc 0 in Figure 4). Inside each group, the dynamic counter schemes under different simulated computing
load balancing scheme is applied based on the availability of environments. The advantage of using simulated environments
processors. When the pre-allocated tasks are finished in one is the flexibility of studying different configurations and
group, the counter in this group “steals” tasks from the other reducing the time implementing the schemes on actual parallel
group to continue the computation until all tasks are done. By computers.
implementing the multi-counter dynamic load balancing The main factors could affect speedup include: (a) the
scheme, counter congestions can be reduced, and further number of processors, N P ; (b) the number of cases, N; (c) the
speedup is expected. latency of the network communication; (d) the counter
updating time; and (e) the bandwidth of the network. Since the
effects of factor (c), (d), and (e) are equivalent in terms of the
contribution of total time, these factors can be treated as one
term, tcnt, for the purpose of MATLAB simulation. In order to
study the sensitivities of the factors of N P , N, and tcnt, two
Figure 4 Framework of multi-counter based dynamic load balancing scheme sets of simulations are studied with the number of processors
with task stealing. ranged from 20 up to 210: (a) different number of cases, N =
500, 5000, 20000 with respect to different number of
The cost to minimize counter congestion using multi-
processors; and (b) different tcnt, tcnt = 0.0001, 0.002, and
counter schemes is the overhead with managing multiple
0.005, with respect to different number of processors.
counters. Even though additional counters can reduce counter
In order to make the simulation data closer to the actual
congestion, it is possible that the overhead would compromise
computational time, the actual computational time for WECC
the benefit gained by reducing counter congestion. Therefore,
full N-1 contingency analysis is studied, based on the
it is important that we evaluation the performance of these
histogram of the time, and then is used in MATLAB
load balancing schemes and determine under what conditions
simulation studies.
the multi-counter scheme had superior performance over the
Figure 5 shows the speedup performance with different
single-counter scheme.
numbers of cases. The Tcnt is fixed to 0.0001 for this set of
simulations. The horizontal axis is the number of processors in
III. NETWORK ENVIRONMENT COMPARISONS
base-2 exponentials, and the vertical axis is the speedup. It is
As stated earlier, minimizing the counter updating time t cnt clear that the better speedup can be achieved when the number
usually means choosing a high-performance network of cases increases. This statement has been confirmed by the
connection among processors. However, due to the cost case studies in [4]. With 512 processors, the speedup is 462
issues, PC-based Ethernet network dominates the current for 20,000 contingency cases, 503 for 150K cases and 507 for
utility control center environment. In order to better 300K cases.
understand the effect of high-performance networks, it is The sensitivity of tcnt is shown in Figure 6. In this
important to study the performance of multi-counter dynamic simulation, the number of contingency cases is fixed to be
load balancing schemes under different network 20,000. Same as in Figure 5, the horizontal axis shows the
environments. number of processors in base-2 exponentials, and the vertical
The main network properties are latency and bandwidth. In axis shows the speedup.
this paper, we are interested in the latency and bandwidth in There are three observations from Figure 6. The first
two common networks: 1GB/Sec Ethernet, and InfiniBand observation is that for certain number of processors, the larger
network. The typical values of latency and bandwidth in these tcnt, which represents the low speed communication network
two networks are listed in Table 1. The latency of InfiniBand (e.g. Ethernet network), the less speedup archived for both
4

single-counter and two-counter schemes; the second balancing schemes is implemented on a NWICEB cluster
observation is with larger time of tcnt, the two-counter dynamic machine, which is a total of 128 processors with the high-
load balancing scheme shows better performance than the speed InfiniBand communication link. The 14,000-bus WECC
single-counter scheme when the number of processors power grid model is used as the study model. In order to
increases. The larger the number of processors, the better simulate the Ethernet environment on the NWICEB machine,
performance for two-counter scheme appears; and the third the operation of counter update is executed 20 times more to
observation is that when the number of processors is relatively mimic the low speed communication. The reason for using the
small, e.g. less than 128, and with tcnt=0.0001s, which number of 20 is because that the latency of an Ethernet
represents the high speed communication network, the network is approximately 20 times of that of InfiniBand.
performance of single-counter scheme is better than that of the Three different scenarios with different number of
two-counter scheme. However the two-counter scheme has contingency cases, N = 500, 5000, and 17346, are tested on
better performance than single-counter scheme when tcnt is NWICEB using both the single-counter and two-counter
larger for the number of processors less than 128. schemes. Two different environments (InfiniBand and
simulated Ethernet) are compared. The execution time of all
scenarios, excluding disk I/O time for the purpose of
eliminating side-effects, with InfiniBand network is listed in
Table 2, while Table 3 shows the execution time with the
simulated Ethernet network.

TABLE 2 EXECUTION TIME* IN SECONDS OF MASSIVE CONTINGENCY ANALYSIS


ON THE NWICEB MACHINE UNDER INFINIBAND NETWORK
# of cases 500 5000 full N-1 (17346)
# of procs 1_cnt 2_cnt 1_cnt 2_cnt 1_cnt 2_cnt
1 260.7 260.7 2285 2285 9038 9038
2 147.39 154.40 1327.9 1348.5 5095.6 5177.2
4 85.487 91.243 748.69 751.41 2880.3 2887.5
Figure 5: Single-counter vs. two-counter comparison for different number of
cases with respect to different number of processors 8 50.1 53.044 406.3 417.26 1488 1546.6
16 28.494 27.609 205.95 210.78 765 767.6
32 14.8325 14.949 108.46 110.35 397.81 386.2
64 8.1825 9.4521 62.339 57.92 187 201.9
128 5.7331 6.2403 30.01 33.251 96.63 97.87
* Disk I/O time excluded

TABLE 3 EXECUTION TIME* IN SECONDS OF MASSIVE CONTINGENCY ANALYSIS


ON THE NWICEB MACHINE UNDER SIMULATED ETHERNET NETWORK
# of cases 500 5000 full N-1 (17346)
# of procs 1_cnt 2_cnt 1_cnt 2_cnt 1_cnt 2_cnt
1 261.12 261.12 2359.9 2359.9 8994.1 8994.1

Figure 6: Single-counter vs. two-counter comparison for different counter time 2 147.51 144.16 1341.5 1316.4 5098.4 5015.2
with respect to different number of processors 4 83.546 85.655 754.4 764.7 2869.9 2804.6
8 49.191 48.145 402.45 419.18 1569.5 1532.7
The third observation is important for most utility/control
center users, who want to implement massive contingency 16 26.236 26.007 207.39 210.18 804.6 778.95
analysis in their current environments. Currently, most 32 14.12 16.903 109.03 112.7 408.36 399.34
utilities/control centers do not own a large number of 64 8.1423 8.7142 58.26 60.537 211.03 202.66
processors, and they are mostly using Ethernet network
128 5.8304 6.869 29.959 32.098 112.17 108.17
environment. Therefore, the two-counter dynamic load
balancing scheme would be useful for their contingency * Disk I/O time excluded
analysis applications.
In Table 2, the execution times with the two-counter
V. CASE STUDIES OF MASSIVE CONTINGENCY ANALYSIS WITH scheme are larger than those with the single-counter scheme,
DIFFERENT COUNTER-BASED SCHEME which indicates that the two-counter dynamic load balancing
scheme does not show better speedup performance than the
The massive contingency analysis framework with the
single-counter scheme under InfiniBand environment.
single-counter and two-counter dynamic computational load
5

Corresponding to Table 2, the speedup results with both (a) In the case of a computing environment with an Ethernet
single-counter vs. two-counter schemes under InfiniBand network or equivalent and no high-speed networking
environment are shown in Figure 7. This result matches the capabilities with a large number of processors, the two-
MATLAB simulation in Section IV. The main reason for this counter dynamic scheme is expected to have better
phenomenon is that the latency in InfiniBand is low and the performance for the application of massive contingency
communication speed is fast. As such, the counter congestion analysis;
is less likely to happen with a relatively small number of (b) In the case of a computing environment with a high-
processors. Therefore, the overhead introduced by an speed communication network but only a small number of
additional counter impairs the performance under the testing processors, the single-counter dynamic scheme is suggested;
circumstances. As shown by the MATLAB simulation, the (c) In the case of a computing environment with a high-
two-counter scheme is more suitable for the application of a speed communication network and a dedicated cluster
larger number of processors, i.e. with a large number of computer with a large number of processors, the performance
processors, the two-counter scheme will improve the of the two-counter dynamic scheme will be better than that of
computational performance of massive contingency analysis. the single-counter scheme.
These considerations can be used as guidance for actual
100 implementation of massive contingency analysis.
90
80
90
70
Single N=500 80
60
Speedup

Two N=500 70
50
Single N=5000 60 Single N=500
40

Speedup
30 Two N=5000 50 Two N=500
20 Single N=17346 40 Single N=5000
10 30
Two N=17346 Two N=5000
0 20
Single N=17364
1 2 4 8 16 32 64 128 10
Two N=17364
The number of processors 0
1 2 4 8 16 32 64 128
Figure 7: Single-counter vs. two-counter comparison with respect to different
The number of processors
number of under InfiniBand environment
Figure 8: Single-counter vs. two-counter comparison with respect to different
When the communication speed is relatively low, counter number of under Simulated Ethernet environment
congestion is more likely to happen. As shown in Table 3,
when the number of contingency cases is large, (N=17,346), As mentioned in the Introduction section, the number of
the two-counter scheme can improve the overall performance cases increases exponentially as the “x” in “N-x” increases.
under the simulated Ethernet environment. Very importantly, When N-x contingency are considered, there are two major
this is true when the number of processors is relatively small issues: massive number of cases and massive amount of data.
and the number of contingency cases is large enough. For Since the sheer number of contingency cases leads to the
example, for full N-1 contingency analysis cases and with 16 impracticality of even the simplified computation for all cases,
processors, the execution time with single counter is 804.6 to solve the first issue, we need smart contingency selection
seconds, while the time with two counters is 778.95 seconds, methods, as well as high performance computing (HPC)
which is 26 seconds less. The speedup with single-counter vs. techniques and hardware. The technical challenge of solving
two-counter schemes under the simulated Ethernet the second issue is how to navigate through the vast volume of
environment is shown in Figure 8. Figure 8 shows that when data and help grid operators to manage the complexity of
N is large (=17,364), the performance of the two-counter operations and decide among multiple choices of actions. The
scheme is better than that of the single-counter scheme; while state-of-the-art industrial tools use tabular forms to present
the performance of the two-counter scheme is worse when the contingency analysis results. When massive “N-x”
number of cases is small. These results match the MATLAB contingency cases are analyzed and the system is heavily
simulation in Section IV, Figure 6. stressed, the tabular method of display is rapidly overloaded
and it is then impossible for an operator to sift through the
VI. DISCUSSION large amounts of violation data and understand the system
The case studies, as well as MATLAB simulation results, situation within several seconds or minutes. Thus the
reveal insights regarding the performance of load balancing usefulness of massive contingency analysis is undermined and
schemes, which can serve as guidance for utilities to select the HPC benefit is diminished. In order to solve the second
suitable counter schemes to implement massive contingency issue, advanced visualization techniques, as well as human
analysis under their computing environments. Considerations factors, are needed to provide real-time situational awareness
for the implementation include:
6

and help operators to anticipate, recognize, and respond to


emergencies in time.
Yousu Chen (M’07) received his B.E. in Electrical Engineering from Sichuan
University, China, his M.S. in Electrical Engineering from Nankai University,
VII. CONCLUSIONS China, and M.S. in Environmental Engineering from Washington State
The performance of single-counter and two-counter based University. Currently he is a Research Engineer at the Pacific Northwest
National Laboratory in Richland Washington. His main research areas are
dynamic computational load balancing schemes is simulated high-performance computing applications, power system stability and control,
in MATLAB, and the schemes are implemented on a cluster and power system operations. Mr. Chen is an IEEE member and the Chair of
computer with an InfiniBand network environment and a the Richland Chapter of the Power Engineering Society.
simulated Ethernet network environment. The observations of Daniel Chavarría-Miranda is a Senior Research Scientist at the Pacific
actual case studies match the findings of MATLAB Northwest National Laboratory in Richland Washington. He can be reached at
simulation. The findings from MATLAB simulation results 902 Battelle Blvd. MSIN K7-90, Richland WA, 99352, E-Mail:
and the case studies can serve as guidance in using a high- daniel.chavarria@pnl.gov.

performance computer for the application of massive


contingency analysis with different network environments.
The single-counter dynamic load balancing scheme is
suggested for the users who own high-speed communication
network, but only have a small number of processors. The
two-counter dynamic load balancing scheme is suitable for the
users who either have a resource of large number of
processors, or have a small number of processors with a
relative low-speed network.
Future work shall include the study of smart contingency
screening and advanced decision support through techniques
such as visualization and parallel post-processing techniques.

VIII. REFERENCES

[1] D. N. Kosterev, C. W. Taylor, and W. A. Mittelstadt, “Model Validation


for the August 10, 1996 WSCC System Outage,” IEEE Trans. Power
Syst., vol. 14, no. 3, pp. 967-979, August 1999.
[2] U.S.-Canada Power System Outage Task Force, “Final Report on the
August 14, 2003 Blackout in the United State and Canada: Causes and
Recommendations”, Apr. 2004. Available at https://reports.energy.gov/.
[3] NERC standards, Transmission System Standards – Normal and
Emergency Conditions, available at www.nerc.com.
[4] Z. Huang, Y. Chen and J. Nieplocha, “Massive Massive Contingency
Analysis with High Performance Computing”, in: Proceedings of the
IEEE Power Engineering Society General Meeting 2009, Calgary,
Canada, July 26-30, 2009.
[5] Zhenyu Huang, and Jarek Nieplocha, “Transforming Power Grid
Operations via High-Performance Computing,” in: Proceedings of the
IEEE Power and Energy Society General Meeting 2008, Pittsburgh, PA,
USA, July 20-24, 2008.
[6] J. Nieplocha, A. Marquez, V. Tipparaju, D. Chavarría-Miranda, R.
Guttromson, Zhenyu Huang, “Towards Efficient Power System State
Estimators on Shared Memory Computers,” in: Proceedings of the IEEE
Power Engineering Society General Meeting 2006, Montreal, Canada,
June 18-22, 2006.
[7] Zhenyu Huang, Ross Guttromson, Jarek Nieplocha and Rob Pratt,
“Transforming Power Grid Operations via High-Performance
Computing,” Scientific Computing, April 2007.

IX. BIOGRAPHIES
Zhenyu Huang (M'01, SM’05) received his B. Eng. from Huazhong
University of Science and Technology, Wuhan, China, and Ph.D. from
Tsinghua University, Beijing, China, in 1994 and 1999, respectively. From
1998 to 2002, he conducted research at the University of Hong Kong, McGill
University, and the University of Alberta. He is currently a staff research
engineer at the Pacific Northwest National Laboratory, Richland, WA, and a
licensed professional engineer in the state of Washington. His research
interests include power system stability and control, high-performance
computing applications, and power system signal processing.

Vous aimerez peut-être aussi