Design Engineer

732
ADVANCED UTILITY DATA MANAGEMENT

AND ANALYTICS FOR IMPROVED OPERATION
SITUATIONAL AWARENESS OF EPU OPERATIONS
JOINT WORKING GROUP

D2/C2.41
JUNE 2018
ADVANCED UTILITY DATA MANAGEMENT
AND ANALYTICS FOR IMPROVED
OPERATION SITUATIONAL AWARENESS
OF EPU OPERATIONS
JWG D2/C2.41
Members
A. DEL ROSSO, Convenor US T. BORST, Secretary NL

R. JAMIESON UK T. XIA US
M. EIJGELAAR NL D. CORTINAS FR
M. HABJA FR E. PFAEHLER DE
G. ARROYO FIGUEROA MX S. CHEN US
G. LAKOTA SI S. RAJAGOPALAN US
S. NOURI IR X. CHEN CN
Contributing Members
G. SANTAMARÍA MX A. HERNÁNDEZ MX
M.Y. HERNANDEZ PÉREZ MX D. MARAGAL US
M. BASTOS BR
Copyright © 2018
“All rights to this Technical Brochure are retained by CIGRE. It is strictly prohibited to reproduce or provide this publication in
any form or by any means to any third party. Only CIGRE Collective Members companies are allowed to store their copy on
their internal intranet or other company network provided access is restricted to their own employees. No part of this
publication may be reproduced or utilized without permission from CIGRE”.
Disclaimer notice
“CIGRE gives no warranty or assurance about the contents of this publication, nor does it accept any responsibility, as to the
accuracy or exhaustiveness of the information. All implied warranties and conditions are excluded to the maximum extent
permitted by law”.
WG XX.XXpany network provided access is restricted to their own employees. No part of this publication may be
reproduced or utilized without permission from CIGRE”.
Disclaimer notice
ISBN : 978-2-85873-434-4
“CIGRE gives no warranty or assurance about the contents of this publication, nor does it accept any
ADVANCED UTILITY DATA MANAGEMENT AND ANALYTICS FOR IMPROVED OPERATION SITUATIONAL AWARENESS OF EPU OPERATIONS
EXECUTIVE SUMMARY
Objective
The CIGRE Joint Working Group No. D2/C2.41 is a joint effort between the study committees D2 and
C2. It has surveyed and examined current practices, industry trends, and new research on the use of
various data sources and analytics tools to enhance situational awareness of system operators, as well
as on the data-integration and -management technologies to facilitate effective implementation of
data-analytics applications in the control room and to support operation engineers.
Motivation
The increasing complexity and interconnectivity of modern electric grids, in addition to the highly
stringent reliability, economic, and environmental constraints, impose the need to provide system
operators and operation engineers with better tools for assessing system conditions and to support
them on making critical decisions. Fortunately, the large variety of internal and external data sources
that are available to electric utilities opens up the possibility to implement advanced data-analytics
and -visualization technologies to improve the way the system is operated and controlled. Analytics
algorithms capable of synthesizing actionable information from the raw data can be used to provide
tools that use real-time data streams to support fast, accurate, and adaptable decisions solving critical
problems at the right moment, as well as to plan mitigation actions against anticipated system
security issues.
Using data to make critical operational and business decisions is certainly not new to the electricity
industry. Indeed, techniques for data analysis have been applied to several areas such as load
forecasting, predictive asset maintenance, crew scheduling, outage management, and demand
response, among others. Nevertheless, the maturity and practical implementation of data-analytics
applications to support the operation of power systems remains relatively low compared to other
areas and industries. Therefore, it is very valuable to examine how advanced data-analytics
technologies can be further used to solve the emerging critical challenges in operating electric
systems.
Approach
The content of the technical brochure is broken down into the major areas that are relevant for the
development and implementation of data-analytics tools, which are: data and information sources,
data-analytics techniques to interpret this data, applications of data analytics in system operations,
data integration and modelling to integrate data into operations, and data quality.
This document has six main sections, each of them addressing one of these topical areas. The content
in each section is intended to provide the reader with an informed and comprehensive starting point
to understand the relevant issues and challenges in each area. The sections discuss latest advances in
terms of data-analytics methodologies, data-management and -integration tools, applications
development, and new trends and emerging technologies.
Value
This technical brochure provides useful insight on how advanced data-analytics techniques and tools
that integrate various data sources can be used to improve situational awareness of those who
operate power systems and to support various operation functions. This work is expected to be useful
for Cigre members in the following areas:
 Operators of transmission and distribution systems will gain knowledge on how new data-
analytics and -visualization technologies can help improve situational awareness.
 Product vendors will assist in identifying gaps in the market and potentially new uses for
existing products.
 Application and system developers will better understand what the challenges are for operations
and the need for better analytics tools.
 Researchers will assist in recognizing new areas for research and the application of this
research.
 Consultants and project engineers will provide relevant reference material.
3
Summary of Relevant Conclusions and Takeaway

The importance of decision support and situational awareness for system operators is becoming more
prominent as significant changes in the way systems need to be operated and controlled occur.
Improved situational awareness at all control levels is necessary to ensure that operational decisions
are properly made and executed, which is critical for maintaining the integrity of system operations.
Therefore, while operators will remain at the core of grid operations, it is becoming more and more
important that they are supported by advanced data-analytics and -visualization tools.
Like in other complex systems, the amount of automation in power systems to assist operators has
increased to a great extent. Even though advanced automation is essential in modern power systems,
there is still a human in the loop with the potential for human error, especially if an operator has
limited visibility of system conditions. Indeed, one of the risks of increased automation in the grid is
that operators may become less aware of the current system conditions. Consequently, the operation
support tools should be able to process and exhibit to the operator in a concise an effective manner.
Data Sources, Data Management, and Integration
Over the past few years, many companies have started their “big data” projects and are competing to
bring a set of information and communication technologies that are largely new to the utility industry.
Several tools that make use of advanced data-analytics techniques and integrate data for various
sources have been developed and implemented. These tools serve a variety of functions to support
system operation, including: tools to detect system events, identify and analyze faults, conduct wide-
area monitoring, monitor and analyze equipment health, trend and forecast load, monitor the
conditions of renewables and systems, and make recommendations for system operation. Many data-
analytics techniques have been around for years, many of them appearing in the 1990s. Also, a great
deal of research has been devoted to the use of data analytics in operation support. The difference
today is the advancements made in handing big data analytics and the adoption throughout the
industry.
Visual analytics is key is to improving the ability of an operator to have situational awareness and
make effective decisions. Providing interactive visual interfaces helps analysts and system operators to
get a better impression of possible symptoms and suspicious behavior and to understand the
performance of a power system to increase the situational awareness. The way in which system data
is presented to the operator can support the strengths and reduce the effects of limitations of human
perception and performance. Advanced visualization techniques enable a wider array of situation-
awareness capabilities to handle the increased complexity of system operation.
The way that data and information can be displayed and exposed to operators has evolved to a great
extent over the years, as the technology has progressed. Today’s visualization technologies have
advanced a long way from old-fashioned static visualization used in the past. Newer visualization
platforms include geographic-based dynamic visualization with user-friendly interfaces and real-time
measurements and analytical results from measurement-based and model-based tools that populate
the system map. Visual aids such as color contouring, 2-D and 3-D bubbles and cones, animation,
geo-spatial representation, display profiles, and integrated system views are widely used in newer
visualization tools.
New trends in visualization in control rooms are based on the concept of integrated space and time,
which is intended to help operators to assess current situations in a static fashion, to understand and
visualize evolving conditions in a power system, and to get better prepared to implement effective
control actions. In general terms, integrated space-time tools include three main functions: situational
awareness on first sight, projection of future status, and recommendations for operators.
As an example, RTE in France has developed a time-driven concept of situational awareness. The
objective is to create an application that will provide the operator a single user interface based on the
hyper-vision concept. The application is intended to help system operators to focus on the actions that
they must take by presenting at the right time the relevant information that they need to make the
right decisions. The designed system—called Apogee—performs security analysis on forecasted
system conditions in the near future through data-analytics and modelling tools and displays relevant
information to the operator only when it is needed. That is, the hyper-vision user interface remains
empty as long as no potential unsecure conditions are detected within the time horizon of the
analysis. In that way, the tool reduces the effects of limitations of human perception and improves
operator responsiveness and performance.
4
Barriers and Needs for Research and Development

In general terms, it can be stated that adoption of data analytics to support the operation of a power
system is gaining momentum, as utilities have started realizing the great potential of the technology.
Nevertheless, there is still a long way to go before these technologies are widely used across the
various operation functions. One of the barriers for more extensive use of data-analytics tools that
integrate various data sources is the shortage of standardized data structures. Well-defined
standards-based data models are essential to support advanced applications, analytics, and
visualizations used in grid operations. Even though significant progress has been made in this area,
more effective and accurate data models and procedures are needed for ensuring data integrity and
availability of the right data in the right format.
Another aspect that to some extent hinders the implementation of data-analytics solutions in system
operation is the lack of understanding of the value and accuracy of these technologies. Traditionally,
tools used in control centers and operation engineering have been mostly based on system models
and simulations. Because a diverse set of sources of power system and external sensor data are now
becoming available, a hybrid approach can be used to developed superior technical approaches and
software tools that have the potential to be implemented in system control rooms to support system
operators. Those tools will combine conventional analytics techniques based on physical models with
heuristic data analytics and decision-making methodologies. For instance, simulation engines would
perform contingency analysis and vulnerability/risk analysis across several different possible scenarios
that may be built with the help of data collected from a variety of sources.
5
CONTENTS
EXECUTIVE SUMMARY ............................................................................................................................... 3
OBJECTIVE .................................................................................................................................................................................. 3
MOTIVATION .............................................................................................................................................................................. 3
APPROACH .................................................................................................................................................................................. 3
VALUE ........................................................................................................................................................................................... 3
SUMMARY OF RELEVANT CONCLUSIONS AND TAKEAWAY ..................................................................................... 4
1. INTRODUCTION AND BACKGROUND...................................................................................... 11

1.1 MOTIVATION ............................................................................................................................................................... 11
1.2 OBJECTIVE .................................................................................................................................................................... 12
1.3 APPROACH ................................................................................................................................................................... 12
1.4 SITUATIONAL AWARENESS....................................................................................................................................... 14
WHY IS IT IMPORTANT....................................................................................................................................................... 16
CONCLUSION .......................................................................................................................................................................... 18
2. DATA SOURCES IN ELECTRIC POWER SYSTEMS .................................................................... 21

2.1 DATA FROM MONITORING AND PROTECTION DEVICES................................................................................... 21
2.1.1 Digital protective relays .................................................................................................................................... 21
2.1.2 Recorders ............................................................................................................................................................. 22
2.1.3 Revenue meters ................................................................................................................................................... 22
2.1.4 Synchrophasor s .................................................................................................................................................. 23
2.1.5 Remote terminal unit (RTU) ................................................................................................................................ 23
2.1.6 Power quality meters ......................................................................................................................................... 23
2.1.7 SCADA (supervisory control and data acquisition) ....................................................................................... 23
2.2 DATA FROM EQUIPMENT SENSORS ........................................................................................................................ 24
2.2.1 Circuit breaker .................................................................................................................................................... 24
2.2.2 Transformer.......................................................................................................................................................... 25
2.2.3 Distributed generation (solar and wind) ......................................................................................................... 27
2.2.4 BESS: Battery energy storage system ............................................................................................................. 28
2.2.5 New sensors for equipment monitoring ........................................................................................................... 30
2.3 NON-ELECTRICAL DATA SOURCES (EXTERNAL DATA) ......................................................................................... 31
2.4 COMMUNICATION REQUIREMENTS FOR SMART GRID DATA ........................................................................... 32
2.5 REFERENCES .................................................................................................................................................................. 36
3. DATA-ANALYTICS TECHNIQUES ................................................................................................ 37

3.1 DATA MINING AND ASSOCIATION RULES ............................................................................................................ 38
3.1.1 Brief definition ..................................................................................................................................................... 38
3.1.2 Technical description .......................................................................................................................................... 38
3.1.3 Application domains ........................................................................................................................................... 38
3.1.4 Potential applications......................................................................................................................................... 39
3.2 K-NEAREST NEIGHBOR............................................................................................................................................... 39
3.2.3 Application domains ........................................................................................................................................... 39
3.2.4 Potential applications in smart grid ................................................................................................................. 39
3.3 MACHINE LEARNING .................................................................................................................................................. 40
3.3.1 Supervised and unsupervised learning ........................................................................................................... 40
6
3.3.2 Linear regression ................................................................................................................................................. 41

3.3.3 Decision and regression trees ........................................................................................................................... 43
3.3.4 Artificial neural network .................................................................................................................................... 45
3.3.5 Support vector machine (SVM) ......................................................................................................................... 47
3.3.6 K-means and clustering ...................................................................................................................................... 49
3.4 PROBABILISTIC NETWORKS ...................................................................................................................................... 51
3.4.1 Bayesian networks .............................................................................................................................................. 51
3.4.2 Bayesian classifiers ............................................................................................................................................. 53
3.4.3 Decision networks ................................................................................................................................................ 53
3.5 DEEP LEARNING ........................................................................................................................................................... 54
3.5.3 Potential applications in smart grid ................................................................................................................. 55
3.6 VISUAL ANALYTICS ..................................................................................................................................................... 55
3.6.2 Related research areas and challenges ......................................................................................................... 55
3.6.3 The visual-analytics process .............................................................................................................................. 56
3.7 REFERENCES .................................................................................................................................................................. 57
4. APPLICATIONS OF DATA ANALYTICS IN SYSTEM OPERATIONS ........................................ 61

4.1 INTRODUCTION ........................................................................................................................................................... 61
4.2 DATA VISUALIZATIONS IN REAL-TIME SYSTEM OPERATION .............................................................................. 61
4.2.1 Visualization technologies in control centers .................................................................................................. 62
4.2.2 Example of control room visualization at ISO – ERCOT case ..................................................................... 71
4.2.3 Emerging trends in control room visualization................................................................................................ 74
4.3 DATA ANALYTICS IN SYSTEM OPERATION SUPPORT PROCESSES ................................................................... 77
4.3.1 Real-time situational awareness with PMU data........................................................................................... 77
4.3.2 Fault identification, location, and analysis ..................................................................................................... 80
4.3.3 Real-time stability assessment .......................................................................................................................... 84
4.3.4 Alarm processing and filtering ......................................................................................................................... 85
4.3.5 Renewable energy generation forecasting and storage analytics ........................................................... 85
4.3.6 Damage prediction (weather related or due to other causes) ................................................................... 86
4.3.7 Outage restoration analytics ............................................................................................................................ 87
4.3.8 Power quality analytics (including voltage control) ...................................................................................... 88
4.3.9 Peak load management (via demand-side management analytics) ......................................................... 89
4.3.10 Load research analytics and energy portfolio management analytics ..................................................... 89
4.3.11 Non-technical loss analytics ............................................................................................................................... 89
4.3.12 Physical and cyber security assessment analytics ......................................................................................... 89
4.3.13 Dynamic assessment of transmission line capacity (dynamic line rating) .................................................. 90
4.3.14 Cable thermal monitoring.................................................................................................................................. 93
4.4 SUMMARY OF INDUSTRY SURVEY ........................................................................................................................... 94
4.5 REFERENCES .................................................................................................................................................................. 99
5. DATA INTEGRATION AND MODELING .................................................................................. 103

5.1 DATA MODELING PROCESSES FOR SYSTEM OPERATIONS ............................................................................. 103
5.1.1 Model information and its usage ................................................................................................................... 103
5.1.2 Model update procedure and lifecycle ........................................................................................................ 106
5.2 DATA MODELS AND OPEN STANDARDS.............................................................................................................. 107
5.2.1 Why do we need a common data model?................................................................................................... 107
5.2.2 IEC standardized data models ....................................................................................................................... 107
5.2.3 Example of harmonization between CIM and IEC 61850 ........................................................................ 112
5.3 IMPACT OF NEW TECHNOLOGIES AND NEW DATA SOURCES ON DATA MODELING ............................ 112
5.3.1 Impact of Synchrophasors on operations data modeling .......................................................................... 112
5.3.2 Impact of renewable energy on operations data modeling..................................................................... 113
5.3.3 Impact of equipment health condition monitoring on operations data modeling .................................. 115
5.4 ADVANCED DATA INTEGRATION MODELING CASE STUDY ............................................................................ 121
7
5.5 CONCLUSIONS .......................................................................................................................................................... 123

5.6 REFERENCES ................................................................................................................................................................ 124
6. DATA QUALITY AND VALIDATION ......................................................................................... 125

6.1 INTRODUCTION ......................................................................................................................................................... 125
6.2 DATA QUALITY PROBLEMS ...................................................................................................................................... 125
6.3 DATA QUALITY ASSESSMENT ................................................................................................................................. 126
6.3.1 Data Interpolation ............................................................................................................................................ 127
6.3.2 Data Profiling .................................................................................................................................................... 128
6.3.3 Data quality Assessment Framework ............................................................................................................. 129
6.4 DATA QUALITY PROBLEM CORRECTION .............................................................................................................. 130
6.4.1 Impact Assessment............................................................................................................................................. 131
6.4.2 Correction and Cleaning ................................................................................................................................. 131
6.4.3 Scavenging of Essential Causes ...................................................................................................................... 133
6.4.4 Monitoring and Prevention .............................................................................................................................. 133
6.5 CONCLUSIONS .......................................................................................................................................................... 133
6.6 REFERENCES ................................................................................................................................................................ 134
7. CONCLUSION ............................................................................................................................. 135
FIGURES AND ILLUSTRATIONS

Figure 1-1: Aspects to address the development and implementation of analytics techniques using
various data sources .................................................................................................................. 13
Figure 1-2: levels of situational awareness................................................................................... 15
Figure 1-3: Representation of operator mental model based on training and experience .................. 16
Figure 1-4: Relationship between analytics and visualization complexity ......................................... 17
Figure 1-5: Illustration of the role of system operators in highly automatic environment ......... Erreur !
Signet non défini.
Figure 2-1: Basic structure of a battery energy storage system...................................................... 29
Figure 2-2: Requirements of a smart grid network........................................................................ 34
Figure 3-1: k-NN classification of abnormal PMU data ................................................................... 40
Figure 3-2: Supervised learning (upper rectangle) and unsupervised learning (lower rectangle) ....... 41
Figure 3-3: A simple DT model for detecting faults in a transmission line........................................ 44
Figure 3-4: ANN1 schematic diagram of a feed forward NN ........................................................... 46
Figure 3-5: ANN2 information processing in ANN .......................................................................... 46
Figure 3-6: A toy example of a linearly separable problem ............................................................ 48
Figure 3-7: A toy example of clustering transmission lines during storm using K-means ................... 50
Figure 3-8: Example of a simple Bayesian network ................................ Erreur ! Signet non défini.
Figure 3-9 : Examples of conditional probability tables .................................................................. 52
Figure 3-10: Deep learning components ...................................................................................... 55
Figure 3-11: The visual analytics process ..................................................................................... 56
Figure 4-1: Overview of a control center monitor display .............................................................. 62
Figure 4-2: Examples of schematic network diagrams ................................................................... 63
Figure 4-3: Contour showing voltage magnitudes with values below 0.98 per unit........................... 64
Figure 4-4: Examples of contour gradients for continuous values ................................................... 64
Figure 4-5: Situational awareness by 2D bubbles ......................................................................... 65
Figure 4-6: 3D display showing bus voltages and generator reserves ............................................. 66
Figure 4-7: Example of situation awareness by 3D cones .............................................................. 66
Figure 4-8: Example of situation awareness by 3D cones .............................................................. 67
Figure 4-9: Example of animated power flow arrows in distribution feeders .................................... 67
Figure 4-10: Visualization of dispersed generation in operator workstation at RTE ........................... 68
Figure 4-11: Visualization of dispersed generation in the general panel at Red Eléctrica del España .. 68
8
Figure 4-12: Examples for geopatial network diagrams ................................................................. 69

Figure 4-13: Integrated system view with Icons and Info boxes .................................................... 70
Figure 4-14: Distribution network visualization ............................................................................. 70
Figure 4-15: Distribution network visualization ............................................................................. 70
Figure 4-16: ERCOT control room - 2016 ..................................................................................... 72
Figure 4-17: ERCOT control room - load and generation details display and quick start/non-spin
graphs...................................................................................................................................... 73
Figure 4-18: ERCOT control room – wind generation .................................................................... 73
Figure 4-19: ERCOT control room - real-time sequence monitor .................................................... 74
Figure 4-20: ERCOT control room - system voltage overview display .............................................. 74
Figure 4-21: CORESO control room (www.coreso.eu) ................................................................... 75
Figure 4-22: Example of control actions displayed in the main interface of Apogeé .......................... 76
Figure 4-23: Example of the time-based constraint display in Apogeé ............................................ 77
Figure 4-24: Swings in the map .................................................................................................. 79
Figure 4-25: power system status, change time range .................................................................. 79
Figure 4-26: Schematic display with recognized islands................................................................. 80
Figure 4-27: Phase angle difference of the voltages between different PMUs .................................. 80
Figure 4-28: Correlation of a feeder outage and lightning strike..................................................... 82
Figure 4-29: Smart Cable Guard system and web interface, showing the location of increasing partial
discharge activity over time ........................................................................................................ 83
Figure 4-30: Example of open PQ Dashboard display .................................................................... 83
Figure 4-31: Example of Synchrophasor -based frequency stability monitoring ................................ 84
Figure 4-32: Framework proposed in [17][18] for real-time dynamic security assessment combining
PMU data analytics and high performance dynamic simulation....................................................... 85
Figure 4-33: Wind forecasting and optimization tools .................................................................... 86
Figure 4-34: Digging damage prediction model ............................................................................ 87
Figure 4-35: Smart meter based outage management .................................................................. 88
Figure 4-36: Power quality analytics tool ..................................................................................... 88
Figure 4-37: Analyzing system load ............................................................................................. 89
Figure 4-38: Dynamic powerline capacity assessment ................................................................... 90
Figure 4-39: SUMO architecture .................................................................................................. 91
Figure 4-40: Exceptional weather events ..................................................................................... 92
Figure 4-41: Thunderstorm – lightning activity and rainfall event notification .................................. 92
Figure 4-42: Visualization platform ODIN-VIS screenshot .............................................................. 92
Figure 4-43: National Grid (U.K.) Cable Thermal Monitor............................................................... 93
Figure 4-44: Responses to survey – Section 1, Question 1 ............................................................ 95
Figure 4-45: Responses to survey – Section 1, Question 2 ............................................................ 95
Figure 5-1: Dominion Virginia Power EMS modeling data ............................................................ 105
Figure 5-2: EMS Winter Build Lifecycle ...................................................................................... 106
Figure 5-3: Common Utility EMS modeling update process ......................................................... 107
Figure 5-4: Data modeling on Smart Grid Architecture Model framework ...................................... 108
Figure 5-5: IEC 61 850 modeling approach[7] ............................................................................ 110
Figure 5-6: Sources and actors[8] ............................................................................................. 111
Figure 5-7: RES Data Integration/Modeling Diagram ................................................................... 115
Figure 5-8: Proposed Concept to Incorporate Equipment Condition Information indices into PRA
Calculations ............................................................................................................................ 118
Figure 5-90: Overview CIM class model for breaker health integration environment ...................... 120
Figure 5-101: Location of UML diagrams and modifications for the breaker health integration ........ 121
Figure 5-11: Asset and Network Model Integrated Solution Architecture ....................................... 122
TABLES
Table 2-1: Monitored parameters of circuit breakers ..................................................................... 25
Table 2-2: Status of different condition assessment techniques for power transformers ................... 26
Table 2-3: Different sensors and output data ............................................................................... 27
Table 2.4: Solar measurement and description ............................................................................. 27
Table 2.5: Wind turbine sensors and applications ......................................................................... 28
Table 2-4: Example of the minimum required BESS signals for a EMS (SICAM microgrid control) ...... 30
9
Table 2-4: General requirement of communication in power system............................................... 32

Table 2-5: Networks and associated communication requirements ................................................. 33
Table 2-6: Communication requirements in terms of latency and data time window ........................ 33
Table 2-7: Network requirements for smart grid applications ......................................................... 34
Table 2-8: Technology supporting each particular application (L – Low, M – Medium, H – High) ....... 35
Table 2-9: Communication technology options ............................................................................. 36
10
1. INTRODUCTION AND BACKGROUND

1.1 MOTIVATION
The motivation behind this technical brochure is to assist in addressing the growing gap between the
challenges arising from an increasingly interconnected world and humans who are still required in the
control loop. The topics of situational awareness—especially data management and analytics—are
large subject fields in their own right. When combined, perhaps the combination is a challenge too
large to solve as a single project. Therefore, this brochure breaks the challenge down into many
component parts and tries to focus on the relevant areas within these subjects.
This growing gap between the new challenges and humans in the control room is driven by a wide
variety of factors, which also have their own internal drivers behind them and do not necessarily
consider the direct impact that the gap is having on the electricity utility industry. With increasing
complexity and interconnectivity of the grid, the scope and complexity of maintaining an increasing
situational awareness have grown.
As a consequence, there is a need to furnish system operators and operation engineers with better
tools for assessing system conditions and for providing effective and timely decision-making and
remedial reactions to an incident. It is not enough to just understand the current state. Situational
awareness implies also the ability to anticipate system changes and their impact on system security.
These issues will only become more challenging as a wide variety of technologies categorized under
the generic “smart grid” concept have been deployed, including advanced control, monitoring
systems, and a wide array of new measurement devices on the transmission system, in substations,
and on consumer’s premises. These technologies and systems result in the collection of tremendous
amounts of data related to the performance and management of the transmission system. In addition,
there are many other new sources of data that can be very valuable for planning, operating, and
managing the system such as external GIS data, satellite data, weather data, lightning data, and data
from renewable resources, storage, and demand response. Some of these diverse sources of data
represent tremendous opportunities to operate the transmission system more efficiently and more
reliably.
The aim of retaining or even increasing situational awareness as the system becomes more complex is
to guarantee the quality of decision-making regarding system integrity. This requires combining the
aforementioned different areas of data management, with new types of analytics to make use of such
data, as well as proper information and computation technologies and procedures to properly
integrate and manage the data. Advanced data-analytics techniques have been developed and used in
a variety of applications in many different industries and organizations. Using data to make critical
operational and business decisions is certainly not new to the electricity industry. Indeed, techniques
for data analysis have been applied to several areas such as load forecasting, predictive asset
maintenance, crew scheduling, outage management, and demand response, among others.
Additionally, big data analytics is being used in distribution systems to convert massive data streams
from smart meters and distributed energy resources into actionable information for grid operations.
Nevertheless, it is very valuable to examine how data analytics experience from other industries, as
well as from former implementations in the utility industry, can be used to solve the emerging critical
challenges of electric systems, considering that the power industry is not as mature as other
businesses in their use of analytics but has some great opportunities ahead to address the challenges
with situational awareness.
To ensure that this brochure is relevant to this wide area of users, a survey was prepared (see Section
4) and circulated among as many Cigre members as possible across a diverse geographical area. The
results from this survey confirmed that this technical brochure addresses a challenge that many
members feel is growing in importance.
11
1.2 OBJECTIVE
The objective of this technical brochure is to address the increasing importance of situational
awareness in grid operation and to give an overview of the most relevant developments in data
analytics and data integration associated with situational awareness. It aims to identify future needs
by addressing some of the fundamental questions that this growing challenge of maintaining an
increasing situational awareness in a complex system implies:
 What does situational awareness mean for the electricity utility industry?
 What are the future needs for improved situational awareness and better operator decision-
support tools (including tools for operation engineers, protection, etc.)?
 Is the new data that is available through smart grid investment useful for accomplishing the
future needs and requirements for future solutions?
 Who are these new data sources for, and do they fulfill their needs?
 What data-analytics techniques and tools are needed to transform the large inflow of data
into actionable information?
 What is the present status of the use of data analytics to support the operation of power
systems?
 What technologies are needed for handling data and performing integration?
 What models and data-integration technologies are needed to automate the processes and
enable actionable information, based on data from many sources, to reach the appropriate
users.
 How can quality of data be properly assessed and improved to make the analytic solution
more valuable and reliable?
 What are organizations doing in this space currently and in the future?
 What areas do organizations need to focus on to address this challenge?
Cigre is ideally placed to draw together these different elements because it has a large knowledgeable
contributor base and can disseminate any learning to a wide audience. Therefore, this technical
brochure is aimed to assist its members in the following areas:
 Transmission and distribution operations: essential for all levels of transmission from
distribution level to system operators to gain knowledge on how new data-analytics and -
visualization technologies can help improve situational awareness.
 Product vendors: assist in identifying gaps in the market and potentially new uses for existing
products.
 Application and system developers: better understand what the challenges are for operations
and the need for better analytics tools.
 Researchers: assist in recognizing new areas for research and the application of this research.
 Consultants and project engineers: provide relevant reference material.
1.3 APPROACH
This technical brochure aims to present the collective thinking from a wide range of industry experts
across a broad range of perspectives to the different challenges involved in providing and maintaining
situational awareness by breaking the problem down into the different areas involved in this
challenge.
Collection of rich data—complemented by system modeling, advanced data analytics, and emerging
decision-support tools—has the potential to improve the possibility of predictive analytics that can
enhance situational awareness and improve decision-making. In general terms, the development and
successful implementation of data-analytics tools involve addressing specific aspects on various
domains, as depicted in the Figure 1-1.
12
Use Cases
Data
Data
Models and
Sources
New Integration
Decission
Support
Tools
Advanced Data
Analytics Quality and
Techniques Validation
Figure 1-1: Aspects to address the development and implementation of analytics techniques using
various data sources
The main aspects associated in these areas can be summarize as follows:

 Identify data sources: There are an increasing number and variety of data sources in
electric power systems arising from grid modernization and investments in the smart grid. In
order to examine data-analytics applications and the best solutions to deploy, it is essential to
understand the characteristics and availability of such large volumes of data.
 Identify advanced analytics techniques: A suite of data-analytics techniques can be used
for different applications to support system operations. Data analytics can reveal patterns,
predict the prospective outcomes, and recommend appropriate decisions. In combination with
visualization, data-analytics techniques can be effectively used to improve situational
awareness of operators. Analytics algorithms can also be used to examine raw event data to
provide descriptive analysis of the event. Understanding the underlying theory behind the
analytics tools, their common and potential uses, and the advantages and implementation
challenges is critical to select the techniques that best suit the problem at hand.
 Identify use cases: The first step is to define the use case applications. The tools for
operation support that can benefit from advanced analytics are not only tools to be used by
system operators in the control room but also applications for engineers who support various
operations related processes such as contingency analysis, outage scheduling and
management, load and renewable forecasting, protection, models management, components
rating calculation, compliance service, and special operation studies. From the technical
viewpoint, the applications can be decided based on specific needs and preferences. However,
the final decision may require consideration of several other factors such as alignment with
the overall enterprise data-analytics strategy and roadmap.
 Apply data models and integration: Well-defined standards-based data models are
essential to support advanced applications, analytics, and visualizations used in grid
operations. These include the need for accurate data models, procedures for ensuring data
integrity, and availability of the right data in the right format. Indeed, data interoperability has
been one of the main challenges for implementation of data analytics that use multiple data
sources. There will be a growing importance of sharing data between parties, and this
common approach is vital to ensure that this can take place efficiently and effectively.
 Improve data quality and validation: It is clear that the overall quality of the data used in
analytics applications significantly impacts the accuracy and trustworthiness of the outcome.
It is then essential to put in place quality-assessment and -improvement processes to ensure
13
that the data used in the various applications meet the minimum standards of data quality to
guarantee meaningful results.
This technical brochure is designed to provide the reader with an informed and comprehensive
starting point to understand these issues.
Each section aims to discuss and incorporate the latest advances in the relevant area in terms of
technology and approach to this particular challenge. The structure of this technical brochure is as
follows:
Section 1 – Introduction and Background: The second part of this introductory section
introduces the concept of situational awareness in relation to electricity power networks and discusses
why it is important to implement advanced data-analytics applications in the control room and
departments that support system operations.
Section 2 – Data Sources in Electric Power Systems: This section provides a description of the
many data sources that can be found in an electric power system. It covers both traditional sources
commonly used for monitoring, protection, and control and new or non-conventional data sources that
emerge from smart grid technologies. It also describes data sources that are external to the electric
system but can be accessed and used for power system applications and decision-making. It also
describes the communication requirements of each dataset type to ensure that data reaches the
different data-analytics applications with the required quality, velocity, and availability.
Section 3 – Data-Analytics Techniques: The main advanced data-analytics techniques that can be
used for a variety of operation-support tools are described in this section. The description of each of
these techniques includes a definition, technical description with some mathematical details, common
application domains, and potential applications in a smart grid.
Section 4 – Applications of Data Analytics in System Operations: This section describes an
extensive array of applications in power systems and the various tools and techniques identified in the
previous section. A survey of existing practices, tools, and techniques using various sources of data to
improve situational awareness and provide operation decision-making support is presented.
Section 5 – Data Integration and Modeling: This section examines typical data modeling
processes in electric utility transmission organizations to explain how data are assembled in the power
industry for secure and reliable grid operation. To illustrate the concepts, it presents an example of an
actual data-integration project in a large utility in the U.S.
Section 6 – Data Quality and Validation: The importance of good data quality and the methods
of validating this data are presented in this section.
Section 7 – Conclusions: This section summarizes the main findings and conclusions. It identifies
the future states, gaps, and research needs to move the utility industry to a more extensive use of
data-analytics technologies to support the operation of a power system. The brochure concludes by
discussing and presenting several conclusions, but due to the nature of the challenge, it will not
provide “one solution to fit all.” Instead, it aims to leave the reader in a more informed position and
with a valuable source of reference and further reading.
1.4 SITUATIONAL AWARENESS

Situational awareness: “The perception of the elements in the environment within a volume of time
and space, the comprehension of their meaning and the projection of their status in the near future” –
Endsley.
The definition by Endsley is considered a classical definition of situational awareness. Although it is a
very high level definition, the application in electrical power systems is very relevant.
Aspects of situational awareness in power systems are:
 Perception and meaning: What is going on?
 Comprehension: How does this all relate to each other and the system?
 Projection: What does this mean for the near future (and what can I do about it)?
14
As grid operations become more complex—due to increasing variability in demand and supply
balancing through new (and often “intelligent”) types of loads, renewable integration, and cross-
border integration of systems—situational awareness becomes more challenging. To cope with this,
automation and automated decision-making have become essential for grid operations. However, this
creates a new level of complexity and makes the system less intuitive and transparent. So, to increase
situation awareness—or, at minimum, keep it on par—new tools for analysis and decision support for
grid operators are essential.
Figure 1-2 describes the levels of situational awareness required for grid operation under the new
conditions. It is a significant challenge to move upwards on these levels. Certainly, it requires an
understanding of the various elements within this problem space, hence the need for the problem to
be broken down in the different sections that are addressed in this brochure.
Therefore, situational awareness from the perspective of electrical power systems can be interpreted
as the continual assessment of the current and future state of the system in order to be able to
respond with the correct measures to reach a desired goal, such as keeping the operating conditions
within the appropriate boundaries, as well as reducing risks and increasing efficiency.
Figure 1-2: levels of situational awareness
This is not limited to the awareness within the central control room but also incorporates “awareness”
and response from local equipment, often referred to as “edge processing.” More and more localized
control systems are implemented in the grid or at its perimeters, such as in smart inverters from
renewable energy sources feeding into the grid.
Situational awareness thus includes the awareness of how different active control mechanisms work
together. While the local active control in general helps to reach the goals of the central control, it
sometimes can work against it, leading to undesired or even dangerous situations.
Within the operation of transmission systems, there has been a focus on situational awareness
because maintaining system reliability has always been crucial. Now this operational situational
awareness becomes increasingly more important for distribution operation as well, increasingly also
down to the low voltage levels of the grid.
The growing amount and variability of data now available to operators within all levels of control
centers is changing the control centers beyond recognition. Operators in control centers routinely
receive system-related information, such as voltage, frequency, current, power flows, network
topology, etc. However, the knowledge derived from asset data that is accessible by asset managers,
equipment subject-matter experts (SMEs), and field staff is now finding its way into operators in
control centers and is taken into consideration in operating the grid. A common element with all levels
of control centers are the use of human operators. Even though technology has moved on, there
currently is still a requirement to keep a human in the loop.
Over time, operators build up a mental model of how the network works and behaves under certain
conditions based on their training and experience. This will not change in the future, because this
15
mental model holds the overall picture that still includes much that cannot be taken over by (self-
learning) algorithms. Thus, a mental model remains far more superior and flexible than algorithms
(even though operators will be more and more supported by algorithms). Thus, it is very important
that the correct models have been learned by the operators, because it is possible to present the
correct information to an operator while they still make an incorrect decision.
Figure 1-3: Representation of operator mental model based on training and experience
It is therefore very important to understand that situational awareness is in the mind of the operator
(see Figure 1-3), so while addressing all the highly complex analytics and technological challenges, we
must not forget that there are also many traditional things that can be done to improve and maintain
the situational awareness of the operator, such as:
 Focused training
 Increase experience
 What–if simulations
These elements are beyond the scope of this brochure but are addressed by other Cigre working
groups.
It is important to ensure that the advanced analytics information is presented in the best possible
way. This issue can be addressed with appropriate HMI standards applied consistently across various
visualizations within different systems, to ensure that when operators swap between systems that
they understand what the key information presented means (e.g. using reserved colors: Red means
an operator needs to take action now, and Yellow means that a system is moving toward an unsafe
state).
Why is it important
Situational awareness is becoming increasingly important because of the increasing complexity of
power systems. It becomes more and more difficult to completely grasp the dynamics in the grid
because of the risks associated with the increasing interaction of technology on power systems.
16
As the amount of external data feeds increases, there is a growing need to focus on the inputs into
the power system algorithms. It can be inferred from Figure 1-4 that the increasing number of inputs
into the system requires new analytics and visualization techniques to be developed and integrated
into electricity utilities to create and enhance situational awareness. This is also due to the complexity
of the inputs and interactions between them. Whereas the number of outputs in terms of measures
has not really increased (e.g. we still measure voltage and frequency), how we visualize these
measures does not necessarily require complex visualizations.
Figure 1-4: Relationship between analytics and visualization complexity
Examples of trends that make grid operation more challenging and dynamic are:
 Variability of demand, both because of new demand (electric vehicles, heat pumps) and moving
demand (demand-side management).
 Changing generation mix (increasing reliant on weather, which also impacts demand) requires a
more active role of the grid operator.
 Market vs physical – the merging of markets across larger geographical/geopolitical zones with
unclear impact of different physical power systems (island systems with larger interconnected
systems).
 Increased cost awareness in the regulated environment: who will pay for decarbonization and the
increased system costs (reserves, response of weather variability)?
 Tools to actively influence the grid become more easily available (e.g. demand response, grid-
connected storage, active switching in the grid, dynamic line rating, grid capacity management,
voltage management).
Therefore, while situational awareness is becoming more prominent, not only do operators need to
become more aware of the situation in their grid, but also equipment itself needs to be aware of the
situation outside its direct environment. An example of this is given below:
“Some years ago a smart MV/LV distribution transformer with automatic tap changing based on power
electronics with embedded controls was developed, build and fully tested at KEMA (now DNV GL)
laboratories, including short circuit capabilities. It was installed in a greenhouse area in western part
of The Netherlands to improve the voltage stability and power quality of the local distribution grid.
17
The smart transformer functioned according to expectation and it was decided to install a second one,
electrically close, that is on the same MV string. As the smart transformers did not communicate with
each other and no situational awareness and/or damping control loop was envisioned or implemented
they started to react to each other resulting in unstable and oscillating behaviour. The end of the
story was that they were removed from the grid.” (quoted from DNV GL white paper power
cybernetics -> [https://www.dnvgl.com/energy/publications/download/power-cybernetics.html])
While operators will remain at the core of grid operations, it becomes more and more important that
they need to be supported by advanced data analytics and analytics visualization in order to grasp the
increased complexity, higher time pressure, and interlocking mechanisms, as indicated in the example
below:
In September 2011 the loss of Arizona Public Service’s (APS) Hassayampa-North Gila 500 kV
transmission line, effecting over 2.7 million customers. That line loss itself did not cause the blackout,
but it did initiate a sequence of events that led to the blackout, exposing grid operators’ lack of
adequate real-time situational awareness of conditions throughout the Western Interconnection. More
effective review and use of information would have helped operators avoid the cascading blackout.
For example, had operators reviewed and heeded their Real Time Contingency Analysis results prior to
the loss of the APS line, they could have taken corrective actions, such as dispatching additional
generation or shedding load, to prevent a cascading outage. The evaluation report recommends that
bulk power system operators improve their situational awareness through improved communication,
data sharing and the use of real-time tools. NERC Report 2012.
Other sectors experienced the effects of increased complexity of systems in combination with human
control. For example, in 88% of aviation accidents, human error was indicated as the cause, 50% of
which was caused by air traffic control operational errors [Measurement of situation awareness in
dynamic systems, Human Factors, 37(1): 65–84. 1995c.]. Like grid operations, these systems have
grown in complexity and time-pressures with an increase in the amount of automation to assist the
operators. However, there is still a human in the loop with the potential for human error.
A major risk of (the necessary) increased automation in the grid is that operators actually become
(relatively) less situationally aware and that automated systems and operators will work against each
other, especially in crises and when under high pressure.
Conclusion
Situational awareness always was and always will be a major element in maintaining the integrity of
the electricity system. However, the importance of situational awareness in growing. As the
operational margins are increasingly variable due to the increase of renewables in the energy mix as
well as growing amounts of “intelligent” demand based on inverters, the system becomes more
decentralized and complex. Therefore, retaining the current level of situational awareness is
challenging. The changes in the power system require an increase of situational awareness on all
control levels, so that the quality of operational decision-making, which is necessary to maintain
system integrity, is kept.
This brochure is about state-of-the-art tools and new data sources that enable operators to be aware
of the situation in the power system and help them to make optimal decisions in operating it. And, as
the complexity of the power system will continue to increase in the future, future needs to increase
situational awareness are addressed.
Situational awareness is about an integrated picture of the electricity system, including:
 Situationally aware automation on a higher level of decision-making of grid operations under high
time pressures, taking into account larger parts of the grid instead of the direct environment.
 Increasing the situational awareness of operators by visualizing the current situation as well as
future situations and scenarios.
 Shifting the main focus of the operators to prepare (“prime”) the system for (near) future critical
situations, using simulations, (short-term) scenarios, and models.
The latter has the additional benefit that operators will gain a much faster and thorough
understanding of the system dynamics than they would get based on experience of (hopefully rare)
real-life events alone.
18
The following chapters address all major elements of situational awareness, starting from data and
information sources, data-analytics techniques to interpret these data, applications of these analytics
in system operations, data integration and modelling to integrate data into operations, and finally data
quality and validation.
References:
[1]. Endsley, M.R. (1995b). "Toward a theory of situation awareness in dynamic systems". Human
Factors. 37 (1): 32–64
19
20
2. DATA SOURCES IN ELECTRIC POWER SYSTEMS

In order to examine the advanced data-analytics applications that support system operations, it is
necessary to understand the sources and characteristics of the large variety of data that is available in
power systems. There is wide range of measurement equipment, sensors, and recorders installed in
the power network, and they are capturing a rich amount of data that can be used in analytics
applications to extract valuable actionable information. Each recording device has its own
characteristics in collecting, processing, and reporting captured data. These devices have specific
built-in purposes, but the data that they provide may be used in data-analytics applications for
additional objectives. In addition to the data captured by measurement equipment, there is data from
non-electrical equipment and data from sources external to the power system that can also be
leveraged for power system analytics.
This section provides an overview of the many data sources that can be found in power systems. For
description purposes, the following provides the classes of data sources [1][2]:
 Data from monitoring and protection equipment
 Data from equipment sensors
 Non-electrical data sources (external data)
Further, this section provides a description of the communication requirements to make this data
available for its use in analytics applications.
2.1 DATA FROM MONITORING AND PROTECTION DEVICES
Modern digital and microprocessor-based devices used for various protection monitoring applications
are commonly referred as intelligent electronic devices (IEDs) [1][3]. Most of these devices were
designed with a very specific, often limited, data-collection function in mind. However, with
technological progress, IEDs evolved into more sophisticated devices with new capabilities, including
new functionalities and higher quality of data recording. Data from many IEDs could be integrated
and used for a variety of analytical applications, provided that standardization, data-recording, and
communication issues are properly addressed.
The different pieces of equipment are briefly described in the following subsections, including a
discussion on the characteristics of recorded data, potential applications, and examples.
2.1.1 Digital protective relays
The purpose of the protection function is to continually detect the abnormal/fault condition on a
power system and provide a high-speed tripping mechanism to isolate the fault from the rest of the
power system. Because the protection function is necessary for safe and normal operation of a power
system, protection relays and equipment are considered critical and require high sampling frequency,
high accuracy, and low latency data transmission. Hence, the data collection and processing are
performed locally and very close to the equipment being protected. The data taken at a high sampling
frequency is generally not needed for data-analytics applications. However, the post-event
disturbance data may have data-analytics applications for analyzing the behavior of equipment and
determining the statistics of events.
The equipment that processes the signal information in real time are called protection relays. Although
old electromechanical relays are used widely in power systems, these devices are replaced with
modern microprocessor-based protection relays in a vast majority of applications, because data in
digital format is necessary for data-analytics applications. Only the capabilities of microprocessor-
based relays have been highlighted in this report. Protection relays generally measure the voltage and
current information on a section of power system. They also are wired with additional alarms and
indication signals from power system equipment that they are meant to protect. There exists many
types of protection relays within a substation overlooking every section of the power system. In
addition to providing high-speed protection, microprocessor-based protection relays also record the
signals and status information at the time of disturbance. These are:
 Fault/trip information such as voltage and current magnitudes, angles, circuit breaker status,
etc.
21
 Equipment indications/alarms at the time of disturbance.

 Operating status of protection functions at the time of disturbance.
 Health of protection relay.
2.1.2 Recorders
If a substation has all microprocessor relays, it would be possible to know the condition of a power
system at the time of a fault/disturbance event by collecting and analyzing the information triggered
in various protection relays. However, because electromechanical relays do not have any capability to
record the disturbance information, separate standalone recorders are used to record the disturbance
data. Similar to microprocessor-based protection relays the recorders measure voltage and current
information from the substation and alarm/indication signals from power system equipment. One of
the primary advantages with standalone disturbance recording equipment is that they can be set
sensitive to trigger for any abnormal condition, whereas protection relays trigger only during fault
conditions.
The two types of standalone recorders widely used in power utilities are sequence-of-event recorders
and digital transient/fault recorders.
2.1.2.1 SER: Sequence-of-event recorder
Large power system equipment such as generators, circuit breakers, and motors have complex
operating mechanism, which operate through a sequence of steps. In such equipment, several
actuators, sensors, and control elements are connected in a complex configuration. Each of these
elements often provides the operating status (0/1) on whether a measuring quantity has exceeded
the threshold or an equipment has operated.
SERs connect several of these signals and record the status changes with time stamps. Analysis of
SER data helps to identify the operation time and performance of each of the control elements and
sub-systems. Data-analytics applications can utilize this data to locate a sluggish-performing device
and warn about a potential failure event. Proactive steps can be taken to replace the device and help
to prevent a catastrophic failure event that causes motor damage.
2.1.2.2 DTR/DFR: Digital transient/fault recorder
Digital fault recorders connect the continuous time-varying signals such as voltage, current, pressure,
temperature quantities, and provide triggering functionalities to record a disturbance. DFRs
continuously monitor these signals and record the transient waveforms on the occurrence of an event.
DFRs may also contain few binary signals (0/1) to indicate the status of equipment.
Analysis of disturbance snapshot recorded by DFRs provide insight into the transient performance and
operational characteristics of power system. The data can be utilized to access the behavior and
response of many connected power system equipment. The result of such analysis will help identify
the root cause of the disturbance and enable corrective actions. Data-analytics algorithms can utilize
DFR to model and access the system-wide health and performance.
2.1.2.3 Dynamic s wing recorders
Dynamic swing recorders (DSRs) are especially aimed to capture the dynamic response of the power
system as a result of a fault or sudden changes. DSRs exist both as standalone and integrated devices
with digital fault recorders. Data is usually stored as RMS or phasor values and sampled from twice a
cycle to every ten cycles. DFRs are able to capture swing record lengths from one minute to 30
minutes or pre-post triggering of swing data, and they can be used for several purposes, such as
analysis of disturbances, the quantification of power system parameter changes, the investigation of
system oscillations, and validation of stability models [1].
2.1.3 Revenue meters
Real-time revenue metering and economic dispatch of generation are two of the most important
functions in power enabling smooth and efficient operation. Revenue meters are located at the point
of interconnection, segregating generators, transmission/distribution owners, and load centers.
Metering data consists of capturing highly accurate data at the frequency of the power system,
representing magnitudes of voltages, currents, real power, reactive power, and system frequency.
The difference between regular meter and revenue meters is the accuracy. Regular meters are used
22
for visualization. However, revenue meters are connected to highly accurate revenue-grade
instrumentation transformers, and the devices contain filters to selectively choose power frequency
components.
The real-time meter information is utilized in advanced data analytic algorithms to detect conditions
and abnormalities in parts of the power system. Timely analysis can help system operators to take
appropriate actions to mitigate these.
2.1.4 Synchrophasors
A phasor is the mathematical representation of a continuously time-varying signal in terms of
magnitude and angle. A Synchrophasor is a digitized phasor data with a UTC (Coordinated Universal
Time) timestamp on each packet. Phasor measurement units (PMUs) are devices that measure the
voltage and current quantities in a substation and compute Synchrophasor data of voltages, currents,
and real/reactive power flow at a much higher rate than remote terminal units (10 to 120
samples/sec). Because Synchrophasor data utilizes a common time reference, it enables comparing
power system state information across a wide geographical area in a common manner. Hence,
mathematical operations such as addition, subtraction, multiplication, and division can directly be
performed on the Synchrophasor data collected from different sources. This enables access the state
of an entire power grid to a much higher granularity and accuracy than previously possible. With the
higher penetration of PMUs, complete real-time automated closed loop control from a centralized EMS
system becomes viable.
Highly accurate Synchrophasor data can reveal the condition of a power system to a greater degree.
It is possible to view the power system oscillations and generator dynamic responses in real time.
2.1.5 Remote terminal unit (RTU)
To achieve centralized control of a power system, real-time values of voltage, current, real power,
reactive power, system frequency, and circuit breaker status information are needed. RTUs connect
the circuit breaker status signals and continuous time-varying voltage and current signals and
calculate the magnitudes of these quantities. RTUs can be integrated into SCADA systems and
connect to wide-area communication networks where the real-time information of these quantities is
transmitted to a control center. Further, RTUs are connected to trip and control circuits of generators
and circuit breakers to regulate the operation of generators and enable remote connection/isolation of
sections of a power system. Hence, RTUs and SCADA systems are annexed in a critical equipment list
to achieve centralized control of power to enable efficient and stable operation of a power system.
RTU data is utilized in state-estimation algorithms to determine an accurate state of the power system
at a given moment. This gives complete visibility of power system depicting its real-time health.
Advanced control algorithms are further used to achieve manual and automated close-loop operation.
Data-analytics applications can use RTU and other types of data and provide enhanced foresight and
situational awareness to the system operator.
2.1.6 Power quality meters
Power quality (PQ) meters are designed to record different power quality variations such as impulsive
and oscillatory transients, sags/swells, interruptions, under/overvoltage, harmonic distortion, and
voltage fluctuations. Usually, the sampling time of PQ meters can be configured according to specific
application requirements. The newest generation of PQ meters can sample at rates of 1024 samples
per cycle for normal conditions and up to 100,000 samples per cycle for transients [7][8].
Traditionally, PQ meter data has been used by power quality engineers for specific PQ monitoring and
assurance purposes. However, the alternative usage of such data has recently been considered and
investigated, including condition monitoring of equipment, fault identification, and fault analysis.
There is an IEEE working group that specifically focuses on this type of data analytics [5][6][7][8].
Various software applications have been developed to process and analyze power quality databases
and automatically combine that data with other power system with data from SCADA, GIS, and
network topologies for detection and analysis of events in the grid [9].
2.1.7 SCADA (supervisory control and data acquisition)
A power grid is a highly interconnected system between generators and loads, which are spread
across wide geographical locations. For efficient and reliable operation of a power system, it is
necessary to monitor its state from both a local and central location. SCADA systems connect to RTUs
23
in substations to monitor the voltage and current quantities and control the operation of circuit
breakers. They also communicate with a central energy-management system for control actions.
SCADA provides the following control capabilities:
 Generators: Control the voltage, frequency, and real/reactive power set points.
 Transformers: Adjust the tap changers where tap changes are available.
 Capacitor/reactor banks: Remotely open/close the banks.
 FACTS (Flexible AC Transmission Systems): Control the set points to regulate power flow or
system voltage on a section of the transmission system.
 Loads: Non-essential loads are operated dynamically to shed at peak load and stressed times
as part of demand-response programs.
A SCADA system uses information collected from both binary (0/1) and analog continuous time-
varying signals for decision-making. SCADA systems form an important source for data-analytics
applications because the data collection and communication infrastructure is already established and
readily available at energy-control centers. Operational data collected from SCADA systems from
various parts of the power system is continually streamed to a central location.
2.2 DATA FROM EQUIPMENT SENSORS
The urgent need to diagnose aging equipment and asset health has led to the development of a
variety of sophisticated equipment-based sensors, which enable one to assess the health and
performance of different pieces of equipment. Devices that monitor the condition of assets contain
equipment-specific intelligence to identify normal and abnormal responses. A condition-monitoring
system can be standalone with advanced analytics about specific equipment or it can be part of a
multifunction protection relay wherein general health and statistics information is provided. Examples
of a standalone condition-monitoring system include vibration-monitoring systems for turbines, partial
discharge monitoring systems for generator stators, and dissolved gas analysis (DGA) systems for
transformer oil. Common functions embedded in modern multifunction protection relays include circuit
breaker monitoring systems, as well as temperature and overload monitoring for transformers,
motors, generators, and transmission lines. Data-analytics algorithms can use the information from
various condition-monitoring systems to determine the health of major power system equipment in
real time and deliver information about equipment health to system operators for situational
awareness.
In what follows, different types of sensors and condition- and operation-monitoring devices installed
in substation and on transmission lines are briefly described. The list is not intended to be exhaustive
but rather to exemplify the characteristics and possible uses of sensor data. The descriptions include
conventional sensors commonly used in substation equipment, as well as emerging sensors and
systems.
2.2.1 Circuit breaker

IEDs in circuit breaker (CB) architecture provide precise indications of the CB’s operation condition
with an efficient data-logging system. The CB-monitoring (CBM) system encompasses two categories
of data collection: real-time and event-based. Breaker relays and CBMs supervise the following
parameters in order to provide continuous evaluation of asset health (a detailed list of the monitored
parameters is provided in Table 2-1).
 The breaker and trip coil statuses
 Charging motor conditions
 SF6 gas quality and heater integrity
During operations, the relays also record concurrent event data that the asset health center (AHC)
monitoring system uses to assess breaker performance and maintenance needs. Such events-based
data includes:
 Transient recordings of breaker interrupt currents
 Breaker operation times
 Trip coil currents
24
 Battery voltages
 Mechanism charging currents
 Mechanism charging times
Table 2-1: Monitored parameters of circuit breakers
Categories Parameters
Contact wear (switch operations)

Main nozzle wear
Electric Wear Auxiliary nozzle wear
Contact resistance
Interrupter wear
Function of cabinet, mechanism, and tank heaters

Number of hydraulic pump starts
Accessories
Total accumulated run hours of the air compressor
Total accumulated run hours of the SF6 compressor
Insulating oil dielectric strength

Rated voltage vs. applied voltage
Rated current vs. applied current
Dielectric
SF6 moisture content, density, temperature, pressure, and purity
High-pressure SF6 moisture content, density, temperature, pressure,
and purity
Close time and velocity

Trip time and velocity
Interpole close time and trip time deltas
Resistor pre-insertion time
Total interrupter travel
Mechanical
Mechanical supervision/monitoring (travel curve, times)
Energy supervision/monitoring (spring/hydraulic)
Motor and coil supervision/monitoring
Sensor, heater, and self- supervision/monitoring
Remaining energy detection for spring mechanism
The CBM system can support client/server architecture. It consists of the CBM devices attached to the
CBs and software running on a central control unit. The main functions of the control unit are:
 Supervise the operating conditions of the circuit breaker.
 Prevent operation if the circuit breaker is outside its operational capabilities.
 Execute operating commands when it is safe to do so.
 Perform data acquisition of signals from the CB control circuit and record sequences of
tripping and closing.
When a breaker operates, recorded files are transmitted to the central control unit using wired or
wireless technologies. The bandwidth required for real-time data transfer of 15 signals, sampled at 2
kHz, is determined as 576 Kbps.
The CBM IED monitors 15 electrical signals from the circuit breaker control circuit. The signals are
generated during either tripping or closing of the breaker. Of these 15 signals, 11 are analog and 4
are binary signals. Analog signals include measurement of electrical variables such as phase current,
while binary signals indicate the statuses of different components.
2.2.2 Transformer
Online monitoring is used continuously during operation and offers possibilities to record the relevant
stresses that can affect the lifetime of a transformer. The evaluation of these data offers the
possibility of detecting incipient faults early. The addition of an embedded web-server, equipped with
powerful data-analysis tools, means that users can manage and interpret information. Table 2-2
illustrates the status of different condition-assessment techniques.
25
Table 2-2: Status of different condition assessment techniques for power transformers
Method Offline Online Monitoring Offsite
Ageing of oil (e.g., color, moisture, and tan δ) 1 1 3 1
Furan in oil analysis 2 2 N/A 2
Gas-in-oil analysis (DGA) 1 1 1 1
PD (IEC 60270) 1 2 3 1
Unconventional PD-measurement (e.g., UHF PD
2 2 3 2
measurement)
Transfer function (FRA) 1 3 N/A 1
Dielectric diagnostic (PDC and FDS) 2 N/A N/A 2
Thermal monitoring N/A N/A 2 N/A
Degree of polymerization (DP-value) N/A N/A N/A 1
1: Generally accepted or standardized; 2: accepted by different users; 3: under investigation or consideration; moisture
measurement.
The control units of modern transformers offer a complete set of communication infrastructure based
on IEC 61850-8, including GOOSE messaging, IEC 61850-9-2 Process bus, IEC 60870-5-103 serial
communication, and DNP 3.0 slave protocol. The control, embedded webserver, and web-based
software units of the transformer work as a SCADA that is used for:
 Incorporation of DGA, PD, and bushing monitoring (BM) in one unit.
 On-site and online display of DGA, PD, and BM key parameters.
 Control the operating conditions of the transformer and execute operating commands.
 Correlation of data from external inputs.
 Full control and communications via secure, flexible web access.
 Extensive analysis tools.
 Full compatibility with asset-management systems.
Dissolved Gas Analysis (DGA): Online DGA represents a vastly improved monitoring process. With
online DGA, devices are installed on substation transformers that are capable of:
 Sampling and evaluating dissolved gasses and sending DGA data to back office systems.
 Integration of online DGA data into operations and maintenance processes.
 Capturing of data at least once per day and, in some cases, as often as once per hour.
 Capability of analyzing a larger number of data points, which improves trending analyses.
 Transmitting online DGA data to an energy-management system (EMS).
 EMS triggers alarms using a rule engine with preconfigured asset-specific parameters.
Partial Discharge (PD): Electrical discharges appear as various forms of voltage and current
impulses that lead to PD and as having a very short duration (nanoseconds). These events radiate
electromagnetic energy with a specific spectral signature for which UHF detection is well suited,
enabling high levels of refresh rate and accuracy. The online PD indicator and its control unit work as
a SCADA system that is used for:
 Radiated electromagnetic energy with UHF detection process, enabling high accuracy.
 Phase-resolved analysis and UHF detection method based on IEC 60270 rules.
 Simultaneous operation of PD indicator and SAW temperature monitoring system.
 Real-time separation of PD events and ambient noise using high-performance algorithms.
 Sample rate: 100 Mbps; Bandwidth: 16 kHz – 100 MHz or 1 MHz – 35 MHz.
Table 2-3 shows different monitoring techniques, sensor types, possible output data, and the purpose
of monitoring.
26
Table 2-3: Different sensors and output data

Monitoring Method and Output
Purpose of Monitoring
Sensor Data
Analysis of the oil samples (Based on IEEE C57.104 and IEC
DGA
60599)
Combustible Insulation, overheated oil
Insulation, overheated oil, system leaks, over-pressurization,

Spectroscope Digital
or changes in pressure or temperature.
PD
Insulation: If there is partial discharge detected, it is possible
UHF sensor
to locate the fault location accurately by using multiple
Acoustic wave sensor
sensors.
Fiber optic sensor Digital
Thermal analysis Heat can indicate multiple faults.

PT100 Resistance Oil temperature
Thermal camera Digital Surface temperatures
Fiber Digital Temperature directly from windings
Vibration
Loose core clamping or bonding bolts
SKF Acceleration sensor Voltage
Moisture
Insulation
Vaisala Humicap MMT318 Current
2.2.3 Distributed generation (solar and wind)

The increasing number of renewable energy sources such as solar photovoltaic, wind, and micro-
hydro is leading to a substantial generation of electric energy in the form of distributed generation
(DG) units within the electric networks. Table 2.4 and Table 2.5 provide some brief descriptions of
sensors and measurements for solar and wind power, respectively [16][17].
DGs need a fast and accurate data-transmission system to transfer the measured data and command
signals to the relevant central controllers. Therefore, the monitoring of data provides a fundamental
operation support of solar and wind power or other DGs. In addition, a proper ICT needs to be
developed. This facilitates controlling and monitoring of electricity generation and consumption as well
as network remote operation.
Table 2.4: Solar measurement and description
Monitoring Method and
Description
Sensor
Measuring irradiance: If there is more than one
orientation of the PV array, then a separate
Pyranometer
pyranometer would be required for each
orientation.
Taking data from satellites and processing them
Satellite-based irradiance
with models to create an estimate of ground-level
measurement
irradiance at a site.
Back-of-module temperature
The temperature sensor with thermal conduction.
sensor
Ambient temperature sensor Ambient temperature.
Measuring current from combiner-box home runs
Current transformers (CTs)
measured at the inverter.
Inverter-direct monitoring Measuring production of each inverter.
Inverter temperature sensor Identify overheating.
Reporting at a minimum cumulative energy
Energy meter
delivery.
27
Table 2.5: Wind turbine sensors and applications
Sensor Application on Wind Turbine

Accelerometer sensor Gearbox monitoring
Position sensor Prop feathering monitoring
Pressure sensor Dynamic pressure measurement on turbine blade
Temperature sensor Bearing monitoring
Fluid property sensor Gearbox oil monitoring
Level sensor Gearbox oil level monitoring
Accelerometer sensor Turbine shroud monitoring
Transformer sensor Windings temperature monitoring
Temperature sensor Stator winding monitoring
Vibration sensor Tower sway monitoring
Position sensor Tower leveling monitoring
The sensors/meters/actuators in one DG can be connected directly to the local controller through
either ADC, GPIO, or serial communication. The received data and any processed outputs can then be
transmitted by the reduced function device (RFD), which is connected to the local controller. The
transmitted data by the local controller of the DG will be received by the central controller through the
full function device (FFD). Alternatively, the sensors/meters/actuators can be connected directly to an
RFD, which transmits data to the central controller. This is applicable for measurements from the CB
and power distribution lines where no significant computation and control process is required.
2.2.4 BESS: Battery energy storage system
Power generation is shifting from large-scale to a highly complex, distributed generation in which
cost-efficient integration of renewables is paramount, and the demand for energy is continuing to rise.
Therefore, a BESS has to provide energy for a large range of applications to optimize asset
performance by stabilizing frequency and voltage and balancing variations in supply and in demand.
The typical applications are, but not limited to:
Generation
 Frequency regulation
 Renewable integration
 Spinning reserve
 Power plant hybridization
 Ramp rate management
Transmission
 Voltage support
 Dynamic line rating support
 Renewable integration
 Dynamic stability support
 Loss reduction
 Constraint relief
Distribution
 Residential and industrial backup power
 Microgrid and island grid support
 Distribution upgrade support
 Peak load reduction
The data exchange of a BESS can vary because of the different manufacturer structures of a BESS.
Figure 2-1 shows the basic BESS elements. Usually, a BESS is conducted by the EMS shown on top.
28
EMS
Energy Management
System
BESS
Battery Energy Storage
System
SMS
Storage
Management System
BMS SCU
Battery Management Storage Control Unit
System
Figure 2-1: Basic structure of a battery energy storage system
Storage Management System (SMS)

The SMS of the BESS works as a SCADA system that is used to:
 Provide interfaces to external EMS/SCADAs, along with the appropriate control and
communication hardware, to conduct energy-storage applications. Therefore, supported
protocols like IEC 61850, IEC 60870-5-104, IEC 60870-5-101, Profibus DP, and Modbus TCP
are standard.
 Control the connected inverters according to the operation mode and its activated control
mode, such as:
o U/f-Mode: battery as voltage source for reliability, grid improvement, island grid
operation, and so on.
o P/Q-Mode: battery as current source for energy shifting, energy optimization, and so
on.
 Simultaneously measure, record, and analyze numerous signals such as:
o AC voltage and power at POI (point of interconnection)
o DC voltage of battery string
o Different temperatures of the system
o Positions of switching devices
o Numerous information about the battery provided by the BMS
Battery Management System (BMS)

The BMS is composed of several controllers coordinated to command, protect, and monitor the
battery, ensuring maximum longevity and performance of the battery cells.
 Supervision of cell voltage
 Supervision of module temperatures
 Calculation of SOC value (state of charge)
 Calculation of SOH value (state of health)
 Balancing between the modules
 Assignment of warning and alarm messages in fault cases (fire, overcurrent, and so on)
29
 Disconnection of batteries from inverters in fault cases
The BMS also tracks and flags the security sensors indicating the state of charge (SoC) and the state
of health (SoH) of the storage battery cells. In the fault case, the BMS sends corresponding warning
and/or alarm messages to the BESS control unit to react correspondingly to disconnect the battery
racks.
Storage Control Unit (SCU)

The SCU usually controls and coordinates the different inverters according to the operation mode and
its activated control mode. Additionally, the system controls various parameters. Start conditions,
availability, and operation parameters of each inverter are reported to the SMS
Table 2-6: Example of the minimum required BESS signals for a EMS (SICAM microgrid control)
Battery Signal Type Description Value Unit
Name
BAT1 DPI (DoublePointIndication) Status on / off N/A
BAT1 DPI Status ready / failure N/A
BAT1 SPI Operating Mode Grid forming / Grid supporting N/A
(SinglePointIndication)
BAT1 CO Operating Mode Grid forming / Grid supporting N/A
(Command)
BAT1 CO Status on / off N/A
BAT1 AI Active Power kW
(AnalogInput)
BAT1 AI Reactive Power kvar
BAT1 AI State of Charge %
BAT1 AO Active Power Setpoint kW
(AnalogOutput)
BAT1 AO Reactive Power Setpoint kvar
2.2.5 New sensors for equipment monitoring

A new suite of sensors has been developed to aid utilities in addressing issues with aging transmission
infrastructure, increased utilization of existing assets, and optimized maintenance and asset
management. Data from those sensors, along with the associated communication and data integration
infrastructure, opens the possibility to enrich and expand scope and use of large variety of analytics
applications. This new sensor suite includes [10]:
 Conductor – Temperature and Current Sensor: This sensor is used to capture the temperature
and current magnitude of overhead transmission conductors. It uses wireless communication to
transmits the data for rating applications.
 Overhead Insulator Leakage Current Sensor: This sensor measures the level of leakage current
of insulators. The main purposes are to aid in determining the right time to wash insulation and
to detect insulators at high risk of flashover.
 Shield Wire – Fault Current Magnitude and Location: These sensors measure the time and
magnitude of the fault current that flows through shield wires. The main use is fault
identification.
 Shield Wire – Lightning Sensor: This sensor measure peak magnitude and time of lightning
current flowing in the shield wires. It can be employed to validate lightning location.
 Transmission Line Surge Arrester (TLSA) RF Sensor: This sensor captures the total number of
events and charges seen by an arrester. They can be used to provide life expectancy.
30
 Overhead Transmission Structure Sensor System: This system fuses RF sensors with image
processing and environmental data. The data is wirelessly communicated in real time with built-
in alarming functions. The system is used to address outages.
2.3 NON-ELECTRICAL DATA SOURCES (EXTERNAL DATA)

One of the areas where a variety of non-electrical or data external to the electric system is used for
power system analytics is in the area of electric load forecasting. Data sources used for this purpose
include electricity demand data from SCADA, weather history, forecasts from weather service vendors,
economy history, forecasts from economic analysis firms, end-use information from surveys, industry
codes, equipment locations, land-use information from GIS, and urban-development plans from local
governments [1]. Data from other sources such as outage information from OMS, logs of demand-
response activities, and records of past and ongoing energy efficiency programs has also been used to
increase forecasting accuracy. Other nonconventional data sources have been also used for load
forecasting. For instance, satellite images are used for spatial load forecasting to track historical
development of cities. High-density weather stations are being used for forecasting rooftop solar
generation and electricity demand in micro-climate zones. Cameras are also being installed around
local farms to capture the local cloud movement [11].
In recent years, the use of non- electrical data for various applications used to analyze data from
power systems are seeing increased popularity. One of the areas where external data provides very
valuable information is on the forecasting and mitigation of extreme events. For instance, the
resiliency of the power system can be improved by predicting the impacts of weather-related outages
by utilizing a variety of weather data and public data sources.
Different non-electrical data from various sources are briefly described below:
Weather Data [11][12]: Depending on the characteristics and capacity, weather stations provide
some or all the following data:
 Lightning characteristics
 Air and soil temperature
 Wind speed and direction
 Precipitation
 Fog, frost, ice, snow, sleet, and blizzard
 Snow-water equivalent
 Air relative humidity
 Solar radiation
 Pan evaporation
Weather data, both historical and in real time, can feed analytics to provide useful insight for various
types of nature-induced disturbances. For instance, lightning is an important cause of interruptions or
damage in almost every electrical system exposed to thunderstorms. The problem is severe mainly for
electric utilities that have exposed assets covering large areas. In areas with a high probability of
lightning, cloud-to-ground (CG) lightning is the single largest cause of transients, faults, and outages
in power transmission systems. Different systems that are in use to detect lightning are:
• Gated wide-band magnetic direction finders (DFs)
• Time-of-arrival (ToA) sensors
• ToA methods operating at higher frequencies
• Interferometric methods
In the U.S., the National Lightning Detection Network (NLDN) gives utilities lightning warnings in real
time and information on whether CG strokes are the root of faults, documenting the response of fixed
assets exposed to lightning and quantifying the effectiveness of lightning protection systems [14]. The
NLDN uses different kinds of lightning sensors located sparsely over the area of about four million
square miles of the U.S. territory. Reference [11] provides a comprehensive list weather event data
sources, with the name, short description, and the URL from where to access the database.
31
Geospatial Data: Geospatial information systems (GIS) have been used by utilities for years;
nevertheless, the types of data available have increased, which leads to new applications of geospatial
data visualization. Weather event data integrated with geospatial information can be applied to
advanced power system analytics such as predictive modeling, real-time forecasting, and post-event
analysis [10].
Customer Data: Social media data can help the utilities to improve engagement with customers as
well as manage outages during major storm events. This data is a prime example of how a
combination of utility operation data, weather data, and customer data could be integrated to provide
better preparedness in outage management. Moreover, customers can see outage information, as well
as receive news (educational and operational) from their utility oftentimes via a mobile application
[10].
Geographic Information System (GIS): The GIS data includes two types of data: spatial and
attribute. The spatial data presents the absolute and relative location of geographic features, for
instance coordinates of location where a substation is situated. The key for effective use of this in
power system applications is the combination of GIS and GPS with the model of a power system. GPS
provides time references that can be applied to synchronize all events. Most digital measurement
devices such as PMUs, traveling wave fault locators, and lightning detectors and locators have
integrated GPS units to send precise time stamps with measured data. The GIS model of a system can
be correlated with an electrical model, providing a more enhanced geographical characterization of a
system [1][15].
2.4 COMMUNICATION REQUIREMENTS FOR SMART GRID DATA
Network reliability and coverage, bandwidth, packet jitter, and latency requirements are the most
critical issues when developing the technical requirements for the power system. For example, the
communications network needs to provide real-time, low latency capabilities for applications such as
centralized remedial action schemes (CRAS), tele-protection (less than 10 ms), transmission,
substation SCADA and VoIP applications (100 to 200 ms), phasor measurement (about 20 ms), and
load-control signaling. These requirements drive the need for high-speed fiber optic and/or microwave
communications to support those capabilities. On the other hand, applications such as automatic
meter reading (up to a few seconds) and data beyond SCADA, which are more latency-tolerant, could
use communications technologies such as unlicensed wireless mesh, broadband wireless, licensed
wireless, and satellite. Future trends and applications in generation, transmission, and distribution
systems present different class of requirements and challenges (general communication requirements
in power system application are provided in Table 2-7).
Table 2-7: General requirement of communication in power system
Requirement Description Example
Substation automation GOOSE applications
Data transmitting in power system needs
require low-latency communications with
different network performance and data
Performance latency budgets in order of milliseconds, while
(bandwidth, latency, and payload)
a conservation voltage reduction (CVR)
requirements.
application has latency expectation of seconds.
For wide-area power system, selecting
one or more communication technologies Rural areas have poor cellular coverage and
Coverage must be done after thorough analysis of metropolitan areas are deploying high-speed
its characteristics, cost, and other mobile 4G/LTE technologies.
associated operational challenges.
Private communication networks are CapEx-
intensive with low OpEx, while a service-
Different communication technologies
Cost provider-based public solution such as cellular
have different cost structure.
or satellite requires higher OpEx with lower
upfront CapEx.
A layered networking architecture ensures
integration of innovations over the Field infrastructures are deployed with an
expected lifetime of the deployment. Over average lifetime of 15 to 20 years, which may
Life Time
the next few years, newer protocols such appear incompatible with the pace of evolution
as IEC 61850 and beyond are expected to in data communications.
be prevalent.
32
The number of devices, the amount of Data collection from many sources on the
data, and frequency of communications power grid—such as sensors, meters, and
Data with the devices are necessary. voltage detection—in the customer premises—
Gathering Acceptable latency and required such as sensors for high-consuming appliances
bandwidth for every type of data should and from external sources such as weather—is
also be considered. necessary.
Power system is vulnerable to cyber- Additional traffic on the network and
Security
attacks. bandwidth consumption.
In summary, power system communications require the following:

 Security
 Bandwidth
 Reliability
 Coverage
 Latency
 Backup
However, each area of communications has different levels of requirements. For applications such as
tele-protection circuits and C-RAS fault detection and fast switching to avert transmission grid failure,
high bandwidth is not necessary. However, it requires extremely low latency communication, from 3
to 8 milliseconds. A field area network (FAN), on the other end, requires wide geographical coverage
and low to medium bandwidth. Table 2-8 presents a more detailed mapping of networks and
associated communication requirements in power systems.
Table 2-8: Networks and associated communication requirements
Categories Throughput Latency Burstiness*
< 50 ms for Now
Inter-Utility Network 10–100 Mbps < 8 ms in the High
Future
High-Speed Backbone Network ~3.3 Gbps < 150 ms High
Tele-protection and Other Low
< 1 Mbps < 8 ms High
Latency Network
Substation Bus Network 10–20 Mbps < 8ms High

1 Mbps
Downstream/ 384
Field-Area communication < 150 ms Medium
Kbps Upstream
Total>384 Mbps
Premise Area Network 4 Gbps < 50 ms Medium
*Burstiness is a measure of the variability of traffic (i.e. the peaks and lows).
Also, other important applications and related communication requirements in modern power system
are provided in Table 2-9.
Table 2-9: Communication requirements in terms of latency and data time window
Latency
Application Origin of Data/Place Data Is Required Data Time Window
Requirement
State Estimation All substation/control center 1 sec Instant
Generating substation/application 10–50 cycles
Transient Stability 100 ms
server (167 ms – 830 ms)
Small Signal
Some key locations/application server 1 sec Minutes
Stability
Voltage Stability Some key locations/application server 1–5 sec. Minutes
Post-Mortem All PMU and digital fault recorder Instant and Event
NA
Analysis data/historian Data
Several smart grid applications have already been developed, and some are in the process of
development as a future trend in power systems. To understand their communication needs, a brief
33
and qualitative survey of some of the most important applications in terms of their data requirement
and latency are presented in Figure 2-2 and Table 2-10.
Figure 2-2: Requirements of a smart grid network
Table 2-10: Network requirements for smart grid applications
Application Data Rate/Volume Latency Allowance (One-Way) Reliability

Smart Metering Low/Very Low High Medium
Inter-Site Rapid Response High/Low Very low Very high
Scada Medium/Low Low High
Operations Data Medium/Low Low High
Distribution Automation Low/Low Low High
Distributed Energy Management
Medium/Low Low High
& Control
Video High Medium High
Surveillance Medium
Mobile Workforce Low/Low Low High
Corporate Data Medium/Low Medium Medium

Corporate Voice Low/Very low Mow High
To meet all the performance, coverage, cost, and lifecycle requirements of the network, utilities
require a combination of multiple communication technologies, because no single communication
technology can meet all of their requirements. The dynamic nature and wide range of communication
technologies available today provide power systems with numerous options. However, this also
creates the multiple challenges of choosing the appropriate technology and networking architecture.
Specific technology supporting each particular application varies based on factors such as bandwidth,
34
latency, and reliability. Table 2-11 lists some of the modern power system applications and the
associated communication technologies that may be employed for each application.
Table 2-11: Technology supporting each particular application (L – Low, M – Medium, H – High)
Network Requirements
Infrastructure/Applications Technology Option
Bandwidth Latency Reliability
Optical Transport (DWDM,
Milliseconds
High-speed Backbone H H SONET)
to Seconds
MPLS and IP-based fabric
Wired and wireless
Milliseconds
carrier/utility company owned
Inter-utility Area Network H to M
wireless networks satellite,
Seconds
microwave
Fiber optic, microwave,
Phasor Measurements H Milliseconds H
broadband wireless
IEC 61850, hardened
Tele-Protection Network L Milliseconds H
routers/switches
Remedial Action Scheme L Milliseconds H Fiber optic, microwave
Centralized Remedial Action
H Milliseconds H Fiber optic, microwave
Scheme
Fiber optic, microwave, low
Protective Relaying L Milliseconds H
latency wireless, copper
IEC 61850, hardened
Substation LAN L Milliseconds H
routers/switches
IP-based fiber optic,
Transmission and Substation Milliseconds
M H microwave, copper lines,
SCADA to Seconds
satellite
Wired and wireless
Seconds to carrier/utility company owned
Field Area Network M M
Hours wireless networks satellite,
microwave
T&D Crew of the Future H Seconds H Broadband wireless
Fiber optic, microwave,
Outage Detection L Minutes H broadband wireless, unlicensed
wireless mesh
Distribution Automation (routine Microwave, satellite, unlicensed
L Minutes M
monitoring) wireless mesh
Distribution Automation (critical Microwave, satellite, unlicensed
L Seconds H
monitoring and control) wireless mesh
Distributed Generation
L Seconds H Microwave, satellite
monitoring
Distributed Generation control L Seconds H Microwave, satellite
Advanced Metering (meter
Seconds to Unlicensed wireless mesh, PLC,
reading, disconnect, M M
Minutes Zigbee
communication to HAN)
Minutes to Microwave, broadband
Data Beyond SCADA M M
Hours wireless, satellite
Outage Detection (thru Fault Fiber optic, microwave,
Indicators, Protection systems or L Minutes H broadband wireless, unlicensed
advanced meters) wireless mesh
Wired and carrier owned/utility
Seconds to
Premise Area Network H M company owned wireless
Minutes
networks satellite, microwave
Dynamic Pricing L Minutes M Internet, ZigBee
Plug-in Electric Vehicle L Minutes M Zigbee, PLC
Demand Response L Minutes H Zigbee, PLC, paging systems
Wired or wireless broadband,
Home Area Network Interface L Minutes M
Zigbee
*
EDISON, Southern California.
35
Different modern technologies can be used in order to improve the functionalities of power systems
and remove associated problems and solve challenges. There is not a finalized architecture for future
power system communication infrastructure. However, the following technology options are the most
promising at this point:
Table 2-12: Communication technology options

Characteristic Solution
Multi-protocol label switching (MPLS).
Dense wave division multiplexing (DWDM).
High-speed Backbone End-to-end IP-based fabric.
Continue to use advanced fiber optic, microwave and satellite networks.
To address needs for connecting of millions of end points.

Migration from IPv4 to
IPv6
IEC 61850 protocol to transform the substation communications networks
from serial (i.e. SCADA RTU) to IP-based communications using IEC 61850-
compliant IEDs and utility-grade rugged IP routers.
Substation LAN Hardened and advanced routers and other networking equipment with
scalable architectures to enable reliable and secure two-way communication
between substation SCADA equipment and the EMS.
2.5 REFERENCES
[1]. Advanced Data Analytics Techniques: Analysis and Applications for Power System Operation and
Planning Support. EPRI, Palo Alto, CA: 2015. 3002007076
[2]. M. Kezunovic, L. Xie, S. Grijalva, P. Chau, and et al, Systematic Integration of Large Datasets for
Improved Decision-Making, PSERC 2015.
[3]. Substation Data Integration and Analysis: Study Report. EPRI, Palo Alto, CA: 2011. 1019916
[4]. J. Perez, “A guide to digital fault recording event analysis,” in 63rd Annual Conference for
Protective Relay Engineers, 2010, pp. 1-17.
[5]. S. Santoso, and D. D. Sabin, “Power quality data analytics: Tracking, interpreting, and predicting
performance,” in IEEE Power and Energy Society General Meeting, 2012, pp. 1-7.
[6]. W. Strang, and e. al., “Considerations for Use of Disturbance Recorders ” in System Protection
Subcommittee of the Power System Relaying Committee of the IEEE Power Engineering Society,
2006.
[7]. "Next-generation power quality meters," 2015; Available online
[8]. W. Xu. "Working Group on Power Quality Data Analytics Objective & Scope," 2015;
http://grouper.ieee.org/groups/td/pq/data/downloads/PQDA-Objective-and-Scope.pdf.
[9]. "PQView," 2015; http://www.pqview.com/.
[10]. Sensor Technologies for a Smart Transmission System, EPRI, 2009.
[11]. Integration of Internal and External Data Sources to Support Transmission Operations, Planning,
and Maintenance, EPRI, 2014.
[12]. M. Kezunovic, L. Xie, S. Grijalva, P. Chau, and et al, Systematic Integration of Large Datasets for
Improved Decision-Making, PSERC 2015.
[13]. P.-C. Chen, T. Dokic, and M. Kezunovic, “The Use of Big Data for Outage Management in
Distribution Systems,” in Int. Conf. on Electricity Distrib. (CIRED) Workshop, Rome, 2014.
[14]. K. L. Cummins, E. P. Krider, and M. D. Malone, “The US National Lightning Detection
Network<sup>TM</sup> and applications of cloud-to-ground lightning data by electric power
utilities,” IEEE Trans. Electromagnetic Compatibility vol. 40, no. 4, pp. 465-480, 1998.
[15]. P.-C. Chen, T. Dokic, and M. Kezunovic, “The Use of Big Data for Outage Management in
Distribution Systems,” in Int. Conf. on Electricity Distrib. (CIRED) Workshop, Rome, 2014
[16]. https://www.nrel.gov/docs/fy17osti/67553.pdf
[17]. http://www.te.com/content/dam/te-com/documents/sensors/global/TE_SensorSolutions_Wind-
Turbines.pdf
36
3. DATA-ANALYTICS TECHNIQUES
Information management in companies is becoming a process of much relevance. The goal is to
discover knowledge from raw data generated during operation of the processes. Traditionally, data is
used for purposes of process control; sometimes it was processed to get graphs of what was going
with the process (situational awareness). Now the decision-makers want the data to be transformed
into useful information for decision-making (decision support).
In the early 1970s and 1980s, decision-support applications such as administrative information,
predictive analytics, and online analytical processing (OLAP) have emerged and expanded the
decision-support system domain. In the early 1990s, business intelligence (BI) played a pivotal role to
increase value and performance of the enterprise. As a technology-driven process, BI helped
corporate users make critical decisions by analyzing data and presenting information. BI involves a
variety of tools, applications, and methodologies that collect data, prepare it for analysis, run queries;
then analytical results such as reports, dashboards, and data visualizations are available to decision-
makers.
The natural evolution of BI is data analytics (DA) or advanced analytics. There is no unique definition
for the term “advanced analytics,” but it usually refers to tool types that based on predictive analytics,
data mining, statistical analysis, digital signal processing, artificial intelligence, natural language
processing, and other mathematical processes that attempt to recognize and validate data patterns
and trends and draw conclusions therefrom. Data-analysis techniques can be combined with other
analytical disciplines, such as descriptive modelling and decision modelling or optimization with the
main objective to provide support for making better decisions. Many of analytics techniques appeared
in the 1990s. Today, the data sets are significantly larger than before and most of these techniques
adapt well to minimal data preparation.
By using advanced analytics, utilities can study electricity usage data to understand and learn the
state of the load and operations, and customer behavior. The advanced analytics can help to discover
knowledge and facts that benefit business. By examining large volumes of data with details, useful
information from hidden patterns and unknown correlations can be extracted to make better
enterprise decisions.
Data-analytics techniques have been applied across many industries, but the practice in the energy
and utility sector is behind the other industries in terms of actual implementation. However, some of
the implementation of analytics techniques (EPRI, Jan 28, 2016) used in the utility industry already
show promising outputs. In order to enable secure, reliable, and interoperable operation of the power
grid, an information-based framework is to be integrated into the electrical transmission grid. A large
and heterogeneous collection of data from a multitude of measurements, status, or third-party data in
various formats is used in constructing the framework. Data analytics is able to identify its unrevealed
patterns, predict the prospective outcomes, and recommend appropriate decisions. Visualizing the
current situation as well as future situations and scenarios could help to increase the situational
awareness of operators; the visualization has to cooperate with data analytics. There is no unique
classification of advanced analytics techniques, but each technique can contribute to data analytics of
modern power system operation, especially in the situational awareness. In this brochure, the
advanced data-analytics techniques are divided into six categories:
1. Data mining and Association Rules
2. k-Nearest Neighbor
3. Supervised Learning and Unsupervised Learning
4. Probabilistic Networks
5. Deep Learning
6. Visual Analytics
These six well-known categories are described in this section. These data and visual analytics
techniques could apply to both real-time data and online and offline simulation data of electrical
transmission grids to prepare short- or long-term scenarios and models. Therefore, operators could
increase the situational awareness by visualizing the output of important information.
37
3.1 DATA MINING AND ASSOCIATION RULES

3.1.1 Brief definition
Data mining is a process of analyzing data from varies perspectives and transferring it into useful
information. Alternatively, it can be defined as the process of data selection and exploration and
building models using vast data stores to uncover previously unknown patterns and using that
information to build predictive models.
Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are
accumulating vast and growing amounts of data in different formats and different databases. Such
data includes:
 Operational or transactional data, such as sales, cost, inventory, payroll, and accounting.
 Nonoperational data, such as industry sales, forecast data, and macro-economic data.
 Metadata, which is data about the data itself, such as logical database design or data
dictionary definitions.
The patterns, associations, or relationships among all this data can provide useful information.
Information can be converted into knowledge about historical patterns and future trends.
3.1.2 Technical description
Generally, any of four types of relationships are sought in data mining:
 Classification (Classes): Stored data is used to locate data in predetermined groups. For
example, a restaurant chain could mine customer purchase data to determine when
customers visit and what they typically order. This information could be used to increase
traffic by having daily specials.
 Clustering (Clusters): Data items are grouped according to logical relationships or consumer
preferences. For example, data can be mined to identify market segments or consumer
affinities.
 Associations: Data can be mined to identify associations.
 Sequential patterns: Data is mined to anticipate behavior patterns and trends and detect
deviations (find anomalies). For example, an outdoor equipment retailer could predict the
likelihood of a backpack being purchased based on a consumer’s purchase of sleeping bags
and hiking shoes.
Data mining consists of five major elements:
 Extract, transform and load transaction data into the data warehouse system.
 Store and manage the data in a multidimensional database system.
 Provide data access to business analysts and information technology professionals.
 Analyze the data by application software.
 Present the data in a useful format, such as a graph or table.
3.1.3 Application domains
Data mining is primarily used today by companies with a strong focus on the consumer, such as in
retail, financial, communication, and marketing organizations. It enables these companies to
determine relationships among “internal” factors such as price, product positioning, or staff skills, and
“external” factors such as economic indicators, competition, and customer demographics. And, it
enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it
allows them to drill down into summary information to view detailed transactional data. With data
mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions
based on an individual’s purchase history. By mining demographic data from comments or warranty
cards, the retailer could develop products and promotions to appeal to specific customer segments.
For example, Netflix mines its database of video rental history to recommend rentals to individual
customers. American Express can suggest products to its cardholders based on analysis of their
monthly expenditures. WalMart is pioneering massive data mining to transform its supplier
38
relationships. It uses this information to manage local store inventory and identify new merchandizing
opportunities.
3.1.4 Potential applications
Some examples of data-mining applications for electric power utilities are customer relationship
management (CRM) to track behavior; power plant maintenance; electrical transmission grid planning
(Chen, Onwuachumba, Musavi, & Lerley, 2017) and operation; human resource management; fraud
detection; and finding anomalies. See Section 4 for a more detailed description of applications of data
mining in the power industry.
3.2 K-NEAREST NEIGHBOR
The k-nearest neighbor (k-NN), which is also referred to as lazy learning, case-based reasoning, and
instance-based learning, is a well-established classification method that is based on closest training
sets in the feature space. The main idea of k-NN, which could be explained as a sample’s category, is
decided by its k most similar samples. The sample falls into a category that contains the largest
number of its k most similar samples.
The k-NN algorithm is among the simplest of all machine-learning algorithms. The k-NN is a
nonparametric learning algorithm because it does not make any assumptions on the underlying data
distribution. This feature is very advantageous because most of the practical data do not obey the
common theoretical assumptions in the real world. Another feature of k-NN is that it is highly adaptive
to local information. A k-NN algorithm utilizes the closest data points for estimation; it is capable of
taking full advantage of local information and form highly nonlinear and adaptive decision boundaries
for each data point.
k-NN compares a group of training objects (k) that are closest to the test objects and label the
influential class in the neighborhood. Three essential elements are included in this process:
 A set of labeled objects (e.g., a set of stored records (data)).
 A distance measurement or a similarity metric.
 The number of nearest neighbors, the value of k.
Once an unlabeled object is provided, the distance of this object to the labeled objects is computed.
Based on the data, k-nearest neighbors are identified, and the class labels of the nearest neighbors
are utilized to determine the class for this unlabeled object. Multiple training and testing sets with
random data from different sets could mitigate bias presented by noise or irrelevant data and thus
improve the performance of k-NN.
3.2.3 Application domains
k-NN is commonly applied to solving classification problems. Offline analysis helps to generate rules
for different data classes, and online analysis could initiate decision trees for classification purpose.
3.2.4 Potential applications in smart grid
One form of such classification is used for classifying historical load consumption data into three
different classes (iTesla (Innovative Tools for Electrical System Security within Large Area), July 29,
2013). The classification is based on training and testing load consumption data, and the training class
is prepared based on the cumulative distribution of the target load.
Another application of k-NN algorithm is the classification of abnormal data from a PMU. The example
in Figure 3-1 shows that k-NN is trained with phase angle difference data, which defines abnormality.
If the test data has an abnormal phenomenon, k-NN can detect this phenomenon based on the
training provided. It will be one of the online data-mining applications for PMUs. The purpose of such
classification could validate the PMU data.
39
Figure 3-1: k-NN classification of abnormal PMU data
3.3 MACHINE LEARNING

3.3.1 Supervised and unsupervised learning
Although the machine learning taxonomy is extensive, the most classical setup establishes two types
of machine learning: supervised (SL) and unsupervised (UL). The former infers a model by relating an
output y (also called labels) with one or more inputs (i.e. features) x, such that y=f\(x\). The feature
vector is denoted in bold to specify that it is composed of several features (i.e. n-dimensional). Labels
can be categories (e.g., a failure report on electric infrastructure is true or false), in which case the
solved problem is called classification, or they can be continuous values (e.g., the daily power demand
on a specific location), in which case the problem is called regression. Further, it is called supervised
because we want to model a specific relation, the one that is given explicitly between y and x.
On the other hand, unsupervised learning tackles problems related to building probabilistic models
from unlabeled data. The goal is to discover hidden patterns within data in the form of hierarchies or
groups. These patterns are obtained by making use of the statistical structure of the provided data.
However, assessing unsupervised learning performance is difficult due to true patterns and
probabilistic distributions where data sources are unknown.
The most basic steps to train a learning model are:
1) information gathering and pre-processing
2) Model training
3) Model testing
In the first step, information is obtained and processed to reduce noise and enhance model
performance and assumptions made over the modelled process. The next step consists of iteratively
estimating models by reducing the fitting error. The final step is different for SL and UL. In the
former, the model is tested using unlabeled data; in the latter, patterns discovered are subject to
additional analysis to extract information. Figure 3-2 succinctly presents a diagram of supervised and
unsupervised learning.
40
Figure 3-2: Supervised learning (upper rectangle) and unsupervised learning (lower rectangle)
Observe that while both SL and UL fits a model to some data according to a vector parameter, the
final objective is different. The former uses a model to relate a dependent variable with its
explanatory features as is detailed by the labeled data; then, the model is used to predict unknown
data. The latter uses unlabeled data. Consequently, the true relation between data is unknown
beforehand. Thus, data is grouped, and then the resulting patterns are evaluated.
The remainder of this section presents machine-learning techniques and applications used for power
systems. Such applications are neither an extensive list nor necessarily the best solutions. Rather, this
list aims to show how ML techniques can be used to obtain robust regression models or to extract
valuable information about the application problems. First, supervised algorithms as linear regression
(LR), decision trees (DTs), artificial neural networks (ANN), and support vector machines (SVM) are
presented. Then, an unsupervised learning technique called K-Means is discussed. Additionally, other
clustering methods are mentioned, and references are provided where needed.
Formally, a generic SL problem can be stated as follows:
Given a dataset of the form of {𝒙𝑖 , 𝑦𝑖 }𝑛𝑖=1 ∣ 𝑋 ∈ ℝN × 𝑌 ∈ ℝ, where 𝑋 is an N-dimensional space of
features and 𝑌 is the corresponding response, we are asked to estimate the relation between 𝑦𝑖 =
𝑓(𝒙 ∣ 𝜽), where 𝜽 are the function's parameters. In classification, the response variable is binary 𝑌 ∈
{±1}, whereas, in regression, the response is continuous 𝑌 ∈ ℝ. For instance, the problem of
forecasting a generator's failure (given measurements of humidity, vibrations, thermic energy, gases,
and aging) is a classifying problem (i.e. fails or not), whereas the prediction of the daily wind power
generation of a wind farm is a regression problem. It is worth mentioning that, the relation 𝑌~𝑋 is
estimated by minimizing an error criterion that ensures that the inferred function generalizes as
accurately as possible the true underlying process.
3.3.2 Linear regression
3.3.2.1 Brief definition and technical description
The linear regression model is one of the oldest, most renowned, and most used models for statistical
and ML applications (James, Witten, Hastie, & Tibshirani, 2013; Hastie, Tibshirani, & Friedman, 2009).
This model is simple and leads to robust solutions. It is readily interpretable by non-expert users and is
accessible to code. Nonetheless, linear regression makes some naive assumptions about the modelled
process (e.g. the process can be approximated by a linear combination of variables, deviations from the
model obey a normal distribution, and so on), assumptions that are hardly met by real-world problems.
41
However, even while a large list of newer and more robust algorithms exists, still today LR remains the
workhorse of several industries. Some of the most important characteristics of LR are its high
interpretability (i.e. we know how much the dependent variable will change with respect each feature),
while additional analysis can be performed using the trained model itself (i.e. features ranking). For
instance, by using LR in a power generation forecasting application, we can know how each of the
measured variables (e.g. humidity, rain, mean transformer losses) impact the power generation output
in electrical power units.
Further, LR has been subject to several extensions to enhance its robustness and precision (Rao,
Toutenburg, Shalabh, & Heumann, 2008). Even more, with the advent of big data technologies, LR has
regained popularity and predictive power (Ma & Sun, 2015; Ma & Cheng, 2016). It is worth noting that,
in literature, LR usually refers to a model with only one explanatory variable, whereas multiple linear
regression (MLR) refers to a model with two or more explanatory variables. In this work, we refer to
both as LR. Colloquially, an LR model can be understood as RESPONSE = FIT + RESIDUAL. In this
expression, RESPONSE stands for the variable of interest; the FIT term represents a linear combination
(a summation) of measured features related to the response; the RESIDUAL term represents an
unpredictable error/noise of the observed values with respect to the model’s prediction. For illustrative
purposes, we will first introduce a one-variable LR model:
𝑌 = 𝑓(𝑋) → 𝑦𝑖 = βi,0 + 𝛽𝑖,1 𝑥𝑖,1 + 𝜖.
The former is a line equation where 𝛽i,0 stands for the intercept (i.e. the expected value of Y when X =
0), βi,1 for the slope’s line (the average increase in Y with a one-unit increase in X), and 𝜖 is the
irreducible error or noise made in the model (James, Witten, Hastie, & Tibshirani, 2013; Hastie,
Tibshirani, & Friedman, 2009; Shalev-Shwartz & Ben-David, 2014). 𝛽 corresponds to the weights
assigned to each variable and are the parameters of the model. Then, the problem is reduced to find
𝛽0 and 𝛽1 such that the difference between sample data labels Y and predicted labels 𝑌̂ is minimized.
For two or more variables, the LR model is simply defined as:
𝑗=1 𝛽𝑗 𝑥𝑖,𝑗 + 𝜖, (1)

𝑦𝑖 = β0 + 𝛽1 𝑥𝑖,1 + ⋯ + 𝛽𝑁 𝑥𝑖,𝑁 + ϵ = β0 + ∑𝑁
Where the vector 𝜷 stands for all the weights of the LR model. Consequently, the training of Equation
(1) is reduced to find 𝜷 such that the resulting hyperplane (i.e. line in more than three dimensions) is
as close as possible to the data points. So far, we have neglected how to find the parameters of the
model. The former requires that LR minimize an error criterion, in its most basic setup, the residual
sums of squares (RSS). Let 𝑦̂𝑖 be the prediction of the model for the 𝒙𝒊 sample point, and 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖
the error between the true and the forecasted values. Then, the RSS is defined as:
𝑛
𝑅𝑆𝑆 = ∑ 𝑒𝑖 .
𝑖=1
3.3.2.2 Application domains

LR applied to building energy consumption
Building energy consumption is the main component of worldwide consumption and carbon dioxide
emissions. Nowadays, LR-based models have been successfully proposed for predicting how much and
when energy will be consumed for single buildings (Asadi, Shams, & Mohammad, 2014) and building
blocks (Ma & Cheng, 2016). On the other hand, understanding the relation between building energy
consumption and its components is essential for developing adequate energy-management policies
(Hsu, 2015; Walter & Sohn, 2016; Chung, 2012). For instance, a building’s energy consumption used
by indoor comfort such as heating, ventilation, and air-conditioning (HVAC) account for 65% (Lam,
Wan, Liu, & Tsang, 2010). In this regard, a penalized LR has been used for automatic identification of
energy system components such as operational schedule, number of customers, lighting control,
employee behavior, and maintenance in commercial buildings (Hsu, 2015).
In another instance (Braun, Altan, & Beck, 2014), an LR model was proposed to predict a U.K.
supermarket electricity and gas consumption. Given the particular conditions of a supermarket building
(i.e. large refrigerated shelves), it was found that climate changes in relative humidity and temperature
are expected to increase the electricity consumption by 2.1%, whereas gas will decrease by 13% (Braun,
Altan, & Beck, 2014).
LR applied to energy policies
42
Energy policies constitute laws and actions to address energy infrastructure development, production,
distribution, and consumption. One of such policies with more significant impact on energy consumption
is retrofitting, which improves energy consumption (e.g. lighting, indoor comfort) by replacing older
electrical components with newer ones. LR was employed to assess the cost-saving benefits of
retrofitting commercial and residential buildings (Walter & Sohn, 2016).
Similar results were obtained in the work by Huebner, Hamilton, Chalabi, Shipworth, & Oreszczyn in
2015. Using an LR model, they assessed energy consumption of a U.K. housing stock by categories of
predictors such as building variables, socio-demographics, heating behavior, and psychological factors.
They found that a building’s electrical components explain by far most of the variability in energy
consumption, thus supporting retrofitting policies (Huebner, Hamilton, Chalabi, Shipworth, & Oreszczyn,
2015). Furthermore, the construction of smart buildings and smart energy policies require building
energy consumption benchmarks. In this sense, expert knowledge and non-technical regulations need
to be integrated into benchmarks. Thus, in (Chung, 2012), a fuzzy-LR model was developed for
benchmarking building energy consumption, including expert knowledge.
LR applied to utility companies
Predicting energy load (Hong, Gui, Baran, & Lee, 2010), demand (Kandananond, 2011), and
consumption (Tso & Yau, 2007) plays an important role in decision-making and planning for utility
companies. For instance, the long-term load forecasting can be employed for transmission and
distribution (T&D) planning, whereas a short-term can be used for the demand-side management
(DSM). DSM is particularly important to reduce peak electricity demand while maximizing utility
generation capacity (Hong, Gui, Baran, & Lee, 2010). Another LR application for utility companies is the
assessing of the reliability and security of a power system (Halilcevic, Gubina, Strmcnik, & Gubina,
2006). In this sense, LR can be employed to identify the critical components of energy transmission in
power supply networks. By knowing this, utilities can perform better managerial actions such as power
reserving and transmission network reinforcement planning (Halilcevic, Gubina, Strmcnik, & Gubina,
2006).
3.3.3 Decision and regression trees
3.3.3.1 Brief definition
Classification and regression trees (DTs) were introduced to the AI area in the mid-1980s. Even though
classification trees and regression trees perform different tasks, they are both referenced here as DTs.
DTs can be used for supervised and unsupervised tasks. However, later applications are beyond the
scope of this document; further, even though DTs can perform regression, binary, and multi-
classification, for pedagogical purposes, we constrain DTs algorithm explanation to binary classification.
In such a setup, DTs build a rule model for separating two classes (e.g. 𝑦 = {±1}) graphically presented
in the form of a tree, thus their name. More precisely, DTs perform a partition of the feature space into
subspaces where a simple model (e.g. the most common class) is fitted (Hastie, Tibshirani, & Friedman,
2009).
DTs have positive and negative characteristics: on one hand, they are interpretable as rules providing
an explanation between x measurements and target value y, they can handle different types of data
(e.g. numerical, categorical, nominal) and missing data at the same time, and they are computationally
cheap (Rokach & Maimon, 2015). On the other hand, DTs suffer from high variance (i.e. they tend to
over fit the model to training data, performing poorly with new data) reducing its performance against
more robust classifiers (James, Witten, Hastie, & Tibshirani, 2013). Nonetheless, DTs performance can
be enhanced by constraining the tree parameters such as the depth of the tree or using combinations
of trees to reduce variance. Using statistical methods like bagging a DTs forest can be grown and used
as a single classifier/regressor. Such statistical methods and how to combine the forest into a single
function are elsewhere documented.
3.3.3.2 Technical description
DTs models are composed of branches and nodes. Branches connect each node in a directed way (i.e.
from A to B). Except from the root node, all other nodes have an incoming branch from a previous
node, whereas except from terminal nodes, each one has a pair of outgoing branches. Each node
corresponds to a decision or split of the feature space. Nodes can be of three types: root, internal, or
terminal (i.e. leaf node). The root is the starting node, and it performs the best dichotomic partition of
the feature space between two given classes and connects to a pair of internal/terminal nodes. Internal
43
nodes correspond to intermediate steps where feature space is further split into more specific sub-
spaces in accordance with some criterion. Terminal nodes correspond to a final decision on the analyzed
point.
It is worth mentioning that terminal nodes rather than a class allocation can be interpreted as the
probability of each class. Moreover, DTs display an explicit hierarchy between features: the root node
(the first variable to perform a split) is the most important feature for the problem, the internal nodes
are the second most important variables, and so on.
Thus, a binary classification DTs is a function 𝑓(𝒙) = 𝑦, which predicts the class or probability of any
instance x by taking decisions following binary rules described by nodes. A binary tree is built as follows:
1. Identify the feature that performs the best separation between classes.
a. Find the best split-point of the feature (the value in which the best separation is
obtained).
b. Divide the feature space into two distinct and non-overlapping regions 𝑅𝑖 𝑦 𝑅𝑗 .
2. If the maximum tree depth is reached or the stopping criterion is met, assign to every
observation in the region 𝑅𝑗 the most common class. Else, identify a new feature and its split-
point to separate region 𝑅𝑗 into two new sub-partitions.
3. Repeat step 2.
As an example, a simple DT for detecting faults in a transmission line during a storm is shown in
Figure 3-3. On the right side, the tree classifier is depicted; on the left, the partition performed in the
feature space is shown. Sampled transmission lines under storm conditions are shown in the feature
space (B part of the figure). Orange dots correspond to lines, which suffered a failure, whereas gray
dots are the non-interrupted transmission lines. Features on this example are precipitation, which is
the continuous variable, and thunderbolts, which is categorical one.
Figure 3-3: A simple DT model for detecting faults in a transmission line
On the A side, the tree constructed for precipitation and thunderbolts is shown; split-points for each
feature are shown on edges, while labels under each terminal node corresponding to the region defined
on the feature space. Further, on each node is also shown the frequency of faults/no-fault and the
corresponding probability. On the B side, the feature space, which is divided into regions R1, R2, and
R3, are presented. In this example, if precipitation is less than a 10 cm^3 threshold (R1), transmission
lines will be classified as no-fail with a 91% probability, whereas failure during a storm with precipitation
below the former threshold will have a very low probability (0.09).
Furthermore, so far we have neglected some important concepts like the criterion for selecting partition
features, how to select split-points, and how to determine a DTs depth. Thus, readers are referred to
(James, Witten, Hastie, & Tibshirani, 2013; Hastie, Tibshirani, & Friedman, 2009; Rokach & Maimon,
2015) for more DTs details.
44

DTs applied to building energy consumption
As was mentioned in the LR section, understanding and forecasting energy consumption patterns in a
building allow for reducing CO2 emissions and manage energy load by regulating demand. Properly
managed energy consumption also requires understanding how the electrical components impact the
overall building consumption. For instance, building designers and architects require tools that can allow
them to predict a new building’s energy usage patterns based on atmospheric data, building architecture
and household characteristics, and energy sources (Yu, Haghighat, Fung, & Yoshino, 2010).
Recently, DTs have been employed to forecast Energy Use Intensity levels (i.e. the ratio of annual total
energy used between the building’s floor area) for buildings across Japan (Yu, Haghighat, Fung, &
Yoshino, 2010). One of the most important contributions of this work is the analysis of rules obtained
from the classification tree. By analyzing the hierarchy of features on the DT, they found that different
sets of features impact a building’s consumption in accordance with district temperatures. In this regard,
we could test other data sources like sun movement and clear-sky solar irradiance on a building’s
surface. Thus, by re-training the DT model and analyzing where such variables are located in the
hierarchy of the tree, we may conclude whether such variables are significant or not in characterizing a
building’s energy consumption.
3.3.4 Artificial neural network
Artificial neuronal networks (ANNs) are computational networks that try to simulate the decision process
that occurs in biological networks of neurons in a central nervous system (Graupe, Sep 2013) (Kalogirou,
Dec 2001) (Russell & Norvig, Dec 13, 1994). Similar to biological neurons, an ANN can be described as
a massively parallel-distributed processor that stores knowledge and makes it available for use (Haykin,
1999). According to (Kalogirou, Dec 2001), ANNs “are good for tasks involving incomplete data sets,
fuzzy or incomplete information and for highly complex and ill-defined problems, where humans usually
decide on an intuitional basis. They can learn from examples and are able to deal with non-linear
problems.” An ANN is a group of interconnected artificial neurons, interacting with one another in a
concerted manner. In such a way, excitation is applied to the input of the network. It resembles the
human brain in two respects:
1) Knowledge is acquired by the NN by means of a learning process.
2) Inter-neuron connection strengths known as synaptic weights are used to store the knowledge.
They learn the relationship between the input parameters and the controlled and uncontrolled variables
by studying previously recorded data, similar to the way a nonlinear regression might perform
(Kalogirou, Dec 2001).
The network consists of three elements:
1) Input layer
2) Hidden layers
3) Output layer (see Figure 3-4 [22]).
45
Figure 3-4: ANN1 schematic diagram of a feed-forward NN

In its simple form, each neuron is connected to other neurons of a previous layer employing adaptable
synaptic weights, and knowledge is stored as a set of connection weights. An ANN is composed of many
nodes connected by links, where each link has a numeric weight (Russell & Norvig, Dec 13, 1994) (see
Figure 3-5).
Training is the process of modifying the connection weights in some order using a learning method that
usually takes place by updating weights (Russell & Norvig, Dec 13, 1994). By means of this learning
mode, an input is presented to the network along the desired output and weights are adjusted, so the
network attempts to produce the desired output, value of weights is acquired after training (see Figure
3-5 [22]):
Figure 3-5: ANN2 information processing in ANN

The basic idea shown in Figure 3-5 is that each node or neuron makes a local computation based on
inputs from its neighbors but without the need for any global control over the set of units as a whole
(Russell & Norvig, Dec 13, 1994). The node receives weighted information from other nodes. First, they
are added and then passed to the activation function. For each of the outgoing connections, this value
is multiplied by the specific weight and transferred to the next node (Kalogirou, Dec 2001). In practice,
ANNs are implemented in software like Matlab, SPSS, Weka, Rapidminer, and Java, among others.
According to (Kalogirou, Dec 2001), for training, the ANN is a necessary training set or group of matched
input and output patterns. Each produced output through the network is compared to the desired
output. To reduce the error to the desired tolerance, the ANN often needs to run repeatedly by altering
connections weights. When the training achieves a desirable level, the network locks the weights
constant and uses this trained network to make decisions, identify patterns, or define associations in
new input datasets.
46
The most popular and powerful algorithm of ANN is back propagation (BP). The train of all patterns of
a dataset is called an epoch. BP tries to improve the performance of NN by reducing the total error by
changing weights and its gradient. The error is expressed by the RMS (root-mean-square), a zero-error
value indicates that all the computed output patterns match the expected values, and therefore network
is well trained (Kalogirou, Dec 2001).
According to (Kalogirou, Dec 2001), ANNs are able to learn the key information patterns within a multi-
dimensional information domain which is fault-tolerant, robust, and noise-immune (Rumelhart, Hinton,
& Williams, 1986). Data from energy systems are noisy, making the data a good candidate to be
analyzed with neural networks. ANN has been applied to predict and optimize energy use in commercial
buildings—particularly in HVAC in commercial buildings—without sacrificing comfort (Kreider, Wang,
Anderson, & Dow, December 1992) (Curtiss, Brandemuehl, & Kreider, January 1994).
ANNs have been applied to the diagnosis of line faults of power systems and load forecasting in power
systems. ANNs were used to model the combustion process of incineration plants with the purpose to
optimize the reduction of toxic emissions (Muller & Keller, 1996). In (Milanic & Karba, 1996), ANNs were
used for predictive control of a thermal plant, by using the steam flow as input and a simple network
structure because on-line predictions of plants are faster. In (Mandal, Sinha, & Parthasarathy, 1995),
ANNs were applied for short-term load forecasting in electric power systems. The output of the ANN
was the next hour load, and no weather variables were considered. In (Khotanzad, Abaye, &
Maratukulam, 1995), a recurrent neural network (RNN) load forecaster was used for hourly prediction
of power system loads. In (Datta & Tassou, 1997), ANNs networks were used for prediction of the
electrical load in supermarkets.
ANNs are used in wind energy systems and can be grouped into three major categories: forecasting
and prediction, prediction and control, and identification and evaluation (Keles, Scelle, Paraschiv, &
Fichtner, 2016).
Forecast methods for day-ahead electricity prices are essential for energy traders and supply companies.
ANN has to be used to successfully forecast day-ahead electricity prices, providing even better results
than ARIMA (Keles, Scelle, Paraschiv, & Fichtner, 2016).
Finally, ANNs are used for the implementation of a wide variety of anomaly-detection systems,
including intrusion detection systems (IDS) for network computers in the electric energy sector as well
as advanced IDS for the smart grid in an ensemble with other algorithms (Aburomman & Reaz, March
2017).
3.3.5 Support vector machine (SVM)
The support vector machine (SVM) was developed by Vapnik and others during the 1990s (Scholkopf &
Smola, 2002). SVM was initially developed as a linear classifier, although it is somewhat famous for its
capacities to handle noisy nonlinear data. SVM has also been extended to the problem of regression,
probability estimation, clustering, and so on (Scholkopf & Smola, 2002). However, because SVM main
features are shared among all distinct SVM applications, we limit the description of the algorithm to
classification problem.
To introduce SVM, we first require introducing the empirical risk minimization (ERM) principle. It is the
most used criterion to train any ML model: It only requires that the model achieves the lowest possible
error on the training sample. Achieving the lowest error rate on a given sample only requires to model
every possible case. However, such model will be so particular that will perform poorly on out-of-the-
sample points. On the other hand, SVM was designed based on the structural risk minimization (SRM)
principle (Scholkopf & Smola, 2002). The former establishes a bound that relates generalization (i.e.
how well the model explains unseen samples) to the simplicity of the model (i.e. if the model is too
complex, it should perform poorly on unseen data). Thus, SVM uses the simplest family of functions
and hyperplanes to approximate a given sample. Further, to constrain the number of valid hyperplanes
and its complexity, a margin around the hyperplane is added.
47
Such a margin guarantees that the problem is convex (i.e. it has an optimal solution(s)), improving on
the computational burden of estimating the hyperplane. Another important feature of the SVM
formulation is that the hyperplane is described using a reduced set of the sample points known as the
support vectors (SVs). Thus, in a classification problem, learning the optimal hyperplane with a given
sample is reduced to find the support vectors. Because only the SVs are required to build the hyperplane,
all remaining training points are disregarded. An SVM toy example of two dimensions is shown in Figure
3-6.
Figure 3-6: A toy example of a linearly separable problem

SVM’s hyperplane is the area between the blue and gray dots. SVs circled in red lie exactly on the
margin. Note that the hyperplane function is written as a line. Moreover, the margin is nothing more
than two lines parallel to the hyperplane separated from it by 1/w, where w is the slope of the line and
b is the bias parameter. The w vector was drawn not parallel to attribute 2 for illustrative purposes.
In its original formulation, the margin is considered to be hard, meaning that all SVs must lie on the
margin. Such a formulation is heavily restrictive because it only works with non-noisy linear data.
Consequently, SVM was extended to model nonlinear relations by adding the kernel trick. Kernel
functions map a point from the original space to the feature space. This mapping has several
advantages:
1) The feature space has more dimensions. Thus, it is presumably easier to find a separating
hyperplane.
2) The feature space is a nonlinear map. Thus, it can handle non-linear data.
3) Comparing two vectors in the feature space simply requires multiplying them and then
computing the map, without the computational burden of performing first the mapping for each
point.
Later, SVM was extended to allow for noisy data. The soft-margin formulation permits SVs to violate
the restrictions imposed by the margin (SVs can be found within or beyond the margin).
Although SVM is formulated as an n-dimensional line, it is rather convenient to find it by solving its dual
formulation. The further details of the optimization problem or kernel functions are presented (Scholkopf
& Smola, 2002). Formally, given a data set of the form (xi , yi )1m ∈ RN × {±1} , the optimal hyperplane
is found by solving the problem:
m 1 m
maximize W(α) = ∑ αi − ∑ α α y y k(x , x )
i=1 2 i,j=1 i j i j i j
C
subject to 0 ≤ αi ≤ for all i = 1, … , m,
m
m
and ∑ αi yi = 0.
i
In this equation, αi corresponds to the weight of the ith sample point, yi is its class, and xi is its feature
vector; C is a penalization parameter that controls the complexity of the model (i.e. larger C values
correspond to a simpler model, whereas a zero value produces a very complex one). Once weights (the
48
support vectors αi ) are found, a new instance can be classified using the hyperplane equation (Scholkopf
& Smola, 2002).
3.3.5.3 Potential application in smart grid
SVM applied to transmission lines fault detection
Transmission line faults entail 85% to 87% of power system faults (Singh, Panigrahi, & Maheshwari,
2011). Power systems and electrical grids require reliable transmission lines, detection tools designed
to find early faults may decrease the time that a circuit is interrupted. Protective relays are in charge of
detecting energy or hardware faults and ameliorate their impact. Initially, these protections were
electromagnetic. However, nowadays they are digital and possibly transmitting its measurements to the
internet. The main problem with the detection of any fault resides on the characterization of the current
and voltage signals. Once a proper characterization is chosen (Singh, Panigrahi, & Maheshwari, 2011;
Ray & Mishra, 2016), machine-learning techniques such as DT or SVM can be used to classify fault/non-
fault signals. However, DTs can be heavily biased given that they tend to overfit training data, and such
data is gathered from simulators rather than real measurements. On the other hand, SVM produces
more robust models that are less sensitive to particular simulated conditions. Moreover, with the kernel
trick, complex relations between faults and its characteristic signals can be captured. Even more, adding
data gathered by the protective relay, more solid results may be expected from SVM than DTs.
3.3.6 K-means and clustering
3.3.6.1 Unsupervised learning
This introduction is necessarily incomplete given the enormous range of topics under the rubric of
“unsupervised learning.” For instance, the goal may be to discover groups of similar data points
(clustering), to determine the distribution of data within the input space (probability density estimation),
or to project the data from a high-dimensional space down to two or three dimensions for visualization
purposes. This document focuses on the first objective.
Clustering can be understood as gathering data into groups of similar individuals known as clusters.
Using similarity/dissimilarity measurements, points are assigned to one or more clusters with other data
that share common features. Further, groups may be ordered by hierarchy or be linked to other groups.
Clustering algorithms are preferred for exploratory purposes, such as when there is no a priori
knowledge about relations existing within data. The most iconic clustering algorithm is called K-means.
3.3.6.2 Brief definition and technical description
K-means is a hard-clustering (Gan, Ma, & Wu, 2007; Wu, 2012), partitional (Kaufman & Rousseeuw,
1990) method. On one hand, it is hard because any point only belongs to one cluster. On the other, is
partitional because it divides the feature space into non-overlapping regions. Further, K-means proposes
a single point to represent each divided region of the feature space. The former are called centroids or
means, and they geometrically correspond to the center (mean) of the cluster (Wu, 2012). These k-
means are refined iteratively by minimizing/maximizing some similarity/dissimilarity function among all
the members of each cluster.
K-means is fast, scalable, and has a linear computational cost in regards to the dataset size (Gan, Ma,
& Wu, 2007; Wu, 2012). Nevertheless, K-means requires clusters to be convex (e.g. spherical) and
tends to perform poorly on different-sized groups (Gan, Ma, & Wu, 2007; Wu, 2012). Thus, they are
prone to be outliers and not well-suited for modelling skewed distributions or noisy overlapping groups.
A K-means algorithm works as follows: First, select randomly k centroids and assign the remaining to
the closest centroid. Second, using all the points assigned to the ith-cluster, recalculate its centroid.
Third, if a centroid does not change or changes little or other stopping criteria are satisfied, the algorithm
ends. Else, it reassigns points to the new centroids and returns to the second step.
Formally, given a dataset X = {xi }, xi ∈ ℝN , i = 1, … , m, K-means assigns each xi to a particular cluster
ck ∈ C, k = 1, … , K by minimizing some objective function. As originally formulated, each partition of the
feature space is determined by minimizing the Euclidean distance among the members of each cluster,
and its mean is μk ∈ ℝN . Given that the Euclidean norm for a ck cluster is defined as:
𝐷𝑒𝑢𝑐𝑙 (𝑐𝑘 ) = √∑ (𝒙𝑖 − 𝝁𝑘 ) ,

𝑥𝑖 ∈𝑐𝑘
49
Then the K-means objective function is defined as:

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐾 𝑚
∑ ∑ 𝑾𝐷𝑒𝑢𝑐𝑙 (𝒙𝑖 , 𝝁𝑘 ),
𝝁𝑘 , 1 ≤ 𝑘 ≤ 𝐾 𝑘=1 𝑖=1
where μk is the centroid of the K-cluster, Deucl is the Euclidean distance, and W is m × k matrix that
satisfies:
1. 𝑤𝑖,𝑘 ∈ {0,1} 𝑓𝑜𝑟 𝑖 = 1, 2, … , 𝑚, 𝑎𝑛𝑑 𝑘 = 1, 2, … , 𝐾
2. ∑𝐾
𝑘=1 𝑤𝑖,𝑘 = 1, 𝑓𝑜𝑟 𝑖 = 1, 2, … , 𝑚
K-means performance is determined by several parameters: centroids initialization, an adequate number

of clusters, and the similarity/dissimilarity measure. First, given that initial centroids are designed
randomly, only local convergence is guaranteed (Gan, Ma, & Wu, 2007). However, this shortcoming is
easily overcome by repeating the clustering procedure several times and choosing the partition where
k-means objective function achieves the smallest value. Second, the determination of the number
clusters for a sample is key in the performance of K-means. Although literature in this endeavor is large
and vast, the k parameter is typically defined by experts criterion (Wu, 2012). Lastly, the
similarity/dissimilarity metric must take into consideration the features domain (e.g. numerical,
categorical, strings, and so on) to ensure that a proper distance is measured for the data. Although
typically the Euclidean distance is employed, further measures may be used instead. By doing the latter,
k-means improves its effectiveness and its speed when applied to high-dimensional data (Wu, 2012).
Figure 3-7 poses a hypothetical example of the usage of k-means. Data from faults on transmission
lines during storms is gathered. Measured features are the number of thunderbolts during a storm,
which is a discrete variable, and the precipitation is a continuous one. In this example, data is gathered
from non-fault lines shown as gray dots, line-to-line (L2L) fault lines (a short circuit caused by two
energized lines) are shown in blue, and single line-to-ground (SL2G) faults (a short circuit due to a line
touching the ground or a neutral conductor) are shown in orange. However, to show k-means
functionality labels of lines, fault statuses are omitted. Moreover, for this example, the number of
centroids was fixed arbitrarily to 3.
Figure 3-7: A toy example of clustering transmission lines during storm using K-means
Dots represent different types of fault lines: Gray corresponds to no fault at all, blue represent L2L
faults, and SL2G are shown in orange. Centroids are displayed in red: The initial centroids are shown in
the lightest red, whereas final centroids are shown in vivid red. Dotted red lines display the partitions
corresponding to each centroid. As can be observed, as the optimization procedure unfolds, centroids
altogether with their partitions are tuned (lightest red displays the first iteration, whereas vivid red
shows the final centroid/partition). The corresponding numbers for each iteration are shown on the left
side of the figure.
3.3.6.3 Potential applications in smart grid
K-means applied building energy consumption
50
As has been stressed, before understanding and predicting energy consumption patterns in a building,
it is key to attack climate change and to better define energy management policies, utility planning, and
so on. However, the categorization of buildings is a rather challenging task: They are multidimensional
and heterogeneous. On one hand, the number of components and interactions of a building electrical
system are vast. On the other, each building population is also heterogeneous and composed of many
sub-groups in different locations, with distinct legislations, energy requirements, and so on.
Thus, ways to group buildings into clusters of similar energy consumption patterns are highly valuable.
For instance, given a dataset of buildings and energy consumption (characterized by active power,
reactive power, voltage, and so on), the most trivial approach would consist in applying K-means to
data for exploratory purposes. Because no relation between data is known beforehand, K-means allows
us to explore possible relations among data. In this example, the Euclidean similarity measure is
employed. However, readers must be aware that such distance measure requires continuous
independent features. Afterwards, the number of clusters to be tested is defined, and the results are
displayed. Although numeric performance measures of the clusters exist, visualization of the clusters
may provide more explicit hints on the relations between building energy consumption patterns groups.
3.4 PROBABILISTIC NETWORKS
Probabilistic networks are representations based on graph theory and probability theory, for modeling
domains with uncertainty and for making inferences with uncertain or incomplete information. They
are based on a domain model through a set of random variables and their dependency relationships
represented using a graph. This structure allows representing the joint probability distribution by a set
of local probabilities, which significantly reduces the computational complexity in space and time.
Probabilistic networks include, among others:
 Bayesian networks
 Bayesian classifiers
 Decision networks
These types of models are suitable to represent problems involving uncertainty; applications include
medical and industrial diagnostic systems, user and student modeling, tutor strategies, planning under
uncertainty, voice and gestures recognition, prediction, image analysis, and robotics. Reference
(Kang, S. B., Advances in Computer Vision and Pattern Recognition, 2015) has detailed discussions on
the Bayesian networks, Bayesian classifiers, and Decision networks.
3.4.1 Bayesian networks
A Bayesian network (BN) takes consideration of a set of local parameters. These parameters are the
conditional probabilities for each variable given its network structure in Erreur ! Source du renvoi
introuvable.. Therefore, based on these local parameters, the conditional probabilities can be
represented.Erreur ! Source du renvoi introuvable. Depicts it is an example of a simple BN; the
structure of the graph implies a set of conditional independence assertions for this set of variables.
B C
D E
Figure 3-8: Example of a simple Bayesian network
51

Representation
For example, the joint distribution of a set of n (discrete) variables, 𝑋1 , 𝑋2 , … , 𝑋𝑛 , can be represented
by a Bayesian network (BN). In Figure 3-9, the node A, B, C, D, and E correspond to variables that
associate with conditional probability table (CPT) as indicated 𝑃(𝐸|𝐵, 𝐶). The structure of the network
implies a set of conditional independence assertions, which give power to this representation. The
joint distribution of this set could be represented by giving the structure of the network with
conditional independence assertions. This joint distribution is defined as 𝑃(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) =
𝑃(𝐴) 𝑃(𝐶) 𝑃(𝐵|𝐴) 𝑃(𝐷|𝐵) 𝑃(𝐸|𝐵, 𝐶)
B C
D E
Figure 3-9: Examples of conditional probability tables
Inference
Inference uses a Bayesian network to compute probabilities. Inference involves a general scenario to
compute 𝑃(𝑋|𝐸 = 𝑒), where X is query variable, E=e is evidence (observed) variable; and the joint
distribution 𝑃(𝑋, 𝐸, 𝑌) is known, where Y is unobserved variable. There are two types of the inference:
single-query inference and conjunctive-query inference, which consists of the effects of observed
variables in a Bayesian network to estimate its effect on the unknown variables.
Pearl’s algorithm, inference elimination, conditioning, junction tree, and stochastic simulation are the
algorithms used for Inference.
Structure and parameter learning
Learning problem in Bayesian networks includes structure learning and parameters learning.
When the structure or topology of the BN is known, and sufficient data are available for all the
variables, parameter learning is straightforward and could estimate the CPTs for the variables. If there
is not sufficient data, the uncertainty of the parameters can be modeled and estimated by a second-
order probability distribution like a Beta distribution for this situation.
There are two main types of methods for structure learning: search and score for global methods and
conditional independence tests for local methods. The complex process of obtaining the topology of
the BN for structure learning requires good estimation on the statistical measures. Techniques such as
trees, polytrees, general DAG depending on the type of structure could be used.
Bayesian networks have advantages to express a compact representation of joint probability
distribution of nodes and fit data, it is an efficient way to represent complex probabilistic systems.
Bayesian networks modeling in several real-world application domains are listed such as system
biology, gene regulatory networks, medicine, biomonitoring, document classification, information
retrieval, semantic search, image processing, turbo code, and spam filter.
52

Probabilistic networks can be used in power systems for diagnosis of a fault in different equipment,
detection of inconsistent values in databases or sensors, detection of causes of technical and non-
technical losses of electricity, models for the prediction of energy demand, and models for the
prediction of energy generation.
3.4.2 Bayesian classifiers
Bayesian classifiers are statistical classifiers based on Bayes Theorem. It could perform probabilistic
prediction of a particular sample is a member of a particular class. The Bayesian classifiers could be
supervised or unsupervised. Clustering is for the unknown classes in the unsupervised problem and
priori is for known classes in the supervised problem.
Naïve Bayes classifier, tree augmented Bayesian classifier (TAN), Bayesian network augmented
Bayesian classifier (BAN), semi-naïve Bayesian classifier, multidimensional Bayesian classifier, and
Bayesian chain classifier are among the major Bayesian classifiers.
The formulation of the Bayesian classifier is based on the Bayes theorem to estimate the probability of
each class given the evidences.
𝑃(𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒|𝐶𝑙𝑎𝑠𝑠)𝑃(𝐶𝑙𝑎𝑠𝑠)
𝑃(𝐶𝑙𝑎𝑠𝑠|𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒) =
𝑃(𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒)
The evidence normally consists of a set of observations 𝐸 = (𝑒1 , 𝑒2 , … , 𝑒n ). If a single most likely class
is selected, the maximized probability 𝑀𝑎𝑥[𝑃(𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒|𝐶𝑙𝑎𝑠𝑠)𝑃(𝐶𝑙𝑎𝑠𝑠)] needs to be estimated for the
Bayesian classifier. In general, the hard part is estimating 𝑃(𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒|𝐶𝑙𝑎𝑠𝑠), and some assumptions
have to be made for the estimation.
Bayesian classifiers have been used for person’s skin detection by obtaining an approximate
classification of pixels in an image as skin or not skin based on the color attributes of each pixel.
Another application can be found in health field such drugs selection for patients and optimization of
treatment decisions.
Examples of Bayesian classifier applications in smart grids include among others: Diagnosis of fault in
different equipment, detection of inconsistent values in databases or sensors, detection of causes of
technical and non-technical losses of electricity, models for the prediction of energy demand, and
models for the prediction of energy generation.
3.4.3 Decision networks
A decision network is often called influence diagram has decision nodes that chosen by action nodes
and utility nodes additional to Bayesian networks to enable the rational decision making. Decision
networks have compact graphical and mathematical representation and efficient evaluation to help a
decision-making situation. The decision models should help the decision-maker to select the optimal
choice under uncertainty by maximizing the expected utility.
The influence diagrams as directed acyclic graphs, 𝐺 viewed as an extension of Bayesian networks,
incorporate with decision and utility nodes. Random nodes (𝑋), decision nodes (𝐷), and utility nodes
(𝑈)
 Random nodes (𝑋) are chance variables associated with CPT; they are represented as ovals.
 Decision nodes (𝐷) are variables that make decisions; they are represented as rectangles.
 Utility nodes (𝑈) are measures of possible outcomes, usually, decision-makers are trying to
maximize the utility; they are represented as the diamond.
53
An influence is denoted as an arrow, which connects the nodes described above and also expresses
relevant knowledge from a node to another.
A decision tree is a graphical representation of a decision problem, which is also complementary of
influence diagram. It consists of three types of nodes that represent decisions, uncertain events, and
results. Usually, an influence diagram has a much more compact representation than a decision tree.
These types of models are adequate to represent problems in which decisions have to be made with
uncertain information. Some applications are educational, medical, and industrial diagnostic systems,
such as student and tutor models to select tutorial actions in intelligent tutors given the current and
incomplete information of the context.
Decision networks can be used to model intelligent power grids. They can be seen as a complex and
uncertain system, where decisions can be done (for example, intelligent assistants in operation and
maintenance diagnostic systems). Another potential application is the energy market, supporting and
permitting both the suppliers and the consumers to be more flexible and sophisticated in their
operational strategies.
3.5 DEEP LEARNING

Deep learning has a long history and many aspirations in solving practical applications. Modern
practices involving deep networks, consisting of all of the most successful methods (Goodfellow,
Bengio, & Courville, Deep Learning, 2016). Usually, to find the parameters of a model that
corresponds to some desired functions, these methods are used for training. With enough training
data, this approach is very compelling.
Modern deep learning provides an essential supporting structure for supervised learning. A deep
network can represent functions of increasing complexity by adding more layers and more features
within a layer. Given sufficiently large models and large datasets of labeled training examples, the
mapping from features can be accomplished by deep learning. A shortcoming of the current state of
the art for industrial applications is that our learning algorithms require large amounts of supervised
data to achieve reasonable accuracy. That is also the reason that many active research projects try to
solve the shortcoming by using unsupervised deep-learning algorithms.
Many deep-learning algorithms are also designed to tackle unsupervised learning problems, but none
has truly solved the problem in the same way that deep learning has primarily solved supervised
learning problems for a wide variety of tasks (Goodfellow, Bengio, & Courville, Deep Learning, 2016).
The high dimensionality of the random variables is the main problem for unsupervised learning. This
brings two recognized challenges: a statistical challenge and a computational challenge.
Deep learning is a branch of machine learning using multiple processing layers, composed of various
linear and nonlinear transformations. In Figure 3-10, the shaded boxes indicate components that are
able to learn from data. The depth of a model includes the shaded boxes, representing the difference
from rule-based systems, classic machine learning, and simple representation learning (Goodfellow,
Bengio, & Courville, Deep learning, 2016). The deep learning has additional layers of more abstract
features. These additional paths could refine a sophisticated and accurate computing model. The
shaded boxes (layers) could be implemented, but is not limited to, the following:
1. Linear factor models
2. Autoencoders
3. Representation learning
4. Structured probabilistic models for deep learning
5. Monte Carlo methods
6. Confronting the partition function
7. Approximate inference
54
8. Deep generative models
Figure 3-10: Deep learning components
3.5.3 Potential applications in smart grid

Two recent deep learning applications on energy demand are described here. One implementation
uses deep learning methods to increase the accuracy of the estimated building energy demands and
user behavior. Reference (Mocanu, Nguyen, Gibescu, & Kling, June 2016) investigates two newly
developed stochastic models for time series prediction of energy consumption, namely Conditional
Restricted Boltzmann Machine (CRBM) and Factored Conditional Restricted Boltzmann Machine
(FCRBM). By using layer-wise unsupervised learning, the ability to train deep architectures could help
achieve an accurate energy prediction.
Another application is energy disaggregation, which estimates appliance-by-appliance electricity
consumption from a single meter measuring the whole home’s electricity demand. Reference (Kelly &
Knottenbelt, November 4-5, 2015) adapts three deep neural network architectures to energy
disaggregation:
1) A form of the recurrent neural network called “long short-term memory.”
2) Denoising autoencoders.
3) A network that regresses the start time, end time, and average power demand of each
appliance activation.
With the unsupervised pre-training, unlabeled data from each house can be disaggregated.
3.6 VISUAL ANALYTICS

According to (Thomas & Cook, 2005), visual analytics is the science of analytical reasoning supported
by interactive visual interfaces. The emerging of data and visual analytics becomes necessary for data
users to understand better the relevant information contained in the data and make effective
decisions. The data-analytics techniques discussed in this section could be applied to the enormous
amounts of data that are being recorded in the electrical transmission grid. The combination of human
knowledge and data analytics is a key to achieving knowledge discovery and data mining in planning
and operating power transmission systems. In addition, providing interactive visual interfaces could
help analysts and system operators to get a better impression of possible symptoms and suspicious
behavior and understand power system performance to increase the situational awareness. Humans
may directly interact with the data analysis and be well informed by using advanced visual interfaces.
3.6.2 Related research areas and challenges
The list of research areas related to visual analytics includes information analytics, spatial-temporal
data analytics, scientific analytics, statistical analytics, knowledge discovery, data management,
knowledge representation, cognitive and perceptual science, and interaction. Visual analytics can be
seen as an integrated approach combining visualization, human factors, and data analysis. Cognition
and perception are the human factors that play an important role in the communication between the
human and the computer, as well as in the decision-making process. Information visualization and
computer graphics often relate to the areas of visualization. Data management and knowledge
representation, as well as data mining are profited from methodologies developed in the fields of
data-analysis techniques. To make use of visual analytics to understand the behavior of electrical
transmission grids, visualization systems for power system analysis can be contributed to and reveal
55
future research directions in this emerging field. Reference (VISUAL-ANALYTICS.EU, n.d.) has more
research-related topics and projects that have been done or are still going on.
The visual-analytics research challenges could be categorized in the following areas: visualization
data, users, design, and technology, which include the challenges of:
 Dealing and integrating with huge, heterogeneous, variable quality datasets.
 Meeting the needs of the users.
 Assisting designers of visual analytic systems.
 Providing the necessary infrastructure technology.
3.6.3 The visual-analytics process
In order to gain knowledge from data, the combination of a computation and visual model tied with
human interaction suggests passing through the following visual analytics process. Figure 3-11 shows
an general overview of the different stages (represented by square blocks) and their transitions
(arrows) in the visual-analytics process (Keim, Kohlhammer, Ellis, & Mansmann, 2010).
Figure 3-11: The visual analytics process

First, data needs to be preprocessed and transformed to derive different representations for further
exploration. The preprocessing tasks may include data cleansing, normalization, grouping, or
integration of heterogeneous data sources. In many application scenarios, heterogeneous data
sources will need to be integrated before computational or visual model analysis methods can be
applied.
Once the data preprocessing task has been completed, the decision whether to apply computational
or visual model analysis methods is made. If a computational model analysis is used first, data-mining
methods are applied to generate models of the original data. Once a model is created, this model
must be evaluated and refined by interacting with the data. The loop between computational and
visual model methods through model visualization and model building could lead to continuous
refinement of models and verification of preliminary results.
Thus, model visualization can be used to evaluate the findings of the generated models. Interpreting
and verifying results at an early stage leads to better results and higher confidence. If a visual data
exploration is performed first, data mapping methods are applied to visual models of the data. The
interaction with a computational model analysis should also involve modifying parameters or selecting
other analysis algorithms. Findings in the visualizations can be used to direct model building in the
computational model analysis. The interaction with the visualization is intended to reveal a deep
understanding of information, for instance by considering data from different perspectives or zooming
in and out on different data areas. In conclusion, in the visual analytics process, knowledge can be
gained from visualization, computational analysis, as well as the interactions between visualizations,
models, and human. The knowledge would help to increase the situational awareness in many aspects
of operating the electrical transmission grid. In Section 4 of the applications of data analytics, more
details of visualization applications are explained.
56
3.7 REFERENCES
[1] EPRI, "Advanced Data Analytics Techniques: Analysis and Applications for Power System
Operation and Planning Support," Power Delivery & Utilization - Transmission, Jan 28, 2016.
[2] S. Chen, A. Onwuachumba, M. Musavi and P. Lerley, "A Quantification Index for Power Systems
Transient Stability," Energies 2017, 10, 984.
[3] iTesla (Innovative Tools for Electrical System Security within Large Area), "Deliverable D2.4
Data mining methods - Uncertainties modeling for offline and online security assessment," July
29, 2013.
[4] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning: with
Applications in R., Springer, 2013.
[5] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning, Springer, 2009.
[6] C. Rao, H. Toutenburg, Shalabh and C. Heumann, Linear Models and Generalizations - Least
Squares and Alternatives, Springer, 2008.
[7] P. Ma and X. Sun, "Leveraging for big data regression," WIREs Comput Stat, vol. 7, no. 1, p.
70–76, 2015.
[8] J. Ma and J. Cheng, "Estimation of the building energy use intensity in the urban scale by
integrating GIS and big data technology," Applied Energy, vol. 183, pp. 182-192, 2016.
[9] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to

Algorithms, Cambridge University Press, 2014.
[10] W. Chung, "Using the fuzzy linear regression method to benchmark the energy efficiency of
commercial buildings," Applied Energy, vol. 95, pp. 45-49, 2012.
[11] J. Lam, K. Wan, D. Liu and C. Tsang, "Multiple regression models for energy use in air-
conditioned office buildings in different climates," Energy Conversion and Management, vol. 51,
pp. 2692-2697, 2010.
[12] D. Hsu, "Identifying key variables and interactions in statistical models of building energy
consumption using regularization," Energy, vol. 83, pp. 144-155, 2015.
[13] S. Asadi, S. Shams and M. Mohammad, "On the development of multi-linear regression analysis
to assess energy consumption in the early stages of building design," Energy and Buildings, vol.
85, pp. 246-255, 2014.
[14] T. Walter and M. Sohn, "A regression-based approach to estimating retrofit savings using the
Building Performance Database," Applied Energy, vol. 179, pp. 996-1005, 2016.
[15] M. Braun, H. Altan and S. Beck, "Using regression analysis to predict the future energy
consumption of a supermarket in the UK," Applied Energy, vol. 130, pp. 305-313, 2014.
[16] G. Huebner, I. Hamilton, Z. Chalabi, D. Shipworth and T. Oreszczyn, "Explaining domestic

energy consumption - The comparative contribution of building factors, socio-demographics,
behaviours and attitudes," Applied Energy, vol. 159, pp. 589-600, 2015.
[17] T. Hong, M. Gui, M. Baran and H. Lee, "Modeling and Forecasting Hourly Electric Load by
Multiple Linear Regression with Interactions," in Power and Energy Society General Meeting,
2010.
[18] K. Kandananond, "Forecasting Electricity Demand in Thailand with an Artificial Neural Network
Approach," Energies, vol. 4, pp. 1246-1257, 2011.
[19] G. Tso and K. Yau, "Predicting electricity energy consumption: A comparison of regression
analysis, decision tree and neural networks," Energy, vol. 32, pp. 1761-1768, 2007.
57
[20] S. Halilcevic, A. Gubina, B. Strmcnik and F. Gubina, "Multiple regression models as identifiers of
power system weak points," Generation Transmission and Distribution, vol. 153, no. 2, pp. 211-
216, 2006.
[21] L. Rokach and O. Maimon, Data Mining with Decision Trees, Link, Singapore: World Scientific
Publishing Co., 2015.
[22] Z. Yu, F. Haghighat, B. Fung and H. Yoshino, "A decision tree method for building energy
demand modeling," Energy and Buildings, vol. 42, pp. 1637-1646, 2010.
[23] D. Graupe, Principles of Artificial Neural Network, World Scientific, Sep 2013.
[24] S. A. Kalogirou, "Artificial neural networks in renewable energy systems applications: a review,"
Renewable and Sustainable Energy Reviews, vol. 5, no. 4, pp. 373-401, Dec 2001.
[25] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, Dec 13,
1994.
[26] S. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice Hall, 1999.
[27] D. E. Rumelhart, G. E. Hinton and R. J. Williams, "Learning internal representations by error

propagation," in Parallel distributed processing: explorations in the microstructure of cognition,
vol. 1, MIT Press, 1986, pp. 318-362.
[28] J. F. Kreider, X. A. Wang, D. Anderson and J. Dow, "Expert systems, neural networks and
artificial intelligence applications in commercial building HVAC operations," Automation in
Construction, vol. 1, no. 3, pp. 225-238, December 1992.
[29] P. S. Curtiss, M. J. Brandemuehl and J. F. Kreider, "Energy management in central HVAC plants
using neural networks," ASHRAE Transactions, vol. 100, no. 1, pp. 476-493, January 1994.
[30] B. Muller and H. Keller, "Neural networks for combustion process modelling," in Proc of the Int
Conf EANN '96, London, UK, 1996.
[31] S. Milanic and R. Karba, "Neural network models for predictive control of a thermal plant," in
Proc of the Int Conf EANN '96, London, UK, 1996.
[32] J. K. Mandal, A. K. Sinha and G. Parthasarathy, "Application of recurrent neural network for
short term load forecasting in electric power system," in Proc of the IEEE Int Conf ICNN '95,
Perth, Western Australia, 1995.
[33] A. Khotanzad, A. Abaye and D. Maratukulam, "An adaptive and modular recurrent neural
network based power system load forecaster," in Proc of the IEEE Int Conf ICNN '95, Perth,
Western Australia, 1995.
[34] D. Datta and S. A. Tassou, "Energy management in supermarkets through electrical load
prediction," in Proc of the First Int Conf on Energy and Environment, Limassol Cyprus, 1997.
[35] D. Keles, J. Scelle, F. Paraschiv and W. Fichtner, "Extended forecast methods for day-ahead
electricity spot prices applying artificial neural networks," Applied Energy, vol. 162, pp. 218-230,
2016.
[36] A. A. Aburomman and M. B. I. Reaz, "A survey of intrusion detection systems based on
ensemble and hybrid classifiers," Computers & Security, vol. 65, pp. 135-152, March 2017.
[37] B. Scholkopf and A. Smola, Learning with Kernels, The MIT Press, 2002.
[38] M. Singh, B. Panigrahi and R. Maheshwari, "Transmission Line Fault Detection and
Classification," in International Conference on Emerging Trends in Electrical and Computer
Technology (ICETECT), 2011.
58
[39] P. Ray and D. Mishra, "Support vector machine based fault classification and location of a long
transmission line," Engineering Science and Technology, an International Journal, vol. 19, pp.
1368-1380, 2016.
[40] G. Gan, C. Ma and J. Wu, Data Clustering: Theory, Algorithms, and Applications, SIAM, 2007.
[41] J. Wu, Advances in K-means Clustering, Springer Berlin Heidelberg, 2012.
[42] L. Kaufman and P. Rousseeuw, Finding Groups in Data, John Wiley & Sons, Inc., 1990.
[43] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.
[44] I. Goodfellow, Y. Bengio and A. Courville, Deep learning, MIT Press, 2016.
[45] E. Mocanu, P. Nguyen, M. Gibescu and W. Kling, "Deep learning for estimating building energy
consumption," Sustainable Energy, Grids and Networks, vol. 6, pp. 91-99, June 2016.
[46] J. Kelly and W. Knottenbelt, "Neual NILM: Deep neural networks applied to energy
disaggregation," in ACM BuildSys' 15, Seoul, November 4-5, 2015.
[47] J. J. Thomas and K. A. Cook, Illuminating the Path: Research and Development Agenda for
Visual Analytics, IEEE-Press, 2005.
[48] "VISUAL-ANALYTICS.EU," [Online]. Available: http://www.visual-analytics.eu/related/] .

[Accessed 31 August 2017].
[49] D. Keim, J. Kohlhammer, G. Ellis and F. Mansmann, Mastering the Information Age Solving
Problems with Visual Analytics, Eurographics Association, 2010.
59
4. APPLICATIONS OF DATA ANALYTICS IN SYSTEM

OPERATIONS
4.1 INTRODUCTION
In general terms, it can be stated that adoption of data analytics to support power system operations
has been getting a late start as compared with other industries, or even other business areas in the
electricity industry. However, as digital innovations grow in the electricity sector, companies are
beginning to adopt and adapt. Because the availability of large volumes of data from sensors and
devices in the power grid is growing exponentially, big data analytics techniques that are already
applied in other industries now find their way to power systems. Over the past few years, major
companies have started their big data projects and are competing to bring a set of IT tools to the
market that are largely new to the utility industry.
This section describes applications in power systems of the various analytics methodologies described
in Section 3, with the focus on tools and techniques that use various sources of data to improve
situational awareness and provide operation decision support. The description includes not only fully
mature technologies being used in production mode in control rooms but also technologies in early
stages of development, in the following main areas:
 Visualization analytics and technologies
 Tools for system events detection, faults identification, and analysis
 Wide-area monitoring
 Equipment-health monitoring
 Trending and forecast
 Operational decision support
4.2 DATA VISUALIZATIONS IN REAL-TIME SYSTEM OPERATION
The way data and information have been displayed and exposed to operators has evolved to a great
extent over the years, as the technology has evolved. Energy management systems (EMS) were
created to manage the physical flow of electricity in the grid following the 1965 blackout. During
1960s and 1970s, system visualization was based on analog computers and hardwired systems. For
this system, sources of visual information were local, and displays of visual information were typically
quite static. Circuits had to be switched to present different system information.
Starting from the 1980s, with the birth of affordable digital computers, visualization in control rooms
became networked and software-driven. Within the network, sources of visualization information
could be anywhere and shared elsewhere—only communication packets had to be switched to get a
different set of measurements. However, the visual information displayed was still typically quite
static.
Presently, as discussed before, large volumes of data from smart grid devices and distributed
generation in the power grid are growing exponentially, and the need to further advance situational-
awareness tools is greater now than ever before. Old-fashioned static visualization tools (using logs
and tables) could hardly harvest the fruit of this new technology trend to improve situational
awareness of the operating system. Emerging platforms include geographic-based dynamic
visualization with user-friendly interfaces and real-time measurements and analytical results from
measurement-based and model-based tools that populate the system map.
This subsection focuses on how data is typically visualized in control centers of grid operators. Given
the vast amount of work in this area, a detailed description of all currently available visualization tools
and platforms for system operations is beyond the scope of this work. Therefore, the intention is to
present a brief overview of main visualization approaches used in control centers, as well as a
description of new trends and emerging visualization technologies. The interested reader can find in
reference [1] a comprehensive survey of the state of the art of visualization products offered by
numerous vendors and developers, and assessment of the effectiveness of the various approaches
with recommendations for developing a visualization strategy.
61
4.2.1 Visualization technologies in control centers

The traditional control center has a graphical representation of the present state of the network and
the generation that is directly connected to the transmission grid. Such representation usually includes
some kind of general view available to all operators and more detailed views in each workstation, as
shown in Figure 4-1.
Figure 4-1: Overview of a control center monitor display
The various displays are intended to expose real-time conditions of the grid, as well as trends of
relevant system variables, to help power system operators maintain adequate situational awareness
and respond to conditions potentially threatening the system stability in an expedited manner.
The way in which system data is presented to the operator can support the strengths and reduce the
effects of limitations of human perception and performance, thereby enhancing operator situational
awareness. As explained in Section 3, there are several principles of display design that help to
understand how humans detect, process, interpret, and act on information[1]. Some ground rules are:
 A display should look like the variable that it represents.
 Processing a large set of information can be facilitated by dividing this information across several
resources (e.g. using both visual and auditory information) and minimizing the cost in time or
effort to “move” selective attention from one display location to another to access information.
These principles lay the foundations for how to design a human-machine interface that satisfies the
needs of human abilities to process information and prevents the negative consequences of cognitive
biases. A description of the main components of the human information-processing system, and how
they apply for display design in system operations, can be found in [1]. Generally, the key driver for
the selection of the appropriate visualization display depends on the task at hand. For example, if one
wants to understand the overall voltage variation across a region, then contours can be quite
effective, but if the exact voltage to three decimal points is needed, then a numerical display is more
appropriate.
Many techniques have already been applied to the field of power system visualization, with some of
them described in the following subsections.
4.2.1.1 Schematic network diagrams (one-line/single diagram)
A schematic network diagram is a simplified notation for visualizing an electrical power system.
Elements on the diagram do not represent the physical size or location of the electrical equipment.
The display is optimized to provide the user a good overview of the network topology.
62
Figure 4-2: Examples of schematic network diagrams

Both types of network diagrams provide:
 A consistent visualization and interaction possibilities, such as for alarms, outage areas, zoom,
pan, switching, and adding notes.
 The ability to toggle between schematic and geospatial views.
 The possibility to open a new network diagram from an existing one (example: open schematic
view from geospatial view).
4.2.1.2 Contouring
The use of color contouring on one-line diagrams is a common technique to highlight feature that
attracts attention to a particular area within a display, thus reducing the size of the search space and
for facilitating target detection. A wide variety of different color maps are possible, utilizing either a
continuous or discrete scaling. Color contours take advantage of the fact that as humans, we perceive
the world in patterns. Hence the speed in which color codes can be interpreted and compared is often
faster than numeric processing.
The use of discrete symbols for visualization can be quite helpful, provided that the number of
individual symbols is relatively low (less than several hundred). However, as the number of values
grows, the displays eventually become too cluttered, making it difficult to detect any underlying
patterns.
To avoid that problem, color mapping can be designed not to cover the entire data range but rather
to highlight values within a particular range of interest. As an example, Figure 4-3 shows the use of
contouring to just highlight voltages that are below 0.98 per unit for a case in the Northeast region of
the U.S. Eastern Interconnection [3].
63
Figure 4-3: Contour showing voltage magnitudes with values below 0.98 per unit
It is important to carefully select the colors that are used to represent the different elements or
conditions as to avoid potential covering or camouflaging other important information [2].
Transparency is also used in some cases for this purpose.
Contour gradients are another variation of contouring used to represent and compare to classes of
values. Thus, the operator can identify significant deviations in the network at a single glance, as well
as their location and severity. In the example shown in Figure 4-4, generation infeed from renewables
is visualized such that red and yellow spots represent a higher generation, while green spots indicate
a lower generation compared with the current (in the cited case, today’s) schedule [28].
Figure 4-4: Examples of contour gradients for continuous values

2D bubbles
A new visualization strategy focuses on highlighting significant deviations from normal states. In
contrast to contours that melt into one another, the 2D view shows only single-colored bubbles with a
fixed radius around a bus. By only coloring the specific area around the bus, misunderstandings can
be avoided. Examples are illustrated in Figure 4-5.
Another key principle is that bubbles are only shown when there is a deviation that needs the user’s
attention—similar to putting the spotlight on. The colors indicate the type of deviation (e.g. high
voltage = yellow, low voltage = orange). The severity is shown by the color density, with low density
64
indicating upcoming problems and high density representing severe problems. Violated limits are
coded additionally by showing a small black ring. With increasing excess of the limit value, the bubble
grows beyond the initial radius, creating a halo effect around the black ring.
Feeders that are connected to a violated bus bar are highlighted in the same color as the bubble to
indicate the impact of the deviation on the network. The operator receives a first-level indication of a
potential problem on a feeder, even if the feeder itself does not have real-time measurements [27].
Figure 4-5: Situational awareness by 2D bubbles
4.2.1.3 Three-dimensional visualization

The key advantage of 3D is its ability to show the relationships between multiple variables. Usually in
3D visualizations, the third dimension is used with some abstract objects, such as a cylinder, in which
attributes of the objects such as size and coloring correspond to the value of an underlying variable.
For example, to provide situational awareness related to voltage security, one is often interested in
knowing both the location and magnitude of any low system voltages, and also the current reactive
power output and the reactive reserves of the generators and capacitors in the cylinder form with
different colors. Such a situation is illustrated as an example in Figure 4-6.
As stated in reference [1], the potential advantages of 3D graphic displays over 2D numeric displays
are significant. The added dimension and pictorial enhancements may provide, among others, the
following benefits: increase the amount of information that can be presented on standard display
screens, assist in navigation and search activities, and facilitate more accurate mental models of the
systems being manipulated.
65
Figure 4-6: 3D display showing bus voltages and generator reserves
3 D cones
This an extension of the 2D bubbles visualization presented above, where 3D transparent cones are
used to display variables of interest with less obfuscation of other parameters, as illustrated in Figure
4-7Figure 4-8. The height of the cone indicates the severity of the violation, so non-critical deviations
(limit not yet violated) stay flat for indicating potentially upcoming but not yet critical problems. In
addition, the pointing direction is showing the type of violation. Hence, low voltage violations as cones
pointing downwards with high voltage violations as cones pointing upwards. Each cone matches a
bubble with a circle in the 2D view, supporting the user’s orientation in the system. The described
principle can be applied to multiple other scenarios, for example for representing areas with very low
versus very high demand, or outage indices such as customer average interruption duration index
(CAIDI) [27].
Figure 4-7: Example of situation awareness by 3D cones
66
Figure 4-8: Example of situation awareness by 3D cones
4.2.1.4 Animation
Some visualization tools provide the option to display animated vectors to visualize power system
dynamics. Figure 4-9 shows an example of animation of power flows, where the direction of the
transmission line animation corresponds to the direction of flow in the physical system [28].
In this figure, animated power flow arrows display profiles for active and reactive power flows. On
demand, the user can turn the animation for a window off or on. Further, some tools offer the ability
to define thresholds based on percent of thermal limits or other parameters, which can be set so that
flows get animated only when they approach alarming levels. Animated arrows are used to visualize
MW and MVAR flows to identify loop flows or other abnormal patterns.
According to the results of human factor experiment conducted by D. A. Wiegmann and other
researchers [4], animation in power system displays can be very effective to help operators interpret
displays by directing their attention to the most important information for a particular task or
situation. It also enhances an operator’s understanding of system behavior. If properly configured, it
also assists an operator to better assess current system states and the causal factors that underline
those states, decide on mitigation measures if a violation of system resources occurs, and provide
immediate feedback regarding the effectiveness of implemented measures.
Figure 4-9: Example of animated power flow arrows in distribution feeders
4.2.1.5 Renewables and dispersed generation

Most countries have seen mass introduction of renewables (wind and solar) in their systems,
sometimes as big wind and solar parks directly connected to the transmission grid, but mostly as
dispersed generation installed in the distribution networks. This widespread installation of dispersed
generation has an impact of the operation of the transmission network. The challenge is threefold:
 How to gather the information from DSOs and from weather forecasts.
 How to predict the behavior of this kind of generation and its impact on the system.
67
 How to present this information to the operators in a practical way.

Some examples of visualizations of renewables are given in Figure 4-10 and Figure 4-11.
Figure 4-10: Visualization of dispersed generation in operator workstation at RTE
E
Figure 4-11: Visualization of dispersed generation in the general panel at Red Eléctrica del España
4.2.1.6 Geospatial representation

One of the most important recent developments in the visualization field has been the integration of
power system information with GIS-based displays, in which the power system is drawn superimposed
over other geographic information. These systems may integrate various displays of data information
such as satellite images, weather, road maps, and other infrastructures into a common display, to
facilitate interpretation of the correlation between the different elements involved in the problem
being assessed. Indeed, the coordinates of graphical elements have a relation to the real world. The
user can view the geographic position of equipment, outages, and so on. Furthermore, the user can
view streets, customers, land usage, or other images. Figure 4-12 shows an example of geospatial
network diagrams.
68
Figure 4-12: Examples for geopatial network diagrams

They are quite useful for particular uses. One such example is fault location. In that case, the ability
to couple the fault location, expressed as a distance from the terminals of the line, with road maps
and/or satellite imagery can be quite helpful in dispatching repair crews [1]. Another example is when
the GIS integration is used to show the relationship between different infrastructures such as is done
with the Oak Ridge National Laboratory VERDE (Visualizing Energy Resources Dynamically on Earth)
platform. VERDE is a software application that uses the Google Earth platform to provide real-time
visualization of the electric power grid. Its capabilities include line descriptions and status of outage
lines; geospatial-temporal information and impacts on population, transportation, and infrastructure;
analysis and predictions results; and weather impacts and overlays [5].
The assessment of visualization technology presented in reference [1] concludes that even though
GIS integrated tools produce impressive graphic representations, they are not necessarily the best
alternative to help an operator understand the state of the electric power system, especially in control
of a transmission system.
The reason is that, for example, in wide-area power system operating condition visualizations, the
locations of elements of greatest interest electrically, such as substations and power plants, usually
have a very small geographic footprint as compared to the entire area. Hence, their representation in
exact geographical coordinates may not be useful. Also, the urban areas that contain much of the
electric infrastructure are likewise relatively small geographically. Transmission lines sharing a
common corridor or even transposed on a single tower would likewise be difficult to differentiate in a
pure GIS representation because they are essentially in the same location. Finally, some traditional
one-line elements, such as aggregate loads, are often spread over a large geographic area and hence
would be difficult to define precisely. Traditional control center map boards that focus on display
topological representation of the network, with pseudo-geographical coordinates in some cases, seem
to be the preferred option.
4.2.1.7 Integrated system view
Dynamic icons assigned to specific incidents will be displayed at the network diagrams. The icon pops
up at the topological diagram position, where the incident occurred. The user can access further
details on the individual elements on demand. By clicking an icon, a small info box will pop up,
showing the most relevant information as well as a link to further information such as the complete
outage record. The operator can open multiple info boxes at the same time for comparing
information. For example, for the task “reviewing outage details and location,” the operator needs to
get a comprehensive overview on the network diagram regarding:
 The outage location (as determined by outage management prediction engine)
 Trouble calls from customers
 Crews and their availability
 Critical customers such as hospitals and police stations
69
 Indicators for fault locations (derived by network analysis)

 Planned maintenance work in the same area
Figure 4-13 shows three screenshots of a tool that produce integrated system view [28].
Figure 4-13: Integrated system view with Icons and Info boxes
4.2.1.8 Display profiles

With display profiles, it is determined what type of information is an additional layer on the network
diagram, as shown in Figure 4-14 and Figure 4-15 [28]. Thus, the system administrator configures
display profiles for the specific operator workflows. An operator can switch between the display
profiles on the fly.
Display profiles could contain:
 One or multiple types of visualizations
 Show/hide topological coloring
 Level of background opacity
Selection of profiles Outage Information Network analysis

Figure 4-14: Distribution network visualization
Voltage violations Network analysis
Selection of profiles
Figure 4-15: Distribution network visualization
70
4.2.2 Example of control room visualization at ISO – ERCOT case

This section presents an example of modern visualization technology implemented at the control room
of the Electric Reliability Council of Texas (ERCOT). ERCOT manages the flow of electric power to 24
million Texas customers, representing about 90 percent of the state’s electric load. As the
independent system operator for the region, ERCOT schedules power on an electric grid that connects
more than 46,500 miles of transmission lines and 570+ generation units.
The ERCOT control room sees an enormous amount of data flow each day. In order to manage the
data, extract useful information from it, and properly present it to the operators, ERCOT uses
standardized display building software. This brings the following benefits [6]:
 Enhances the ability to customize displays that are dynamically updated by the underlying model.
 Adopts an industry standard display building process.
 Is more secure for the production environment while allowing easier access for users.
Display Principles
To ensure that the data being presented to the operators is relevant to the issues of real-time
operations, the information to be displayed must be carefully selected.
1. Indicators of system state/health:
Good indicators of system performance need to be developed and critical functions identified. The
displays should use these indicators and functions to summarize the state of the system, with the
ability to show detailed information on-demand.
2. Alerts and alarms:
Logical and consistent displays should be developed to show alerts and alarms for violations of
metrics associated with indicators of system health.
3. Sources of data:
Multiple systems provide data to the control room, and related data from these varied sources
should be compiled together to ensure a holistic view of the system.
4. Division of responsibility:
The ERCOT control room has eight “desks,” each administered by an operator. Each of the
functions carried out by the eight desks requires its own set of displays leading to an enormous
inflow of information to the control room. The information needs to be organized in terms of the
function(s) it helps address. The desks are:
a. Real-time:
 Ensures that frequency within the ERCOT system remains within the tolerances specified
by the protocols and NERC.
 Monitors the health of the security-constrained economic dispatch (SCED) application and
validates the reasonableness of the solution.
 Verifies the quality of load forecast data and switches sources when necessary.
b. Transmission and Security:
 Analyzes base case and post-contingency constraints and takes actions to maintain system
reliability.
 Responsible for ensuring that the ERCOT system is operated so that instability,
uncontrolled separation, or cascading outages will not occur.
 Updates stability limits for all ERCOT generic transmission constraints (GTCs) every 10
minutes.
c. Resource Operations:
 Monitors ancillary service levels and executes a supplementary ancillary services market
(SASM) when necessary.
 Deploys and recalls reserves as system requirements change.
 Manages planned, maintenance, and forced outages for generation resources.
d. Shift Engineer:
71
 Works closely with the ERCOT control room system operators, providing round-the-clock
support for analysis and system applications.
 Develops and authors congestion management plans for mitigation of temporary and
ongoing grid vulnerabilities.
 Gathers relevant and accurate information about grid events and communicates that
information in a timely manner to the shift supervisor and engineering support groups.
e. Shift Supervisor:
 Monitors the operation of all desks in the control room.
 Continually reviews and analyzes system security.
 Provides the primary point of communication with ERCOT management and market
participants.
f. DC Ties:
 Schedules and monitors energy transactions into and out of the ERCOT control area across
the asynchronous DC ties.
 Coordinates the import of emergency energy across the DC ties into the ERCOT control
area during emergency operations.
g. Reliability Unit Commitment (RUC):
 Oversees the weekly reliability unit commitment (WRUC), day-ahead reliability commitment
(DRUC), and hourly reliability unit commitment (HRUC) processes.
 Performs hourly studies to identify potential voltage problems on the ERCOT system.
 Responds to inquiries about RUC commitments.
h. Reliability Risk:
 Coordinates with the RUC, real-time, transmission and security, resource, operations, shift
supervisor, and other ERCOT operators as necessary to maintain grid reliability .
 Responsible for the safe and efficient operation of all intermittent renewable resource
(IRR) generation assets.
 Responds to inquiries about intermittent generation dispatch, wind and solar forecast,
operations, curtailments, and other related tasks.
Figure 4-16 is an overview of the control room main visualization board and various displays. Figure
4-17 shows displays related to generation and load information. The graphic on the right displays in a
graphical information non-spin and quick-units, to facilitate interpretation by system operators.
Figure 4-16: ERCOT control room - 2016
72
Figure 4-17: ERCOT control room - load and generation details display and quick start/non-spin
graphs
Figure 4-18 is an overview of the visualization tool that is used to display wind generation details. It is
intended to provide the operator with breakdown of wind generation by zone, allowing the operator to
visualize current wind generation trends in one screen. Figure 4-19 is a snapshot of the real-time
sequence monitor, whose main objective is to summarize results from real-time security assessment
tools, such as state estimation and contingency analysis, with timer indicating last execution. It
provides the shift engineers and operators with alarms when real-time applications have not
successfully run.
Figure 4-20 is the system voltage overview display. It gives an overview of voltage levels at some
345-kV and 138-kV busses around the ERCOT system. It also alerts operators when voltage levels are
too high or too low and indicates what reactive devices can be put in service to help control voltage.
Figure 4-18: ERCOT control room – wind generation
73
Figure 4-19: ERCOT control room - real-time sequence monitor
Figure 4-20: ERCOT control room - system voltage overview display
4.2.3 Emerging trends in control room visualization

4.2.3.1 Space and time visualization
In most cases, control centers visualizations represent a network topology and generation state
(space). However, a system operator should rather analyze and prepare for the near future (time). A
new paradigm of visualization is needed to integrate space and time in a practical way.
Because of the increased variability and volatility of system operation conditions caused by the growth
of renewable generation, increased power transfer through interconnections, demand changes, and
other factors, operators need to be informed in advance of potential risks. They need to be guided on
possible mitigation actions that can be taken and the time the actions should be executed to be
effective. Hence, visualizations should to be integrated with the various data processing and analytics
tools built into modern energy management systems (EMSs) to provide operators a comprehensive
assessment of system vulnerability and control actions, considering the following aspects:
 Situational awareness on first sight:
o Workflows of a specific user group have to be mapped in the visualization.
o Seamless interplay of applications need to reduce navigation effort during the workflow.
Visualization tool should integrate results from other operation support tools and allow
operator to easily navigate among them.
 Projection of future status:
o Integrated study environment (analysis of possible future events through simulation grid
models).
74
o Fully integrated day-ahead and intraday congestion forecast.

o Contingency analysis (N-n) to prevent thermal and voltage problems.
 Recommendation for operation:
o Provide incident patterns based on analysis of previous incidents or via simulations with
the grid model. The interaction of these incident patterns with the current grid image
helps an operator to understand the potential risk on the current operating condition.
4.2.3.2 Visualization for inter-area coordination
As the extension of power system interconnection increases, there is a significant need to improve
coordination with the many utilities, system operators, and other agents involved in system operation
and security. For example, in Europe several initiatives are underway to improve the coordination of
system operation among TSOs and across countries. Two main Regional System Coordination
Initiatives (RSCIs) exist today, namely: CORESO [7] and the Transmission System Operator Security
Cooperation (TSC) [8]. They operate a control room that gathers information from each TSO
participating in the initiative, and they carry security analysis for the whole area, 24 hours a day, 7
days a week.
These kinds of initiatives are generating new visualization needs that go beyond the physical
representation of the network. They need to aggregate a huge amount of data to present it in a
practical way to operators. Figure 4-21 provides an overview of the CORESO control room, where the
representation of the participant regions can be seen on the main wall map.
Figure 4-21: CORESO control room (www.coreso.eu)

In the United States, the situation is different. The North American system is split into three distinct
grids: the Eastern, Western, and Texas Interconnections. These grids are operated independently and
are only electrically tied together by several DC links. There is no single party responsible for
coordinating the operation of all these areas.
4.2.3.3 Time-driven situational awareness
Time-driven situational awareness is a new concept being developed by RTE in France. RTE calls this
system Apogeé. The objective is to create an application that will provide the operator a single user
interface based on the hyper-vision concept. The application is intended to help a system operator to
focus on the actions that he must take by presenting at the right time the relevant information that he
needs to make the right decisions. The system interfaces with the operator support tools available in
the control room. It filters and processes the data from these tools to generate key information and
schedule the time when that information will be presented to the operator through a dedicated
graphical interface.
75
To illustrate this idea, this section describes a forecast security tool, which will be part of the hyper-
vision system. On a rolling basis, the system gathers data from forecasting tools such as load,
renewable generation, and market results forecasts, as well outage planning, and combines that data
with the last SCADA snapshot to feed it into grid models that have been updated to represent
foreseen system conditions in the near-term future (few minutes, to 24 hours). The system performs
security analysis through data analytics and modelling tools. If a constraint or reliability issue is
detected, it assesses the effectiveness of possible remedial actions that have been used in the past in
similar situations, or that have been considered in grid studies to solve the type of problem being
analyzed. If no solution is found in the remedial action library for the foreseen constraint, the system
alerts the operator that a detailed evaluation needs to be performed to assess whether and when a
preventive action is to be taken to mitigate the security risk. In that case, the operator performs
studies to design a proper solution to the constraints in question and adds the solution to the remedial
actions library for future use. The system will alert the operator in a timely manner, so that the
mitigation actions can be effectively implemented.
The hyper-vision user interface remains empty as long as no potential unsecure conditions are
detected within the time horizon of the analysis. If a constraint is identified for any future operating
condition considered, it is displayed in the upper timeline along with the proposed remedial actions, as
shown in Figure 4-22. The operator has access to detailed information about the constraint and
results of the analysis.
Figure 4-22: Example of control actions displayed in the main interface of Apogeé
To monitor the process, a time-based supervision display that synthesizes the results of the forecasted
security analysis is also proposed. Figure 4-23 illustrates the concept (labels and names have been
obfuscated to protect sensitive information). The first column is an expandable tree representation.
The first level of the tree displays contingencies that result in constraints. For each of those
contingencies, the field can be expanded to show the second level, which contains further description
of the constraint and recommended remedial actions. The color code is as follows:
Type Color Meaning
Contingency Green Constraints are detected.
There is at least one effective remedial action.
Red Constraints are detected.
No effective remedial action found.
Constraint Black The constraint is detected.
Remedial action Green The remedial action is effective.
76
Red The remedial action is not effective.
Figure 4-23: Example of the time-based constraint display in Apogeé
4.3 DATA ANALYTICS IN SYSTEM OPERATION SUPPORT PROCESSES

The following use cases are among the most recognized ones in relation to energy data analytics to
support system operation:
 Real-time situational awareness with PMU data
 System event detection (detection of equipment failure or malfunction)
 Fault location and root cause analysis
 Real-time stability monitoring
 Alarm processing and filtering
 Renewable energy generation forecasting and storage analytics
 Damage prediction (weather related or due to other causes)
 Outage restoration analytics
 Grid optimization and power quality analytics (including voltage control)
 Peak load management (via demand-side management analytics)
 Load research analytics and energy portfolio management analytics
 Non-technical loss analytics
 Physical and cyber security assessment analytics
In the following sections, we present a brief description of these use cases, with references where
interested readers can find more details.
4.3.1 Real-time situational awareness with PMU data
Outstanding characteristics of Synchrophasor data, namely high resolution and time synchronization,
make it possible to monitor power system dynamic performance as well as grid stresses over a large
geographical area. Synchrophasor applications for on-line or near real-time operations enhance
situational awareness and help detect situations that can threaten reliability of the grid. On-line
applications include, among others, system electromechanical oscillations detection and evaluation of
associated damping, voltage, and angular stability assessment; voltage sensitivities with respect to
real and reactive power changes; display and analysis of voltage angle difference over wide
geographical areas; improved state estimation; islanding detection and monitoring; and event
detection [12].
Actionable information from these applications is useful when there is sufficient time for an operator
to take action to mitigate the threat. In cases where there would be not enough time for an operator
action, automatic corrective control should be designed and implemented. Even though there is a
77
great potential to use Synchrophasor data for automatic control, not many applications have been
successfully implemented. One of the main hurdles that prevents extended use of control applications
is data quality and availability. Apart from those main applications, analytics that use Synchrophasor
data have been developed to identify and diagnose a wide number of grid events, such as failing
potential transformer, capacitor bank switching issues, open phases on breakers, negative sequence
concerns, issues created by variable loads (such as arc furnaces), as well as generation and
transmission equipment misoperations [13].
Because of the advantages of Synchrophasor technology to enhance monitoring and situational
awareness of the grid, many electric systems have deployed a large number of PMUs across their
footprints. In the USA, for example, the Smart Grid program led by the Department of Energy in the
recent past resulted in the installation of PMUs at about 1000 substations and an extensive
communication infrastructure to collect and archive the data. Other countries, in particular China and
India, have also implemented plans for large deployment of PMUs and the corresponding
communication and computation infrastructure.
There is an abundant technical bibliography on Synchrophasor technology and applications.
Reference [14] gives a thorough update of Synchrophasor projects in North America, with detailed
explanation of the applications currently implemented. Reference [15] provides a methodology for
identifying and estimating the benefits of using Synchrophasor technology to enhance grid operations
and planning. Also, a wealth of technical information can be found at the North American
Synchrophasor Initiative (www.naspi.org). Several vendors provide platforms and software solutions
for various one-line applications of Synchrophasor s data [12][16].
The following visualizations examples represent standard applications of typical wide area monitoring
systems based on Synchrophasor technology:
4.3.1.1 Power swing recognition (PSR)
Wide area monitoring is getting more and more in focus in the control center visualizations. They will
be integrated in the standard visualizations to monitor the whole roundtrip from monitoring until the
automatic regulations.
The “power swing recognition,” also called “oscillation monitoring system” (OMS), can recognize,
evaluate, and display active power swings in the energy supply network. This ensures that power
swings that can be dangerous for network operation are recognized and reported automatically. A
power swing can be observed between two locations by evaluating the phase angle difference of the
PMUs involved, or at a single location by evaluating the active power determined there. If a power
swing measured in terms of phase-angle difference is present, the locations of both PMUs involved are
circled and the associated connection line of the same color is inserted between them (see connection
Paris – Rome in Figure 4-24 [29]). If a power swing measured in terms of active power at an
individual PMU is present, then the PMU where the measurement was made is marked with a circle
(see Copenhagen in the following figure). The assigned color represents the damping ratio and
amplitude quantities, which are required for a meaningful estimate of the actual degree of danger.
These quantities, coupled with the associated limiting values, give a degree of danger that forms the
basis for assessing the potential consequences of the detected power-swing event.
78
Figure 4-24: Swings in the map
4.3.1.2 Power system stability curve (PSS)

This curve displays the state of the complete power system. This kind of “fever curve” is calculated
from all available measured values for which the limiting values are defined. The user can assign
parameters for which measured values are to be included in the calculation. The curve is calculated
from the weighted distances between the measured values and their limiting values. The curve can be
displayed by defining the time range. It is divided into defined time steps (hours, for example). In the
on-line mode, the right end of the diagram shows the current value, as shown in Figure 4-25 [29].
Figure 4-25: power system status, change time range

If any of the actual limits is violated by any measurement, the PSS curve changes the color to red.
Trends to instability can be easily recognized by a rising level of the PSS curve. A customer can
optimize the settings of the limiting values, such that the PSS curve shows the appropriate sensitivity
of the power system.
4.3.1.3 Island state detection (ISD)
The objective of island state detection is to use the measured values of the frequency (f) and the rate
of change of frequency (df/dt) available in each PMU to determine whether separated networks have
formed. If there is islanding between two or more substations, then the detected islands are displayed
in the schematic display as colored areas. If only one substation is in the island, the area around the
substation is displayed as a square. For several substations, the area is displayed as a polygon, with
the substations as corners (see Figure 4-26 [29]).
In this example, four detected islands have:
1. Island (orange): Copenhagen
2. Island (blue): Paris - Nuremberg - Rome - Munich
79
3. Island (green): Muelheim

4. Island (violet): Vienna
Figure 4-26: Schematic display with recognized islands
4.3.1.4 Visualization of angle differences

The phase angle difference of the voltages between different PMUs can be displayed in graphical
form. The locations form triangles, such as Copenhagen – Paris – Rome, which are shown in color
(see Figure 4-27 [29]). In the figure below, Nuremberg is defined as the reference PMU and shown in
a white frame. If a PMU has a positive phase angle difference in comparison to the reference PMU,
the phase angle leads. If a PMU has a negative phase angle difference in comparison to the reference
PMU, the phase angle lags. Color deviations indicate angle differences.
Figure 4-27: Phase angle difference of the voltages between different PMUs
4.3.2 Fault identification, location, and analysis

In case of power system faults, protection and fault analysis engineers are often presented with event
recordings coming from various substations and also from multiple devices. There are several
challenges in processing these files efficiently in order to perform analysis and make decisions based
on the event data. Once this event data has been classified and prioritized, fault location analytics will
80
allow engineers to correctly select the affected transmission line, determining fault type and
performing fault location calculation.
The use of data-analytics techniques and tools for identification and classification of power system
events has been the subject of extensive research. It is probably one of the most studied areas for
the use of data analytics to support system operations. Reference [9] provides a survey of data-
analytics applications to support system operations, with emphasis in fault and event detection and
analysis.
Techniques used for fault detection, classification, and analysis include: Artificial neural network
(ANN), wavelet transform, support vector machine, k-Nearest Neighbor, decision tree, and association
rules. See section 3 in this document for an explanation of these techniques. Hybrid methods that
combine various techniques have also been developed. For example, reference [10] presents a new
method based on combined wavelet transform-extreme learning machine (WT-ELM) technique to
identify, classify, and locate a fault in a series-compensated transmission line.
Fault-diagnosis methods based on association rules can take both spatial and temporal characteristics
into account. The resulting set of rules could obtain a real-time model that helps to create a list of
preventive actions to be taken. A second suggested advantage of this approach is that the data is
provided by protection equipment like relays.
Sequences of events like voltage sags can be analyzed using pattern sequence discovery algorithms
that are an association rule-based method. The events that are measured in a measuring point have
an associated time of occurrence, and temporal patterns that occur with a sufficient frequency are
identified. The time spans between expected related events that are the result of this model can be
used for prediction and prevention of successive events.
In the same field of fault but a different specific objective, an application using advanced data-
analytics techniques has been proposed for fault-direction detection for protection purposes. It based
on Multilayer Feedforward Neural Network (MFNN). It is claimed that the proposed discriminator is
fast, robust, and accurate. And it is suitable for realizing an ultrafast directional comparison protection
of transmission lines [11].
As concluded in the study presented in [9], only a few of the developed data-analytics algorithms
have been implemented in production mode in electric utilities. One of the difficulties to fully
implement them is the need for appropriate communication and data integration infrastructure.
Certainly, a system intended to perform fault detection, classification, and location has to operate in
on-line mode with minimal manual interventions in place. This requires a communication system to
retrieve the appropriate data from relays, digital fault recorders, and any other necessary devices and
securely send this data to a centralized location where the algorithms are run. If the methodology
uses data from other sources as well (such as weather data), such data also needs to be timely
available and properly integrated with the electrical data for the analysis.
Example: Lightning correlation detection
One such use case is the real-time outage and lightning correlation described in [21]. In the event of
a feeder outage, the correlator process combines data on current network topology and geography,
circuit breaker operation, and lightning data from a lightning location service to show the affected
feeder area and the lightning strike that caused the outage of the feeder. This provides valuable
information to the dispatcher to coordinate the field crew and accelerate remedial activities.
81
Figure 4-28: Correlation of a feeder outage and lightning strike

Input data on lightning activity is provided by a lightning detection system (LDS), such as Euclid in
Europe or NLDN in the USA/Canada. Electrical grid data is provided from GIS (geographical
information system), where powerlines and switching elements, such as circuit breakers, are modeled.
The network topology processor (NTP) builds network topologies from GIS data. When it receives
changes in states of switching elements from SCADA, it rebuilds the topology and calculates the
supply areas for every switching element. It then processes whether any lightning events have
occurred in the vicinity of the powerlines of the supply areas. If the time of the lightning event and
change of switching state, taking into account the relay protection settings, correlate temporally and
spatially, the specific lightning event is proclaimed as the cause of the failure [21].
Example: Smart Cable Guard
DNV GL’s Smart Cable Guard [39] is an analytics-based advanced tool to locate faults in MV cables
while providing information of developing weaknesses in MV cables. This is done by detecting partial
discharges that are developing in these cables. Knowing the location of a defect within a 1% accuracy
of the cable length enables a network owner to replace the weak spot before it results in a
breakdown. This will help to reduce both the SAIDI and SAIFI and will enable network owners to plan
repair work (at optimal costs), and in most cases, it also enables them to better/faster identify the
root causes of defects. The system uses two time-synchronized sensors that capture and measure
data from the MV cable.
82
Figure 4-29: Smart Cable Guard system and web interface, showing the location of increasing
partial discharge activity over time
Example: OpenXDA platform
An open-source platform that integrates several applications has been developed by Grid Protection
Alliance (GPA) in the United States, under the auspice of several sponsors, including EPRI, electric
utilities, government agencies, and other research organizations. One of the applications is the
openXDA software, which is an extensible platform for processing events and trending records from
disturbance-monitoring equipment such as digital fault recorders (DFRs), relays, power quality meters,
and other power system intelligent electronic devices (IEDs). Open PQ Dashboard, another application
of the suite, provides visual displays to quickly convey the status and location of power quality
anomalies and other events throughout the electrical power system (see Figure 4-30 [22]). It is also
used to display results from openXDA in the geo-referenced visualization panel. Summary displays
start with the choice of a geospatial map-view or annunciator panel, both with visualizations for
across-the-room viewing fit for operations support center [22].
Figure 4-30: Example of open PQ Dashboard display
Root cause identification of faults

As described above, the classification of power system events has been the subject of extensive
research, of which both the identification of presence and of the faulty phases are the main focus. A
variety of methodologies and software tools have been developed and implemented to assist
operators and protection engineers to locate the fault in an accurate and fast manner. However,
methods to identify the possible root cause of a fault in on-line mode have not received the same
level of attention; consequently, algorithms and tools are not readily available. Adding information
about the underlying cause of a fault to the fault location process can be very beneficial to expedite
repairs and restore the faulted line to service, as well as optimize preparation work of crews. It also
can give an operator valuable information to decide and implement control actions. Thus far, there
has not been much work in identifying the underlying causes of events, but this gap in research is
expected to be filled by new research.
References [23] proposes a methodology to automatically identify the underlying cause of a fault in
transmission lines based on analysis of fault data recorded by different IEDs, as well as other non-
electrical data. Reference [24] proposes a similar approach that uses a machine learning approach to
classify faults as they occur in the system into preselected fault cause groups.
83
4.3.3 Real-time stability assessment

Voltage stability is always an operational reliability concern for modern power systems. Voltage
monitoring and control have been traditionally based on SCADA and EMS. Due to the inherent
limitations of these systems and applications, such as slow data sampling rate, slow data
communication rates, time-consuming computation, and model inaccuracies, a complete assessment
of system voltage stability condition may take several minutes to perform. Motivated by the advantage
of technology of Synchrophasor s and their wide installation in the current power systems around
world, PMU-based voltage stability monitoring applications are appearing now, which can improve the
power system voltage stability and security.
Figure 4-31: Example of Synchrophasor -based frequency stability monitoring

Monitoring methodologies based on the use of high-resolution PMU data are proven to be effective to
track dynamic performance of a power system in real time and provide understanding of the current
state of the system, including potential operating margins under varying system conditions. However,
these approaches cannot assess performance under contingency conditions or for changes in
operating scenarios.
As previously described, for improved situational awareness, system operators need a succinct view of
current operating conditions, in addition to an assessment of potential risks associated with expected
changes in load, topology, or generation, as well as unexpected events in the system (faults, changes
in renewables output). Therefore, it is clear that a security assessment tool based only on PMU data is
not sufficient. Various so-called hybrid approaches that combine traditional simulation methods with
PMU analytics have been proposed.
An R&D project performed under the auspice of the U.S. Department of Energy proposes an
integrated platform that combines high-performance dynamic simulation analysis tools and
Synchrophasor -based stability assessment algorithms. The project integrates the results to provide
real-time situational awareness, including available operating margins against major stability problems
[17][18]. Figure 4-32 depicts a high-level overview of the proposed framework for real-time dynamic
security assessment. Reference [18] provides the application of the framework through selected
illustrative examples.
84
Figure 4-32: Framework proposed in [17][18] for real-time dynamic security assessment combining
PMU data analytics and high performance dynamic simulation
4.3.4 Alarm processing and filtering

Many of the conventional alarm management systems as part of SCADA/EMS systems lack the ability
to analyze complex events efficiently within a time constraint. As a result, operators are overloaded
with alarms in the control rooms, which they will start ignoring. This has the risk that serious alarms
that are hidden in this list are overlooked. Analytics-based intelligent alarm processing can solve this
problem.
Examples that have appeared are an advanced alarm processor that combines alarm processing
techniques at both the substation automation system and the energy management system level. In
addition, fuzzy-reasoning petri-nets diagnosis models can take advantage of both expert system and
fuzzy logic [37][38].
4.3.5 Renewable energy generation forecasting and storage analytics
With a growing installed capacity of renewable energy plants comes a growing number of remote
monitoring solutions to track the performance of these plants. Enormous amounts of data are being
generated by renewable energy plants, and it is becoming ever important to create valuable insights
from this data. Big data analytics performed on the data collected from these plants enables owners
and O&M crews to operate the renewable plants at the plants’ maximum potential. Among all the
types of big data analytics that could be performed on the plant data, predictive analytics holds the
most promising of providing insights by leveraging performance data to create correlations and
outcomes.
Regression models can be applied to study impact on weather on electricity demand. The results can
be used to forecast demand. Variables like historic electricity demands, temperatures, humidity, GDP,
and population growth numbers can be taken into account.
85
Figure 4-33: Wind forecasting and optimization tools

Monitoring and optimization tools are becoming available for all types of renewable energy sources
from PV modules, PV inverters, wind farms/turbines, and the whole portfolio. This gives owners and
operators uniform access to, and analysis of, their operational data. It facilitates efficiency and
intelligent operational decisions and maximizes availability, efficiency, production, and financial return.
4.3.6 Damage prediction (weather related or due to other causes)
The impact of storms and other extreme weather events on utility services can be devastating.
Hurricane Sandy is a recent example of the enormous damages that storms can inflict on electrical
infrastructure (and on society and the economy). Quick response to these emergencies represents a
big challenge to electric power utilities. With the advent of the Smart Grid technology, utilities are
incorporating automation and sensing technologies in their grids and operation systems. This greatly
increases the amount of data collected during normal and storm conditions. These data, when
complemented with data from weather stations, storm forecasting systems, and online social media,
can be used in analysis in order to enhance storm preparedness for utilities.
For example, a flash alarm service produces alarms on lightning activity approaching the area of a
utility’s interest. The alarms are categorized into levels according to lightning activity distance to area
of interest. Areas of interest can be of any shape to suit different critical electrical assets, such as
substations, power line corridors, and communication facilities. The alarms are also used to warn the
crewmen working on the powerlines or in the substations of incoming danger of lightning strikes.
An illustrative implementation of a weather forecasting system was developed in the U.S. by Ameren
Missouri and Saint Louis University (SLU). The system called Quantum Weather®, which became fully
operational in 2008, is a storm-prediction system designed to improve the ability of an electric utility
to anticipate and respond to weather-related damage. It harnesses the data in Ameren Missouri’s
service territory from more than 100 strategically located weather stations and integrates the data
stations with data from other sensors and measurement devices to run a numerical weather-
prediction model in a high-performance computing platform [19]. The system has proven to be
effective in forewarning emergency response planners, prepare staff, and dispatch resources to where
they will be needed most.
Another example of damage prediction system is the model implemented by San Diego Gas & Electric
(SDG&E), which forecasts when strong winds will occur and determines how severe they will be and
how much risk they will pose. The main risk associated with those strong winds—the so-called Santa
Anas winds—is fire. Those winds that blow from the desert to the coast are dry and hot and may
convert any type of ignition source into wildfires. The prediction system, which uses data from 170
weather stations on SDG&E’s transmission and distribution systems, helps the utility to forecast where
the winds would be the strongest and to subsequently warn customers and position people in the
field, as well as determine the staffing levels that it would need for the duration of the winds. They
also use the real-time data during the windy condition to determine which circuits had lines that were
at greatest risk so that it could shut them off [20].
Damage prediction systems are not only applied to weather-related events. A tool has been developed
to predict and asses the risk related to damages to cables as a result of digging by construction
86
companies. Digging damage comprises a large part of low-voltage and medium-voltage power failures
in any distribution network, so prevention of cable digging damage is important. Based on various
data sources (location, soil, cable types, subcontractor track record, etc.), a predictive model is a very
useful tool to management the risk related to digging damage.
Figure 4-34: Digging damage prediction model
4.3.7 Outage restoration analytics

Outage restoration analytics helps utility distribution managers, outage managers, community liaisons,
and regulatory affairs managers in applying analytics to predict, prevent, detect, assess, and respond
to outages. It provides real-time situational awareness of unfolding outages, operational deployments,
and restoration progress during major events. Self-service means users get up-to-the-minute
information without distracting outage management operators from their tasks.
Outage restoration analytics tools will offer insight into historical performance, trends, and possible
root causes, helping companies to proactively reduce the number of outage events. It also enables
monitoring of KPIs to identify emerging issues before they become problems and simplifies reporting
on industry standard indices such as system reliability, IEEE reports, outage analysis, and crew history
reports.
For example, if a smart meter sends a “last gasp” to say it has lost power, the utility can determine
the meter-to-transformer relationship and assess the scale of the problem. Information can be passed
to the control center for visual display and creation of outage alerts that can be turned automatically
into work orders for field crews.
87
Figure 4-35: Smart meter based outage management
4.3.8 Power quality analytics (including voltage control)

Good power quality and uninterrupted power are extremely important goals at all utilities.
Compromised power quality can cause damage to costly electrical equipment, reduce productivity,
and—if severe enough—disrupt daily operations. Variations in power quality can result from voltage
spikes, swells, and sags; harmonic disturbances; and short and long interruptions of power lasting
from a few milliseconds to over two seconds. And any of these events can occur at any time.
Power quality data analytics is about collecting waveform-based power system data, extracting
information from it, and applying the findings to solve a wide variety of power system problems in
areas such as power quality, power system protection, equipment condition monitoring, and network
performance enhancement.
Power quality data analytics tools will combine electric power system recordings with data from
SCADA, GIS, and network models to provide estimated fault location and to send an alarm to the
operations personnel, reducing time to locate faults by hours.
Figure 4-36: Power quality analytics tool
88
4.3.9 Peak load management (via demand-side management analytics)

Peak load management, also known as demand side management, is the process of balancing the
supply of electricity on the network with the electrical load by adjusting or controlling the load rather
than the power station output. This can be achieved by direct intervention of the utility in real time,
by the use of frequency sensitive relays triggering the circuit breakers (ripple control), by time clocks,
or by using special tariffs to influence consumer behavior.
Peak load management enables utilities to reduce demand for electricity during peak usage times
(“peak shaving”), which can, in turn, reduce costs by eliminating the need for peaking power plants.
Analytics can help in identifying demand patterns and determining the factors that drive energy load.
4.3.10 Load research analytics and energy portfolio management analytics
Load research enables utilities to study the ways their customers use electricity, either in total or by
individual end uses. Load research analytics tools allow for aggregation of profiles created by domain
analysis to get an overall load shape of the territory and to calculate cost allocator statistics such as
peak demands and their dates.
Figure 4-37: Analyzing system load
4.3.11 Non-technical loss analytics

Non-technical losses (NTLs) include electricity theft, faulty meters, or billing errors. It can cause
significant harm to the economy. Some countries may range up to 40% of the total electricity
distributed. To detect NTLs, inspections of customers are carried out based on predictions.
Traditionally, these predications are based on calculations of the energy balance requiring topological
information of the network. This does not always work accurately. As network topology undergoes
continuous changes, analytics tools enable analysis of customer profiles, their data, and known
irregular behavior in order to trigger a possible inspection of a customer.
4.3.12 Physical and cyber security assessment analytics
While the electric power industry has developed mandatory reliability standards that help provide a
basis for grid reliability and resilience (e.g. NERC-CIP), grid modernization is introducing new
technologies that do not have well-defined standards. Advanced information and communication
technologies are being developed and deployed at a rapid pace to enable new system capabilities and
to support the integration of variable and distributed energy resources.
Technologies and capabilities to assess the “state of security” for the grid will be needed as cyber and
physical threats evolve. Cyber-physical models, analytical tools, and performance metrics can help
enable this capability to increase the security posture. Moving to real-time analytics and the ability to
co-simulate cyber and physical systems can help perform non-traditional contingency planning, such
as managing grid impacts of interruption to heating oil and propane deliveries.
While the energy sector has a well-established capability to plan for and survive physical
contingencies, it should also be able to survive physical contingencies that result from cyber incidents.
89
4.3.13 Dynamic assessment of transmission line capacity (dynamic line rating)

Transmission systems are constrained by the capacities of their transmission lines. Normally, utilities
use a static rating, which may vary by season or shorter periods, as the transmission line thermal
capacity limit. This rating is determined based on assumed weather conditions, which in most cases
are conservative. If instead the thermal rating of a transmission line is determined based on the
environmental parameters that the power line is operating in at any moment, additional capacity can
be available as compared to the static capacity. This is the concept of dynamic line rating (DLR) of
transmission lines. Applying DLR can enhance reliability, security, and economical operation by
enabling less constrained operation and timely mitigation action to avoid dangerous system security
conditions. The additional capacity allowed by DLR can be very useful to transfer sudden and short
lived increases in power flows from distributed energy resources, such as power from wind farms or
solar PV plants, thus improving integration or renewable generation. Dynamic ratings are often, but
not always, greater than static ratings. Demonstration projects conducted in the U.S. under the
auspices of the U.S. Department of Energy confirmed the presence of real-time capacity above the
static rating, in most instances with up to 25% additional usable capacity made available for system
operations [30].
DLR is very a well understood concept. Indeed, the science and technology of DLR has been in
development and deployment for over 35 years, and today several different DLR technologies are
commercially available [31]. The technology has evolved significantly since the first development in
the late 1970s. A new generation of DLR provides effective solutions for the most important
shortcomings of early generations, one of the most important ones being the ability to forecast line
rating in various timeframes. Certainly, additional transmission capacity from existing assets can
provide major benefits when such capacity is known in advance, not in real time.
The demonstration projects mentioned earlier revealed opportunities to enhance future DLR
deployments by ensuring the reliability of DLR data, addressing cybersecurity concerns, integrating
dynamic ratings into system operations, and verifying the financial benefits of DLR systems.
A wealth of reference material about DLR technology, case studies, and practical applications is
available in the literature, including several Cigre reports [32] [33]. Therefore, the intention of the
following subsections is not to duplicate readily available information but rather provide a brief
overview of some of the existing DLR technologies, with main emphasizes on the data analytics and
visualization aspects. References are provided for those seeking additional details.
4.3.13.1 SUMO system
An example of such a system is SUMO, a system for dynamic assessment of powerline capacity,
utilized at Slovenian TSO ELES [26].
Figure 4-38: Dynamic powerline capacity assessment

The SUMO system combines different subsystems into a meaningful and helpful power grid operating
tool. It comprises the following functions (Figure 4-39):
90
 Measurements: currents from SCADA, measured data from weather stations, gridded weather
data applied to micro locations using weather model and terrain data.
 Reliability analyses: N-1 analyses, line outage distribution factors (LODF) for power flow
calculations.
 Forecasts: short-term load flow forecasts and short-term weather forecasts for corridors of
power lines.
 Dynamic thermal ratings (DTR): calculations based on current weather and forecasted
weather (t0 ... t0+3h).
 Exceptional weather events.
 Visualization.
 Integration platform and data exchange: SUMO BUS.
ODIN VIS –
Visualization
Platform
SUMO BUS data

structures keep
ODIN measured and
server
calculated data
from different
SUMO subsystems
SCADA SUMO BUS

SUMO
DB
Physical
Conductor Data
Commercialy and Power Line
available Spatial (GIS)
DTR Data, System
ZM ONAP LF LODF DTR OIAP Configuration
subsystems ZM
DTR
Data
subsystems
Exceptional
Weather NOV Weather
assesment Load Flow Data
and forecast Calculations Forecast of Loads Load Flow for Notification
in Network Nodes N-1 state
Figure 4-39: SUMO architecture

Dynamic thermal ratings (DTR) module
The upper limit of a line capacity to transmit power flow is set by the weakest link in its path from a
source to destination node. In the case of calculating DLR, the power line is split into sections. Each
section has its own weather data, its own conductor physical properties, and its own geographical
orientation. So, for each section, the thermal rating needs to be calculated. For the entire power line,
the minimum thermal rating obtained is declared as the power line’s thermal rating.
Exceptional weather events
In the case of weather events, that could potentially lead to power line outages directly
(thunderstorms, high wind speeds) or indirectly (high air temperatures and consecutively low ratings),
the operator in charge is presented with warnings (Figure 4-40) on the live weather situation of the
power grid. Through this the operator is warned of certain weather situations that can cause a line
outage or impact its capacity. In case of a local thunderstorm, the operator can focus on the line in
question and re-assess its outage in detail also through other tools (e.g. SCADA, other load-flow
tools), thus confirming the outage’s influence on the rest of the system.
91
Symbol Meaning
Thunderstorm –
lightning activity
High wind speeds

(gale or storm)
High air
temperatures
Low air
temperatures
Extreme rainfall
Figure 4-40: Exceptional weather events Figure 4-41: Thunderstorm – lightning activity
and rainfall event notification
Visualization
The visualization provides the means to aggregate the vast amount of data in a convenient and easy-
to-understand manner. The results are presented in real time to dispatchers in the network control
center (NCC) via advanced visualization platform ODIN-VIS (Figure 4-42).
Figure 4-42: Visualization platform ODIN-VIS screenshot
On the center of the screen, a part of the transmission grid is shown. The power lines are colored
according to ratio of the actual current to the actual rating. On the right side of the screen is the
SUMO panel that for each power line shows the following:
 “Four quadrant” view of the relative line load:
o Upper left: actual line current versus actual line rating for actual network topology.
o Upper right: forecasted line current versus forecasted line rating for actual network
topology.
o Lower left: actual line current versus actual line rating for N-1 network topology.
o Lower right: forecasted line current versus forecasted line rating for N-1 network
topology.
 Exceptional weather events.
 N-1 power line – the power line in the transmission grid, when tripped, that causes the largest
rise of load on the power line of interest.
The quadrants are colored green if the ratio of the line current versus the rating is less than 90%. If
the ratio is between 90 and 100%, the quadrants are colored orange. If the ratio is 100% or more,
the quadrants are colored red and additionally show the safe remaining operating time.
92
4.3.13.2 Other dynamic line rating technologies

Lindsey Manufacturing Co. offers a transmission line dynamic rating and forecasting system using
real-time measured conductor data combined with reliability-based methods [34]. The ratings are
developed by actively learning how the conductor behaves with regard to conductor temperature,
weather, current, and the conductor’s exact clearance-to-ground. The system uses measurement data
continuously collected by self-powered, line-mounted monitors that can measure critical data directly.
This can include conductor current, conductor temperature, ground temperature, conductor vibration,
and the actual conductor-to-ground distance measurement via built-in LiDAR. The latter eliminates the
need for sag estimations.
Genscape’s LineVision Transmission Line Monitoring and Dynamic Line Rating system uses
electromagnetic field (EMF) to measure the line critical variables, including conductor clearance/sag,
line loading (current), conductor temperature, thermal rating, VARs, voltage excursions, and
conductor horizontal displacement (blowout). The main advantage is that the sensors are not installed
in the conductor but on the ground underneath the line. Therefore, they can be easily deployed under
critical lines without the need for line outages or installation crews [35]. Forecasting capabilities are
not yet available with this technology.
A different approach for DLR has been developed by LineAmps Systems. The system does not use
direct measurements from sensor to calculate the rating of a line; rather, it uses an expert system to
estimate line ampacity during steady state, dynamic state and transient conditions. It was developed
by the application of artificial intelligence using object-oriented knowledge base design of the power
line environment. The expert system provides hourly values of line ampacity up to seven days in
advance. The main advantage of this system is that it does not require the conductor temperature
sensors, meteorological sensors, or telecommunication system. It can be used for monitoring
overhead line conductors, underground cables, and substation equipment for over temperatures [36].
4.3.14 Cable thermal monitoring
This monitoring system shown in Figure 4-43 provides the ability to see up-to-the-minute calculations
of the capacity of circuits on the grid, identification of hotspots, and an indication of the time available
to resolve potential overloading. The circuit thermal monitor performs continuous calculations; should
an unexpected failure occur, the system automatically updates the control room with new capacity
and time estimates.)
Figure 4-43: National Grid (U.K.) cable thermal monitor
93
4.4 SUMMARY OF INDUSTRY SURVEY

A survey was prepared and distributed among both Cigre members and non-members across a
diverse geographical area with the intention to collect information about existing practices, research
and development efforts, and respondent opinions on the use of innovative data-analytics
methodologies and software tools for improving system operator decision support. The survey was
structured in five sections.
Section 1 - Basic Information: requests respondent and system information.
Section 2 - Opinion on the use of data-analytics techniques for transmission operation improvement.
Section 3 – Description of data-intensive applications for transmission operations.
Section 4 – Description on the use of asset/equipment health and condition information in the control
center to facilitate decision-making.
Section 5 – Description of visualization to support system operation.
Even though the questionnaire was sent out to a significantly large number of prospective responders,
only a small number of responses were received. Because of that, responses to the survey do not
allow for robust, generalized statistical assessment of the collected information. There is, however,
valuable information that can be extracted from the responses. Because of the way the survey was
set up, only Section 1 contains statistical-type information. Responses on the rest of the questionnaire
are descriptive and hence not intended for comparison or for statistical analysis.
In this section, we present insights gained from questions of Section 1 and a summary of responses to
other sections.
In Section 1, responders were asked for their opinions and visions regarding the use of data analytics
for system operations support. Two questions were presented for that purpose, which are reproduced
below for clarity.
Question 1: Responders were asked to indicate the extent to which they agree or disagree with each
of the following statements, by using the following 5-point rating scale: Strongly Agree – Agree –
Neutral – Disagree – Strongly Disagree.
Q1.1 Analytics practice in electric utilities is lagging other industries such as transportation,
healthcare and financial services, in terms of actual implementation
Q1.2 Data analytics technologies that use multiple data sources can play a significant role in
improving situational awareness tools
Q1.3 The value and accuracy of data analytics solutions that integrate various data sources is
not well understood, and that affect implementation and adoption of the data analytics
technology
Q1.4 There is a need to develop standardized data structures and data models for effective
deployment of enterprise’s data analytics capability
Q1.5 My organization is planning to introduce new data-analytics techniques and tools for
transmission operation improvement within the next 2 to 4 years
Q1.6 Data quality issues is a major barrier for wide spread use of Synchrophasor data to
improve system operations
The responses to these questions are shown in Figure 4-44. It can be observed in this figure that for
the first two statements, responses are equally divided between those who agree or strongly agree
with the statement and those who don’t have a strong opinion about it (neutral). None of the
responders seems to disagree. Responses for Q1.4 indicate that there is strong consensus among the
responders about the need for standardized data structures and data models for effective deployment
94
of enterprise’s data analytics capability. Surprisingly, there is quite divided opinion about the
importance of data quality for widespread use of Synchrophasor s for improving system operations.
Figure 4-44: Responses to survey – Section 1, Question 1

Question 2: Responders were asked to prioritize a provided list of data analytics use cases, by using
the following scale: Very Important – Important – Neutral – Not Important – Not at all Important
Results are presented in Figure 4-45. In can be seen that most of the data analytics use cases rank
high in terms of importance, system event detection being the one that seems to be most relevant for
responders.
Figure 4-45: Responses to survey – Section 1, Question 2
95
Data analytics applications

In what follows, we summarize information provided by the responders related to data analytics tools
implemented in their systems.
Terna SpA Italy:
Terna’s generation system has installed capacity of 114 GW, with 27 GW of renewable variable
generation nameplate capacity. The peak load is around 59 GW. Terna’s transmission system is
interconnected with five other systems and has about 800 substations.
Terna has developed in-house a system to access and analyze a wide-area monitoring database that
enables users to promptly identify faults, oscillations, and other perturbations in the system.
Measurement data is provided by PMUs. The system has graphic representation of status of
frequency, voltage, oscillations, and other parameters.
Terna has also developed and implemented in operation a real-time dynamic stability assessment
(DSA) tool. It includes functions for voltage and angle stability and assessment of N-k dynamic
evaluation of possible effects in case of islanding of part of the grid.
Regarding equipment heath monitoring, Terna has developed an application called MBI, which is
intended to inform the maintenance department about possible asset risk of failure. Within the
trending and forecast category, they have developed an advance dispatching tool for forecasting load
close to real time. The tool takes into account renewables but does not forecast them explicitly. It
forecasts the total load and the net load (the load seen from 400- and 220-kV grid), and thus
“forecasts” renewables, because all of them are connected on a 150-kV grid or below.
Finally, according to the expert from Terna, the new visualization tools for control rooms should be a
“smart approach” that includes representation of measurement trending, cockpit customization, and
alarm customization.
Dominion Virginia Power – USA
Dominion Virginia Power is an electric utility in the Eastern Interconnection of the U.S. The main
characteristics of Dominion’s power system are as follows:
- System peak load: 21,651 MW
- Installed generation capacity: 24,300MW
- Voltage levels: 500, 230, 138 115, 69
- Number of interconnections with other systems, control areas, regions: 30 transmission level
- Number of measuring points: Roughly 130k + SCADA measurements
Dominion has implemented tools for on-line dynamic security assessment, on-line voltage stability
analysis, and reactive power and voltage control. These tools are used by control room operators as
well as study engineers. They reside in the EMS and use field measurement and simulation based on
system models.
For geographical visualization of transmission assets, Dominion uses Alstom/GE EMS applications
provided by the eTerra Suite of productions. The system provides tabular display of various
information provided by different systems and applications. One-line diagrams of complete systems
are available via static tile map-board and also digitally at an operator console. The upgrade plan
includes the implementation of a digital version that will be displayed on a video wall at new control
centers.
As part of the visualization features, a weather map for the state of Virginia is shown on a large wall
board for all operators.
Regarding the availability of visual information for other parts of the business besides control, the
responder indicated that system operators have access to all enterprise level information. Also, they
utilize substation security cameras when needed.
The responder described the following as the main challenges in visualization: acceptance of
operations personnel, constraints imposed by cyber security rules, and maturity of visualization tools
available through the EMS platform. Regarding data analytics, he indicated that standardized data
structures and data models are only one part of the picture, and that there is a strong need for
production-ready, reliable analytics.
96
REN - Portuguese TSO

System characteristics:
- Installed Generation Capacity: as of 2015 (including wind, solar, and CHP) – 18,533 MW
- Installed VG Capacity (include wind and other variable generation resources): total wind,
solar, and CHP - 5868 MW
- Voltage levels: 400, 220, and 150 kV
- Number of interconnections with other systems, control areas, regions: Interconnections with
Spain - 9 (6 x 400 kV and 3 x 220 kV)
- Total line length per voltage level: 400 kV - 2632 km, 220 kV - 3611 km, 150 kV - 2562 km
REN is implementing an analytic system for fault locations in OHL and incident postmortem analysis. It
is intended to be used by control room operators and operation engineers. The system uses primarily
data from digital protection relays. It has been tested in the lab, and it was in prototype phase at the
time when this survey was conducted.
Regarding forecasting and trending tools, REN buys four wind forecasts from vendors and combines
and upscale them as needed by using forecasting tools developed in-house.
REN uses asset health information defined dynamic limits of operations in real time and for outage
scheduling. The responder indicated that asset health that influences operations limits should be
considered in system operations, at OHL, power transformers, and breakers (minimum set). He also
indicated that the consequence of not considering such information is reduced efficiency, because
worst-case scenarios are to be used to make deterministic decisions.
TELETRANS / Transelectrica – Romania
- System peak load: 9,479 MW
- Installed generation capacity: 24617 MW
- Installed VG capacity (include wind and other variable generation resources): 4331 MW
- Voltage levels: 400, 220 kV
- Number of interconnections with other systems, control areas, regions: 8
- Number of measuring points: about 8,000
- Number of substations: 82
TELETRANS/Transelectrica has implemented a synchronized phasor measurement system (SPMS)
developed by Schweitzer Engineering Laboratories. The SPMS covers 14 s/s, 400 kV upper voltage
level where the phasor measurement units (PMUs) are installed. The PMUs are connected to
protection secondary cores of the CTs and VTs. Additionally, in the s/s with cross border lines there
are installed local phasor data concentrators (PDCs) to archive data for longer time frames.
The main features of the application interface are:
 Expanded power system observability. Power system online data visualization offered by the
SPMS is a facility to improve power system monitoring by operators in the shift. Further, by
using the on-line accurate measurements of voltage magnitude and angle, of the active power
flow on certain transmission lines it is possible to estimate the steady-state stability of the
power system or a section of it more accurately and allow for prevention actions.
 Line parameter calculations and model validation. The ability of synchronized voltages and
currents gathering of two bus-bars of adjacent substations connected by a transmission line
enables line parameter calculation, thus improving the power system model validation.
 Detection of power system oscillations. By use of the SPMS, inter-areas oscillations can be
detected and properly analyzed in terms of damping capabilities.
 Accurate post-event analysis based on voltage, current, and active and reactive power flow
data obtained from the SPMS.
The system is used by control room operators, operational engineers, and protection engineers. It
was deployed in 2009. The company plans to extend the SPMS in the short term to collect data from
more substations and, in the medium-term time frame, to integrate it into the EMS-SCADA system at
the NDC level.
97
Regarding visualization, the company uses classical GIS system for OHL only and schematic diagrams
in the specific SCADA/EMS. They share system data and visualization with the EAS (European Answers
System), which is dedicated to the interconnection security.
Tohoku Electric Power Co., Inc. – Japan
- System peak load: about 14,000 MW
- Installed generation capacity: 17,810 MW
- Installed VG capacity (include wind and other variable generation resources): 3,230 MW
- Voltage levels: max 500 kV
The company has implemented a tool for assessing and monitoring reliability in on-line mode. It has
several functions, the dynamic stability assessment module being the most important, which runs
every 30 minutes.
For trending and forecasting analysis, they have developed and implemented a system called
Photovoltaics Output Estimating and Forecasting System, which is intended for control room operators
and operation engineers. It estimates the amount of solar radiation from numerical weather forecasts
and calculates photovoltaics output based on that.
For visualization in the control room, measurement data, such as transmission line power flow,
generator output, and bus voltages, are displayed on the system diagram every 10 seconds. Results
from security analysis tools are also displayed in the monitor screen using a color code to highlight if a
reliability violation occurs. Information about weather conditions and forecasts are also displayed.
Relative to distributed generation, total current wind power output and photovoltaics output in the
entire area are displayed on a dedicated screen, along with generation forecasts for a short- and
medium-term period selected by the user.
For the question about the biggest challenges and needs in visualization, the responder indicated that
it is critical to improve accuracy of renewable energy and output prediction and use those results to
estimate voltage variations and display them in a monitor screen alongside critical visualizations.
Kyushu Electric Power Company – Japan
- Voltage levels: 500, 220, 110, 66, 22, 6.6kV
For wide-area monitoring, Kyushu Electric has implemented a system to evaluate current state of the
power system as well as expected conditions in the near future (30 to 60 minutes). The application
assesses power system reliability in terms steady and dynamic security, including frequency
variations, overload, unscheduled power flows, voltage deviations, and dynamic and voltage stability.
A separate tool is used for voltage and reactive power control. The tool determines optimal control
actions in response to predicted demand and system operating conditions, with the objective to
prevent voltage at critical buses to deviate from operating margins.
The company has a renewable energy forecast system that forecasts output of solar photovoltaic
every 30 minutes. The system uses radiation forecast data purchased from a weather information
provider.
Another application is used for short-term demand forecasts. Future demand is estimated based on
accumulated historical demand, historical weather data, and weather forecast data purchased also
from a weather service company.
Visualizations to support system operation include:
98
Weather and weather forecast: Weather information (cloud radar images, forecast information, etc.)
and images from weather video cameras are displayed on a system panel. Temperature, other
weather information, and lightning strike status are displayed on the operator monitors.
Distributed generation: Battery charge/discharge status and area renewable output and status are
shown graphically on a system panel and operator monitors.
Anticipated state of the network: Results from power system reliability assessment tools are displayed
on operator monitors.
Other types of visualizations specific to that control center: Dam level information is displayed on
operator monitors, and earthquake occurrence information is presented on the overall system panel.
Visual information for other sectors of the company: Some power system information and
demand/supply information are made available throughout the company. In addition, power
generation forecast and lightning strike information are available in the company website.
In accordance with other responses, the responder indicated that the biggest challenge in
visualization, and other related analytics tools, is the needed for better renewable energy power
output forecast.
National Grid - UK
- Voltage levels: 400/275 kV
- Number of measurement points: 1,000,000
National Grid has a tool called VISOR for wide-area monitoring developed by Psymetric and the
University of Manchester. The tool performs real-time monitoring and alarming of sub-synchronous
oscillation.
Related to equipment health monitoring in the control room, they have developed and implemented a
tool for monitoring underground cables, which uses data from temperature sensors around cables to
evaluate cable conditions in terms of loading and capacity.
For trending and forecast analysis, the company has developed an energy forecast system to predict
national demand based on weather forecast data plus the contribution of both solar and wind
generation to the generation mix.
Visualization to support system operation includes:
Weather and weather forecast: TBD
Distributed generation: Battery charge/discharge status and area renewable output and status are
shown graphically on a system panel and operator monitors.
Anticipated state of the network: Results from power system reliability assessment tools are displayed
on operator monitors.
Other types of visualizations specific to that control center: Dam level information is displayed on
operator monitors, and earthquake occurrence information is presented on the overall system panel.
Visual information for other sectors of the company: Some power system information and
demand/supply information are made available throughout the company. In addition, power
generation forecast and lightning strike information are available in the company website.
4.5 REFERENCES
[1]. Technology Assessment of Power System Visualization. EPRI, Palo Alto, CA: 2009. 1017795.
99
[2]. T. J. Overbye, D. A. Wiegmann, A. M. Rich, and Y. Sun, “Human factors aspects of power system
voltage contour visualizations,” IEEE Transactions on Power Systems, pp. 76-82, February 2003
[3]. T. J Overbye, D. Wiegmann and R. J. Thomas, “Visualization of Power Systems”, PSERC Final
Project Report, Publication 02-36, Nov. 2002.
[4]. D. A. Wiegmann, G. R. Essenberg, T. J. Overbye, Y. Sun, “Human Factor Aspects of Power System
Flow Animation,” IEEE Trans. on Power Systems, vol. 20, August 2005, pp. 1233-1240.
[5]. ORNL VERDE: Visualizing Energy Resources Dynamically on Earth
http://techportal.eere.energy.gov/technology.do/techID=17
[6]. Woody Rickerson, “A Control Room View of the ERCOT Grid”, ERCOT Public April 19, 2016
http://www.ercot.com/content/wcm/key_documents_lists/81724/5_A_Control_Room_View_of_the
_ERCOT_Grid.pdf
[7]. https://www.coreso.eu/mission/
[8]. http://www.tscnet.eu/
[9]. Advanced Data Analytics Techniques: Analysis and Applications for Power System Operation and
Planning Support. EPRI, Palo Alto, CA: 2015. 3002007076.
[10]. V. Malathi, N. S. Marimuthu, S. Baskar, and K. Ramar, “Application of extreme learning
machine for series compensated transmission line protection,” Engineering Applications of Artificial
Intelligence, vol. 24, no. 5, pp. 880-887, 2011.
[11]. T. S. Sidhu, H. Singh, and M. S. Sachdev, “Design, implementation and testing of an artificial
neural network based fault direction discriminator for protecting transmission lines,” IEEE Trans.
Power Delivery, vol. 10, no. 2, pp. 697-706, 1995.
[12]. Review of Synchrophasor Applications, EPRI, Palo Alto, CA: 2014. 3002002870
[13]. Alison Silverstein, Kyle Thomas, and Jim Kleitsch, “Using Synchrophasor Data to Diagnose
Equipment Mis-operations and Health”, NASPI Work Group Meeting October 22, 2014
[14]. U.S. Department of Energy, “Advancement of Synchrophasor Technology in ARRA Projects”,
March 2016 –https://www.smartgrid.gov/files/20160320_Synchrophasor _Report.pdf
[15]. THE VALUE PROPOSITION FOR SYNCHROPHASOR TECHNOLOGY, North American
Synchrophasor Initiative NASPI Technical Report, October 2015.
https://www.naspi.org/sites/default/files/reference_documents/5.pdf?fileID=1571
[16]. Catalog of Data-Intensive Applications for Transmission Systems. EPRI, Palo Alto, CA: 2015.
3002005231.
[17]. High-Performance Hybrid Simulation/Measurement-Based Tools for Proactive Operator
Decision-Support – U.S. Department of Energy (DOE) DE-OE0000628 - Final Report, September
2014.
[18]. E. Farantatos, A. Del Rosso, N. Bhatt, K. Sun, Y. Liu, L. Min, C. Jing, J. Ning, M. Parashar, “A
Hybrid Framework for Online Dynamic Security Assessment Combining High Performance
Computing and Synchrophasor Measurements”, 2015 IEEE PES General Meeting
[19]. Case Study: Demonstration of the Quantum Weather Storm-Prediction Model and Application,
EPRI, Palo Alto, CA: 2016. 3002004268
[20]. Situational Awareness – Opportunities for the Electric Power Industry, EPRI, Palo Alto, CA:
2016. 3002007606
[21]. Djurica, V., Milev, G., An Application to Display Lightning Data Using SCALAR Information
System,(2014), 23rd International Lightning Detection Conference, Tucson, USA
[22]. https://www.gridprotectionalliance.org/products.asp#XDA
[23]. U. Minnaar, “The Characterisation and Automatic Classification of Transmission Line Faults”,
Ph.D. Thesis, University of Cape Town, September 2013.
100
[24]. Mahfuz Ali Shuvra and Alberto Del Rosso, “Root Cause Identification of Power System Faults
using Waveform Analytics”, accepted for the CIGRE US National Committee 2017 Grid of the
Future Symposium
[25]. Lakota, G., et al., “Real-Time and Short-Term Forecast Assessment Of Power Grid Operating
Limits – SUMO”, 5th International Scientific and Technical Conference - CIGRE B5, Sochi, Russia,
2015.
[26]. Djurica, V., dr. Kosmač, J., Milev, G., “A Multiple Power Line Corridor and Lightning Error-
Ellipse Spatial Processor for Real-Time Correlator”, (2008) 20th International Lightning Detection
Conference, Tucson, USA.
[27]. CIRED Paper / 0406 / June 2015:2D AND 3D VISUALIZATION STRATEGIES FOR
DISTRIBUTION MANAGEMENT, Siemens AG: Sonja Sander / Siemens AG: Dr. Roland Eichler
[28]. User Interface: Spectrum PowerTM 7 / Siemens AG
[29]. User Interface: SIGUARD PDP / Siemens AG
[30]. U.S. Department of Energy – Electricity Delivery & Energy Reliability, “Dynamic Line Rating
Systems for Transmission Lines”, Topical Report, Smart Grid Demonstration Program, April 25,
2014 (available online at www.smartgrid.gov)
[31]. Integrating Dynamic Thermal Circuit Rating into System Operations: Utility Experiences and
Technology Roadmap. EPRI, Palo Alto, CA: 2011. 1021751.
[32]. Increased Power Flow: Overhead Transmission Line Rating Research Advancements. EPRI,
Palo Alto, CA: 2015. 3002005709.
[33]. Cigre Working Group B2.36 Technical Brochure, “Guide for Application of Direct Real-Time
Monitoring Systems”, June 2012.
[34]. http://lindsey-usa.com/dynamic-line-rating/
[35]. http://info.genscape.com/physical-grid-monitoring
[36]. http://www.lineamps.net/about.shtml
[37]. Alarm Grouping and Event Root Cause Analysis for Transmission Control Centers. EPRI, Palo
Alto, CA: 2016. 3002008275.
[38]. Alarm Management Philosophy for Transmission Operations Control Centers, EPRI, Palo Alto,
CA: 2016. 3002008274.
[39]. DNV GL Smart Cable Guard www.dnvgl.com.
101
5. DATA INTEGRATION AND MODELING

Success of advanced data management and analytics greatly relies on the accessibility, flexibility,
scalability, comprehensiveness, and efficiency of data modeling for system operations as in the old
saying: Data is only as good as the way it is packaged. This section examines typical data modeling
processes in some utility companies and regional transmissions organizations to explain how data are
assembled in the power industry for secure and reliable real-time grid operations in EMS, including
introduction of the model information and its usage and model update procedure and lifecycle. In order
to exchange the operational data between control centers and throughout the industry, common data
exchange format and protocol need to be in place.
The most important international industry standards, such as IEC 61850, CIM (Common Information
Model), and COSEM, are presented in the second part of this section. The intentions of the standards
are briefly discussed, followed by specific introduction of each standard and the harmonization efforts
to unify them. With the information and technology explosion in the modern era, the power industry is
also being profoundly affected, and the utility companies have been making substantial efforts to
adapt. This section touches upon the impact of new technologies and new data sources on the data
modeling approaches in this changing technological and regulatory macroscopic environment.
The new technologies and new data sources include Synchrophasor , renewable energy, and
equipment health condition monitoring on operations data, and their impacts on operations data
modeling will be individually explored. Based on the extensive analysis of the real-time operational
data modeling, an example of an actual data integration project is presented.
5.1 DATA MODELING PROCESSES FOR SYSTEM OPERATIONS
In order to ensure reliable and economic operation of the electric transmission network, the real-time
monitoring and control system and energy management system (EMS) have to establish operation
schedule and remote control of voltage, power flow, and power system equipment. EMS provides the
information about the past states of the network, and it is capable to export snapshots of its network
data models. Thus, accurate and up-to-date models used by EMS are very important for a reliable
system operation. The following two aspects of data modeling processes for system operations are
discussed in details:
 Model data information and its usage
 Model update procedure and lifecycle
Depending on the interconnection of the regional transmission operator (RTO) or the transmission
system operator (TSO), the EMS models might be described in different layers such as the proprietary
detailed footprint and less detailed neighboring RTOs or TSOs. Modeling a large system and keeping
the model up-to-date could be very complex, therefore requiring cooperation between control centers
and member companies, as well as neighboring utilities, RTOs, TSOs, etc.
5.1.1 Model information and its usage
The operating entities are required to create and maintain an accurate model of their electric systems.
The computer representation of the power system facilities model requires the input data from various
sources such as generator owner (GO), transmission owner (TO), load serving entities, and other
reliability coordinators need to be timely and accurate, because it may impact the reliable operation of
the system.
In general, telemetry data are required to include, but not limited to, the following:
 Voltages at interest location above certain level (e.g. 69 kV).
 MW and MVAR values for all generating units, transmission facilities, and injections at certain
voltage level and above have greater than certain level of power flow (e.g. 1 MW).
 MVAR values for synchronous condensers and static VAR compensators.
 Transformer phase angle regulator (PAR) and load tap changer (LTC or TCUL) tap positions
for modeled and controlled transformers.
 Circuit breaker status for each modeled facility at certain voltage level and above (e.g. 69 kV).
 Frequencies at selected stations.
103
TOs, GOs, and other electricity utility companies are responsible for providing the information and
data for an accurate modeling of their electrical system. Usually, the data and information need to
include, but not limited to, the following (PJM Operation Support Division, August 25, 2016):
 Substation topology (including generator substations), facility connectivity, and physical
location upon request (state and global positioning satellites (GPS) coordinates)  
 Equipment names or designations  
 Facility physical characteristics including impedances, transformer taps, transformer tap
range, transformer nominal voltages, etc.
 Facility limits and ratings
 Voltage control information and recommended set-points
 Recommended contingencies to be studied
 Protective device clearing times, as appropriate, to support real-time transient stability
analysis
 Buses, breakers, switches, and injections or shunts such as loads, capacitors, SVCs, etc.
 Lines and series devices (reactors or series capacitors)  
 Transformers and phase shifters
 Generator auxiliary, station service, or common service loads (MW & MVAR)  
 Generator step-ups to be modeled for Bulk Electric System (BES) generators  
 Generator “D” curve limits  
 Real-time analog and equipment status telemetry for transmission elements, including, but
not limited to:
o Breaker, switch, or other equipment status required to determine connectivity to real
(MW) and reactive (MVAR) power flow for lines, transformers (high or low-side), and
phase shifters
o Real (MW) and reactive (MVAR) for loads and/or other injections as appropriate
o Reactive (MVAR) power flow for capacitors and SVCs
Figure 5-1 shows typical data involved in EMS modeling for power system operations. The EMS
modeling includes telemetry data, connectivity data, and electrical parameter data. Note that the basic
connectivity information is necessary to include external system models. In order to collect the EMS
data, communication, construction design, transmission and planning, operations modeling engineers,
and RTO need to be involved in this cooperative process as the figure demonstrates.
104
Figure 5-1: Dominion Virginia Power EMS modeling data

In this example diagram, RTO receives the input data. Based on the input data and real-time telemetry
data, the real-time model (EMS), steady-state model, and real-time transient stability model are created
and maintained. The EMS model together with real-time locational marginal price (LMP) and security
constrained economic dispatch (SCED) provide secure and economic operating points. By using a state
estimator (SE), the EMS model calculates the real-time state of the electric system. The established
system operating limits could be assessed by EMS model as well. The steady-state model is usually
maintained in seasonal builds; thus, the updated line impedances and connectivity information are very
necessary. The steady-state state estimation requires voltage telemetry information, branch flow, and
breaker status. The real-time transient stability analysis (TSA) is an optional tool for some control
centers. TSA depends on the EMS data and model, and it also depends on the SE solution as the initial
condition to perform transient stability analysis. It should be made clear that Figure 5-1 is an example to
illustrate the common utility EMS data modeling processes, and that there may be reasonable deviations
from the processes shown in the figure in a utility company to serve a specific company organizational
structure or due to historical maintenance practices.
There are many other types of data and information that could be utilized to improve the accuracy of
the EMS model. For instance, the dynamic line rating (DLR) for EMS could maximize the use of the
transmission system while ensuring reliable and efficient market operations. An accurate real-time
ampacity monitoring system will allow the operators to exploit the full capabilities of existing lines.
CIGRE/IEEE dynamic thermal model (IEEE Standard for Calculating the Current-Temperature
Relationship of Bare Overhead Conductors, 2013) requires following weather-related data for
ampacity and sag calculation:
 Ambient temperature
 Solar intensity
 Wind speed
 Wind direction
 Rain rate
Cybersecurity concerns, integration of DLR into system operations, and verification of the financial
benefits [3] are very important issues to address before applying DLR in EMS. DLR could provide a
better knowledge of the actual line rating that that is provided by static ratings.
PMU’s will be playing a more important role in the next-generation EMS [4]. Some of the PMU data are
used for model validation on generator model and line impedance. Other advanced PMU applications
105
[5] include angular separation, dynamics oscillations monitoring, disturbance location identification, and
islanding and resynchronization.
5.1.2 Model update procedure and lifecycle
This EMS model requires significant coordination between system operators and stakeholders.
Summer and winter builds are two updates commonly known as the regularly-scheduled builds in
North America, and the other regions of the world may have similar model update processes. A new
build usually includes two essential types of changes:
 Topology changes
 Parameter changes
TSOs and power utilities are responsible for providing data about all construction projects that will
impact the RTO model. They are typically required to notify the RTO from six months to one year in
advance of system topology changes. The EMS network model updates accordingly a few times (e.g.
four times each year) to reflect the topology changes. Thus, to ensure that the EMS update includes a
facility addition, revision, or deletion, all model information must be submitted to the RTO or the TSO
accurately and timely. An example of an EMS model build lifecycle is shown in Figure 5-2.
Jun-Sept
25%
Jan-Jun
38%
Jul-Sept
19%
Dec
6% Oct-Nov
12%
Jun-Sept Jul-Sept Oct-Nov Dec Jan-Jun
Figure 5-2: EMS winter build lifecycle

 June–September, TOs are required to submit data.
 July–Sept, RTOs package the submitted data.
 October–November, RTOs test new model.
 December, RTOs implement the model build.
 December–January, TOs check implementation changes.
 January–June, cut-ins, outage request, telemetry, and ratings are required in model build.
The summer build has a similar lifecycle but shifts six months starting in December. Interim builds
between summer and winter are often implemented if the topology and parameter changes greatly
required in the system [1].
Figure 5-3 shows a common utility EMS modeling update process diagram. It shows a typical lifecycle
in details of an EMS modeling update. The TSO or power utility not only has the obligation to fulfill the
modeling date update deadlines that the RTO/TSO requires, but it also needs to cooperate the
conduction of its own energization target three to six months prior to energization target date, with the
update request.
106
Figure 5-3: Common utility EMS modeling update process

The TSO/electrical utility company construction has the network change information updated, and
RTO/TSO tests and implements the updated data model for its EMS. Typically one to two weeks prior
to the energization target data, the TSO or power utility will check the implementation with the outage
scheduling.
5.2 DATA MODELS AND OPEN STANDARDS
5.2.1 Why do we need a common data model?
Evolutions of electrical grids induced by smart grids accelerate changes in transmission and distribution.
Data exchanges are increasing, market deregulation has led to a proliferation of actors, and applications
become more complex. These are all reasons why actors of energy markets decide to use international
standards. Due to the growing need of smart grid stakeholders to deploy solutions offering a semantic
level of interoperability, data modeling appears to be the key element and the foundation of the smart
grid framework. Furthermore, data modeling seems much more stable than communication
technologies, which makes this foundation even more important.
5.2.2 IEC standardized data models
Currently, the IEC framework relies on three main standards for the field of data modeling,
represented in Figure 5-4.
107
Figure 5-4: Data modeling on smart grid architecture model framework

 The CIM (IEC 61 970, IEC 61 968, IEC 62 325) provides the information model containing
equipment and functions and their properties for power system management, analysis, and
related use cases (generation, market, and grid).
 IEC 61850 provides the information model containing equipment and functions and their
properties for power utility automation use cases.
 The COSEM Companion Specification for Energy Metering provides the information model
containing equipment and functions and their properties for metering and related use cases.
Figure 5-4 also shows the three ongoing harmonization efforts in progress (i.e. the definition of
unified shared semantic sub-areas or formal transformation rules), which are needed to allow an easy
bridging of these semantic domains:
 Harmonization between CIM and IEC 61 850, mostly to seamlessly connect the field to operation
and enterprise level (cf. § 5.2.5).
 Harmonization between CIM and COSEM, mostly to seamlessly interconnect electricity supply and
grid operation.
 Harmonization between COSEM and IEC 61 850, where smart metering may co-habit with power
utility automation systems.
The following subsections discuss the three main standardized data models in more detail.
5.2.2.1 CIM
The Common Information Model (CIM), developed by the IEC (International Electrotechnical
Commission), is an abstract information model that can be used to model an electrical grid and the
variety of equipment used on the grid. By using a common model, utilities and vendors can reduce their
integration costs, which should allow more resources to be applied toward increased functionality for
managing and optimizing the electrical system [6].
The model covers all the necessary data in the study and operation of electrical systems, including
market transactions between companies or between producers and consumers. Operations include the
grid control, its constitution, its maintenance, its evolution, and for all the business of the energy sector
from production to distribution, and consumer to marketing. Standard series arises from this work:
 IEC 61 968 series, which defines the interfaces for the main elements of a distribution
management system (DMS). A DMS consists of various distributed application components for
the utility to manage electrical distribution networks. These capabilities include monitoring
and control of equipment for power delivery, management processes to ensure system
reliability, voltage management, demand-side management, outage management, work
management, automated mapping, and facilities management. All these applications put
108
together constitute the Interface Reference Model (IRM). Communication between application
components of the IRM requires compatibility on two levels:
o Message formats and protocols.
o Message contents must be mutually understood, including application-level issues of
message layout and semantics.
 IEC 61970 series, which provides among others a set of general guidelines and infrastructure
capacity lines necessary for the implementation of EMS-API interface standards (Energy
Management System - Application Program Interface). ENTSO-E in Europe, for example, uses
the Common Grid Model Exchange Standard (CGMES), which is a superset of the IEC CIM
standard. It was developed to meet necessary requirements for TSO data exchanges in the
areas of system development and system operation.
 IEC 62325 series, which specifies the CIM for communications for deregulated energy
markets. The IEC developed these standards as a framework for energy market
communications encompassing two market styles: European style and North American style
markets.
The foundation of the IEC 62325 series is:
 IEC 62325-301 “CIM extensions for markets” standard, which is an abstract model that caters
for the introduction of the objects required for the operation of electricity markets.
 IEC 62325-450 “Profile and context modeling rules,” the international standard for the
generation of profiles.
For each standard, there are degrees of freedom that must be defined. The CIM standard must be
adapted within the energy companies according to their needs. For example, the energy company EDF
described the M-SITE model. This is a UML model derived from the CIM model for network domain
requirements. It defines CIM UML classes as well as specific M-SITE additions to describe networks
and extensions used to support a number of study functions. It is the reference (data dictionary) for
defining classes, associations, and UML attributes used to construct exchange interfaces based on the
MSITE model. Industrials must appropriate and adapt the standards to his needs while respecting the
basic rules in order to remain CIM compliant.
5.2.2.2 IEC 61850
IEC 61850 is a standard established by the TC 57 of the IEC. This standard defines common
communication architecture for systems inside the substation (process level, cubical level, and station
level). Historically, IEC 61850 is based on IEC 60870 and IEEE UCA.
The information exchange mechanisms rely primarily on well-defined information models. These
information models and the modeling methods are at the core of the IEC 61 850 series. The IEC 61850
series uses the approach to model the common information found in real devices as depicted in Figure
5-5 [7]. All information made available to be exchanged with other devices is defined in the standard.
The model provides for systems for power utility automation an image of the analog world (power
system processes, switchgear).
109
Figure 5-5: IEC 61 850 modeling approach

Implementations to reach interoperability have to be based on a common understanding of definitions.
The approach of the standard is to decompose the application functions into the smallest entities, which
are used to exchange information. This is described in the IEC 61850-5.
The granularity is given by a reasonable distributed allocation of these entities to dedicated IEDs. These
entities are called logical nodes (LN). A logical node corresponds to a functionality of the electrical
system (for example overcurrent protection). IEC 61850-7-4 describes the structure of 128 logical
nodes, ranking them into 19 groups, such as a virtual representation of a circuit breaker class, with the
standardized class name XCBR. The logical nodes are modeled and defined from the conceptual
application view in IEC 61850-5.
In the standard, implementation of the logical nodes in the substation structure is not mandated. It
depends more on the implementation of the features in the substation. Logical nodes are themselves
composed of dataObject; some are mandatory.
DataObjects have a particular type. The different types of the standard are described in IEC 61850-7-3.
This type decomposes dataObejcts in dataAttributes. This is the lowest level description of the standard.
5.2.2.3 COSEM
DLMS/COSEM is standardized internationally via the IEC and CENELEC technical committees TC13. IEC
TC13’s working group 14 on meter data exchange defines the standards issued under the IEC 62 056-
x series: standardization framework (which requires profiles to be created based on existing lower
layers, mainly from ITU, IEC, and IETF), OBIS codes, interface classes, COSEM application layer, and
other lower layers (see Figure 5-6 [8]).
110
Figure 5-6: Sources and actors

DLMS/COSEM is the protocol of choice for communication between multi-energy smart meters,
gateways, and backhaul systems and the guarantee of interoperable systems. The DLMS User
Association is responsible for editing the DLMS/COSEM specifications and the transmission via a D liaison
to the standardization organizations (IEC, CENELEC), edition of the Conformity Test Tool (CTT), control
of product conformity, member support, training, and promotion. The association currently has more
than 300 members from more than 50 countries, and over 700 products from more than 100
manufacturers have been certified to date.
COSEM uses an object modelling technique to represent all functions of the meter, without making any
assumptions about which functions need to be supported, how those functions are implemented, and
how the data are transported. DLMS/COSEM has been designed for separation between the COSEM
object model and the DLMS communication protocol. The formal specification of COSEM interface
classes forms a major part of COSEM. The COSEM object model represents the product’s interface
description.
The definition of OBIS, the Object Identification System, is another essential part of COSEM that
organizes data and methods according to the object model. The communication protocol allows
transmission of coded messages between a server (the product) and a client (a gateway or more
generally a remote IT system) through a scope of services. The various physical media transport layers
are out of scope of the specification but are supported via a set of profiles. The standardized COSEM
interface classes form an extensible library. Manufacturers use elements of this library to design their
products.
Objects are made up of attributes and methods. Similar objects are grouped into interface classes. Some
major categories are storage, access control and management, time and event management,
prepayment, and communication configuration.
The COSEM data model is independent of the underlying media communications layer, and all profiles
use the DLMS/COSEM application layer. The connection manager is independent of the media, such as
TCP or CIASE S-FSK. An adaptation or convergence layer is introduced whenever required. Profiles are
developed according to market requirements, such as CENELEC A band OFDM G3-PLC and S-FSK PLC.
The security approach is end-to-end multilevel, protecting both COSEM payload and xDLMS messaging.
Authentication is supported on both server and client side with configurable security policy, as well as
a fine-grained access control depending on the client role. Specific alarms and alerts for security are
also supported. DLMS/COSEM specifies high-level security algorithms based on NSA Suite B, NIST, and
FIPS, with cryptography based on Diffie-Hellman elliptic curves (ECDSA/ECDH) and variable symmetric
or asymmetric key sizes. Finally, optional compression can be implemented according to ITU V.44.
111
5.2.3 Example of harmonization between CIM and IEC 61850

Every interface between systems that are not covered by the same standard requires a mapping or
transformation from one standard’s format to another in addition to the mapping from proprietary
formats to standard formats when a system interface does not support any standard at all.
This verdict has prompted the development of a common semantic model for the IEC 61968 standards
(CIM) and the IEC 61850 substation automation standards. Goals of such work are to:
o Enable the entry and update of substation configuration data once.
o Enable access to real-time data from IEC 61 850 devices to directly feed SCADA and back office
systems on the CIM standards.
Without the harmonization of these standards, the development and implementation of systems and
applications will result in a significant amount of engineering and design that applies to only one
implementation. The harmonization can be done by mixing equipment topological approach of CIM and
functionality approach of substation configuration description language (SCL). SCL is the language and
representation format specified by IEC 61850 for the configuration of electrical substation devices. IEC
TC57 is involved in the harmonization CIM & SCL.
5.3 IMPACT OF NEW TECHNOLOGIES AND NEW DATA SOURCES ON DATA
MODELING
In the past few decades, innovative technologies have been burgeoning on an unprecedented scale
thanks to both the pull from fast-growing electricity demand, especially the demand for green energy,
and the push on the other side from the quickened advancements in supporting industries such as IT,
telecommunication, data processing, and so on. Among the major technology advancements in the
power industry, renewable (solar, energy storage), Synchrophasor , and equipment health condition
monitoring are the ones that are changing the paradigm of real-time grid operations and are requiring
significant process improvements in the operations data modeling.
5.3.1 Impact of Synchrophasors on operations data modeling
PMU is widely accepted as the most important measuring device in the future of the power system that
will revolutionize the way power systems are monitored and controlled, and it is anticipated that the
migration towards full PMU implementation for power systems is underway with accelerated momentum
in almost all major utilities and RTOs. With the exponential growth of Synchrophasor data in control
centers of a wide range of utility companies, there is an ever-increasing emphasis to successfully
integrate existing model-based EMS applications with the PMU measurement-based applications. This
is necessary to gain EMS operators’ confidence, who will continue to use existing systems and
procedures they are accustomed to while embracing these new measurement-based tools and
techniques.
For example, when grid oscillations or sudden rate of change events are detected by the fast
measurement-based applications, these notifications are presented in the traditional EMS alarms
display and will also trigger traditional EMS tasks to allow the operator to drill down using EMS displays
to discover specific details of these events, as well as launching “what if” analyses to determine the
severity of the event. One of the “what-if” analyses is to study potential contingencies and to simulate
transmission stress to determine the most limiting operational limit (OL) for a particular transmission
corridor. This OL can then be used in the faster measurement-based analytic to quickly alert the operator
when an OL is being reached [9].
The growth of new Synchrophasor applications for reliable and secure operations of power systems is
already making progress in the electric utility industry. A good number of new Synchrophasor
applications for reliable and secure operations of power systems have moved beyond the conceptual
and development stages to the PoC stage. A few such new applications include:
 Situational awareness, visualization, and alarming
 Abnormal angles alarm
 Dynamics oscillations (small signal oscillation) monitoring
 Line overloads monitoring with considerations for per phase analysis
 Abnormal voltages alarm
112
 Enhancements of alarming, cognitive task analysis

 Voltage stability indicators
 Enhanced EMS state estimation (SE)
 Down-sampling the Synchrophasor measurements and adding the data stream to existing SE
measurements
 Observing dynamic state changes of the grid during disturbances
 Linear state estimation
 Islanding, resynchronization, and blackstart (IRB) simulator
 High-resolution data driven equipment condition/health assessment
 Short-term equipment failure precursor
 Long-term equipment cumulative age calculation
 On-line apparatus electric parameter validation and PT/CT calibration
 GIS integrated enhanced fault location
The new application runs using PMU data from a phasor data concentrator (PDC) and can conceivably
run for every PMU sample set (i.e. 30 to 120 times/second). If this complete data is not available in the
EMS, which is likely because down-sampling to 1 sample/second is typically used, the natural choice
for the application is to reside on the same system as PDC. In such scenario, there may be a need for
some EMS information to be periodically available to the application and the results of the application
transferred to EMS. This will necessitate a reliable, secure data transfer mechanism, hopefully using a
web service [10].
With the integration of fast Synchrophasor measurements (at rates of 50 to 60 measurements per
second) into the control center data model, the EMS now has real-time visibility of the dynamics of the
power system. This complements the visibility of the steady-state behavior of the grid with traditional
SCADA measurements. Many of the new Synchrophasor analytics complement and corroborate
traditional EMS analytics and can therefore be used together to jointly validate and fine-tune the
analytics for improved precision and accuracy. For example, the oscillation monitoring analytic using a
network dynamic model can be “married” with its counterpart measurement-based analytic to compare
results and to gradually improve the network dynamic model parameters.
5.3.2 Impact of renewable energy on operations data modeling
In many parts of the world, the government-mandated Renewable Portfolio Standards (RPS) requires
electricity suppliers to obtain a minimum percentage of their power from renewable energy resources
by a certain date in response to the recent emphasis on environmental issues and concerns for global
warming. There have been a wide variety of financial incentives that are being put in place by
governments around the globe to the boost economy and employment and to mitigate the impacts of
the looming climate crisis. These incentives are expected to spur investments and growth in wind and
solar industries. All those factors are causing wind and solar energy to expand at an ever-quickening
pace, leading to high levels of penetration in a relatively short time. Utilities and power system
operators must prepare to integrate and manage more of these variable renewable electricity sources
on a much larger scale [11].
Apart from the many benefits owing to the ever-increasing amount of variable resources, most
renewable resources required in the RPS are variable resources characterized by their high level of
variability and uncertainty, and the variability with these resources remains a major concern for
utilities in terms of grid operations. First of all, the task of controlling the power system and balancing
supply and demand becomes more of a challenge for the grid operators. In addition to the inherent
variability and unpredictability associated with these resources, the fast ramping associated with wind
and solar photovoltaic resources will further challenge the utility companies.
The task of balancing and controlling the power system is further complicated by the fact that, in
current practice, in most balancing areas, renewable resources are treated as “must take” resources,
requiring the grid operators to look for additional fast responding resources to compensate for the
113
variability, uncertainty, and the fast ramping of variable resources. In order to accommodate the
increasing penetration levels of variable resources, balancing areas will need to adopt strategies and
implement new tools to provide better visibility into variable resource operations, to better forecast
their expected generation levels on a short-term basis, and to dispatch and control these resources.
The operator of these resources, on the other hand, will require tools with adequate datasets and
advanced data models to interface with the balancing area operators, and to facilitate and automate
the participation of variable resources in various energy and ancillary services markets.
Integrating data from large utility-scale variable generation presents unique challenges. These
challenges call into question the long-standing set of assumptions that determined how utilities
operated the power systems for decades. Power systems are designed to handle significant amounts
of load variations and other uncertainties. Thus, managing risks is not new for grid operators. The
expected increase in wind and solar generation, however, introduces new operational paradigms: how
to ensure system controllability and observability and how to manage new kinds of variability and
uncertainty.
Operational integration deals with how operating characteristics of wind and solar plants are combined
with existing operating policies (e.g. system balancing, ancillary services, ramping resources up/down)
and decision-support tools deployed to support the utility control-room operators who run the power
grid. Operating policies include different heuristics that are used to ensure balance between load and
generation. With increased variable generation, policies on how this balance is maintained can be
expected to change.
Wind and solar energy generation are intermittent resources and, as such, can make it difficult to
operate the power grids to which they are connected. The primary requirement for integrating these
variable generations with utility operations is having access to forecast information about the quantity
and availability of the power output from wind or solar plants. Thus, reliable forecasting systems are
necessary to achieving increased wind and solar energy penetration. The use of forecasting in control
rooms is the key to managing variability and reducing uncertainty, operational impacts, and costs.
Forecasting allows operators to anticipate generation levels from wind and solar plants and adjust the
remaining generation units accordingly. Accurate short-term wind production forecasts enable grid
operators to make better day-ahead operational decisions, including scheduling the mix of generation
resources to be dispatched. What constitutes a challenge is how to integrate wind forecast data with
existing tools used in control centers. The goal of the data modeling and integration must be to
enhance operators’ local and global situational awareness in light of increased variability and
uncertainty. Toward this end, existing EMS, GMS (generations management system) and MMS
(market management system) applications must be enhanced by incorporating wind forecast
information and by making changes to different applications such as unit commitment, automatic
generation control, and special protection schemes. Below is the high-level information flow and
datasets of the variable renewable energy integration for grid operations.
114
State Estimation
Generation/Load Look Ahead
Balance Analysis
Applications Interface
Turbine Facility
Wind Areas Telemetry S.E. Setup
Configuration
Parameter Types Owner
Enterprise Data Bus
Generation Wind/Solar Day Ahead

Schedules Forecast Congestion
(Conv, Wind, Forecast
Solar, etc.)
Figure 5-7: RES data integration/modeling diagram

Wind and solar forecasts are developed using weather models that contain abundant geographically
distributed data. This data provides information needed to support decisions, as well as input for other
forecasting tools such as load and transmission line thermal limit forecasting. Thus, the way this data
is presented to operators will affect stability assessment (SA). Advanced forecasting systems can be
used to develop early-warning systems that alert grid operators of the likelihood of extreme weather
events so that the operator can take necessary actions.
Equipped with the appropriate data, information, and tools that are fully integrated with renewable
energy forecasts, operators will achieve a higher level of situational awareness and become more
confident about managing variable resources. As a result, they are more likely to run the grid less
conservatively, allowing a greater percentage of the renewable energy to actually be dispatched. The
term “duck curve” was coined by the electric power industry to refer to its system’s load net of
renewable generation resources (i.e. wind and solar) with the belly of the duck being primarily the
effect of penetration of utility-scale solar. The net load required to be supplied by an electric system
from dispatchable resources, including imports (i.e. system load minus load served by utility-scale
variable generation—wind, solar PV, and solar thermal) has gotten lower and lower, and some of the
regions in the U.S. and around the world have the “duck belly” more than 50% of the total load
during the peak hours of the renewable energy output, much more quickly than originally projected.
5.3.3 Impact of equipment health condition monitoring on operations data modeling
In recent years, transmission operators have actively pursued having power grid asset/equipment
information available in control centers to facilitate their decision-making. There are several types of
assets or equipment data currently received by the operators in control rooms and some specific
equipment information, whether direct or derived, desired by operators to support their decision-
making to maintain a more reliable and efficient power grid. The common business drivers or project
justifications shared within utility space are discussed in the following:
Equipment health diagnosis technologies, also known as sensor technologies, have made tremendous
progress in recent years, and it makes practical sense to leverage the emerging technologies for the
benefit of grid operations where practicable and cost effective.
Situational awareness has become more and more important in control centers, and critical asset
information will further improve the situational awareness of the grid operators.
Asset condition or health information can provide the operator with “look ahead” capability to
proactively plan ahead for potential security or emergency situations. Operator awareness of the
present condition of equipment has great potential to avoid catastrophic equipment failures, which is
of great benefit to the power system and public through improved overall system reliability.
115
Asset/equipment information facilitates operator’s decision-making and operational risk management

abilities. For example, if a piece of equipment is shown to be in poor or cautionary condition, the grid
operator can weigh in this information, along with other relevant information (e.g., approaching
storms), to decide whether to take the equipment out of service or reduce its loading to maintain
system reliability. The desired asset condition information can also be integrated into the real-time
contingency analysis to develop mitigation strategies. Thus, the asset/equipment information
empowers the grid operator to proactively manage potential reliability risk and enhance overall system
reliability.
On the power grid asset management side, the benefits of the asset information being available for
grid operators also include improved system reliability through reducing the risk of equipment failures,
improved efficiency through dynamic equipment ratings, and potentially prolonging equipment life by
avoiding costly failures.
Finally, asset information provides the grid operator with another tool to comply with mandatory
reliability standards in an increasingly complex and challenging operating environment that involves
wider geographic areas, integration of renewables, and other novel supply- and demand-side
technologies, as well as growing cyber security concerns [12].
It has been widely witnessed in the power industry that tremendous progress has been made in
developing and demonstrating technologies that can diagnose the health of power transformers,
circuit breakers, and other power system equipment. The useful information that these technologies
can yield to improve operational awareness has been identified as: a) current equipment condition; b)
sudden change in condition, if any; c) life expectancy, or at the least, likelihood of failure in the short
term (i.e. days); d) loading margin (MW, Mvar); and e) prediction of system operational risk.
Caveats should be given that the grid operators need to resist information overload by receiving
volumes of asset/equipment data. Rather, the asset-related data needs to be filtered, processed, and
analyzed before being sent to the grid operators so that the actionable information derived from such
data can effectively support their decision-making. In a nutshell, grid operators need to see succinct,
actionable information displayed clearly on EMS dashboards or other often-used EMS visualizations,
which will minimize the need for additional training and operating procedures [13]. Figure 5-8 shows
the aggregated response to an EPRI-conduced survey question, “What specific data/information do
you currently receive in the control center regarding equipment health condition?” which summarizes
the most relevant data the grid operators are interested in obtaining for their improved operation
performance.
116
Figure 5-8: Equipment data/information currently received in control centers

For the purpose of concisely presenting the most relevant information to the critical mission that grid
operators execute, it has also been an initiative to use probabilistic reliability assessment (PRA) to
filter and analyze the asset condition information and make it more concise and ready for the
operators. PRA methodology provides a technical approach to assess risk posed by undesirable
system events during contingencies. Specifically, PRA combines probabilistic measure of the
likelihood of undesirable events with potential consequences of these events to arrive at a reliability
index—probabilistic risk index (PRI). There has been substantial research and development in the
area of calculating, improving, and implementing the PRI algorithms and processes. Among those
efforts, EPRI developed the PRA program as a tool for system planners and operators to perform risk-
based reliability assessment as diagrammed in Figure 5-9, which shows an integrated methodology
condition index calculation integrated into the calculation of the probability of the outage situation.
The PRA starts with the equipment condition indices that are comprised of normal degradation and
abnormal degradation condition components. When these indices are summarized and used to rank a
transformer fleet, they will be referred to as the condition ranking indices (CRI) with a CRI-derived
value used as a simple modifier to the unavailability. In the traditional, deterministic power system
study, the contingencies are ranked according to the severity of their consequences (number of
violations, sum of violations, average violation, margin to voltage collapse, etc.). This approach does
not take into account the likelihood of the system to experience operation limit violations in
contingency situations. The probabilistic approach weights the severity by a probability to yield the
PRIs. Ranking the contingencies according to their probabilities and consequences gives an account
of the risk posed by the contingencies to power system reliability.
The PRA program uses the contingency analysis results as well as the equipment outage information
as the inputs to compute an overall risk, PRI. The PRA program performs a detailed assessment of
undesirable consequences of a contingency such as thermal and voltage violations, voltage collapse,
and load loss. The PRA program can help system planners and operators to identify the most critical
potential grid contingencies and compare their adverse impacts to other contingencies [12].
117
Figure 5-9: Proposed concept to incorporate equipment condition information indices into PRA
calculations
The grid operators used to receive the asset/equipment information through asset management or
field personnel, and this may still be the case in many utilities where operators sometimes have to
scramble to gather adequate information about the equipment under duress and to perform quick
assessments or detailed analyses, such as the real-time contingency analysis, to arrive at mitigation
1-2 measures. This communication method inherently introduces some delay and possibility of
miscommunication. Now that most of the real-time equipment health information is available in the
control center, the firsthand knowledge will help the grid operators in assessing the situation and the
associated risk. However, this constitutes a widely recognized challenge of organizing/integrating the
asset health related data into the current grid operation data models and merging the information into
real-time SCADA data stream and connected network model.
In the past decade, there have been some recommendations within the industry for the network
architecture, communication protocol, and information model needed to integrate and transmit this
equipment health data to grid operators efficiently. It presents an initial effort to identify functional
specifications for an integrated "equipment health information system for grid operators," including
conceptual visual displays. The use of CIM for asset health information sharing addresses how the CIM
can be leveraged in defining both the shared semantic data model and the actual data exchanges
required by the integration layers of the framework. There have been some preliminary results proposed
by various research institutes and consulting firms on this forefront endeavor. Figure 5-10 illustrates a
proposal from EPRI regarding integrating asset health information into the CIM-based EMS data
structure for the gird operators and reliability engineers to consume.
118
119
Figure 5-10: Overview CIM class model for breaker health integration environment
Power transformers and circuit breakers are two of the most important transmission components due
to their high costs and large impacts on system operations if a failure occurs. The diagram below is
presenting rating the CIM extensions for circuit breaker data modeling for grid operations in the EPRI
proposal to include asset health information in the overall network model standard formed by the IEC
61970 and IEC 61968. Figure 5-11 shows the extensions in relation to the framework set up in CIM for
congregation and contextualizing the data needed for power system operations [14]. With the asset
health information being in the same data structure alongside with the SCADA real-time data, the
parameters, and connections of the major grid components, the operators will be better equipped with
the information you need to operate the grid in a more effective, efficient, secure, and reliable way.
120
Figure 5-11: Location of UML diagrams and modifications for the breaker health integration
5.4 ADVANCED DATA INTEGRATION MODELING CASE STUDY

The big data challenges in power grids discussed above, high level and detailed alike, are demanding a
comprehensive and state-of-art data modeling and integration strategy to meet the challenges and offer
solutions to the issues.
Striving to meet the challenges identified previously, Dominion Virginia Power, one of the largest energy
producers and transporters in the U.S., with an asset portfolio of 28,000 megawatts of power generation
and 6,500 miles (10,400 km) of electric transmission lines, is in the process of establishing a data
integration, modelling, and analytics platform that will integrate the operational data with asset
information to operate its grid more reliably and better manage its assets across electric power
networks. Dominion has acquired an on-the-shelf commercial data historian, in conjunction with a
network model management (NMM) application, as the integrated solution to enable such objectives.
The proposed data platform being implemented in DVP for grid and field operations includes all major
datasets that can improve the situational awareness of the operators, engineers, and technicians for
better decision making to operate the grid with greater reliability, stability, and efficiency. Time-series
data from a wide range of sensors and sources are integrated and brought into the data collector (the
new Dominion data historian) by various interfaces and adaptors, including real-time dynamic data such
as SCADA, PMU, and on-line monitors, off-line data such as field testing and lab testing results, file-
based data such as oscillography COMTRADE files, and static data such as equipment ratings. Non-
temporal data such as SAP that stores work orders, asset information, and maintenance records and
geographical information such as ArcGIS are also being accounted for in the data platform. It should be
mentioned that integration and modeling of other data sources will also be considered if deemed
instrumental to the overall operational standard.
Once all the data are consolidated in to new data historian, there central data repository can function
as a strong engine to drive a great deal of applications related to enhance data utilization and situational
awareness of the system. The data model can be derived from the integrated data; analytics can be
developed to improve grid reliability and operational economy; event detection and notification can be
121
configured; visualization can be set up; dynamic asset management can be optimized by data driven
algorithms; and even switching work order can be automatically generated for review if abnormality is
found somewhere in the grid.
Key to the success of data modelling and NMM implementation in this platform is the power network
model management and the integration of asset-related data into the contextualized business
intelligence and situational awareness data hierarchy for grid operations. The goal is to correlate asset
information with the connectivity nodes in the network models for both planning and operations.
Consequently, Dominion is looking to implement some initial use cases to leverage network model and
connectivity information in business intelligence data structure such as dynamic equipment health
assessment for strategic asset management and optimal VAr advisory for the optimization of reactive
power flow.
Traditionally, utilities gather asset health data from online and office line capabilities, as well as historical
SCADA information. The advent of centralized network model management (NMM) capability has
allowed utilities to not only streamline their network model usage across operation, planning, and
engineering but also to make the network connectivity information available for asset management
purposes. Given the business challenges that Dominion and more broadly the industry faces today and
some of the technology investments it has committed to, the asset and network model integration
solution architecture was developed to meet the business challenges. This architecture reflects the
industry best practices around the technology, standards, and requirements of utility asset and network
model management. The following diagram provides an overview of the architecture. It is believed that,
implemented correctly, it will provide the right foundation for Dominion’s short- and long-term asset
management and operational applications needs as business requirements grow and change over time.
Figure 5-12: Asset and network model integrated solution architecture

Specifically, this data flow diagram and solution architecture in Figure 5-12 propose the following key
concepts:
 The data infrastructure being implemented in Dominion will be the main repository for
operational, offline, and field test data with histories. This would allow Dominion to have
access to a variety of operational data from a single source. Sources for this include but are
not limited to: PMU, DFR, OLCM, field tests, and SCADA.
 The integrated data model will provide the asset hierarchies so that users will be able to
navigate to the desired operational data through a visual and searchable asset structure. This
asset hierarchy model will be developed using Dominion’s asset data structure and be
supplemented by IEC CIM asset-related model elements.
 The data historian visualization suite: This is where asset management related analytics can
be developed and used by end users, operators, engineers, technicians, and managers alike.
Self data service business model will be the theme behind this concept.
122
 Network Model Management: This is a standalone application that provides a centralized

environment for network model management and maintenance, leveraging the IEC CIM
connectivity model structure. Vendor applications in this area can import and export a variety
of model file formats including the IEC CIM-based connectivity model. Applications to interact
with this environment will be those of EMS/DMS, planning, and protection.
 Asset Repository: The main purpose of this asset repository (AR) is to store the historical
information about the electrical connectivity information and their relationship to assets. This
information is critical to support the desired use cases for situational awareness and
operational excellence within and beyond the control rooms. There are several options to
implement this asset repository. It can be implemented in the new business intelligence
database server environment as a set of relational tables. It can be implemented as a
standalone database. Or it can be implemented as part of the NMM application server
environment. It also depends on Dominion’s desire for the long-term use of this asset
repository. Asset data from SAP and GIS will be integrated into AR and then made available to
the business intelligence data structure.
 BI Tools: For analytics that go beyond the data in the data historian and the asset repository,
business intelligence (BI) tools can be deployed to support reporting, query, and analysis of
real-time and asset data to establish a picture of the prevailing system operating conditions.
Mixed data from multiple sources, validated network model with mapped asset data, and across-the-
system data semantics: all of those complex data properties are what the current electric utility data
systems typically do not possess to conduct advanced analytics for grid situational awareness. Within
this future-oriented data platform, however, the data integration process addresses those issues and
makes those features available for the advanced analytics, which opens the gate for a great many
leading-edge analytics for a more adaptable, more secure, and more reliable power grid, among which
are 1) advanced/predictive restoration systems, 2) adaptive topology planning, 3) system dynamics
and transients modeling validation, and 4) wide area profiling and system management, to mention
but a few. All of those aforementioned easy wins and advanced grid analytics are all great examples
of opportunities of this new data platform.
5.5 CONCLUSIONS
Currently, the primary sources of data in utilities for grid operations are offered by SCADA gathered
from various sensing devices, phasor measurement units (PMUs) distributed over transmission and
distribution networks, consumption data collected by smart meters, which are deployed at customer
premises, and intelligent electronic device data, which represent the data collected from individual grid
components. In addition to the data directly obtained from the electricity infrastructure, utilities may
collect data from other resources to facilitate system studies such as weather data, geographic
information system (GIS) data, manufacturer data, electricity market data, and others.
Due to the complexity of the power grids, the high volume, large variety, and fast velocity characteristics
of the utility space big data, and lack of strategic methodology and plan to alleviate or eliminate the
issues arising from it, the electric utilities are consistently confronted with day-to-day difficulties list below
in the operations of their grid and the equipment within it.
1) Data silos
2) No semantics layer on top of the data
3) Lack of cross system integration
4) Not all relevant data is shared
5) Difficult to share data and models
6) Excessive time used to validate data/models, not running studies
7) Data accuracy and inconsistency
8) Common data not in sync and up to date
9) Impossible to propagate data change to all pertinent data destinations
To cope with these challenges, it is first and foremost important for the electricity utility companies to
properly integrate and model the data from the variety of data sources that we are relying on to carry
out the daily operations of the grid in a stable, dependable, reliable, and more efficient fashion. As we
described in Section 5.2, utilities are moving toward data description from international standards such
as IEC 61850, CIM, or COSEM. Standards are likely to have evolved and others may emerge over the
next few years, but the important thing is to continue the efforts. Indeed, the normative effort is crucial
123
because it makes it possible to improve the opportunities for interoperability of electrical systems, which
are increasingly interconnected both between countries and between upstream and downstream with
the arrival of new uses.
5.6 REFERENCES
[1]. PJM Operation Support Division, "PJM Manual 3A: Energy Management System (EMS) Model
Updates and Quality Assurance (QA)," August 25, 2016.
[2]. "IEEE Standard for Calculating the Current-Temperature Relationship of Bare Overhead
Conductors," IEEE Std 738-2012 (Revision of IEEE Std 738-2006 - Incorporates IEEE Std 738-
2012 Cor 1-2013), vol. no, pp. 1-72, 23 Dec 2013.
[3]. U.S. Department of Energy, Electricity Delivery & Energy Reliability, "Dynamic Line Rating
Systems for Transmission Lines," Smart Grid Demonstration Program, April 25, 2014.
[4]. Power Systems Engineering Research Center, "The Next Generation Energy Management
System Design," Sept 2013.
[5]. J. Giri, M. Parashar, J. Trehern and V. Madani, "The Situation Room: Control Center Analytics
for Enhanced Situational Awareness," IEEE Power and Energy Magazine, vol. 10, no. 5, pp. 24-
39, Sept.-Oct. 2012.
[6]. EPRI CIM Primer 3rd edition Technical Report – 2015
[7]. IEC 61850-7-1 : Basic communication structure – Principles and models – 2011
[8]. DLMS WEBSITE : http://dlms.com/index2.php
[9]. V. Madani, et al., "Advanced EMS Applications Using Synchrophasor Systems for Grid
Operation," T&D Conference and Exposition, 2014 IEEE PES, Pages: 1 – 5 DOL
10.1109/TDC.2014.6863246.
[10]. Jampala, et al., "Practical Challenges of Integrating Synchrophasor Applications into an EMS,"
2013 IEEE PES Innovative Smart Grid Technologies Conference (ISGT), Pages: 1 - 6, DOI:
10.1109/ISGT.2013.6497847.
[11]. F. Albuyeh, "Integrating Variable Renewable Generation in Utility Operations," Power and
Energy Society General Meeting, 2010 IEEE, DOL10.1109/PES.2010.5590118.
[12]. Integration of Asset Information into Control Centers: Prioritization of Asset Information and
Concept Development. EPRI, Palo Alto, CA: 2012. 1024257.
[13]. Integration of Equipment Condition Information into Control Center Operations: Survey on
Equipment Condition Information for Transmission Operators. EPRI, Palo Alto, CA: 2014.
3002004614.
[14]. “Standard Based Integration Specification, Common Information Model Framework for Asset
Health Data Exchange”, EPRI 2014 Technical Update
124
6. DATA QUALITY AND VALIDATION

6.1 INTRODUCTION
In the modern digitized world, most advanced industrial operations depend on information systems for
control and analysis. Data is increasingly being considered a valuable asset, of equal worth to physical
assets, and considerable costs are involved in collecting, storing, and acting upon the data. As with
physical assets, the quality of the data is a prerequisite for ensuring reliable operations. Additionally,
high data quality must be ensured to enable reuse of data and to enable analytics on historical data.
Information and analytics-driven organizations, with no traditional physical operational commitment,
rely solely on high-quality data to stay competitive.
Data value chains are common in industry, production, and business operations. Data is born, follows
a value chain, and is then refined and prepared for several different tasks. Thus, the user or system
utilizing the data does not necessarily have knowledge of the data origin, quality level, weaknesses,
legal or contractual obligations, semantics, changes in the system capturing data, and the context in
which the data was born. In order to ensure both reliable operations and valid analytics, it is
important that the data quality is assessed and continuously monitored for all critical systems and
services. Organizations should define data quality policies, and processes should be in place to
support these policies. The requirements for data quality and dataset definitions must be clearly
stated, and measurement points should be implemented in order to verify compliance with
requirements. Ideally, the measures should be in effect across the entire organization to ensure
optimization and to avoid data quality assessment being performed in silos.
The analytics applications and associated visualizations described in previous sections will provide
reliable and useful actionable information to system operators as long as the input data they are fed
with maintain high-level of quality. Hence, it is critical that data quality of all types of data used in
operator support applications is assessed and continuously monitored.
With the exceptional growth of data from sensors, intelligent electronic devices, and other sources in
these analytics applications data quality has become a prevalent aspect. Indeed, issues such as empty
value, redundancy, inconsistency, and inaccuracy have been increasingly detected in data from those
sources. These data problems hinder successful implementation and deployment of situational
awareness applications, as they have a direct impact on the accuracy and validity of the analysis.
There is not a universal definition of data quality as applied to power system applications. One
commonly used data quality concept derived from the ISO 9000:2015 standard states that data
quality measures the degree to which a set of characteristics of data fulfils requirements. Examples of
characteristics are: completeness, validity, accuracy, consistency, availability, and timeliness.
Requirements are defined as the need or expectation that is stated, generally implied, or obligatory.
Therefore, per this broad concept, essentially any aspect of data that bears on its ability to satisfy a
given purpose falls under the umbrella of data quality [1, 2].
Establishing good practices to maintain data quality is an institutional issue. Organizations need to
define and put in place data quality policies and processes to ensure data quality requirements are
compiled at various levels. Ideally, the measures should be in effect across the entire organization to
ensure optimization and to avoid data quality assessment being performed in silos. Different
frameworks, methodologies, and approaches for data quality assessment and improvement have been
developed.
This report presents a general discussion on data quality issues in power system operations and
describes methodologies for data quality assessment and improvement methods. Consistent with the
overall approach of this technical brochure, references for detailed information are provided for the
interested readers.
6.2 DATA QUALITY PROBLEMS
Power data has the characteristics of large amount and many types. They are mainly derived from the
control system, the production system, and the management system, such as data monitoring
information, smart meter collection, device maintenance information, SCADA, Internet of Things (IoT),
and energy management systems.
125
This data is used for supporting different kinds of application topics, such as sensing status, load
forecasting, and user behavior analysis. Besides, many enterprise management systems (ERPs) are
established to record and produce enterprise data such as financial and human resources data.
However, in all the data mentioned above, there is a variety of data quality problems that may impact
data analysis. The main manifestations are data incompleteness, data inaccuracy, data non-
normative, data inconsistency, and other aspects [5-7]. For example, the following are some common
data quality problems found in smart meters:
 Incompleteness: The smart meter needs to collect multi-point data every day, including
positive and negative active data, reactive power data, three-phase voltage data, etc.
However, some data collection points are missing.
 Inaccuracy: The equipment operation time is not accurate. In particular, the operation time
information is not updated after the transmission line is replaced or broken. The customer
contact information (telephone number and address) is not accurate. The information cannot
be updated immediately after some changes.
 Non-normative: Equipment manufacturer data is not standardized; multiple names may exist
for the same manufacturer.
 Inconsistency: The inherent connection association from transformer substation to line,
to transformer, to substation area, to key users is not consistent by data level.
 Non-uniqueness: The equipment master data is maintained by multiple sources in the
infrastructure, materials, production, and dispatch systems. For example, one equipment may
have different names and encodings.
Another example of data quality problems is related to a material system. This mainly focuses on the
non-standard data entry, incompleteness, and duplicate data entry:
 Incompleteness: Kinds of fields such as material number, start time, production time, etc. are
empty.
 Inaccuracy: The contract amount is less than 0, or more than 10 billion.
 Non-normative: The encoding method does not conform to the specification. The data type is
not standardized.
 Inconsistency: The same information such as line loss rates may be different in the statistics
of the financial, planning, and operational system.
 Non-uniqueness: The same information may be entered in multiple systems.
As a result of power systems being interconnected and networked, data inconsistency problems will
be there. Due to personnel negligence, database failure, communication interruption, and other
reasons, data association may be missed or mismatched, and data quality problems, such as data loss
and data error, will be caused. Enterprise information level differences, data model differences, and
other reasons will also cause a data interoperability problem. These data quality problems will have
direct impacts on the results of subsequent data-analytics applications in a production operation.
Therefore, it is necessary to focus on improving power data quality.
6.3 DATA QUALITY ASSESSMENT
Data quality assessments are typically performed both bottom-up and top-down. The bottom-up
approach utilizes profiling tools and schema inspections to perform a generic and usage-agnostic
assessment. The bottom-up approach will reveal indicators of potential areas of data inconsistency.
However, due to the generic nature of the method, this approach is also prone to detect false
positives. The top-down approach will involve domain experts and actual usage scenarios to detect
inconsistencies. Although this method does not typically result in false positives, it will not lend itself
easily to automation and hence might not be as conclusive as the bottom-up approach. Normally, the
bottom-up assessment provides valuable input to the top-down assessment, and hence both are
required to perform an exhaustive evaluation.
Performing an initial iteration is recommended in order to validate input data flow, map data paths
and all transformations, identify enhancements and refinements on data, collect and use metadata
126
and schemas involved, and document this as part of the data quality assessment. The next iteration
will detect any relationships between data feeds. These could include: (i) a sensor measuring power
supply to a pump can be correlated to sensor data measuring performance of a pump, or (ii) starting
an engine, resulting in a rise in engine oil temperature being detected by sensors. Subsequent
iterations can assess whether the data quality is sufficient for the algorithms using them.
If simulations and/or digital twins are used, these models should also be quality assured. For this
purpose a digital twin can be viewed as a dataset in the context of the data quality method. The first
assessment provides a baseline for measurements and the improvement cycle. ISO 8000-8 defines
three categories for data quality measurements: syntactic, semantic, and pragmatic. Information and
data quality are defined and measured according to these categories.
For organizations with well-defined requirements, the assessment will tend towards that of the
assessment model for ISO 8000-8. In this case, the data quality dimensions are categorized according
to ISO 8000-8, and the appropriate methods are employed:
 Automatic syntax and integrity checks for syntactic quality.
 Correlation with reference models and sampling techniques for semantic quality.
 Algorithm sensibility for data quality issues, user feedback, and focus groups for pragmatic
quality.
As described above, data quality can generally be understood as “the extent to which a set of intrinsic
properties of data meets the requirements.” The intrinsic properties can be decomposed into five
dimensions:
 Timeliness: the extent to which data generation and transfer meet the requirements of
management and usage.
 Integrality: the extent to which the data has or maintains its intrinsic information.
 Compliance: the extent to which the type, format, dimension, and accuracy of the data meet
the normative design.
 Accuracy: the degree to which the data truly reflects the actual information.
 Consistency: the data association obtained by different approach.
Different methods are applied to assess data quality problems in the various levels of the logic
hierarchy model depicted in Figure 6-1. For example, information matching method is usually adopted
to discover data quality problems in control layer; data analysis method and rule checking method can
be used for the operation layer, management layer and analysis layer. Some of these methods are
briefly described below.
6.3.1 DATA INTERPOLATION
In power transmission and distribution systems, measurement of electrical magnitudes is generally

redundant due to the coexistence of various monitoring systems, such as SCADA system, PMU
systems, power quality monitoring systems, energy metering systems, and so on. The SCADA system
is a system with relatively complete monitoring points and relatively high measurement frequency,
while PMU based systems have relatively small number of monitoring points but a high measurement
accuracy. When the same magnitude (e.g. voltage) is measured in both SCADA and PMU at the same
time, the SCADA voltage data (mainly voltage amplitude) can be compared to voltage data measured
by PMU. When data from the two systems do not match, data interpolation can be used to match the
SCADA data and PMU data.
For measuring and checking voltage data, the following two methods can be used:
During the operation of power system, if the load does not change abrupt, for the sampling interval of
SCADA system, the voltage amplitude at a certain time is usually related to its adjacent last time or
next time. Therefore, the voltage amplitude of a certain time can be detected by comparing the
amplitude of the voltage at the adjacent time.
The voltage amplitude of a node can be obtained by calculating the measurement information and line
parameters of multiple nodes with topological relevance (i.e. different locations or different space). It
127
can be used to detect and repair the corresponding voltage data of SCADA system if the calculation of
the voltage amplitude is accurate.
6.3.2 DATA PROFILING
Data profiling is the systematic analysis process of data structure, data content, and data relationship
[11-13]. Through this method, it is possible to empirically examine the potential problems of data.
Figure 6-1 shows the implementation of data analysis.
table profile table description
data structure
profile minimum length
column profile
maximum length
data value
Integrality
distribution
data content data pattern

data profiling column profile Effectiveness
profile distribution
placeholder
format Consistency
foreign key
analysis
cross-table profile
correlation
data relationship analysis
profile
dependency
table profile
conflicts
Figure 6-1: Implementation of data analysis
Through data profiling methods, the abnormal value that can be identified as follows:
 High frequency value: its frequency is greater than the expected value.
 Rare value: its frequency is lower than the expected value.
 Complete value: the empty value is higher than the expected number or percentage.
 Frequent pattern: its frequency is larger than expected pattern.
 Rare pattern: its frequency is lower than the expected pattern.
 Value cardinality problem: the number of different values in columns is higher or lower than
expected.
 Accident value: value that does not conform to the defined range constraint.
 Defaults: high frequency value or the empty value as the default value.
 Orphan record: a record that has a foreign key but does not match the main key.
 Mapping problems: the consistency of the values between columns in a single table or cross
table does not conform to expectations.
 Duplicate records.
 Association relation: association relation does not follow the defined mapping expectation (for
example, a primary key record is mapped to more than one foreign key record, but
association relation requires one to one mapping).
128
Based on data profiling, validation rules can be set up while locating data problems, and the quality
assessment model can be used to conduct comprehensive quality diagnosis for the analysis dataset.
6.3.3 DATA QUALITY ASSESSMENT FRAMEWORK
Except data profiling results, data quality must need electrical business logic knowledge (such as
charge of electricity requirement, relation between meter and measuring point) to determine whether
the results are correct. So, based on data profiling results and business logic knowledge, we can
analyze and evaluate the data quality by tools or program codes to provide better data quality
analyzing basis. A data quality assessment framework can be developed from timeliness, integrality,
compliance, accuracy, and consistency dimensions [6, 13, 15]. Each dimension can set up several
rules to be described, which is shown in the figure below. Some rules are given as follows in detail.
Data Timeliness Rule
Timeliness Record Integrity Rule
Non-Blank Rule
Primary Key Rule

Integrality
Foreign Key Rule
Type Rule
Format Rule
Data
Quality
Compliance Dimensional Rule
Assessment
Framework
Precision Rule
Range Rule
Accuracy Equivalent Function Dependence Rule
Logic Function Dependence Rule
Code Rule
Consistency Equivalent Consistency Rule
Logical Consistency Rule
Figure 6-2: Framework of Data Quality Assessment
 Data timeliness rule: the generation and circulation of data should meet (or meet certain
conditions) timeliness requirements of management and use.
 Record integrity rule: the number of centralized records of the test data should (or meet
certain conditions) meet the business expectations.
 Non-Blank Input rule: the tested data of the dataset should (or meet certain conditions) be
not null.
 Primary key rule: when a field of the tested dataset is the primary key, the value of the data
should uniquely identify a record.
 Foreign key rule: when a field of the tested dataset is the foreign key, that field should (or
meet certain conditions) reference the primary key of another data table.
129
 Type rule: the data type of the tested dataset should (or meet certain conditions) meet the
field type requirements predefined by the business system.
 Format rule: the format of the tested data in the dataset should (or meet certain conditions)
meet the field format requirements predefined by the business system.
 Dimension rule: the dimension of data should (or meet certain conditions) meet the
dimensional requirements predefined by the business system.
 Precision rule: the precision of numerical data should (or meet certain conditions) meet the
precision requirements predefined by the business system.
 Value-domain rule: data values of the tested dataset should (or meet certain conditions) occur
within a certain range, and the range can be determined by one or more means such as the
data dictionary, business knowledge, distribution, and variation of historical data.
 Equivalent function rule: in the same data table, one data should (or meet certain conditions)
be calculated from another one or more data, and such equivalence calculation relationship
must to be in line with the business characteristics.
 Logical function dependence rule: in the same data table, one data should (or meet certain
conditions) meet some kind of logical relationship (greater than, less than, earlier than, later
than, etc.) with another one or more data, and this logical relationship must be in line with
the business characteristics.
 Code rule: the value of the data from the tested dataset should (or meet certain conditions)
conform to the constraints of the source business system’s design.
 Equivalency consistency dependence rule: in different data tables, one data should (or meet
certain conditions) be calculated from one or more data from other data tables, and such
equivalence calculation relationship must to be in line with the business characteristics.
 Logical consistency dependence rule: in different data tables, one data (or meet certain
conditions) should satisfies some logical relationship (greater than, less than, earlier than,
later than, etc.). With one or more data from other data tables, this logical relationship needs
to be in line with the business characteristics.
6.4 DATA QUALITY PROBLEM CORRECTION
In data analysis and inspection process, many data quality problems would be revealed. Having
identified the data quality problems that may adversely impact the reliability and validity of analytics
applications, the next step is to design and implement a measure to solve the issues found [16, 17].
Figure 6-3 shows the main steps to correct data quality problems. The task involved in each of these
steps are described next.
Impact
assessment
Monitoring Correction
and and
Prevention cleaning
Scavenging
of essential
causes
Figure 6-3: Concept to correct data quality problems
130
6.4.1 IMPACT ASSESSMENT
At first, consider the following characteristics of the data quality problem [15, 18]:
 Scope of the influence. It identifies the extent to which business processes is impaired by the
problem
 Feasibility of correction. That is, the possibility to correct the data quality problem in question.
 Feasibility of prevention. That is, the possibility of eliminating the root cause of the problem or
identifying problems through continuous monitoring.
6.4.2 CORRECTION AND CLEANING
The correct-and-clean process almost exists in all stages of data collection and storage, integration,
analysis, and application. Figure 6-4 shows the entire flow of power system data from collection to
application. In different stages of the data correction and clean, the methods and emphases are
focused differently [6, 15].
Figure 6-4: Data flow and problem correction control points
a. Data collection process
It is usually based on multi-source information from different sampling time and different monitoring
sources with related topology nodes, as well as business common sense to do the abnormality
judgment and deviation correction [18].
b. ETL process
Data is not perfect. There is a gap between the raw data and the final result. It usually needs to be
cleaned, converted, and sorted by the ETL (extract the transformation load). ETL includes three main
links. The first one is data extraction. It implies reading data from original business systems, which
could be different operating platforms or different databases. The second is data conversion. This
process entails converting data under pre-defined set of rules, including the operation of fields merge
and split, sorting, default value assignment, data aggregation, and so on. The third link is data load,
during which the conversion data is loaded into the data warehouse [15, 19].
In the data conversion process, operations for resolving data quality problems include:
 Data integrality check and incomplete data filling
 Incorrect data check and repair
 Duplicate data inspection and handling
131
 Inconsistent data conversion

 Data granularity transformation
 Calculation based on business rules
 Data desensitization
The data quality problem correction operation is an iterative process. Whether to perform the data
extraction from the source business system to the ODS or the data warehouse, as well as whether to
perform the repair operation on the data that does not meet the data quality rule, needs the
managers of original business system and the data center to confirm.
c. Data analysis and application
In the ETL process, the data has been cleaned, converted, and collated. Then, they will be stored in
the data warehouse. However, the ETL operations are not enough for the data quality requirement in
the following data analysis topics or BI (business intelligence) analysis topics. There may still be some
data quality problems in the different application topics, such as data mismatching and logical error.
These problems should be corrected to satisfy the power business requirements. Thus, data quality
problems should be differently considered during the different phases of power data lifecycle, which is
shown in Figure 6-5 [6, 20-22].
Figure 6-5: Data cleaning in data analysis and application phase

Except the ETL process, the cleaning modes of data analysis/BI analysis in the above figure include
but are not limited to the following:
 Replace standard content

 Uniform field format and content
 Null field assignment
 Special character substitution
 Multi-column logic operation or splicing
 Duplicate records remove
 Regular information extraction
132
 Rich data information

 Replacement with regular expression
 Parse special format data
 Address standardization
 Model-based continuous numerical value filling
When solving data quality problems in data warehouse and big data platforms, many challenges will
be faced, like the amount of data is huge and the data types are diverse. It is recommended that run
the batch, schedulable and rapid cleaning algorithms on distributed computing platform.
6.4.3 SCAVENGING OF ESSENTIAL CAUSES
In order to substantially solve the quality problem, it is necessary to analyze where the problem
comes from, and the best place to repair and eliminate the root causes. If the sources and the best
place can be identified, it is possible to assess and correct the process to eliminate the essential
causes of the quality problem.
The possible essential causes include:
 Poor tunnel conditions
 Arguments setting errors
 System running failures
 Extraction and conversion errors
 Terminal failures
 Human factors
 Software systems lack verifications
Because the sources of data are diverse, models of data are large, and the conversion processes are
complex. To identify data sources and the best place, the metadata-management-based data source
tracking technology can actually improve the performance.
Assessing and eliminating the quality problem causes can be considered from the following points:
 Assess the workload of every candidate program.
 Choose one as the repair program.
 Determine the repair time.
 Design the development plan.
 Design the test plan.
6.4.4 MONITORING AND PREVENTION
If the workload of eliminating the above root causes goes beyond the organization’s capabilities,
resources, or requirements, then monitoring procedures should be established for known data quality
issues. When an error occurs to the monitoring procedures, the appropriate person can be notified to
take appropriate action to delay or terminate the error until data processing continues normally.
6.5 CONCLUSIONS
Data quality hasn’t been fully considered in the initial design phase of system function development of
power control systems and enterprise management systems. In the process of data integration,
analysis, and application, a series of problems such as data inconsistency, inaccuracy, and
incompleteness must be faced and handled. However, these problems are extremely important for the
system to achieve the desired effect in the applications of data analytics in system operations.
For data quality measurement, kinds of methods including multi-source information matching, data
analyzing, and rule testing can be used. For data quality correction, a set of technologies—including
133
issue influence analysis, correction, and cleaning; scavenging of essential causes; monitoring and
prevention—is proven feasible to improve data quality, which can satisfy the needs of data analysis
and application.
DNV.GL proposed that the data quality assessment and improvement process include defining the
scope, data exploration and profiling, data quality assessment, organizational maturity assessment,
data quality risk assessment, and risk-based data quality improvement [6]. These measures have a
positive effect on improving the enterprise’s data quality, maturity, and risk management.
6.6 REFERENCES
[1]. ISO 8000-8 Information and data quality: Concepts and measuring
[2]. ISO 9000 Quality management
[3]. Data quality assessment framework, DNVGL-RP-0497, Jan. 2017
[4]. Yang et al. Journey to data quality, MIT Press 2006
[5]. X. Chen, et al., Integration of IoT with smart grid, IET ICCTA 2011, Beijing, China, 2011.
[6]. T. Zhao, et al. Data quality assessment and improvement techniques for power system, SGCC
Technical Report, 2017.
[7]. G. Liu, et al., Evolving graph based power system EMS real time analysis framework", IEEE
ISCAS 2018, Italy, 2018
[8]. H. Hang, Q.-L. Zhu, Development analysis and prospect of data quality control in smart grid,
Science & Technology Information, pp. 92-93, Jul. 2012
[9]. K.-Y. Liu, et al, Detection and evaluation of SCADA voltage data quality in distribution network
based on multi temporal and spatial information of multi data sources, Power System
Technology, pp. 3169-3175, Nov. 2015
[10]. NASPI, PMU Data Quality: A framework for the attributes of PMU data quality and a
methodology for examining data quality impacts to Synchrophasor applications, Mar. 2017
[11]. DAMA United Kingdom, The six primary dimensions for data quality assessment, Report, Oct.
2013
[12]. Sadiq, Shazia, Handbook of data quality: research and practice, Springer, 2013
[13]. C. Batini, and M. Scannapieco Data and information quality: Dimensions, principles and
techniques, Springer, 2016
[14]. S. Keller, et al., The evolution of data quality: understanding the transdisciplinary origins of data
quality concepts and approaches, Annual Review of Statistics and Its Application, vol. 4, pp 85-
108, 2017
[15]. Q/GDW 11570-2016, The common criteria of data quality evaluation based on power grid
operation data, Enterprise Standard of SGCC, 2016
[16]. D. Loshin, The practitioner's guide to data quality improvement, 2011.
[17]. H. Liu, et al, Research on the advanced computing method for supporting large data quality
assessment and improvement, Advances in Computer Science Research, Jan. 2017
[18]. Y.-W. Cheah, and B. Plale, Provenance quality assessment methodology and framework,
Journal of Data and Information Quality, vol. 5 (3), Feb. 2015
[19]. X. Chen, N. Li, F. Wu, and X. Li, Research on hierarchical information aggregation technology in
the smart grid Internet of Things, Telecommunications for Electric Power System, vol.32 (230),
pp.73-77, Dec. 2011
[20]. ISO 8000-110-2009-Data quality - part 110: Master data: exchange of characteristic data:
syntax, semantic encoding, and conformance to data specification.
[21]. K. Xing, et al, Mutual privacy-preserving regression modeling in participatory sensing, IEEE
INFCOM 2013, Turin, Italy, Apr. 2013
[22]. W.H. Inmon Dan Linstedt, Data Architecture: A Primer for the Data Scientist: Big Data, Data
Warehouse and Data Vault, Morgan Kaufmann, Nov. 2014
134
7. CONCLUSION
With increasing complexity and interconnectivity of the grid, the scope and complexity of maintaining
and increasing situational awareness have grown. As a consequence, there is the need to furnish
system operators and operation engineers with better tools and visualizations for assessing system
conditions, and for providing effective and timely decision-making and remedial reactions to an
incident. It is not enough to just understand the current state. Situational awareness implies also the
ability to foresee and anticipate system changes and their impact on system security.
The large variety of internal and external data sources that are available today to electric utilities
make it possible the implementation of advanced data analytics and visualization technologies to
improve the way the system is operated and controlled. Analytics algorithms capable of synthesizing
actionable information from the raw data can be used to provide tools that use real-time data streams
to support fast, accurate, and adaptable decisions for solving critical problems in the right moment, as
well as plan in advance mitigation actions to anticipated system security issues.
Even though the use of data analytics for power system operation support is not new, its widespread
use remains low. Hence, there is need to examine how advanced data analytics technologies can be
further used to solve the emerging critical challenges in electric systems operations.
This technical brochure provides an insight into how advanced data-analytics techniques and tools
that integrate various data sources can be used to improve situational awareness of power system
operators and support various operation functions. The content of this technical brochure is broken
down into the major areas comprising the development and implementation of data analytics tools,
which are: data and information sources, data analytic techniques to interpret these data, applications
of these analytics in system operations, data integration and modelling to integrate data into
operations, and data quality and validation.
Some relevant observations and takeaway from this work are as follows:
 Utilities have started to realize the value and benefits of data analytics tools that integrate data
from various data sources. Several software tools have been developed to serve a variety of
functions to support system operation, including tools for system events detection, faults
identification and analysis, wide-area monitoring, equipment health monitoring and analysis,
trending and forecast of load, renewables and system conditions, and recommendation for
operation. There is a recognized growing need to improve situational awareness for system
operators, as it is also revealed in our industry survey. Advanced data management and analytics
can help fill this need. One of the challenges for successful implementation of such tools is the
difficulty to integrate data that is collected and resided in different enterprise systems. Hence,
effective implementation of advanced analytics tools greatly depends on operational data-
management policies and technologies. Proper data models allow the definitions and
characteristics of the data to be clearly understood. Even though significant advances have been
made to improve data interoperability, more effective and accurate data models and procedures
are needed for ensuring data integrity and availability of the right data in the right format.
 Another aspect that may hinder the implementation of data analytics solutions in system
operation is the lack of understanding of the value and accuracy of these technologies.
Traditionally, most of the tools used in control centers and operation engineering are based on
system models and simulations. Engineers understand the capabilities and limitations of those
tools, as well as the considerations that need to observed to develop and validate the simulation
models. Data-analytics techniques that attempt to recognize and validate data patterns and trends
and draw conclusions therefrom may be less understood. Advantages of both approaches—model-
based and data-based methods—can be combined in hybrid methodologies to developed superior
technical approaches and software tools for use in system control rooms and to support various
operation functions. Those tools will combine conventional analytics techniques based on physical
models with heuristic data analytics and decision-making methodologies. For instance, simulations
engines would perform contingency analysis across a number of scenarios, which in turns will be
built with the help of data collected and integrated from a variety of sources. Data-analytics
techniques will then be used to extract relevant patterns from the simulation results, assess
vulnerability/risk, and classified critical conditions based on given risk criteria.
135
 Effective use of these data sources in operation support tools relies upon the real-time exchange
of this data through a high-performance, reliable, secure, and scalable communication network
infrastructure. The new trend of integrated network architecture that is happening with the
expanding smart grids investments will enable effective and reliable use of analytics tools that
integrate various real-time data streams.
 It’s widely recognized that visual analytics is key is to improve operator ability to understand the
system situation and make effective decisions. Visualization technologies and techniques have
advanced significantly since first developments in the early 1980s. Best practice from within the
visualization industry and learning from other data intensive areas should be used. Newer
visualization platforms include many advanced futures such as geographic-based dynamic
visualization with user-friendly interfaces and real time measurements and analytical results from
measurement-based and model-based tools that populate the system map. Visual aids strategies
such as color contouring, 2D and 3D bubbles and cones, animation, geospatial representation,
display profiles, and integrated system views are widely used in newer visualization tools. The
most significant trend in new visualization is the integrated space-time concept, which is intended
to help operators to assess current situations in a static fashion and to understand and visualize
the conditions the system is evolving into, to get better prepared to implement effective control
actions.
 There is a fundamental need to understand the challenges and requirements that system
operators are experiencing in terms of the main goals that drive their actions. Operators may
need to reconcile multiple objectives and answer questions that may sound conflicting, such as:
o Is my goal purely economic?
o Is my goal purely focused on maintaining system security at any cost?
It is not possible to utilize and deliver effective data-analysis tools and associated visualization
without challenging these two key drivers.
Also, another paradigm shift in terms of the approach for presenting data and information to the
operator is required. The common current approach is “show the user lots of data,” but the
rationale behind such simplistic strategy has never been clear. Perhaps the concept is that by
providing the user all the data available there is less chance to filter out important information, or
possibly because analytics tools used so far to process raw data have not provided successful
results. Regardless of the cause, it is clear that the current approach will not improve situational
awareness of system operators. This shift in thinking needs to be mainly based on timescales:
o What can we tell from historic data? (have we been here before?)
o Now and near now
o Future (plus how long into the future do you need to understand)
 Real time operations need to move away from a simplistic deterministic approach towards a
decision-making process based on probabilistic scenario/contingency models, and by harnessing
the insights provided by effective data analytics combined with advanced visualization.
Currently, there is no single data-analytics solution for supporting operator decision-making
that will fit all possible scenarios and requirements of modern power systems, but there is a
potential for significant improvement as the data-analytics technologies continue to evolve.
We expect that this technical brochure will be a positive step towards enabling readers to
understand such a potential and the complexities involved in development and
implementation process.
136

Design Engineer

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Design Engineer

Transféré par

Droits d'auteur :

Formats disponibles

732

ADVANCED UTILITY DATA MANAGEMENT

JOINT WORKING GROUP

A. DEL ROSSO, Convenor US T. BORST, Secretary NL

Summary of Relevant Conclusions and Takeaway

Barriers and Needs for Research and Development

1. INTRODUCTION AND BACKGROUND...................................................................................... 11

2. DATA SOURCES IN ELECTRIC POWER SYSTEMS .................................................................... 21

3. DATA-ANALYTICS TECHNIQUES ................................................................................................ 37

3.3.2 Linear regression ................................................................................................................................................. 41

4. APPLICATIONS OF DATA ANALYTICS IN SYSTEM OPERATIONS ........................................ 61

5. DATA INTEGRATION AND MODELING .................................................................................. 103

5.5 CONCLUSIONS .......................................................................................................................................................... 123

6. DATA QUALITY AND VALIDATION ......................................................................................... 125

7. CONCLUSION ............................................................................................................................. 135

FIGURES AND ILLUSTRATIONS

Figure 4-12: Examples for geopatial network diagrams ................................................................. 69

Table 2-4: General requirement of communication in power system............................................... 32

1. INTRODUCTION AND BACKGROUND

The main aspects associated in these areas can be summarize as follows:

1.4 SITUATIONAL AWARENESS

Figure 1-2: levels of situational awareness

Figure 1-4: Relationship between analytics and visualization complexity

2. DATA SOURCES IN ELECTRIC POWER SYSTEMS

 Equipment indications/alarms at the time of disturbance.

2.2.1 Circuit breaker

Table 2-1: Monitored parameters of circuit breakers

Contact wear (switch operations)

Function of cabinet, mechanism, and tank heaters

Insulating oil dielectric strength

Close time and velocity

Table 2-3: Different sensors and output data

Insulation, overheated oil, system leaks, over-pressurization,

Thermal analysis Heat can indicate multiple faults.

2.2.3 Distributed generation (solar and wind)

Table 2.5: Wind turbine sensors and applications

Sensor Application on Wind Turbine

Figure 2-1: Basic structure of a battery energy storage system

Storage Management System (SMS)

Battery Management System (BMS)

 Disconnection of batteries from inverters in fault cases

Storage Control Unit (SCU)

2.2.5 New sensors for equipment monitoring

2.3 NON-ELECTRICAL DATA SOURCES (EXTERNAL DATA)

In summary, power system communications require the following:

Substation Bus Network 10–20 Mbps < 8ms High

Figure 2-2: Requirements of a smart grid network

Table 2-10: Network requirements for smart grid applications

Application Data Rate/Volume Latency Allowance (One-Way) Reliability

Corporate Data Medium/Low Medium Medium

Table 2-12: Communication technology options

To address needs for connecting of millions of end points.

3.1 DATA MINING AND ASSOCIATION RULES

Figure 3-1: k-NN classification of abnormal PMU data

3.3 MACHINE LEARNING

𝑗=1 𝛽𝑗 𝑥𝑖,𝑗 + 𝜖, (1)

3.3.2.2 Application domains

Figure 3-3: A simple DT model for detecting faults in a transmission line

3.3.3.3 Application domains

Figure 3-4: ANN1 schematic diagram of a feed-forward NN

Figure 3-5: ANN2 information processing in ANN

Figure 3-6: A toy example of a linearly separable problem

𝐷𝑒𝑢𝑐𝑙 (𝑐𝑘 ) = √∑ (𝒙𝑖 − 𝝁𝑘 ) ,

Then the K-means objective function is defined as: