Vous êtes sur la page 1sur 7

Alarm Management

By Nick Sands

Topic Highlights
Alarm System Practices
Alarm Philosophy
Rationalization
Design
Training
Monitoring
Management of Change
Alarm System Problems
Nuisance Alarms
Stale Alarms
Alarm Floods
Alarm Clarity
Alarms for Safety

18.1 Introduction
The term alarm management refers to processes and practices for determining, documenting,
designing, monitoring, and maintaining alarm messages from process automation and safety systems.
Alarm system performance issues have contributed to many significant incidents in the process indus-
tries, with an estimated cost over $13B USD each year in the U.S. alone [Ref. 1].

The issues with alarm systems are well known, as are the practices to address those issues. Practices
will be discussed first, followed by the main issues of alarm management and the application of the
practices to those issues. The last section mentions the limitations of alarms for risk reduction.

18.2 Alarm System Practices


The following practices are often cited as essential steps to improve alarm system performance: alarm
philosophy, rationalization, design, training, monitoring, and management of change.

18.2.1 Alarm Philosophy


The foundation of an alarm management system is the development of an alarm philosophya docu-
ment that establishes the principles and procedures to consistently manage an alarm system over time.
The philosophy does not specify the details of any one alarm, but defines each of the key processes
used to manage alarm systems: rationalization, design, training, monitoring, and management of
229
230 RELIABILITY, SAFETY AND ELECTRICAL IV

change. Alarm system improvement projects can be implemented without a philosophy, but the sys-
tems tend to drift back toward the previous performance. Maintaining an effective alarm system
requires the discipline to follow these practices.

The philosophy begins with the basic definitions and extends those to operational definitions with the
principles of the alarm system. The philosophy should define such things as the number of levels of
alarm, the types of alarms allowed, and the assigned alarm priorities.

Alarm: An audible or visible means of indicating to the plant operator an equipment or process mal-
function of abnormal condition [Ref. 2].

The following are examples of principles:

Each alarm must have a defined operator action.

Each alarm must be rationalized prior to installation.

Each alarm will be designed in accordance with site guidelines.

Operator training is required for each alarm prior to installation.

Each safety related alarm must be tested prior to start-up and, thereafter, at an explicitly
documented frequency.

Alarm system performance must be monitored on a daily basis and corrective action taken
when performance limits are not met.

All additions, modifications, and deletions of alarms must follow a management of change
procedure.

Principles like these are critical to an alarm philosophy. They provide the standards against which all
potential alarms are tested. A well-defined set of principles will yield a consistent and useful set of
alarms.

18.2.2 Rationalization
Rationalization is the process of examining one alarm at a time against the principles and criteria
defined in the alarm philosophy. The product of rationalization is a set of consistent, well-documented
alarms. The documentation supports both the design process and operator training.

Rationalization begins with identifying the signal, the rationale for the alarm and the associated
action. If the alarm is consistent with the philosophy, it is prioritized based on consequences and
response time. Any further requirements for the alarm design are captured as well.

The alarm philosophy will capture information for each alarm, such as the basic control system infor-
mation:

Tag
Alarm type
Description
Units/states
Setting/alarm state

The tag is the tag number of the alarm in the database. The alarm type describes the alarm as high,
low, or a discrete state. The description is for the tag, from the same tag database. The units are the
engineering units for an analog type value, and the states are the discrete states of a digital value. The
setting is the analog alarm limit or the discrete state that generates the alarm.
Chapter 18: Alarm Management 231

Some information is necessary to document the alarm for procedures and training:

Consequence of deviation
Corrective action
Time for response
Consequence category
Basis

This information is required to train operators to respond to the alarmspecifically what action is nec-
essary, and how fast must it be completed before the consequence results. Documenting the basis for
the alarm allows re-evaluation of the consequences, especially with process changes.

Other information is required to complete the requirement specifications:

Priority
Retention period
Report requirements
Notification requirements
This information specifies properties of the alarm. The priority in the operator interface is a critical way
to designate the importance of the alarm. The alarm record may need to be kept for a certain period of
time, included in certain reports, or the alarm may be set up to trigger e-mail, pager, or voice mail
messages. These functions are defined in the philosophy, and the rationalization identifies individual
alarms that require these functions.

Example
A new tank containing flammable materials has the following alarms identified:

Alarm 1 Alarm 2 Alarm 3 Alarm 4


Tag LIG502 LIG502 PIG502 PIG502
Alarm type LL HH LL HH
Description T502 Level T502 Level T502 Pressure T502 Pressure
Units/states % % INWC INWC
Setting/alarm state 10 90 1 10
Consequence of deviation Cavitate pump Overflow tank Air intrusion Excess venting
Corrective action Stop pump Close inlet valve Stop pump Close inlet valve
Response time 2 min 2 min 10 min 10 min
Consequence category Equipment Safety Safety Environmental
Pump cavitation Tank overflow at Vacuum breaker Conservation vent
Basis
at 2% 107% setting setting
Priority Low Emergency High High
Retention period 1 year 5 years 5 years 5 years
Pump report Safety report Safety report Environmental
Report requirements
report
None None None Environmental
Notification requirements
coordinator

18.2.3 Design
The design phase utilizes the rationalized alarms and design guidance. Design practices are often docu-
mented in a separate guidance document specific to the type and generation of the control system. As
systems change, the guidance should be updated to reflect features and limitations of the control sys-
tem. Design practices fall into three areas: the basic configuration of alarms, the human-machine
interface (HMI), and advanced techniques for managing alarms.
232 RELIABILITY, SAFETY AND ELECTRICAL IV

The guidance on basic configuration may include default settings for alarm deadbands, alarm practices
for redundant transmitters, timing periods for discrete valves, alarm practices for motor control logic,
and the methods for handling alarms on bad signal values. Many alarm system problems can be elimi-
nated with good basic configuration practices.

Deadband: the change in process value from the alarm point in the reverse direction of the alarm nec-
essary to clear the alarm state.

The guidance on the HMI may include alarm priority definitions, alarm color codes, alarm tones,
alarm groups, alarm summary configuration, and graphic symbols for alarm states. Alarm functions
are only one part of the HMI, so it is important that these requirements fit into the overall HMI design
philosophy. The consistent use of color for alarms is often listed as a principle.

A common component of the HMI design guide is a table of alarm priorities, alarm colors, and alarm
tones. Some systems have the capability to show shapes or letters next to alarms. This is a useful tech-
nique for assisting color blind operators in recognizing alarm priorities.

Example of alarm priority features:

Priority Color Tone Shape


Emergency Red Tone 1 Red triangle, point up
High Yellow Tone 2 Yellow Diamond
Low Orange Tone 3 Orange triangle, point right

Beyond the basic configuration and HMI design, there are many techniques to reduce the alarm load
on the operator and improve the clarity of the alarm messages. These techniques range from first-out
alarming to state-based alarming to expert systems for fault diagnosis. The techniques allowed should
be defined in the alarm philosophy, along with the implementation practices in the design guide.

First-out (First-up): A sequence feature that indicates which of a group of alarm points operated first
[Ref. 3].

Alarm suppression: Use of condition-based logic to determine that an alarm should not occur when
the base alarm condition is present.

State-based alarming: Use of measurements or models of the equipment or plant operating state to
suppress alarms when they are not needed and activate alarms in the operating states to which they
are relevant.
Dynamic prioritization: Use of measurements or models of the equipment or plant operating state to
change alarm priority based on the current operating state.

Testing is a common requirement when the design is implemented. Testing requirements vary with
the type of alarms. Initial and periodic testing requirements should be documented in the rationaliza-
tion so the accommodations for testing can be made in the design step.

18.2.4 Training
Training is an essential step in developing an alarm system. Since an alarm exists only to notify the
operator to take an action, the operator must know the corresponding action for each alarm, as
defined in the alarm rationalization. A program should be in place to train operators on these actions.
Documentation on all alarms should be easily accessible to the operator. Beyond the alarm specific
training, the operator should be trained on the alarm philosophy and the HMI design. A complete
training program includes initial training and periodic refresher training.
Chapter 18: Alarm Management 233

18.2.5 Monitoring
Monitoring alarm systems is a critical step in alarm management. Since each alarm requires operator
action for success, overloading the operator reduces the effectiveness of the alarm system. Instrument
problems, controller performance issues, and changing operating conditions will cause the perfor-
mance of the alarm system to degrade over time. Monitoring and taking action to address bad actors
can maintain a system at the desired level of performance.

The alarm philosophy should define report frequencies, metrics, and thresholds for action. Common
measurements include:

Frequency of alarms, such as total number of alarms per day.


Frequency of alarm by tag, such as the number of times a tag alarms per day.

Time in alarm by tag, such as the number of minutes a tag is in alarm.

Rate of alarms, such as alarms per ten-minute interval.


The number of alarm floods (more than 10 alarms per 10 minutes) per day.

Measurement tools allow reporting of the metrics at different frequencies. Typically, there are daily
reports to personnel responsible to take action, and weekly or monthly reports to management. The
type of data reported varies, depending on the control system or safety system and the measurement
tool.

Distinct limits to trigger action should be set on the measurements. These limits are dependent on the
type of process and the resources to take corrective action. If the action limits are too relaxed, they will
not be effective. If they are too aggressive, they will be ignored. The performance metrics are usually
calculated per operator position or operator console.

Example
Alarm measurement triggers points and actions:

Frequency of alarm by tag greater than 10 alarms/day.

Time in alarm by tag greater than 24 hours.

Rate of alarms greater than 10/minute.

Rate of alarms greater than 300/day.

The Engineering Equipment Materials and Users Association (EEMUA) Publication 191, Alarm Systems:
A Guide to Design, Management, and Procurement, provides guidance on metrics for performance classifi-
cation. As above, these metrics are calculated per operator since they are related to the operators abil-
ity to process alarms.

Table 18-1: Benchmark for Assessing Average Alarm Rates [Ref. 4]


Long term average alarm rate in steady
Acceptability
operation
More than 1 per minute Very likely to be unacceptable
One per 2 minutes Likely to be over-demanding
One per 5 minutes Manageable
Less than one per 10 minutes Very likely to be acceptable
234 RELIABILITY, SAFETY AND ELECTRICAL IV

Table 18-2: Guidance on Alarm Rate Following an Upset [Ref. 5]


Number of alarms displayed in 10 minutes
Acceptability
following a major plant upset
More than 100 Definitely excessive and very likely to lead to the
operator abandoning use of the system
20-100 Hard to cope with
Under 10 Should be manageable but may be difficult if several
of the alarms require a complex operator response

18.2.6 Management of Change


Another key procedure for maintaining an alarm system is management of change. Usually there are
one or more management of change processes already established for Process Safety Management
(PSM) or current Good Manufacturing Practices (cGMP) which would encompass changes for alarms.
The alarm philosophy will define the change processes and the steps necessary to change alarms.
These steps are usually the same steps, though the scope may be smaller, as a project.

18.3 Alarm System Problems


The main problems in alarm management are nuisance alarms, stale alarms, alarm floods, and clarity
of the alarm to the operator. The processes defined in the alarm philosophy, implemented with opera-
tional discipline, can address these problems.

18.3.1 Nuisance Alarms


Nuisance alarms are alarms that indicate an abnormal condition when none exists, or when no change
in process condition has occurred. Nuisance alarms desensitize the operator, reducing the response to
all alarms. Instrument problems or alarms set within the normal operating range often cause nuisance
alarms. Measurement of the alarm frequency by tag is used to detect nuisance alarms at a threshold
defined in the alarm philosophyfor example, 10 alarms per day. Once detected, nuisance alarms
should be investigated and corrected as soon as possible. Typical alarm reports show a very small per-
centage of tags are responsible for the majority of alarms. Without monitoring and prompt follow-up,
nuisance alarms can quickly deteriorate the performance of an alarm system to the point where tens
of thousands of alarms are recorded per day.

18.3.2 Stale Alarms


Stale alarms are alarms that remain in the alarm state when no abnormal condition exists or no oper-
ator action is required. Stale alarms form a baseline of alarms that require no action and train the
operator to ignore certain alarms. These alarms are often caused by alarm configuration problems or
alarms set within the normal operating range. Measurement of the time in alarm by tag is used to
detect stale alarms at a threshold defined in the alarm philosophyfor example, 24 hours. Without
monitoring and follow-up, the number of stale alarms slowly increases, decreasing the effectiveness of
the alarm system.

18.3.3 Alarm Floods


Alarm floods are a temporary high rate of alarms, usually associated with an event like a process upset.
Alarm floods overwhelm the operator, masking the important alarms and reducing the operators abil-
ity to correctly respond to the upset. Alarm floods are often caused by configuring multiple alarms for
a given event. Alarm floods are detected by measuring the rate of alarms in a given time interval with
a threshold defined in the alarm philosophyfor example, 10 alarms per 10 minutes. Alarm floods are
one of the more difficult problems to solve, but a problem closely linked with plant disasters. Monitor-
ing can detect and report alarm floods, but reducing floods takes detailed process understanding and
good alarm practices. Rationalization can help reduce duplicate alarms. Advanced alarming techniques
can reduce the number of alarms during an upset.
Chapter 18: Alarm Management 235

18.3.4 Alarm Clarity


Clarity of alarms is an issue related both to configuring the alarms and to training the operator to
respond to the alarm. Alarm documentation generated during rationalization provides the information
for training. Alarm clarity problems are a difficult thing to measure. Operator training can provide the
opportunity to identify clarity problems. Sometimes the problems can be resolved with changes to the
basic alarm configuration. Advanced alarm techniques are also often employed to produce fewer
alarms that have clear meaning.

18.4 Alarms for Safety


Alarms mark the boundary between normal and abnormal conditions in the process. They alert the
operator to take action to return the process to normal conditions. Because alarms are linked to oper-
ator intervention, they are sometime used as a layer of protection for hazardous events. Care should
be used when evaluating an alarm in the process automation system as a safety layer.

While some alarms provide safety warnings, there is a key difference between an alarm system and a
safety system. The alarm function always requires an operator to take action. The safety function is
almost always designed to function without the operator. One consequence of this difference is that
the alarm systems effectiveness is limited by the operators ability to respond correctly to each alarm.
An operator can be overwhelmed as the rate of alarms or the complexity of the response increases.
When the process control system is used for safety related alarms, monitoring can maintain the alarm
system performance. Even with monitoring, the risk reduction factor for the basic process control sys-
tem (BPCS), including the process alarms, is limited to 10 unless the system is treated as a safety
instrumented system [Ref. 6].

18.5 References
1. Nimmo, Ian. Abnormal Situation Management.

2. ISA-RP77.60.02-2000, Fossil Fuel Power Plant Human-Machine Interface: Alarms, p. 9.

3. ANSI/ISA-18.01-1979 (R2004), Annunciator Sequences and Specifications, p. 9.

4. Alarm Systems: A Guide to Design, Management and Procurement. EEMUA, p. 105.

5. Alarm Systems: A Guide to Design, Management and Procurement. EEMUA, p. 107.

6. ANSI/ISA-84.00.01-2004-Part 2 (IEC 61511-2 Mod) - Functional Safety: Safety Instrumented


Systems for the Process Industry Sector - Part 2: Guidelines for the Application of ANSI/ISA-84.00.01-
2004 Part 1 (IEC 61511-1 Mod) - Informative. Sections 9.4.2 and 9.4.3.

About the Author


Nick Sands has worked in various process control assignments at DuPont for the past 15 years, after
graduating from Virginia Tech. He is a Chemical Solutions Process Technology Manager at DuPont. He
is an active ISA member, serving as a section, division, and standards committee volunteer, and a con-
tributor to the new ISA Certified Automation Professional program. Nick is a Certified Automation
Professional.