Vous êtes sur la page 1sur 13

Engineering Failure Analysis 36 (2014) 121–133

Contents lists available at ScienceDirect

Engineering Failure Analysis


journal homepage: www.elsevier.com/locate/engfailanal

Texas City refinery accident: Case study in breakdown


of defense-in-depth and violation of the safety–diagnosability
principle in design q
Joseph H. Saleh a,⇑, Rachel A. Haga a, Francesca M. Favarò a, Efstathios Bakolas b
a
School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, USA
b
Department of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin, Austin, USA

a r t i c l e i n f o a b s t r a c t

Article history: In 2005 an explosion rocked the BP Texas City refinery, killing 15 people and injuring 180.
Received 15 June 2013 The company incurred direct and indirect financial losses on the order of billions of dollars
Received in revised form 5 September 2013 for victims’ compensation as well as significant property damage and loss of production.
Accepted 20 September 2013
The internal BP accident investigation and the Chemical Safety Board investigation identi-
Available online 2 October 2013
fied a number of factors that contributed to the accident. In this work, we first examine the
accident pathogens or lurking adverse conditions at the refinery prior to the accident. We
Keywords:
then analyze the sequence of events that led to the explosion, and we highlight some of the
Defense-in-depth
Refinery explosion
provisions for the implementation of defense-in-depth and their failures. Next we identify
Safety–diagnosability principle a fundamental failure mechanism in this accident, namely the absence of observability or
Accident pathogens ability to diagnose hazardous states in the operation of the refinery, in particular within the
raffinate splitter tower and the blowdown drum of the isomerization unit. We propose a
general safety–diagnosability principle for supporting accident prevention, which requires
that all safety-degrading events or states that defense-in-depth is meant to protect against
be diagnosable, and that breaches of safety barriers be unambiguously monitored and
reported. The safety–diagnosability principle supports the development of a ‘‘living’’ or
online quantitative risk assessment, which in turn can help re-order risk priorities in real
time based on emerging hazards, and re-allocate defensive resources. We argue that the
safety–diagnosability principle is an essential ingredient for improving operators’ situation
awareness. Violation of the safety–diagnosability principle translates into a shrinking of
the time window available for operators to understand an unfolding hazardous situation
and intervene to abate it. Compliance with this new safety principle provides one way to
improve operators’ sensemaking and situation awareness and decrease the conditional
probability that an accident will occur following an adverse initiating event. We suggest
that defense-in-depth be augmented with this principle, without which it can degenerate
into an ineffective defense-blind safety strategy.
Ó 2013 Elsevier Ltd. All rights reserved.

q
This research builds on and extends previous work presented at the joint ESREL–PSAM conference of 2012.
⇑ Corresponding author. Tel.: +1 404 385 6711.
E-mail address: jsaleh@gatech.edu (J.H. Saleh).

1350-6307/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.engfailanal.2013.09.014
122 J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133

1. Introduction

On March 23, 2005 an explosion rocked the BP Texas City refinery,1 killing 15 people and injuring 180 after the blowdown
drum of the isomerization unit overflowed (loss of containment of hydrocarbons), and a heat source ignited the ensuing vapors
resulting in an explosion and subsequent pool fire. Three different investigation panels were convened and generally agreed
that the accident resulted from a combination of factors, including design and operational flaws, technical and organizational
factors, and more broadly a weak safety culture.
In this work, we reexamine the BP Texas City refinery accident as a case study in breakdown of defense-in-depth, and we
identify a fundamental failure mechanism in this accident, namely the absence of observability or ability to diagnose
hazardous states in the operation of the isomerization unit. We first examine the accident pathogens or lurking adverse
conditions at the refinery prior to the accident. We then analyze the sequence of events that led to the explosion, and
we highlight some of the provisions for the implementation of defense-in-depth and their failures. This failure mechanism
identified leads us to propose a new principle for supporting accident prevention, the safety–diagnosability principle, which
requires that all safety-degrading events or states that defense-in-depth is meant to protect against be diagnosable. We
propose that defense-in-depth be augmented or complemented with this principle, without which it can degenerate into
an ineffective defense-blind safety strategy [2].
A brief discussion of defense-in-depth is in order, after which we examine its implications on the observability of a sys-
tem. Defense-in-depth is a fundamental principle/strategy for achieving system safety. First conceptualized within the nu-
clear industry and rooted in elements of military strategy, defense-in-depth is the basis for risk-informed decisions by the US
Nuclear Regulatory Commission [16,23] and is recognized under various names in other industries (e.g., layers of protection)
in the chemical industry [1,12,24]. Accidents typically result from the absence or breach of defenses or violation of safety
constraints [15,17,25]. The principle of defense-in-depth embodies the idea of multiple lines of defense and safety barriers
along accident scenarios, and this principle shuns the reliance of safety on a single element (hence the ‘‘depth’’ qualifier).
Defense-in-depth, typically realized by successive and diverse safety barriers, technical and procedural, is designed to: (1)
prevent incidents or accident initiating events from occurring, (2) prevent these incidents or accidents sequences from esca-
lating should the first barriers fail, and (3) mitigate or contain the consequences of accidents should they occur because of
the breach or absence of the previous ‘‘prevention’’ barriers [21].
Defense-in-depth however is not without its critics. For example, Reason [18] noted that ‘‘defences-in-depth’’ are a mixed
blessing. One of their unfortunate consequences is that they ‘‘make systems more [. . .] opaque to the people who manage
and operate them.’’ He further explained that ‘‘the main problems that defences-in-depth pose [. . .] is that they can conceal
both the occurrence of their errors and their longer term consequences. A characteristic of such defences is that they do not
always respond to individual failures. These can be either countered or concealed, and in neither case need the individuals
directly concerned be aware of their existence. This allows for the insidious build-up of the latent condition.’’ In other words,
by placing multiple defenses along a postulated accident sequence, the signals that may be triggered by these ‘‘individual
failures’’ indicating that a safety intervention is warranted are no longer available. As a result, system operators may be left
blind to the possibility that hazard escalation is occurring, thus decreasing their situational awareness and shortening
the time they have to intervene before an accident is released. Several accident reports identified hidden failures or
unobservable accidents pathogens as important contributing factors to the accidents—the Three Mile Island accident being
a well known such case [10]—and they lend credence to Reason’s statements [17,18]. In this work, we address this particular
problem, and propose that defense-in-depth ought to be augmented in such a way as to ensure observability and avoid
potential ‘‘blind spots’’ for hazardous states.
The ability to observe and diagnose a hazardous state of a system or the occurrence of a safety-degrading event is crucial
in maintaining system safety. Roughly speaking, operators make decisions during system operation that are both based on
and affect the internal conditions/states of the system [13]. If process monitoring fails to provide information regarding the
actual conditions/states of a system, there is a distinct possibility that operators will make flawed decisions (omission or
commission), which in turn can compromise the safe operation of the system or fail to check the escalation of an accident
sequence. The absence of observability, or inability to diagnose hazardous states, during the operation of the BP Texas City
refinery was a fundamental failure mechanism that contributed to the accident, as we will show in this work.
The remainder of this work is organized as follows. In Section 2, we review the functioning of the raffinate splitter section
at the refinery (where the accident occurred), and we examine the accident pathogens and sequence of events that led to the
explosion. In Section 3, we introduce the safety–diagnosability principle, and examine its violation within the splitter sec-
tion. We conclude this work in Section 4.

2. Texas City refinery explosion: anatomy of a system accident

The isomerization unit’s function is to separate and refine oil to provide higher-octane components for unleaded gasoline.
The unit was comprised of multiple sections; we only focus on the raffinate splitter section (RSS), which is where the acci-
dent occurred. The purpose of the RSS is to separate incoming raffinate feed into light and heavy components. In this section

1
The refinery was previously owned by Amoco prior to the merger of BP and Amoco in 1998. In October 2012, BP sold the refinery to Marathon Petroleum.
J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133 123

Fig. 1. Simplified schematic of the raffinate splitter section (adapted from [6]).

we first provide an overview of the raffinate splitter section. We then expand on the important accident pathogens and lurk-
ing adverse conditions that lead or contributed to the accident and aggravated its consequences. Finally, we analyze how
different factors, combined with poor decisions and a lack of communication among the operators and supervisors culmi-
nated in the explosion. This section is based to a large extent on the accident investigation report by the US Chemical Safety
Board [6], and to a lesser extent on the internal BP investigation report [3].

2.1. Overview of the raffinate splitter section

A simplified schematic of the RSS layout is shown in Fig. 1. During startup, heavy liquid raffinate was pumped into the
170 ft (52 m) tall raffinate splitter tower. The tower had a diameter of 12.5 ft (3.8 m) and a volume of 586,100 L. Heavy liquid
raffinate was pumped into the tower at a feed rate set by the operator and regulated by an automatic control. For example,
during most of the accident sequence, the feed rate was set at 20,000 barrels per day or 132,500 liters per hour. The tower
operated as a distillation column with two outputs: heavy raffinate routed out of the bottom of the tower, and light raffinate
routed out of the top of the tower. The heavy output flowed through two heat exchangers, the first one to pre-heat the
raffinate feed into the tower, and the second one to cool down before being sent to its designated storage tanks. The light
raffinate exited the overhead of the tower as vapors and was routed down a 45 m pipe along the side of the splitter tower,
after which it passed through a condenser and sent to the light raffinate storage tank.2 The light raffinate pipe had three safety
valves (labeled I in Fig. 1), which opened in the event that pressure in the tower exceeded an unsafe limit. If these safety valves
released, the liquid or vapors would overflow into a blowdown described next.
The blowdown drum was designed to ‘‘receive, quench, and dispose of hydrocarbon vapors and associated liquid’’ from
the splitter tower and though the safety valves in case of operational upsets or shutdowns [3]. The blowdown drum was
installed in the 1950s, had a volume of 86,200 L, about seven times smaller than the splitter tower, and functioned as a safety
buffer, disposing of its content as follows: hydrocarbon vapors were dispersed out of its open top into the atmosphere;
liquids were discharged from the base of the drum into the sewer [6].

2.1.1. Instrumentation
The splitter tower was equipped with a level sight glass and a level transmitter that only measured liquid heights be-
tween 4 ft (1.2 m) and 9 ft (2.7 m) in the tower. Recall the tower was 170 ft (52 m) tall, and outside the small range of

2
Or back to the splitter tower for a repeat of the process.
124 J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133

NOT DRAWN TO SCALE

Raffinate splitter tower


170 ft

Redundant
high level
alarm
Level transmitter

Range of the
level transmitter
7.9 ft 9 ft
7.6 ft
4 ft

Fig. 2. Instrumentation of the splitter tower.

the level transmitter, no other sensor was available to measure the height of the liquid in the tower and provide this
information to the control room operator. During startup, the liquid was not to exceed 50% of the level transmitter, that
is, 6.5 ft (2.0 m) from the bottom of the tower. Two high-level alarms were set at 72% and 78% of the level transmitter range,
that is, at 7.6 ft (2.3 m) and 7.9 ft (2.4 m) respectively (see Fig. 2). A high-level alarm was also present in the blowdown drum
and was ‘‘set to activate when the drum was close to flowing over the top’’ [6].3
We next discuss the accident pathogens and lurking adverse conditions at the refinery, following which we examine the
details of the accident sequence that culminated in the explosion. Sections 2.2 and 2.3 can be read in reverse order as well.
We chose the present order to reflect in a sense the nature of accident pathogen: a lurking adverse condition that is present
in a system prior to the unfolding of an accident sequence. As such, we discuss the pathogens at the Texas City refinery prior
to the accident sequence. We leave the discussion of the safety–diagnosability principle and its violation for the last subsection.

2.2. Accident pathogens at the Texas City refinery

System accidents are a distinct class of adverse events, initially termed ‘‘man-made disasters’’ [26] and ‘‘organizational
accidents’’ [18]. These two qualifiers, ‘‘organizational’’ and ‘‘system’’, are used to indicate on the one hand an organizational
contribution to accident causation beyond the traditional technical and human error factors, and on the other hand a recog-
nition that accidents can occur due to the interactions between the elements of a system, rather than failures of the elements
themselves [15]. Two distinctive features of a system accident are its temporal depth of causality and its diversity of agency
([5]):

i. Temporal depth of causality: The chain of causality, or chain influence, leading to the accident extends beyond the
temporal vicinity of the moment the accident occurred, with build-up of accident pathogens occurring over different
time-scales before an initiating event triggers an accident sequence.
ii. Diversity of agency: The safety value chain, that is, groups and individuals who influence or contribute to the accident
occurrence/prevention, extends far beyond the immediate victims, who may or may not have contributed to the acci-
dent (see [21] for a discussion of the safety value chain).

In addition, system accidents are typically but not exclusively associated with large-scale (uncontrolled) releases of
energy, as was the case with the BP Texas City refinery accident.
Before discussing the accident sequence, that is, the chain of events that led to the explosion, we examine in this
subsection the pre-existing or latent adverse conditions at the refinery. These lurking conditions are sometimes referred
to in the safety literature as ‘‘accident pathogens’’, and after an accident occurs, they are identified as causal elements of

3
No quantitative information was available regarding the height at which the alarm was to go off in the drum.
J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133 125

or contributing factors to the accident.4 An accident pathogen is an adverse latent or pre-existing condition, which when com-
pounded with other factors or occurrence of adverse events, can further advance an accident sequence, precipitate an accident,
or aggravate its consequences [2]. In the following, we identify some of the important accident pathogens at the Texas City
refinery, including design flaws within the raffinate splitter section, poor safety practices, and more generally a weak safety
culture. The accident pathogens we examine next are of different nature and on different levels of abstractions: some are
technical and pertain to engineering design flaws for example—these and others will turn into causal factors once the accident
sequence is triggered; others are operational and maintenance related, and others are behavioral in nature, both at the individ-
ual and organizational levels. The diverse nature of causal and influencing factors in system accidents should be recognized, and
the causal basis of such an accident ought not to be reduced to a solely technical, managerial, or behavioral factors. It is often a
complex web of causal factors diverse in nature and interacting in a variety of ways that lead to an accident, as was the case at
the Texas City refinery.

2.2.1. Design flaws and maintenance shortcomings


Design flaws at the refinery ranged from mechanical failures to poorly thought-out systems and obsolete technology. On
the day of the accident, most of the raffinate tower instruments were not functioning properly: the second high-level alarm
did not sound (it is not clear how long it had been broken for and unchecked); the sight glass was cloudy and consequently
could not provide visual information regarding the level of liquid in the tower; and the level transmitter was erroneously
calibrated. More importantly, the level transmitter was a limited tool, as it only measured liquid levels over a 5 ft span of
the 170 ft tall tower (Fig. 2). In other words, the tower was not instrumented in a way that would reflect its internal state,
or the extent of its filling, which can be considered a proxy for the hazardousness of its state (the more it is filled, the closer
the tower is to overflowing). The operators were thus blind to the liquid level in the tower above a few feet from the bottom,
which decreased their ability to ‘‘see’’ and comprehend when a hazardous situation was developing and make timely deci-
sions to de-escalate the accident sequence. We will later revisit this lack of proper instrumentation of the tower. Addition-
ally, the raffinate tower had no automatic shutdowns or triggers when the liquid reached dangerously high levels, even in
the blowdown drum.
There were also design flaws with the blowdown drum. In the event that the drum overflowed, its vent stack dispersed
hydrocarbon vapors into the atmosphere. This was an outdated technology and should have been phased-out years prior to
the refinery explosion. In addition, liquid hydrocarbon in the blowdown drum was discharged into the sewer system, an ob-
solete and hazardous design choice—industry guidelines recommend against it [6]. Modern systems include flares or are
closed system configurations. In addition, as noted previously, there was only one level alarm in the blowdown drum and
no monitoring of the raffinate liquid level in it. As a result, the operators were also blind to the internal state of the blow-
down drum.
One final design flaw worth mentioning is the poorly designed computer display in the control room: the feed (input) and
output flow of the raffinate tower were not displayed on the same screen, and no display of an estimated net flow was pro-
vided. Consequently in order for the operator to see the measurements of feed into the tower and outflow from the tower,
two separate computer screens had to be consulted, and some mental calculations done to assess the net flow. This
contributed to a degradation of situational awareness of the operator, and the fact that a critical state variable of the tower,
namely the net flow, was not monitored and reported, was an important design flaw that contributed to the operators’
failure to comprehend the unfolding situation and de-escalate the accident sequence once it was triggered, as we will see
shortly.
To summarize, two critical state variables of the system, (1) the height of the liquid in the raffinate tower or the ex-
tent to which the tower is filled, and (2) the net raffinate flow into the tower, were by design not observed and mon-
itored. Both these variables are key determinants of the hazardousness of the state of the tower. The more the tower is
filled, the smaller the remaining safety margin and the closer it is to overflowing; similarly, the larger the net flow, the faster
the tower will fill, and the faster the safety margin will disappear. An increase in both these variables will result in a shorter
amount of time available for the operators to comprehend an unfolding hazardous situation and intervene effectively. The
operators at the Texas City refinery were blind to both these state variables.

2.2.2. Trailer siting


An accident pathogen, as noted previously, can precipitate an accident or aggravate its consequences. The trailer siting
issue at the Texas City refinery, while not part of the causal chain leading to the explosion, is one example of an accident
pathogen that dramatically aggravated the consequences of the accident. In any safety strategy, the first and preferred course
of action is to prevent accidents from happening [23]. Should the first lines of defense fail in their prevention function, a
number of safety features should be in place to contain the accident or mitigate its consequences (e.g., lifeboats onboard
a ship). The major accident pathogen at the Texas City refinery that not only failed to mitigate the consequences of the explo-
sion, but also dramatically aggravated them was the siting of trailers in the vicinity of the blowdown drum [11]. Temporary
workers and contractors were housed in trailers placed within the immediate vicinity of the blowdown drum; recall the
drum is the final line of defense against loss of containment through overflowing, its atmospheric vent stack would ensure

4
For example, a failed emergency power system is an accident pathogen at a nuclear power plant: should the main power system fail, this latent adverse
condition will precipitate the accident, or cause the sequence to further advance toward a core meltdown [20].
126 J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133

that a loss of containment would dramatically affect the trailers. Sadly, all the 15 fatalities and many of the injured were in or
around these trailers. The company had a history of placing trailers next to processing unit for convenience purposes, despite
the inherent danger in the distillation process. As temporary structures, the trailers were not removed after they served their
designated purpose, becoming semi-permanent structures. The consequences of the accident in loss of life and limb
would not have been this tragic had it not been for this major accident pathogen of trailer siting in the vicinity of
the most dangerous part of the raffinate splitter section (and that the trailers were occupied/not evacuated during the
dangerous startup phase of the unit).

2.2.3. Poor safety practices, and more generally a weak safety culture
Safety culture is an important concept for hazardous industries. It was introduced by the International Nuclear Safety
Advisory Group (INSAG, 1986) following the Chernobyl accident, and later became a staple of safety studies in academia
and within a host of industries. The scope and meaning of the concept have evolved since it was first introduced, and
although many definitions for it exist, it is generally agreed that safety culture of an organization is ‘‘the product of individual
and group values, attitudes, perceptions, and patterns of behavior that determine the commitment to and the proficiency of
an organization’s’’ to health and safety issues [4,22].
The safety culture at the Texas City refinery was significantly weak, and this was manifest in a variety of ways, from poor
safety practices, to inadequate procedures, and a repeated pattern of safety violation. We briefly describe in the next para-
graphs specific aspects of this weak safety culture.
Poor safety practices, from inadequate communication between personnel to routine violations of operational procedures,
were prevalent throughout the raffinate unit. According to the CSB report, formal procedure regarding communication between
shifts was deficient and what little formal policy existed was not enforced (e.g., turnover procedures and records in logbooks).
Moreover shutting down and restarting the RSS was not covered in a formal agreed-upon and enforced procedure. One existing
procedure was considered outdated and routinely ignored; according to this procedure, startup should be conducted in automatic
mode, set to reach and maintain 50% of the level transmitter (6.5 ft from the bottom of the tower). However, because it was thought
that the level of liquid in the tower fluctuated widely, and low liquid level was thought to potentially damage equipment, the tower
was routinely filled to greater than 90% (8.5 ft) of the level transmitter. As a consequence, both high-level alarms were frequently
ignored. Furthermore, in order to fill the tower beyond the 50% of the level transmitter, the automatic mode was routinely disen-
gaged and it was common practice for the start-up to be conducted manually, which was also a violation of operational procedure
(and fairly risky, even for an experienced operator, given the inadequate instrumentation of the tower). Procedural steps in the pre-
startup safety review were seldom undertaken or carried out effectively—they were not on the morning of the accident—and crit-
ical instrumentation on the raffinate tower was identified as malfunctioning but not repaired [6]. The CSB investigation reported
that as technicians started checking the alarms, the supervisor informed them that ‘‘the unit was starting and there was no time for
additional checks’’. A proper safety culture would have the roles reversed and the technicians would inform the supervisor when
all the checks have been completed and give their ‘‘go’’ for unit startup. The supervisor signed off on startup procedure the morning
of the accident that all checks had been completed despite the fact that they were not.
Many of the previously mentioned deficiencies were common occurrences, rather than isolated events, and they
collectively reflected a weak safety culture at the refinery. The poor safety practices noted previously are one manifestation
of this weak safety culture. In addition, as noted in the CSB report, on 14 of the 19 previous startups prior to this accident,
the pressure in the tower exceeded limits set in procedure, but none of these incidents were investigated. Furthermore, since
2003, two startups were above the pressure relief points, and most likely resulted in liquid being dumped into the blowdown
drum. Similarly these two near misses were not investigated. As a result, many opportunities for learning from these precursor
events or close calls and fixing the hazardous startup operations were missed. In the 30 years prior to accident, there were
23 fatalities at the Texas City refinery, three of those deaths occurred in 2004, the year prior to the RSS explosion. Several
additional manifestations of the weak safety culture at the plant will be noted in discussing the accident sequence next.

2.3. Accident sequence

Having reviewed the context and the lurking adverse conditions at the refinery, we examine in this subsection the se-
quence of events that led to the explosion, starting about 10 h before the accident. As noted previously, many steps in
the pre-startup safety review were not undertaken or not carried out fully (equipment and alarm checks). In addition, the
supervisor ‘‘did not distribute or review the applicable startup procedure’’ with the operators, as required, another manifes-
tation of the weak safety culture.
The RSS was started on the night shift. During the early morning of March 23, 2005, the night operator began pumping
liquid raffinate in the splitter tower. At 3:09 am the first high-level alarm sounded and was ignored. The level of raffinate in
the tower reached and then surpassed the level of the second high-level alarm. The alarm did not go off. The night operator
continued to fill up the raffinate tower until the level transmitter read 99% of the level transmitter (9 ft from the bottom of
the tower). The post-accident analysis revealed that the tower was probably closer to 13 ft by this time.
The night operator left at 5:00 am, an hour before his shift ended, and left no detailed communication for the next shift regard-
ing the actions he had taken during the startup,5 another manifestation of the weak safety culture (and training) at the plant.

5
His note to the next shift read as follows: ‘‘ISOM: brought in some raff to unit, to pack raff with’’ [6].
J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133 127

Around 9:50 am, the raffinate unit was restarted and the feed rate into the tower was set and maintained at 132,500 L per
hour.
The critical initiating event in this accident sequence was the restart of the unit with raffinate flowing into the split-
ter tower but none flowing out: the operators had shut both output valves of the tower.
The contribution of this situation to the accident cannot be overstated, and it resulted from poor communication and
coordination between operators and supervisors, and a general lack of leadership of the startup process. The RSS had inside
and outside operators, the two groups received mixed/incomplete information for output routing instructions. The inside
operators believed that the heavy raffinate storage was full and closed its corresponding output valve; the outside operators
believed the light raffinate storage was full and closed its corresponding output valve as well. The CSB report indicates that
there was no direct communication or coordination between the inside and outside operators, and that neither group
documented their routing instructions in the logbook. As a result, when start up resumed around 9:50 am, both output
valves were shut and raffinate feed accumulated in the tower. With the lack of proper tower instrumentation and no
observability/monitoring of the two critical state variables noted previously, (1) the height of the liquid in the tower, and
(2) the net flow into the tower, the operators would remain blind to the fact that the tower was filling up and a hazardous
situation was unfolding. We will revisit this discussion in Section 3.
As a side note, we question whether the startup should have been initiated if any one of the storage tanks was unavailable
for either the light or the heavy raffinate (not addressed in the accident reports). If this were the case, both the outside and
inside operators could have stopped the startup process despite their lack of coordination instead of shutting one of the out-
put valves, provided they had received the proper safety training. Reviewing, auditing, and delivering proper safety training
is a management responsibility and should be intrinsic to a company’s Safety Management System. This also deserves more
careful attention in accident investigation reports.
Going back to the re-startup, around 9:50 am, the feed rate into the tower was set and maintained at 132,500 L per hour, no
raffinate was flowing out of the tower, and the automatic level control of raffinate in the tower was disengaged (see discussion
in Section 2.2.3). At this feed rate, should no change to the net flow occur and neglecting other physical phenomena in the tower
(boiling raffinate and vaporization at the bottom of the tower), the tower would fill in roughly 4 h. This is a crude estimate of the
time window available for the operators to understand the unfolding situation and attempt to de-escalate the accident sequence.
As we will see shortly, the lack of proper instrumentation, which we will refer to as a violation of the safety–diagnosability prin-
ciple, gnawed at this time window and left the operators with very little time to intervene effectively (too little too late).
By 11:16 am, the operators lit additional burners in the reboiler furnace, and the temperature in the tower feed rose from
approximately 200 F to 307 F by 12:40 pm; the temperature increase rate was about 75 F/h. Both these measures were in
violation of the startup procedure, which required a temperature of 275 F and a rate of 50 F/h to avoid excessive pressure
buildup in the tower and boiling raffinate. This situation is likely to have shortened the 4-h time window mentioned in
the previous paragraph. At this time, 11:16 am, the tower level transmitter displayed a liquid level of 93% (8.7 ft from the
bottom of the tower), although post-accident analysis revealed that the liquid was probably around 67 ft. The discrepancy
between the actual level of the liquid and the level indicated by the level transmitter was due to incorrect calibration and the
fact that the transmitter was operating outside its range.
At 12:41 pm, the pressure in the tower rose dramatically to 33 psig, exceeding the limits set in the startup procedure.
Unknown to the operators, the liquid level had reached 140 ft in the tower.
In response to the pressure spike, the operators vented the vapors in the tower, reduced the heat to the furnace, and rou-
ted some heavy raffinate out of the tower (at 15% of the valve output). This was the first outflow from the tower since it had
been restarted during the morning shift. Unfortunately, the routing of the heavy raffinate circulated it through a heat ex-
changer, heating the input flow, causing the liquid in the tower to expand. At 1:14 pm, the tower overflowed, the pressure
spiked to 63 psig, and the safety relief valves opened, routing liquid into the blowdown drum. The control room operator
noticed the pressure spike, decreased the furnace heat and increased the output flow to the maximum setting. These actions
did have some positive effect but they came too late. The blowdown drum filled up very quickly in about 6 min. Its high-level
alarm did not go off. Around 1:20 pm, two witnesses reported seeing ‘‘vapors and liquid emerging approximately 20 ft above
the stack like a geyser and running down and pooling around the base of the blowdown drum’’ [3]. The operators ‘‘stated
they had insufficient time to sound the emergency alarm before the explosion’’ [6]. The vapor cloud ignited, presumably
by a backfiring pickup truck parked in the vicinity of the blowdown drum, and resulted in the explosion and fire. A visual
recap of the accident pathogens and select events in this accident sequence is provided in the appendix Fig. A1.

3. Breakdown in defense-in-depth and violation of the safety–diagnosability principle

As the splitter section spiraled into disaster, the operators were unaware of the growing hazardousness of the situation
until about 12:41 pm, and they probably did not realize the extreme danger until about 1:14 pm, just 6 min before the
explosion. This means there was roughly a three-hour and a half window starting from the day-shift restart of the process,
during which had the operators been aware of the state of the splitter tower and blowdown drum, a number of decisions
could have been made and courses of action taken to block the accident sequence from unfolding and abate it. The height
of the tower and the blowdown drum served among other things as safety buffers, and they are the primary reason that this
window of opportunity existed. But these safety barriers were ineffective in large part because of the violation of the
safety–diagnosability principle in their design, as we discuss next.
128 J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133

In general terms, the hazard level of a situation can be conceived of, loosely speaking, as the closeness of an accident to
being released. As such, it depends on the specific system and accident scenario considered. For example, for the accident
‘‘loss of containment through blowdown drum overflow’’, the level of hazardousness can be approximated by the height
of the raffinate in the tower and the blowdown drum. A more detailed definition of the situation’s hazardousness would
account for the tower’s pressure, temperature, and net flow. As a first order approximation, which is sufficient for our
purposes, we can write a dimensionless hazard level H for this accident sequence as:
hðtÞ
HðtÞ  ð1Þ
hmax

Eq. (1) can be specialized for the tower, the blowdown drum, or both, with H = 1 corresponding to overflow. Given a mass
flow rate u(t) and v(t) in and out of the tower respectively, and assuming an average raffinate density q , the height of raf-
finate in the tower can be expressed as:
v ðtÞ
dh ¼ uðtÞ
q pR2
 dt
or ð2Þ
Rt
Dhðt0 ; t i Þ ¼ q p1R2 t0i ½uðtÞ  v ðtÞ  dt

Using the local form of Eq. (2), the BP investigation estimated a height increase per unit time dh
dt
 38 ft=h This constitutes
a proxy for the rate of hazard escalation in the tower, and it is likely to have further accelerated nonlinearly starting at
11:16 am when the operators lit additional burners in the reboiler furnace.
Eqs. (1) and (2) can be used to express a level of hazardousness at time ti before the accident occurred and following the
restart of the raffinate unit at t0:
R ti
hðt i Þ hðt 0 Þ þ q pR2 t0 ½uðtÞ  v ðtÞ  dt
1

Hðti Þ  ¼ ð3Þ
hmax hmax

The splitter section had several features that served as safety barriers in a defense-in-depth like strategy with respect to
H. Recall liquid raffinate in the tower was not to exceed 50% of the level transmitter (see Section 2.1). The difference between
this nominal height and that of the first high-level alarm, 72% of the level transmitter, can be conceived of as a first safety
barrier or buffer. The incremental height between the first and second high-level alarm, between 72% and 77% of the level
transmitter, was a second barrier. The remaining height of the tower, from 7.9 ft to 170 ft, was the third safety buffer. And
the blowdown drum was the final safety barrier before loss of containment occurred through overflowing.6 Had the liquid
level in the raffinate tower been observable and monitored across all these safety barriers (beyond the short span covered
by the level indicator), the operators would have been aware of the growing hazard and had more time to take action. To further
examine the effects of observability or lack thereof on the operation of a system, consider the notional example in Fig. 3.
It is worth pointing out again that the hazard level examined in this section (Eqs. (1)–(3)) is specific to the particular acci-
dent scenario here considered; it constitutes one important dimension of the hazard at the raffinate splitter section but not
the only one. Other context and accident scenarios would require different definitions of quantitative hazard levels. As noted
previously, the hazard level of a situation can be conceived of, loosely speaking, as the closeness of an accident to being re-
leased. This ‘‘closeness’’ can be expressed in a temporal or probabilistic manner, and this provides a guide for tailoring the
definition of hazard levels to a variety of situations and accident scenarios. The safety–diagnosability principle is indepen-
dent of the particular definition of the hazard level, as we discuss next.
The solid line in Fig. 3 represents the actual hazardousness of the system’s state at a given time, H(t), while the dashed line
b
represents the operators’ assumed hazardousness7 of the state at that time, HðtÞ. The distance between these two curves, here-
b
after denoted as jjHðtÞ  HðtÞjj , can result from the absence of knowledge or misinformation regarding the state of the system,
and as such, it represents a degraded situational awareness of the operators. We refer to this lack of observability of critical state
variables (at the barriers) as a violation of the safety–diagnosability principle, which we further discuss in the next subsection.
The gap between the operator’s assumed hazard level and the actual hazard level in the plant is one measure of a degraded
situation awareness, a connection we will further examine in this section.

3.1. The safety–diagnosability principle

We define the safety–diagnosability principle as the requirement that all safety-degrading events or states that defense-
in-depth is meant to protect against be observable/diagnosable. This principle requires that various features be put in place
to observe and monitor for breaches of any safety barrier, and reliably provide this feedback to the operators.8

6
This list is not meant to be exhaustive. Additional features, such as safety valves and the discharging of the drum’s content into the sewer or slop tank, also
served as safety barriers, but they are tangential to our purposes.
7
Assumed hazardousness can have two parts, an estimate of the state in question (in our case the liquid level in the tower), and a perception of the ‘‘danger’’
associated with that state (see first paragraph in Section 3.2).
8
A more stringent version of this principle would require the complete state of the safety barrier to be observed and monitored, not just its breach.
J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133 129

Fig. 3. Illustration of the violation of the safety–diagnosability principle (degraded situational awareness).

The safety–diagnosability principle addresses the potential for concealment of hazard escalation, one of the limitations of
defense-in-depth discussed in the introduction. More importantly, compliance with this principle ensures that the system
will be operated in a closed-loop mode, instead of open-loop, with respect to risk, and that at various levels of hazardousness
(previously defined, and for which safety barriers have been put in placed to cope with), the proper feedback is provided to
the operators, which ensures that HðtÞb b
converges to H(t), hereafter noted as HðtÞ ! HðtÞ. In other words, the safety–diagnos-
ability principle ensures that if situational awareness is degraded during system operation, it is adjusted appropriately if or
^
when safety barriers (sbi) are breached, that is, kHðtÞ  HðtÞk sbi ! 0.
Going back to Fig. 3, assume H0 is the hazardousness of nominal operations, and at time t0, an adverse initiating event
occurs, which translates into H > H0. Violation of the safety–diagnosability principle can result, as illustrated in Fig. 3, by
the operators remaining oblivious to hazard escalation, that is, Hðt ^ 1 Þ ¼ H0 .
One effect of this violation is the shortening the amount of time available for the operators to intervene, following an ini-
tiating event and before the accident is released, from Dt to Dt  dt1 as illustrated in Fig. 3. The situation is further aggra-
vated by the time H2 is reached, and if further escalation continues without adjustment of HðtÞ, b as shown in the figure,
the accident will occur from the operators’ perspective as ‘‘out of the blue’’ without prior warnings. Although Fig. 3 repre-
sents an extreme illustrative case, it serves to establish the link between the violation of the safety–diagnosability prin-
ciple, and the shrinking of the time window available for operators to understand an unfolding hazardous situation
and intervene to abate it. Similarly, the violation of the safety–diagnosability principle can be translated into an increasing
conditional probability that an accident will occur following an adverse initiating event (IE):

PSDP ðAccjIEÞ P PSDP ðAccjIEÞ ð4Þ

SDP and SDP represent violation and compliance with the safety–diagnosability principle respectively.
Having introduced the safety–diagnosability principle, we examine it in the next subsection in the case of the Texas City
refinery accident, and we highlight the specific violations of this principle in the raffinate tower and the blowdown drum
(illustrated in Fig. 4).

3.2. Violation of the safety–diagnosability principle at the Texas City refinery

As discussed in Section 2.3, the night shift operator started the RSS by pumping raffinate into the tower above the first
high-level alarm, which sounded but was ignored. Since this was common violation of the startup procedure, it can be said
that the first safety barrier was in effect broken or useless. The operators continued pumping raffinate past the second high-
level alarm, thus breaching the second safety barrier as well. In this case, the second alarm did not go off; with the level
transmitter erroneously calibrated, the operator was unaware of this second breach. This would qualify as a violation of
the safety–diagnosability principle (in this case due to poor maintenance practices).
The raffinate feed into the tower continued, and the liquid height reached 13 ft, surpassing the limit of the level transmit-
ter (9 ft) by the time the night shift operator stopped the process and left. The absence of an alarm on the upper limit of the
level transmitter was an unfortunate design flaw. More serious was the situation that followed. Going back to Eq. (3), as the
day-shift operators restarted the splitter section, h(t0) was not observable/diagnosable, the net flow [u(t)  v(t)] was not
monitored, and the remaining height of the tower was devoid of instrumentations, liquid level sensors and/or alarms. As
the result, raffinate liquid level would rise, hazard escalation would continue, and the operators would remain oblivious
to this situation. The fact that the tower was not instrumented to monitor liquid levels between 9 ft and 170 ft was an
egregious violation of the safety–diagnosability principle. During the several hours it took for the tower to fill up, any liquid
b
level sensor above the level transmitter could have adjusted the operator’s HðtÞ to H(t), and would have likely prompted an
9
intervention to block the accident sequence from unfolding. A multitude of actions could have been taken to stop the rising

9
The problem was compounded by the fact that the level transmitter was mis-calibrated, was allowed to operate outside its range, and was providing false
information during the accident sequence.
130 J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133

E10
NOT DRAWN TO SCALE

E9
E8
E7

E6

E5
E4

E3
E2
E1
E0

Fig. 4. Accident sequence and violation of the safety–diagnosability principle in the splitter section (the break in the assumed state hazardousness
represents the change in shift).

^
liquid level, however, since the operators were unaware of the internal state of the tower and the increasing hazard, HðtÞ  H0 ,
no remedial action was considered.
It is worth noting that compliance with the safety–diagnosability principle can help detect and assess a hazardous situ-
ation, akin to traditional fault detection and diagnosis in dynamical systems [27], even though in our case no ‘‘fault’’ per se
has occurred. But the diagnosis part, while important, is not always a necessary condition for blocking an accident sequence.
For example knowing that the liquid level in the tower had reached say 150 ft may prompt the operator to shut down the
splitter process, without knowing why this has happened, that is, before diagnosing the situation (that both output valves of
the tower were shut).
At 11:16 am, the operator lit additional burners in the reboiler furnace, further increasing the hazardousness of the
^ ¼ 11 : 16Þ  H0 .
situation or accelerating hazard escalation, all the while assuming that the situation was roughly nominal, Hðt
The splitter tower contained 70 distillation trays. The BP investigation noted that when 57 trays were flooded (around
137 ft), the tower could no longer perform its primary function [3]. The fact that this particular height and loss of primary
function was not observable/diagnosable—no instrumentation was in place to monitor and report this loss of primary func-
tion—was a major design flaw in the tower.
The accident sequence continued unabated, and raffinate level further increased in the tower. The event tower overflow-
ing with liquid raffinate was not observable, the breach of another safety barrier, and as such it constitutes another gross
violation of the safety–diagnosability principle. At 1:14 pm, as the pressure spiked and the safety valves released, the oper-
ators’ estimate of the hazardousness of the situation increased. However, the signal was not unambiguous: safety valves can
release high-pressure hydrocarbon vapors into the blowdown drum, a less hazardous situation than releasing raffinate li-
quid. In other words, the release of the safety valves provided partial observability into the actual event unfolding but
not enough to set HðtÞ ^ ! HðtÞ. The blowdown drum constituted the final safety barrier before loss of containment of liquid
raffinate occurred (the drum could be broken down into two barriers, one before and one after its outlet gooseneck). The
blowdown drum was also not instrumented to measure raffinate liquid level and monitor the state of this safety
barrier. As a result, the operators were blind to liquid level, if any, in the blowdown drum. As the drum filled (in about
6 min), its high-level alarm did not sound, which in the absence of liquid level instrumentation, meant that the operators
were unaware of the imminent danger. This was a major violation of the safety–diagnosability principle in the design of
the blowdown drum.
The accident was not inevitable even as the blowdown drum began to fill and very limited time was left to intervene: an
emergency shutdown of the process could have been initiated had the design of this final safety barrier complied with the
safety–diagnosability principle. Or alternatively a less ambitious statement can be made that compliance with this principle
may have provided a few precious minutes to sound the ISOM unit evacuation alarm (it was not), and the accident conse-
quences in loss of life and limb could have been reduced.
In short, the implementation of the safety–diagnosability principle in the design of the splitter tower and blowdown
drum could have supported the prevention of the Texas City refinery accident, or at least it would have mitigated its
consequences.
Table 1 summarizes these key events in the accident sequence, and Fig. 4 provides an illustration of the discrepancy be-
tween the actual hazardousness of the splitter section state and the operators’ assumed hazardousness.
In Figs. 3 and 4, we refer to the gap between the operator’s assumed hazard level and the actual hazard level in the plant
at any given time as one measure or dimension of degraded situation awareness. The concept of situation awareness involves
J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133 131

Table 1
Key events in the accident sequence.

Ei Event
0 Unit start-up
1 High-level alarm sound
2 Liquid reaches level of second alarm reached; alarm fails
3 Liquid reaches limit of level transmitter
4 Restart of the splitter section; two output valves shut
5 Additional burners lit in reboiler furnace
6 Loss of splitter tower primary function (liquid level floods 57 distillation trays)
7 Splitter tower overflows
8 Pressure safety relief valves open, liquid raffinate flows into blowdown drum
9 Blowdown drum overflows
10 Vapor cloud ignites

an operator’s comprehension of the dynamic situation that he/she is monitoring or controlling [8,7]. It is an important con-
struct in cognitive engineering and is meant to capture, among other things, the operator’s ‘‘understanding of the state of the
environment, including relevant parameters of the system’’ [9]. Hazard level in a system, in our case the non-dimensional
raffinate height parameter defined in Eq. (1), is a critical parameter of the system. The safety–diagnosability principle is thus
one important ingredient, among several others, for improving operators’ situation awareness. Violation of the safety–
diagnosability principle fails to provide critical parameters or states of the system, and as such, it directly leads to degraded
situation awareness. It is important to note that ‘‘situation awareness involves more than being aware numerous of pieces of
data’’ [8]; what we content is that an important state variable of the system, its hazard level, is a necessary, although not
sufficient, ingredient for an operator’s situation awareness.

4. Conclusion

The risk analysis and safety literature seems to have drifted toward the organizational and social sciences on the one
hand, and the refinement of probabilistic modeling tools on the other hand. Important contributions have been made in both
areas, in understanding the human and organizational contributions to accident causation, and in better modeling and
assessing various risk scenarios (and their perceptions) in a multi-stakeholder context.
One important aspect that has faded from this literature is the engineering and design side of system safety. The pendu-
lum may have swung too much in favor of the organizational and soft aspects of system safety, which prompted Rollenhagen
[19] to make the argument that too much of a focus on ‘‘safety culture’’ can become an excuse for not thinking through the
engineering and technical drivers of safety. Comments such as the following lend credence to Rollenhagen’s argument and
are significantly flawed and unhelpful in their implications: ‘‘considerable progress has been made in engineering out the
physical causes of accidents. It is now generally acknowledged that individual human frailty and organizational defects
lie behind the majority of remaining accidents’’ [14].
The safety–diagnosability principle here proposed brings back some emphasis on engineering and system design issues in
support of accident prevention. We argued that the recognized defense-in-depth safety strategy be complemented with this
principle, without which it can degenerate into an ineffective defense-blind strategy, as seen in the case of the Texas City
refinery accident.
Violation of the safety–diagnosability principle highlighted not the causal chain of an accident sequence—why the acci-
dent happened—but some causal factors that failed to support accident prevention—why blocking the accident sequence did
not happen. Compliance with the safety–diagnosability principle provides one way to improve operators’ sensemaking and
situational awareness after an adverse initiating event has occurred, and, as argued in this work, it provides them with more
time to understand an unfolding accident sequence to intervene and abate it. As such, the safety–diagnosability principle is
synergetic with organizational factors in support of accident prevention, in particular safety training, which can be shaped by
and include off-nominal conditions or states flagged by features implementing said principle (and courses of actions to
follow).
We believe the safety–diagnosability principle (SDP) provides a rich basis for further research and safety innovations. For
example, we recommend:

1. That SDP be integrated into the traditional Probabilistic Risk Assessment (PRA) technique for an improved risk analysis
methodology, and to quantify the impact (cost-benefit) of its implementation.
2. That various industries investigate ways for implementing the SDP in support of their current safety strategies (in par-
ticular the nuclear and chemical industries, where defense-in-depth and layers-of-protection are adopted).
3. That regulatory agencies overseeing hazardous industries, and/or professional societies, consider developing guidelines in
support of the SDP.
4. That accident investigation boards (e.g., NTSB, CSB) consider examining the reasons for failure to abate an accident
sequence (violation of the SDP) in addition to their standard causal analysis of an accident under investigation. This would
help them formulate a richer set of recommendation in support of accident prevention.
132 J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133

ACCIDENT PATHOGENS SELECT EVENTS IN THE ACCIDENT SEQUENCE BARRIERS & ALARMS

Weak safety
culture

Missed pre-start-up
Uninvestigated safety review
near-misses March 23, 2005

Unit start-up early morning


Routine violation
of procedures
High-level alarm sound 3:09 am B1:Δh = first level alarm - nominal level
Alarm disregarded
Poor safety
practices Second alarm Liquid reaches level of
fails second alarm B2:Δh = second level alarm - first level alarm

Maintenance
shortcomings Liquid reaches limit of
level transmitter

5:00 am
Inadequate Night operator leaves
communication End of night-shift 6:00 am

Two output valves Restart of splitter 9:50 am Unobservable


shut section Interval

Design flaws
Additional burners lit
11:16 am
in reboiler furnace

Poor instrumentation

Splitter tower
1:14 pm B3:Δh = total tower height - secon level alarm
overflows
Poorly designed
computer displays
Pressure safety Liquid raffinate flows
relief valves open into blowdown drum

Obsolete & Blowdrum 1:20 pm B4:Δh = height of the blowdown drum


hazardous design overflows

Vapor cloud
ignites

NOTE:
Explosion hits 1:20+pm Dotted pathogens lines
Trailer siting
the occupied trailers indicate two or more
contributing factors
timeline not to scale

Fig. A1. Visual recap of some of the accident pathogens and select events in the accident sequence at the Texas City refinery.

Finally, we recognize the strong connection between the safety–diagnosability principle and Human Reliability Analysis
(HRA), and we propose to carefully examine this relationship in future work.

Appendix A

See Fig. A1.

References

[1] AICHE. Layers of protection analysis: simplified process risk assessment. New York: American Institute of Chemical Engineers, Center for Chemical
Process Safety; 2001.
[2] Bakolas E, Saleh JH. Augmenting defense-in-depth with the concepts of observability and diagnosability from control theory and discrete event
systems. Reliab Eng Syst Saf 2011;96(1):184–93.
[3] BP, Fatal accident investigation report: isomerization unit explosion (Final report). December 2005. <www.bp.com/liveassets/bp_internet/us/bp_us/
final_report.pdf> [accessed 10/30/2012]; 2005.
[4] Choudhry MR, Fang, Mohamed S. The nature of safety culture: a survey of the state-of-the-art. Saf Sci 2007;45(10):993–1012.
[5] Cowlagi RV, Saleh JH. Coordinability and consistency in accident causation and prevention: formal system-theoretic concepts for safety in multilevel
systems. Risk Anal 2013;33(3):420–33.
[6] CSB, 2007. US Chemical Safety and Hazard Investigation Board. Investigation report: refinery explosion and fire.
[7] Durso FT, Sethumadhavan A. Situation awareness: understanding dynamic environments. Human Fact: J Human Fact Ergonomics Soc
2008;50(3):442–8.
J.H. Saleh et al. / Engineering Failure Analysis 36 (2014) 121–133 133

[8] Endsley MR. Toward a theory of situation awareness in dynamic systems. Human Fact: J Human Fact Ergonomics Soc 1995;37(1):32–64.
[9] Endsley MR. Measurement situation awareness in dynamic systems. Human Fact: J Human Fact Ergonomics Soc 1995;37(1):65–84.
[10] Hopkins A. Was the Three Mile Island a normal accident? J Contingencies Crisis Manage 2001;9(2):65–72.
[11] Kaszniak M, Holmstrom D. Trailer sitting issues: BP Texas City. J Hazard Mater 2008;159:105–11.
[12] Kletz TA. Hazop and Hazan: identifying and assessing process industry hazards. 4th ed. Philadelphia: Taylor & Francis; 1999.
[13] Le Bot P. Human reliability data, human error and accident models – illustration through the Three Mile Island accident analysis. Reliab Eng Syst Saf
2004;83(2):153–67.
[14] Lee T, Harrison K. Assessing safety culture in nuclear power stations. Saf Sci 2000;34(1–3):61–97.
[15] Leveson NG. A new accident model for engineering safer systems. Saf Sci 2004;42(4):237–70.
[16] NRC, US, Causes and significance of design-basis issues at U.S. Nuclear Power Plants, Draft Report. Washington (DC): US Nuclear Regulatory
Commission, Office of Nuclear Regulatory Research; 2000.
[17] Rasmussen J. Risk management in a dynamic society: a modeling problem. Saf Sci 1997;27:183–213.
[18] Reason J. Managing the risks of organizational accidents. Vermont: Ashgate; 1997.
[19] Rollenhagen C. Can focus on safety culture become an excuse for not rethinking design of technology? Saf Sci 2010;48(2):268–78.
[20] Saleh JH, Saltmarsh E, Favarò FM, Brevault L. Accident precursors near misses and warning signs: critical review and formal definitions within the
framework of Discrete Event Systems. Reliab Eng Syst Saf 2013;114:148–54.
[21] Saleh JH, Marais KB, Bakolas E, Cowlagi RV. Highlights from the literature on accident causation and system safety: review of major ideas, recent
contributions, and challenges. Reliab Eng Syst Saf 2010;95(11):1105–16.
[22] Sorenson JN. Safety culture: a survey of the state of the art. Reliab Eng Syst Saf 2002;76(2):189–204.
[23] Sorensen JN, Apostolakis GE, Kress TS, Powers DA. On the role of defense in depth in risk-informed regulation, In: Proceedings of PSA ’99, international
topical meeting on probabilistic safety assessment, Washington (DC), August 22–26, 1999, American Nuclear Society, La Grange Park, Illinois; 1999. p.
408–13.
[24] Summers AE. Introduction to layers of protection analysis. J Hazard Mater 2003;104(1–3):163–8.
[25] Svedung I, Rasmussen J. Graphic representation of accident scenarios: mapping system structure and the causation of accidents. Saf Sci
2002;40(5):397–417.
[26] Turner BA. Man-made disasters. Wykenham Publications; 1978.
[27] Venkatasubramanian V, Rengaswamy R, Kavuri SN, Yin K. A review of process fault detection and diagnosis: Part III: Process history based methods.
Comput Chem Eng 2003;27(3):327–46.

Vous aimerez peut-être aussi