Académique Documents
Professionnel Documents
Culture Documents
44
generally been discredited for process safety incidents, especially complex incidents resulting in, or having the potential to result in, serious or catastrophic consequences. The tool is still often used for investigating low severity incidents, including minor occupational injuries. The focus of informal, one-on-one investigations was often limited to determining the immediate remedies that would prevent an exact repeat of the incident circumstances. For example, a common finding may have been that an operator failed to follow an established procedure. Based on the finding, the investigator might proceed to evaluate how best to get this specific operator to follow the procedure as a recommendation to prevent recurrence. This informal type of investigation required little time or training, but the disadvantages of the approach were grave. The resulting investigations did not provide a comprehensive understanding of: Why the incident really occurred What was needed to prevent recurrence of similar incidents elsewhere How the management systems failed and should be corrected The investigation committee method is another unsuccessful approach. This unstructured approach is historically significant and was judged inadequate for investigating process safety incidents because it produced incomplete and inconsistent results. It often did not find the root cause level or all the root causes.
45
The overall investigation approach within the process safety field is similar across many of the available methodologies. However, differences arise in the particular emphasis. Some methodologies focus on management and organizational oversights and omissions, while others consider human performance issues in more depth. Busy personnel working within the organization where the incident occurred often perform investigations. Investigative tools, therefore, need to be practical and relatively easy to use. Sometimes investigators may use judgment to make adaptations to selected tools based on the size and complexity of the investigation effort. Judgment based on knowledge and experience is important in determining how and why an incident occurred.
46
47
48
perspective and experience than one-on-one investigations. The group will typically have an understanding of the sequence of occurrences that led up to the incident through a timeline or sequence diagram. The group may also have identified causal factors, and typically focus on establishing barriers to reduce the risk (probability or consequences) of recurrence. The disadvantage to unstructured group brainstorming is that the discussion may be dominated by certain vocal individuals, who are not shy about stating their opinions, and who may or may not be experts but believe that they are. Each person may enter the discussion with his or her own bias, thereby directing the thinking toward incorrect conclusions. The results of group brainstorming are very dependent on the collective experiences of the group, which may be incomplete, lacking critical knowledge or a competency skill set. Two different groups may reach two different conclusion on the cause of an incident. A slightly more structured approach uses What-If Analysis,(1) which involves the team asking What if? questions that usually concern equipment failures, human errors, or external occurrences. Some examples are: What if the procedure was wrong? What if the steps were performed out of order? The questions can be generic in nature or highly specific to the process or activity where the incident occurred. Sometimes these questions are preprepared by one or two individuals, which may also potentially bias the discussion. The 5-Whys tool (2) is another approach used to add some structure to group brainstorming. The tool utilizes a logic tree approach without actually drawing the logic tree diagram. The group questions why unplanned, unintended, or adverse occurrences occurred or conditions existed. Typically the group needs to ask why? five times to reach root causes; hence the name. Judgment and experience are required to use the 5-Whys tool effectively to reach management system failures. The level of analysis is up to the group and does not always ensure reaching root causes. 4.4.2. Timelines The first phase of incident investigation involves developing a preliminary chronological description of the sequence of occurrences that led to the failure. This requires collecting evidence through interviewing key witnesses and examining all relevant evidence (equipment and documents) in order to piece together the circumstances of the incident in chronological order. This timeline development can range from a simple list of occurrences in sequence to diagrams showing occurrences and conditions along a straight axis. Development of the timeline should start as soon as facts emerge about the incident. By starting early, the investigator will become aware of
49
gaps in the sequence of occurrences that may be used as a focus for subsequent stages of the investigation in order to resolve the gaps. Construction of the timeline is an iterative activity. The timeline is refined and adjusted as the team gains a more complete and accurate understanding of the actual incident scenario and sequence of occurrences. Timelines alone do not identify the root causes of an incident. They should be used in conjunction with other tools, described in the following sections. For a detailed description of how to develop timelines along with several examples, refer to Chapter 9.
4.4.3. Sequence Diagrams Although investigators had previously used diagrams and charts, The Bureau of Surface Transportation Safety of the National Transportation Safety Board (NTSB) introduced Multilinear Event Sequencing (MES) concepts in the early 1970s to analyze and describe accidents. Ludwig Benner, Jr., and his colleagues at NTSB pioneered the development of MES concepts and many implementing procedures such as Sequentially Timed Events Plot (STEP).(3,4) W. G. Johnson adapted the basic MES concepts to create a form of sequence diagrams known as Events & Causal Factors Charting (E&CF)(5) first introduced as part of the MORT program. The term causal factor chart is commonly used for a sequence diagram in this industry. Techniques for developing sequence diagrams encompass a number of fundamental principles. Before developing a sequence diagram, it is important to define the end of the incident sequence. The starting point of the incident must also be defined but may not be immediately apparent until partway through the investigation. Typically, sequence diagrams start at the end and work backward, identifying the immediate contributing events first. Like timelines, construction of the diagram may start as soon as facts emerge about the incident in order to identify gaps for resolution. It is important to choose a format that may be easily updated and revised as new evidence is gathered, such as Post-It notes on which a single event or condition is written. Like timelines, sequence diagrams do not identify root causes, and therefore they should be used in conjunction with other tools. The mechanics of these tools are relatively easy to learn, but the investigator must exercise care to avoid locking into a preconceived scenario. For more information on sequence diagram tools refer to Chapter 9.
50
4.4.4. Causal Factor Identification Once the evidence has been collected and a timeline or sequence diagram developed, the next phase of the investigation involves identifying the causal factors. These causal factors are the negative occurrences and actions that made a major contribution to the incident. Causal factors involve human errors and equipment failures that led to the incident, but can also be undesirable conditions, failed barriers (layers of protection, such as process controls or operating procedures), and energy flows. Causal factors point to the key areas that need to be examined to determine what caused that factor to exist. There are a number of tools, such as Barrier Analysis (6, 7) and Change Analysis,(8) that can assist with the identification of causal factors. The concepts of incident causation encompassed in these tools are fundamental to the majority of investigation methodologies. (See Chapter 3 for information about the Domino Theory, System Theory, and HBT Theory.) The simplest approach involves reviewing each unplanned, unintended, or adverse item (negative event or undesirable condition) on the timeline and asking, Would the incident have been prevented or mitigated if the item had not existed? If the answer is yes, then the item is a causal factor. Generally, process safety incidents involve multiple causal factors. Causal factor identification tools are relatively easy to learn and easy to apply to simple incidents. For more complex incidents with complicated timelines, one or more causal factors can be overlooked, ultimately leading to missed root causes. Another disadvantage is that an inexperienced investigator could potentially assume that suppositions are causal factors, when in reality the supposed event or condition did not occur. Once the causal factors have been identified, the factors are analyzed using a root cause analysis tool, such as 5-Whys or predefined trees. See Chapter 9 for a more detailed discussion of Barrier Analysis (sometimes called hazardbarriertarget analysis or HBTA) and Change Analysis (also referred to as Change Evaluation/Analysis or CE/A). In essence, these tools act as a filter to limit the number of factors, which are subjected to further analysis to determine root causes. 4.4.5. Checklists Checklist analysis tools can be a user-friendly means to assist investigation teams as they conduct root cause analysis.(1) Each causal factor is reviewed against the checklist to determine why that factor existed at the time of the incident. The Systematic Cause Analysis Technique (SCAT)(9, 10) is an example of a proprietary checklist tool.
51
The comprehensiveness of the various checklists varies greatly. Some are very detailed with numerous categories and subcategories, whereas others do not reach down to the level of root causes. The advantage of checklists is that they are simple to use and the investigation team does not require a lot of training. The checklist essentially directs the investigation team and keeps them on track. Another advantage is consistency; the standard categories (and subcategories) on the checklist can be easily trended to identify recurring problems at a facility. A disadvantage is that a checklist may allow an investigation team to jump to conclusions, and does not provide the opportunity to think outside the box. This is especially important if the checklist is one of the less comprehensive types. It is also tempting to use the checklist too early, before all causal factors have been identified. Be sure to determine what happened and how it happened before determining why it happened. Otherwise, the team will think it has identified the right root causes, when in reality not all of the root causes have been determined. Some checklists must be used carefully because to the casual observer they can imply blame, contrary to the intended policy of discouraging blame seeking. Checklists may also be used to supplement other tools; for example, checklists on human factors may be used in conjunction with logic trees. Similarly, checklists may be used in combination with structured brainstorming tools such as What If/Checklist and Hazard and Operability (HAZOP) Analysis.(1) It is also a good practice to apply a tool like the 5-Whys to the root causes identified from the checklist to verify whether they are truly root causes. The use of checklists, such as SCAT, is discussed in Chapter 9. See Chapter 6 for an example of a human factors checklist. 4.4.6. Predefined Trees Owing largely to a scarcity of investigation methods available 30 years ago, attempts were made to apply engineering fault tree analysis (FTA) logic to incident reports. This approach appeared to demonstrate value. It was observed that many of the incidents displayed similar patterns of causal factors. This commonality led to the development of the Management Oversight Risk Tree (MORT)(11, 12, 13) that was based loosely on fault tree conventions. MORT was modeled on safety management system concepts that were considered best practice at that time. A simpler Mini-MORT variation has been developed to reduce complexity. (14) Today there are several predefined tree tools available from public and proprietary sources, including TapRooT,(15, 16) Seeking Out the
52
Underlying Root Causes of Events (SOURCE),(17) and Comprehensive List of Causes (CLC).(18) The majority of the predefined tree tools are prescriptive and list potential root causes for consideration among their branches. This offers the investigator a systematic method of considering the possible root causes associated with an incident. This approach encourages investigators to contemplate a wide range of causal factors and not just those that come to mind through brainstorming. The investigator does not have to build the tree, but rather apply the causal factors to each branch in turn, and discard those branches that are not relevant to the specific incident. Like checklists, the comprehensiveness of the various predefined trees varies. Some are very detailed with numerous categories and subcategories, whereas others may not fully reach root causes. This is hardly surprising, as the predefined trees are essentially a graphical representation of numerous checklists, organized by subject matter, such as human error, equipment failure, or other topics. The more comprehensive techniques were developed from many years of incident experience and management system experience across the chemical and allied industries. The advantages of predefined trees are that they may bring expertise into the investigation that the team does not have, and by presenting all investigators with the same classification system, greater consistency is encouraged among investigators. Largely, the technique ensures a comprehensive analysis and simplifies statistical trend analysis of the collected data. A disadvantage of predefined trees, as with a checklist, may be a tendency to discourage lateral thinking if the incident involves novel factors not previously experienced by those who developed the original tree. The use of predefined trees, overall, requires less resources and prior training than the nonprescriptive techniques involving team-developed trees discussed below. Some organizations have taken a generic, predefined tree and structured it along the lines of the companys management system. The effectiveness of a predefined tree is dependent on how well the tree models the data and system of dealing with the incident. When choosing a predefined tree, the user should confirm that the tree models the technology and system of the user. The use of predefined trees is also discussed in detail in Chapter 9. See the accompanying CD-ROM for examples of predefined trees, including Comprehensive List of Causes (CLC) and SOURCE/Root Cause Map. 4.4.7. Team-Developed Logic Trees A number of deductive techniques require that the investigation team develop a tree. This is accomplished by reasoning to organize causal factors into a diagram (tree) and define their interrelationship. These logic
53
trees can vary over a wide range from simple trees to complex fault trees. Most start at the end occurrence and work backward until a point is reached at which the team agrees it would be unproductive to go further. The earliest logic trees were based on engineering fault tree analysis methods. Today, companies use a number of variations or combinations of logic trees and call them by different names, such as Why Tree,(19) Causal Tree,(20, 21) Cause and Effect Logic Diagram (CELD),(22) and Multiple-Cause, Systems-Oriented Incident Investigation (MCSOII).(23, 24) The tools have more similarities than differences. Logic trees are best developed using a multidiscipline team. Starting at the end event, the discussion is guided by asking, why? and recording the results in a tree format. The general approach encourages investigators to contemplate a wide range of causal factors, but relies on group discussion. This makes its success dependent on the experience and knowledge of the team. These tools recognize that incidents have multiple, underlying causes, and the investigations attempt to identify and implement system changes that will eliminate recurrence not only of the exact incident, but of similar occurrences as well. There are five main strengths to a logic tree approach. 1. It provides the ability to separate a complex incident into discrete smaller events (segments) and then to examine each piece individually. 2. It allows the investigation team to understand how the causes worked together to allow the incident to occur. 3. It improves the quality of investigations by directing the focus past the immediate surface causes to the underlying root causes and management system failures, and mandating a search for multiple causes. 4. It offers a clear record that the investigation team understands the incident through the logic diagram/tree. 5. It provides an opportunity to include human factors in the incident investigation process. The disadvantages of logic trees center on their dependency on the cumulative expertise within the assembled investigation team and their ability to compile sound fact finding related to the incident. No technique can be a substitute if the team does not have the requisite knowledge and experience. Logic trees can also be somewhat time consuming to develop and may not use a consistent set of categories and subcategories, making trend analysis of recurring problems difficult. Examples of logic treesfault, event, causal, and whyare discussed below in order of increasing rigor. Chapter 9 contains detailed information on developing logic trees.
54 Why Tree
The Why Tree provides a simple method for depicting the logical relationship between causes and effects of an incident.(19) The process starts by displaying all direct losses and associated consequences in separate boxes. A drill down question that asks why? challenges each box. Plausible explanations are entered into new boxes attached by straight lines to the subject or receptor box above. Ultimately, the page will fill up with several boxes attached by straight lines. Unlike a formal fault tree, this method is empirical and does not require logic gates to be established. All boxes are scrutinized to determine their validity. If the content of a box is refuted by hard evidence, it is crossed off with an appropriate explanation. Otherwise, the boxes are left connected to show the logical progression upward toward an incident. At the base of the why tree where fundamental management systems are implicated, the process ceases. The investigation team should then focus its efforts on the rigor and quality of the management systems that could have prevented the incident. Recommendations are developed to address system deficiencies and these are tested against the why tree. An example is included on the accompanying CD ROM. Causal Tree Causal Trees were developed in an effort to use the principles of deductive logic found in Fault Tree but make it more user-friendly. Originally, private companies developed the Causal Tree Method (CTM) for safety, process safety, and environmental incident investigations applications. Rhone-Poulenc, for example, was an early user.(20, 21) Multiple-Cause Systems Oriented Incident Investigation (MCSOII) is another name for the CTM. At this time, most companies use simplified versions of fault trees for complex incident investigations. Causal tree methods rely on group discussion among experts from different fields, including workers, witnesses, supervisors, and process safety specialists. Starting at the end event, and working one level of the tree at a time, the group asks three questions: 1. What was the cause of this result? 2. What was directly necessary to cause the end result? 3. Are these factors (identified from question 2 above) sufficient to have caused the result? In recognition that most incidents have multiple root causes, the team is generally required to identify a minimum of three factors; one from each of the following categories: organizational, human, and material factors.
55
Causal trees may be drawn from top to bottom, left to right, or right to left. Connectors such as AND- and OR-gates are often omitted. Some methods use only AND-gates. Event Tree Another type of logic tree, the event tree, is an inductive technique. Event Tree Analysis (ETA) also provides a structured method to aid in understanding and determining the causes of an incident.(1) While the fault tree starts at the undesired event and works backward to identify root causes, the event tree looks forward to display the progression of various combinations of equipment failures and human errors that result in the incident graphically. Each event, such as equipment failure, process deviation, control function, or administrative control, is considered in turn by asking a simple yes/no question. Each is then illustrated by a node where the tree branches into parallel paths. Each relevant event is addressed on each parallel path until all combinations are exhausted. This can result in a number of paths that lead to no adverse consequences and some that lead to the incident as the consequence. The investigator then needs to determine which path represents the actual scenario. Generally, a qualitative event tree is developed when used for incident investigation purposes. Chapter 3 contains an example of an event tree in Figure 3-1 on page 34. Fault Tree Fault Tree Analysis (FTA) provides a structured method for determining the causes of an incident.(1, 25, 26, 27) The fault tree itself is a graphic model that displays the various combinations of equipment failures and human errors that can result in an incident. The undesired event appears as the top event and the trees are drawn from top to bottom. Two basic logic gates connect event blocks: the AND-gate and the OR-gate. The facts dictate the structure of the incident diagram and limit the influence of presupposed conclusions invariably drawn by team members before all of the facts are identified and logically matched. Logic rules are used to test the tree structure. The term fault tree means different things to different people. Some people use the term to describe trees that have frequency terms included. These quantitative trees can be solved mathematically to provide a frequency of the incident. However, for incident investigation, the term commonly refers to a qualitative tree.
56
57
Endnotes
1. Center for Chemical Process Safety. Guidelines for Hazard Evaluation Procedures, Second Edition with Worked Examples. New York: American Institute of Chemical Engineers, 1992. 2. ABS Consulting. Introduction to Reliability Management: An Overview. Knoxville, TN: ABS Group Inc. 2000. 3. Benner, L., Jr. 10 MES Investigation Guides. Oakton, VA: Starline Software Ltd. 2000.
58
4. Hendrick, K. and Benner, L., Jr. Investigating Accidents with S-T-E-P. New York: Marcel Dekker, 1987. 5. Buys, R. J., and Clark, J. L. Events and Causal Factors Charting. Revision 1, Idaho Falls, ID: System Safety Development Center, Idaho National Engineering Laboratory, 1978. (DOE 76-45/14 SSDC-14) 6. Dew, J. R. In Search of the Root Cause. Quality Progress. 23(3): 97102, 1991. 7. Trost, W. A., and Nertney, R. J. Barrier Analysis. Idaho Falls, ID: System Safety Development Center. Idaho National Engineering Laboratory 1985. (DOE/SSDC 76-45/29) 8. Kepner, C. H., and Tregoe, B. B. The Rational Manager, 2nd ed. Princeton, NJ: Kepner-Tregoe, Inc., 1976. 9. Bird, F. E., Jr., and Germain, G .L. Practical Loss Control Leadership. Loganville, GA: International Loss Control Institute (ILCI), 1985. 10. International Loss Control Institute. SCATSystematic Cause Analysis Technique. Loganville, GA: Det Norske Veritas, 1990. 11. Johnson, W. G. MORT, Safety Assurances Systems. New York: Marcel Dekker, 1980. 12. Department of Energy, Accident/Incident Investigation Manual. 2nd ed., Idaho Falls, ID: System Safety Development Center. Idaho National Engineering Laboratory, 1985. (DOE/SSDC 76-45/27) 13. Buys, R. J. Standardization Guide for Construction and Use of MORT-Type Analytical Trees. Idaho Falls, ID: System Safety Development Center. Idaho National Engineering Laboratory 1977. (ERDA 76-45/8) 14. Ferry, T.S. Modern Accident Investigation, and Analysis 2nd ed. New York: Wiley, 1988. 15. Unger, L.. and Paradise, M. TapRooTA Systematic Approach for Investigating Incidents. Paper, Process Plant Safety Symposium. Houston, TX, 1992. 16. Paradise, M. Root Cause Analysis and Human Factors. Human Factors Bulletin. 34(8): 15, 1991. 17. ABS Consulting. Root Cause Analysis Handbook: A guide to effective Incident Investigation. Knoxville, TN: ABS Group Inc., 1999. 18. BP (formerly BP Amoco). Incident Investigation. Root Cause Analysis Training. Comprehensive List of Causes. London, 1999. 19. Nelms, Robert, C. The Go Book. C. Robert Nelms publisher. 20. Leplat, J. Accident Analyses and Work Analyses. Journal of Occupational Accidents. 1: 331340, 1978. 21. Boissieras, J. Causal Tree, Description of the Method. Princeton, NJ: RhonePoulenc, 1983. 22. Mosleh, A. et al. Procedures for Treating Common Cause Failures in Safety and Reliability Studies. Palo Alto, CA: Electric Power Research Institute. 1988. (EPRI NP-5613) 23. Dowell, A. M. Guidelines for Systems Oriented Multiple Cause Incident Investigations. Deer Park, TX: Rohm and Haas Texas Inc. Risk Analysis Department, 1990.
59
24. Anderson, S. E., and Skloss, R. W. More Bang for the Buck: Getting the Most from an Accident Investigation. Paper, Loss Prevention Symposium. New York: AIChE, 1991. 25. Browning, R. L. Analyze Losses by Diagram. Hydrocarbon Processing, 54:253257,1975. 26. Arendt, J. S. A Chemical Plant Accident Investigation Using Fault Tree Analysis. Proceedings of 17th Annual Loss Prevention Symposium. Paper 11a, New York: AIChE, 1983. 27. Vesely, W.E. et al. Fault Tree Handbook. Washington, DC: US Government Printing Office, 1981. (NUREG-0492)
Additional Reference
Nelms, C. Robert. What You Can Learn from Things that Go Wrong. C. Robert Nelms Publisher, 1996.