Vous êtes sur la page 1sur 11

Reliability Engineering and System Safety 53 (1996) 73-83

1996 Elsevier Science Limited


Printed in Northern Ireland. All rights reserved
ELSEVIER PII: S0951-8320(96)00010-5 0951-8320/96/$15.00

The basic concepts of failure analysis


Marvin Rausand" & Knut O i e n b
UDepartment of Production and Quality Engineering, Norwegian University of Science and Technology, N-7034 Trondheim,
Norway
bSINTEF Safety and Reliability, N-7034 Trondheim, Norway

(Received 20 April 1995; revised 16 October 1995; accepted 8 January 1996)

This paper discusses basic concepts of failure analysis and gives advice on how
to interpret, e.g., function, failure, failure mode, failure cause, and failure
effect, using a gate valve as an example. A general approach to identification
and classification of functions and failure modes is presented and the various
steps of failure analysis are discussed. The OREDA (Offshore Reliability
Data) database is used to illustrate the interpretation of the basic concepts.
Limitations in the use of OREDA are pointed out and proposals for
enhancements of this database are given. 1996 Elsevier Science Limited.

1 INTRODUCTION . In practical applications the failure m o d e


description has a tendency to slide towards a
Failure is a fundamental concept of any reliability failure cause description or towards a failure
analysis. According to accepted standards 1 failure is effect description. 5 This also has to do with the
defined as 'the termination of the ability of an item to level of indenture being analyzed. A failure
perform a required function.' m o d e on one level will be a failure cause on the
The quality of a reliability analysis strongly depends next higher level.
on the analyst's ability to identify all the required . The failure m o d e concept is often unfamiliar to
f u n c t i o n s - - a n d hence all the f a i l u r e s - - o f the item that laymen. A maintenance engineer may conduct
is subject to analysis. maintenance and repair failures all his life
Most reliability analysts do not apply any formal without knowing the failure m o d e c o n c e p t - - a n d
procedure to identify functions. This m a y imply that not feeling any need for it. In the workshop
only part of the required functions is identified and manual for the Ford Sierra, 6 the concept failure
hence analyzed. m o d e is, for example, not mentioned at all, even
Formal procedures to identify functions and failures though failure modes are treated. Instead of
are also seldom used when establishing reliability data classifying failure modes, they are simply
sources like O R E D A . 2 Shortcomings of data sources 'grouping the difficulties', and use concepts like
will often lead to similar shortcomings of reliability s y m p t o m s and reasons instead of failure modes
analyses, since m a n y analysts are using data sources as and failure causes.
a guide w h e n identifying failure modes. They m a y
tend not to include specific failure modes in their The objectives of this p a p e r are: (1) to m a k e clear
analysis if the failure modes are not covered in the the interpretation of the basic concepts; function,
data sources. failure, failure mode, failure cause, and failure effect,
Failures are often classified into failure modes. The (2) to discuss problems connected to the identification
failure m o d e concept is generally recognized as a and classification of failures, and (3) to point out
useful and necessary concept in failure analysis. possible solutions to these problems.
According to British Standard BS 5760, Part 5, 3 failure
mode is defined as 'the effect by which a failure is
observed on a failed item'. 2 REQUIRED FUNCTIONS
The failure m o d e concept does, however, not have a
well defined interpretation. T h e r e are several reasons
The functions of an item may be defined as: s 'the
for this:
normal or characteristic actions of an item, sometimes
1. The definition of a failure m o d e is not the same defined in terms of p e r f o r m a n c e capabilities.'
in the various standards, t'3'4 In m a n y guidelines and textbooks, 7 it is recom-
73
74 M. Rausand, K. Oien

mended that the various functions are expressed in the usually happens when equipment has been
same way, as a statement comprising a verb plus a modified frequently over a period of years, or
n o u n - - f o r example, 'close flow', 'contain fluid', when new equipment has been overspecified'.
'transmit signal'. Superfluous functions are sometimes present
The term item is in the following used to denote any when the item has been designed for an
component, subsystem, or system that can be operational context that is different from the
considered as an entity. actual operational context. In some cases failure
of a superfluous function may cause failure of
2.1 Classification of functions other functions.
These classes are not necessarily disjoint. Some
A complex item may have a high number of required functions may be classified in more than one class.
functions. All functions are, however, not equally To establish maintenance strategies and especially
important, and a classification may therefore be an aid function testing strategies, it is important to
for analysis purposes. One way of classifying functions distinguish between so-called evident and hidden
is: (dormant) failures. The following classification of
1. Essential functions: These are the functions functions may therefore prove necessary: 9
required to fulfil the intended purpose of the 1. On-line functions: These are functions operated
item. The essential functions are simply the either continuously or so often that the user has
reasons for installing the item. Often an essential current knowledge about their state. The
function is reflected in the name of the item. An termination of an on-line function is called an
essential function of a pump is, for example, to evident failure.
pump a fluid. 2. Off-line functions: These are functions that are
2. Auxiliary functions: These are the functions that used intermittently or so infrequently that their
are required to support the essential functions. availability is not known by the user without
The auxiliary functions are usually less obvious some special check or test. An example of an
than the essential functions, but may in many off-line function is the essential function of an
cases be as important as the essential functions. emergency shutdown (ESD) system. Many of the
Failure of an auxiliary function may in many protective functions are off-fine functions. The
cases be more safety critical than a failure of an termination of the ability to perform an off-line
essential function. An auxiliary function of a function is called a hidden failure.
pump is, for example, containment of the fluid.
3. Protective functions: The functions intended to Items may, in general, have several operational
protect people, equipment and the environment modes, and several functions for each operational
from damage and injury. The protective mode.
functions may be classified as: In this paper we will, as an example, consider a
process shutdown gate valve with a spring-loaded
(a) safety functions (i.e., to prevent accidental
hydraulic fail-safe actuator. The valve is held open by
events a n d / o r to reduce consequences to people,
hydraulic pressure. When the pressure is bled off, the
material assets and the environment) valve closes by spring force. A sketch of a typical gate
(b) environment functions (e.g., anti-pollution
valve is shown in Fig. 1.
functions during a normal operation)
The process shutdown valve in Fig. 1 has four
(c) hygiene functions
different operational modes; (1-) close flow, (2) keep
Safety protective functions are further discussed, flow closed, (3) open flow, and (4) keep flow open.
e.g., by Moubray 8 pp. 40-42. The modes (2) and (4) are stable states, whilst the
modes (1) and (3) represent 'transitions' between the
4. Information functions: These functions comprise
stable states.
condition monitoring, various gauges and alarms,
The essential function of this valve is to 'close flow',
etc.
i.e., to shut down the process. The establishment of
5. Interface functions: These functions apply to the
the different operational modes of the valve is
interfaces between the item in question and
recommended for two reasons:
other items. The interfaces may be active or
passive. A passive interface is, for example, 1. It reveals other functions that might be
present when the item is a support or a base for overlooked when focusing too much on the
another item. essential function.
6. Superfluous functions: According to Moubray: 8 2. It provides a structured basis for the identifica-
'Items or components are sometimes encoun- tion of failure modes that are completely
tered which are completely superfluous. This connected to, and dependent on, the given
Basic concepts of failure analysis 75

Hydrau1%c
l operating
pressure
Cap
Actuatorbody
l~ Stem
Spring

Stemseal
Seat
Gate

Working
pressure

Fig. 1. Hydraulically operated fail-safe gate valve.

operational mode. A valve can, e.g., not close direction by asking 'why' a function is necessary. This
spuriously in closed position, i.e., in operational is repeated until functions on the system level are
mode 'keep flow closed'. reached.
Operational modes are therefore an aid in The FAST diagram is then established displaying a
identifying both functions and failure modes. graphical picture of all of the systems functions at
different levels, linking the individual functions
2.2 Identification of functions together in a network.
For a new design the question will be: what
The functions of an item may be split into hardware is best suited to fulfil the functions. For
subfunctions on an increasing level of detail. A existing systems, superfluous functions can be revealed
function F, on top level may, for example, be split into and/or better solutions may be detected. For value
ni subfunctions, F~.I,...,F~.... on the next lower level of engineering 1~ the key question is: are there any other
indenture. The functions on this level, for example F~,j, ways of achieving the same functions, and at the same
may be split further into nij subfunctions, F~j.~,..., F/j.,,.j time reducing the life cycle costs (LCC).
and so on. The number of functions may increase
significantly as this development proceeds, dependent 2.2.2 Functional block diagrams
on the complexity of the item. To retain the understanding of the functional
interactions in the functional hierarchy, and to clarify
2.2.1 FAST diagrams the required input and output interfaces, it is often
The functional relationships may be illustrated by the useful to establish so-called functional block diagrams.
so-called functional analysis system technique (FAST) A detailed description of this type of diagram is given
diagrams as described, for example, by Fox. ~ The by, e.g., Pahl & Beitz. 12
FAST technique may assist i n identifying and Functional block diagrams are used to portray the
establishing the required functions. design requirements of the item in a pictorial manner,
The FAST diagram is created by asking 'how' an illustrating series-parallel relationships, possible feed-
already established function is accomplished. This is backs, the hierarchy of functions, and functional
repeated until functions on the lowest level are interfaces. 11 The required control signals are also
reached. specified in the functional block diagram, together
The diagram can also be developed in the opposite with the environmental stresses affecting the various
76 M. Rausand, K. Oien

Control system between a computed, observed or measured value or


s~,~ ~ ; , . ~ ....... condition and the true, specified or theoretically
correct value or condition.' A n error is (yet) not a
Fluid in ~ Keep flow open Fluid out ,.. failure because it is within the acceptable limits of
deviation from the desired p e r f o r m a n c e (target value).
An error is sometimes referred to as an incipient
. . . . . . . . . . . . . . .
failure. 2,15.16
Environment According to I E C 50(191) 1failure is the event when
Fig. 2. Top level functional block diagram for a valve in the a required function is terminated (exceeding the
operational mode 'keep flow open'. acceptable limits), while fault is 'the state of an item
characterized by inability to p e r f o r m a required
function, excluding the inability during preventive
functions. Functional block diagrams are designated as maintenance or other planned actions, or due to lack
top level, first level, second level, and so on. of external resources'. A fault is hence a state
A simple top level functional block diagram of the resulting from a failure.
process shutdown valve in the operational m o d e (4) The distinction between failure (or fault) and error
' k e e p flow o p e n ' is shown in Fig. 2. is essential in failure analysis, because this describes
Functional block diagrams are r e c o m m e n d e d by the borderline between what is a failure and what is
I E C 81213 and M I L - S T D 1629A 4 as a basis for failure not.
modes and effects analysis ( F M E A ) , and also by
Smith TM as a basis for reliability centred maintenance 3.2 Failure modes
(RCM).
A failure m o d e is a description of a fault, i.e., how we
can observe the fault. Fault m o d e should therefore be
a m o r e appropriate term than failure mode. I E C
3 FAILURES AND FAILURE MODES 50(191) 1 deprecates the use of the term 'failure m o d e ' ,
and thus denotes F M E A as fault modes and effects
Even if we are able to identify all the required analysis, while BS 57603 argues that the older term
functions of an item, we may not be able to identify all failure modes and effects analysis has been retained in
the failure modes. This is because each function may order to align it with the current version of I E C 81213
have several failure modes. No formal procedure which is widely accepted.
seems to exist that m a y be used to identify and classify
the possible failure modes. 3.2.1 Identification of failure modes
T o identify the failure modes we have to study the
outputs of the various functions. Some functions may
3.1 Failures, faults, and errors
have several outputs. Some outputs may be given a
very strict definition, such that it is easy to determine
The term failure is often confused with the terms fault in an actual case whether the output requirements are
and error. Various (conflicting) definitions exist. The fulfilled or not. In other cases the output may be
relationship between these terms as defined in I E C specified as a target value with some tolerance limits.
50(191) 1 is illustrated in Fig. 3. The consequence of a deviation from the target value
According to I E C 50(191) 1 an error is a 'discrepancy may be a function of the length of the deviation (see
Fig. 4), comparable to Taguchi's quality loss
functionJ 7
Performance If we again consider the process shutdown valve, it
should be designed with a specified closing time, for
example, 10 seconds. If the valve closes too slowly, it
will not function as a safety barrier. On the other
....................... ,,~ l _ Tot,get value
.I. Error / hand, if the valve closes too fast, we may get a
. . . . . . . . . . . . . . '--- o
pressure shock destroying the valve or the valve
flanges. Closing times between 6 and 14 seconds may,
for example, be acceptable, and we state that the
Failure / ' \
(event) / valve is functioning (with respect to this particular
Fault function) as long as the closing time is within this
(state) interval.
'~ Time The criticality of the failure will obviously increase
Fig. 3. Illustration of the difference between failure, fault, with the deviation from the target value. This situation
and error. is illustrated in Fig. 4.
Basic concepts of failure analysis 77

0 sec Shock
Control system

Failure modes Closing too fast


6 sec

i
Input Output
,,..] Close eceptable
10 sec (Nominal - target value)
Y flow eviation
- Momentum - Time
- Tolerances - Etc.
14 sec
- Lubrication Closing too slowly
Failure modes
- Etc.

Environment Failed to close


- Temparature
- Humidity
- Etc.

Fig. 4. Desired output and tolerances.

Examples of failure modes derived from the not achieved at all, or the quality of the function
function output deviations of closing the valve may be is far beyond what is considered acceptable.
(1) closing too slowly, (2) closing too fast, or (3) not . Partial loss of function: This group may be very
closing at all. Several apparent failure modes may be wide, and may range from the nuisance category
identified in a similar manner by studying the response almost to the total loss of function.
of the various required functions. Failure modes . Erroneous function: This means that the item
related to the various operational modes of the performs an action that was not intended, often
process shutdown valve are listed in Table 1. the opposite of the intended action.

3.2.2 Failure mode categories For the gate valve, the failure modes 'not opening
It is important to realize that a failure mode is a at all' and 'not closing at all' are obviously of the
manifestation of the failure as seen from the outside, category 'total loss of function' since it is not possible
i.e., the termination of one or more functions. to open (close) the valve on command. A partial loss
'Internal leakage' is thus a failure mode of the valve, of function will occur, for example, if the valve opens
since the valve loses its required function to close in (closes) too slowly or in a jerking mode (improper
the fluid. Wear of the valve seal, however, represents operation). The failure mode 'internal leakage', i.e.,
a cause of failure and is hence not a failure mode of leakage through the valve in closed position, may be
the valve. either a total loss of function or a partial loss of
Failure modes may be classified in three main function depending on the operational context. If the
groups related to the function of the item: leakage is strictly prohibited, the failure mode will
represent a total loss of function even for a tiny
1. Total loss of function: In this case a function is leakage.

Table 1. Failure modes related to the various operational 3.3 A general classification s c h e m e for failure m o d e s
modes of a process shutdown valve

Operational mode Failure modes


A variety of classification schemes for failure modes
Close flow (1) Not closing at all have been published. According to our opinion, one
Not closing completely of the most suitable classifications is: TM
Closing too slowly
Closing too fast 1. Intermittent failures: Failures that result in a lack
Improper operation of some function only for a very short period of
Keep flow closed (2) Opening spuriously time. The item will revert to its full operational
Internal leakage
External leakage standard immediately after the failure.
Open flow (3) Not opening at all 2. Extended failures: Failures which result in a lack
Not opening completely of some function that will continue until some
Opening too slowly part of the item is replaced or repaired.
Opening too fast Extended failures may be further divided into:
Improper operation
Keep flow open (4) Closing spuriously
External leakage (a) Complete failures: Failures that cause
Plugged. complete lack of a required function.
(b) Partial failures: Failure that lead to a lack of
78 M. Rausand, K. (3ien

some function but do not cause a complete lack 4.1 Failure causes, mechanisms and root causes
of a required function.
Both the complete failures and the partial failures According to I E C 50(191) ~ failure cause is 'the
may be further classified in circumstances during design, manufacture or use
which have led to a failure'.
(i) Sudden failures: Failures that could not be
The failure cause is a necessary piece of information
forecast by prior testing or examination.
in order to avoid failures or reoccurrence of failures.
(ii) Gradual failures: Failures that could be forecast
Failure causes may be classified in relation to the life
by testing or examination. A gradual failure will
cycle of an item as illustrated in Fig. 6, where the
represent a gradual 'drifting out' of the specified
various failure causes are defined as: ~
range of p e r f o r m a n c e values. The recognition of
gradual failures requires comparison of actual 1. Design failure: A failure due to inadequate
device p e r f o r m a n c e with a performance specifica- design of an item.
tion, and m a y in some cases be a difficult task. 2. Weakness failure: A failure due to a weakness in
The extended failures are split into four categories; the item itself when subjected to stresses within
two of these are given specific names: the stated capabilities of the item. (A weakness
may be either inherent or induced.)
Catastrophic failures: A failure that is both 3. Manufacturing failure: A failure due to non-
sudden and complete. conformity during manufacture to the design of
Degraded failure: A failure that is both partial an item or to specified manufacturing processes.
and gradual (such as the wear of the tires on a
4. Ageing failure: A failure whose probability of
car).
occurrence increases with the passage of time, as
The failure classification described above is a result of processes inherent in the item.
illustrated in Fig. 5, which is adapted from Blache & 5. Misuse failure: A failure due to the application
ShrivastavaJ s of stresses during use which exceed the stated
capabilities of the item.
6. Mishandling failure: A failure caused by
4 FAILURE CAUSES AND FAILURE incorrect handling or lack of care of the item.
EFFECTS
The various failure causes in Fig. 6 are not
The functions of a system may usually be split into necessarily disjoint. There is, e.g., an obvious overlap
subfunctions as discussed in Section 2.2. Failure between 'weakness' failures, and 'design' and
modes at one level in the hierarchy will often be 'manufacturing' failures.
caused by failure modes on the next lower level. It is Failure mechanisms are defined as the 'physical,
important to link failure modes on lower levels to the chemical or other processes which have led to a
main top level responses, in order to provide failure'J A c o m m o n interpretation of this term is the
traceability to the essential system responses as the immediate causes to the lowest level of indenture,
functional structure is refined. This is illustrated in such as wear, corrosion, hardening, pitting, oxidation,
Fig. 7 for a hardware structure breakdown. etc.
This level of failure cause description is, however,
not sufficient in order to evaluate possible remedies.
Failure W e a r can, for instance, be a result of wrong material
specification (design failure), usage outside specifica-
I tion limits (misuse failure), poor m a i n t e n a n c e - -
I I inadequate lubrication (mishandling failure), etc.
Intermittent Extended
failure failure
These fundamental causes are sometimes referred to
as root causes (see Fig. 7), the causes for which
I remedial actions can be decided upon.
I I
Complete
failure
] Partial
failure 4.2 Failure effects and severity
I I A general picture of the relationship between cause
I I I I
uddeo I radua, I Iudden Iradua, I
failure failure failure failure
and effect is that each failure m o d e can be caused by
several different failure causes, leading to several
.... [. . . . . . . . I ....
different failure effects. To get a broader understand-
, Catastrophic, ' Degraded , ing of the relationship between these terms, the level
, failure ' ', failure , of indenture being analyzed should be brought into
Fig. 5. Failure classification. account. This is illustrated in Fig. 7.
Basic concepts of failure analysis 79

I Failure
cause

I I
I Manufact-
Design uring Use i
I
I I I I
I ) I
I
Design
failure
]1 Weakness
failure
Manufact- Ageing
urngfaure J failure
I

Fig. 6. Failure cause classification.


I I Misuse
II failure
Mishandling
fai ure

Figure 7 shows that a failure mode on the lowest priorities. A severity ranking of failure modes is also
level of indenture is one of the failure causes on the an essential part of the RCM procedure. Many of the
next higher level of indenture, and the failure effect standards present various classification schemes for
on the lowest level equals the failure mode on the the severity. In MIL-STD 882 ~ the following
next higher level. For example, the failure mode classification is used:
'leakage from sealing' for the seal component is one of
(a) Catastrophic: Any failure that could result in
the possible failure causes for the failure mode
death or system loss.
'internal leakage' for the valve, and the failure effect
(b) Critical: Any failure that could result in severe
(on the next higher level) 'internal leakage' resulting
injury, severe occupational illness, or major system
from "leakage from sealing '~6 is the same as the failure
damage.
mode 'internal leakage' of the valve.
(c) Marginal: Any failure that could result in minor
By the concept severity of a failure mode is
injury, minor occupational illness, or minor system
understood the impact of the failure mode on the
damage.
system level. In the process shutdown valve example
(d) Negligible: Any failure that results in less than
the severity is the failure effects on the complete
minor injury, occupational illness, or system
process line.
damage.
A severity ranking is often required as part of
F M E A and similar techniques to be able to make In the RCM technique 5s~4 the evident failures are

FAILURE FAILURE
CAUSES MODE
I SYSTEM
Internal leakage ',lid No total LEVEL
(One process
I shutdown line)

FAILURE FAILURE FAILURE


CAUSES MODE EFFECTS
ITEM LEVEL
I I
Leakage fromsealing i..~l
I

Internal
~ No total
shutdown
(Valve)

ROOT FAILURE FAILURE FAILURE


CAUSES MECHANISMS MODE EFFECTS
COMPONENT
- Poorlubrication~ - Corro6ion I I LEVEL
- Usage outside ~ - Wear/ero6ion ~ Leakage ~ Internal (Seal)
specifications ~ ' - Hardening from sea n~ leakage
- Wrongmaterial - Etc.
specifications

Fig. 7. Relationship between failure cause, failure mode and failure effect.
80 M. Rausand, K. Oien

classified according to the following severity classes-- 5.10REDA failure mode identification
usually in descending order of importance:
O R E D A is 'intended to permit data utilization in the
1. Failures with safety consequences
interests of evaluating and improving safety and
2. Failures with environmental consequences
reliability in offshore platform operations and design'. 2
3. Failures with operational consequences
This means that O R E D A shall support safety and
4. Failures with non-operational consequences.
reliability analyses carried out either during the design
phase or the operational phase with failure data
4.3 Multiple failures necessary for making evaluations and improvements.
O R E D A provides failure rates and repair times for
failure modes within three categories. These cate-
A special problem is connected to so-called dependent gories are based on the local failure effects on the
failures (see, e.g., H0yland & Rausand2). Two types item, not on the system. The O R E D A classification is
of dependent failures are of special interest: (1) hence not comparable with the severity classification
c o m m o n cause failures, and (2) cascading failures. of failure modes presented above.
C o m m o n cause failures are multiple failures that are a 1. Critical .failure: A failure which is both sudden
direct result of a c o m m o n or shared root cause. and causes cessation of one or more fundamental
Cascading failures are multiple failures initiated by the functions. Note: The failure requires immediate
failure of one c o m p o n e n t in the system that results in corrective action in order to return the item to a
a chain reaction or 'domino effect'. A n u m b e r of satisfactory condition.
defensive tactics to avoid dependent failures have 2. Degraded failure: A failure which is gradual,
been developed. Many of these are based on a partial, or both. Note: Such a failure does not
modified F M E A (see, e.g., H c y l a n d & Rausand 2 for cease the fundamental functions, but com-
further discussion). promises one or several functions. The function
may be compromised by any combination of
reduced, increased, or erratic outputs. In time,
such a failure may develop into a critical failure.
5 APPLICATIONS
3. Incipient failure: An imperfection in the state or
condition of an item so that a degraded or
critical failure can be expected to result if
All quantitative safety, reliability or maintenance
corrective action is not taken.
analyses require failure data. The quality of the
analyses is highly dependent upon the quality of the The same classification is also used in I E E E
data being used. There are, however, a variety of Std.500,'5 where the critical failures are called
analyses requiring somewhat different input data. A catastrophic failures. The O R E D A failure modes of a
particular failure database might be suitable for some valve are shown in Table 2.
analyses but not necessarily all. These failure modes have not been established
The interpretation of the basic concepts discussed through a structured identification of all required
on a general basis above, are exemplified in this functions, followed by a systematical identification of
section using the Offshore Reliability D a t a ( O R E D A ) possible failure modes. Future phases of O R E D A
h a n d b o o k 2 as an example. The O R E D A h a n d b o o k may therefore benefit from:
contains data from a wide range of components and
systems used on offshore installations for oil and gas
production. The present version of the O R E D A Table 2. OREDA valve failure modes
h a n d b o o k is based on actual field data from the North Failure effect category Failure modes
Sea and the Adriatic Sea collected in the time period
1981-1991. Field data are, however, collected on a Critical Failed to open
more or less continuous basis and stored in a Failed to close
computerized database that is available only to the oil Significant internal leakage
Plugged
companies participating in the O R E D A project. The Unknown
data in the computerized O R E D A database are more Degraded Improper operation
detailed than in the O R E D A handbook. We have, Internal leakage
however, chosen to discuss only the O R E D A External leakage
h a n d b o o k 2 since this contains the publicly available Unknown
Incipient Faulty indication
data from the O R E D A project. It should be noted Unknown
that some of our critical c o m m e n t s to the O R E D A Unknown Failed
h a n d b o o k do not apply to the O R E D A database.
Basic concepts of failure analysis 81

(a) establishing the required functions by the use of The arrows in Fig. 8 indicate that an incipient
functional block diagrams, FAST, or some similar failure (which is actually an error) may develop into a
techniques, degraded failure, and a degraded failure may further
(b) analysing each operational mode of the item develop into a critical failure, if no action is taken.
separately, When reporting failures in O R E D A a lot of
(c) systematically generate failure modes by using subjective judgments have to be made. One of the
'guide words' similar to the procedure used in most difficult ones is the distinction between a
H A Z O P . 21 permitable deviation (incipient failure) and a de-
graded failure. In most cases acceptable limits are not
This will provide a better guarantee of taking all defined a n d / o r it is not possible to measure the degree
relevant failure modes into account, at least to of failure. This problem also exists for the distinction
evaluate the failure modes before they are rejected. It between degraded and critical failures, as internal
also documents how the failure modes are established. leakage for the valve, see Table 2. When is the
internal leakage 'significant', and can it be measured?
5.20REDA failure m o d e classification It is of vital importance for the quality of the data
that both acceptable limits are defined, and that
Figure 8 illustrates the relationship between the possibilities to measure the degree of failure exist, to
O R E D A failure mode classification and the basic the greatest possible extent. A problem that is difficult
terms used previously in this paper. Compared to the to overcome is, however, the situation where the
failure classification in Fig. 5 the main differences are: database covers equipment that are used in quite
Critical failure ( O R E D A term) covers somewhat different operational contexts. Then the acceptable
more than catastrophic failure. limits will without doubt be different.
Degraded failure ( O R E D A term) covers a lot
more than degraded failure in Fig. 5. 5.30REDA failure causes and failure effects
Incipient failure ( O R E D A term) is not included
at all in Fig. 5 because it is actually not a failure, The O R E D A handbook 2 'stops' at the failure mode
only an error (containing 'error modes'). level, i.e., no failure causes, failure mechanisms or
root causes are recorded. (In the current version of
The first differences may not be controversial, but the O R E D A database there is, however, a record
the last one may lead to the suggestion that the term called 'failure descriptor' which is almost equal to
'error' is used instead of 'incipient failure'. failure cause.) This is illustrated in Fig. 9 where the

I
I

O~EDA
tatms

! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . !

Fig. 8. OREDA failure mode classification.


82 M. Rausand, K. Oien

FAILURE FAILURE
CAUSES MODE
SYSTEM
LEVEL
(One process
line)

ITEM LEVEL
(Valve)

ROOT FAILURE FAILURE FAILURE


CAUSES MECHANISMS MODE EFFECTS
COMPONENT
Poor lubrication ~ _ - Corrosion I I LEVEL
- Usage outside ~ - Wear/ero61on . ~ Leakage ~ . Internal (Seal)
specifications ~ - Hardening fromsealing leakage
- Wrong material - Etc.
specifications

Fig. 9. The level of failure data provided by OREDA (shaded area).

shaded squares represent the O R E D A level of failure configuration. By using O R E D A it is possible to


data treatment. identify the weak links, i.e., the items contributing
The failure mode square in Fig. 9 represents the most to the unreliability, but without evaluating the
level at which the O R E D A handbook provides failure failure causes or mechanisms it is not possible to
rates and repair times. The failure effect square improve the inherent reliability of the item itself.
represents the level at which the O R E D A failure O R E D A data is therefore not sufficient as input
effect categories (critical, degraded, and incipient) are data for methods like F M E A and RCM. These
determined. Actually it is the effect on local level methods require knowledge about the failure
(item level) that is being used, and not the next higher mechanisms.
level as the illustration in Fig. 9 shows. O R E D A does not identify 'whom to blame' for the
The effect on local level may be the failure mode failures; the manufacturer or the user. TM This is
itself, i.e., 'internal leakage', and hence it may be essential for instance when failure reports are being
judged as critical if the leakage is significant. The used to prove conformity with contractual guaranteed
consequence on the next higher level may, however, life cycle costs (LCC) for a product.
be or not be 'no total shutdown' depending on the O R E D A is well suited for many purposes, but both
system configuration (redundant valves). O R E D A this failure database and others may increase and
does not take the configuration of the system (or improve their utilization if it is possible also to handle
subsystem) into account. O R E D A only looks at the failure causes, failure mechanisms (and maybe root
item itself. This is reasonable as long as the causes) as systematic as failure modes.
configuration may change. Reliability analyses may be
carried out to assess the failure rates on system level.
(The point here is that even though O R E D A defines ACKNOWLEDGEMENT
an item failure to be critical, the failure may not have
any critical effect on the system due to, for instance,
The authors are grateful to and would like to thank
redundancy.)
the anonymous reviewers for constructive suggestions
However, the fact that O R E D A (and many other
to improve the quality and readability of this paper.
failure databases) only presents failure modes and to
some degree failure causes ('failure descriptors'), but
not failure mechanisms or root causes, places some
limitations on the use of O R E D A for safety and REFERENCES
reliability analysis purposes. Evaluation and improve-
ment of safety and reliability, based on O R E D A , can 1. IEC 50(191), International Electrotechnical Vocabulary
only be achieved through optimization of the system (IEV), Chapter 191--Dependability and quality of
Basic concepts of failure analysis 83

service, International Electrotechnical Commission, 13. IEC 812, Analysis Techniques for System Reliability-
Geneva, 1990. Procedures for Failure Modes and Effects Analysis
2. OREDA-1992, Offshore Reliability Data, DNV Techn- (FMEA), International Electrotechnical Commission,
ica, H~vik, Norway, 1992. Geneva, 1985.
3. BS 5760-5, Reliability of systems, equipments and 14. Smith, A. M., Reliability-Centred Maintenance,
components; Part 5: Guide to failure modes, effects and McGraw-Hill Inc., New York, 1993.
criticality analysis (FMEA and FMECA), British 15. IEEE Std. 500, IEEE Guide to the Collection and
Standards Institution, London, 1991. Fresentation of Electrical, Electronic, Sensing
4. MIL-STD 1629A, Procedures for Performing a Failure Component, and Mechanical Equipment Reliability Data
Mode, Effects and Criticality Analysis, US Department for Nuclear Generating Stations, John Wiley & Sons,
of Defense, Washington DC, USA, 1980. New York, 1984.
5. Nowlan, F. S. & Heap, H. F., Reliability-centred 16. Simola, K. & Laakso, K., Analysis of failure and
maintenance. Tech. Rep. AD /A066-579, National maintenance experience of motor operated valves in a
Technical Information Service, US Department of Finnish nuclear power plant. Tech. Rep. VTT-RN-1322,
Commerce, Springfield, Virginia, 1978. VTT Technical Research Centre of Finland, Espoo,
6. Rendle, S., Ford Sierra Owners Workshop Manual, 1992.
Haynes Publishing, Sparkford, UK, 1994. 17. Bergman, B. & Klefsj, B., Quality from Customer
7. Cross, N., Engineering Design Methods: Strategies for Needs to Customer Satisfaction, Studentlitteratur, Lund,
Product Design, John Wiley & Sons, Chichester, 1994. Sweden, 1994.
8. Moubray, J., Reliability-centred Maintenance, 18. Blache, K. M. & Shrivastava, A. B., Defining failure of
Butterworth-Heinemann, Oxford, 1991. manufacturing machinery & equipment. In Proc.
9. Catola, S., Reliability Centred Maintenance Handbook, Annual Reliability and Maintainability Symp., 1994, pp.
Naval Sea Systems Command, S9081-AB-GIB- 69-75.
010/MAINT, US Navy, 1983. 19. MIL-STD 882B, System Safety Program Requirement,
10. Fox, J., Quality Through Design. The Key To Successful US Department of Defense, Washington DC, 1984.
Product Delivery, McGraw-Hill, London, 1993. 20. Hcyland, A. & Rausand, M., System Reliability Theory;
11. Blanchard, B. S. & Fabrycky, W. J., System Engineering Models and Statistical Methods, John Wiley & Sons,
and Analysis, Prentice-Hall, Inc., Englewood Cliffs, N J, New York, 1994.
USA, 1981. 21. Kletz, T.A., Hazop & Hazan: Identifying and Assessing
12. Pahl, G. & Beitz, W., Engineering Design, The Design Process Industry Hazards, 3rd ed., The Institution of
Council, London, 1984. Chemical Engineers, Rugby, UK, 1992.