Académique Documents
Professionnel Documents
Culture Documents
http://www.mcbup.com/research_registers
JQME
7,4
252
The current issue and full text archive of this journal is available at
http://www.emerald-library.com/ft
Practical implications
Pareto histograms of equipment failure codes ranked according to downtime or
repair costs do not enable the influence of the failure frequencies or the mean
downtime or repair cost to be clearly identified. Logarithmic scatterplots enable
failures to be classified according to acute or chronic characteristics, and
provide a better means of establishing maintenance priorities. In addition,
logarithmic plots can be used to graph trends in maintenance performance.
Introduction
In the late nineteenth century, the Italian engineer Vilfredo Pareto (1842-1923)
constructed histograms of the distribution of wealth in Italy and concluded that
80 percent of the country's wealth was owned by 20 percent of the nation's
population. This trend was later found to be representative of the distribution
of other data populations. The 80:20 rule, or a variation known as ABC analysis
that uses an 80:15:5 classification rule, is now routinely used in many fields of
study. As applied to the field of maintenance engineering, Pareto analysis is
commonly used for identifying those failure codes responsible for the majority
Journal of Quality in Maintenance
Engineering, Vol. 7 No. 4, 2001,
pp. 252-263. # MCB University
Press, 1355-2511
The author would like to thank Komatsu Mining Systems Chile and Modular Mining Systems
Chile for supporting the development of the work outlined in this paper. The paper has also
benefited from the thesis work of Cristian Aranguiz and Carlos Turina, final year students of
the Mining Centre of the Catholic University of Chile.
of equipment maintenance cost or downtime (see Hall et al., 2000). Based on the
failure codes identified, action plans can be elaborated to lower maintenance
costs or improve equipment availability.
However, Pareto analysis suffers from several deficiencies:
.
First, maintenance costs and downtime are the product of two factors;
the number of failures that occurred in a particular time frame and the
average associated repair cost, or mean downtime. A Pareto histogram
based on downtime (or cost) alone cannot determine which factor, or
factors, are dominant in contributing to the downtime or cost associated
with individual failure codes.
.
Second, Pareto analysis may miss identifying: individual events having
high associated repair costs or downtime; or frequently occurring
failures that consume relatively little repair cost or downtime yet cause
frequent operational disturbances. An example of the former is the
failure of the transmission in a mechanical mining truck. An example of
the latter is a repair to the truck's driving lights. Whilst the high cost of
the former is immediately evident, failures that frequently re-occur often
have significant hidden costs. For example, if the truck has to return to
the workshop to have a light replaced, the time lost travelling to and
from the workshop may dramatically increase the opportunity costs
associated with lost production.
.
Third, Pareto histograms are not generally useful for trending
comparisons. It can be difficult to directly compare ranked histograms
of costs or downtime for two different time periods since the relative
position of failure codes can change from one period to the other.
This paper outlines a simple, but powerful way of analysing data in order to
overcome these shortcomings.
Logarithmic scatterplots
The most convenient way of presenting the theory behind the new
methodology is via an example. Table I presents unplanned downtime data for
electrical failures in a fleet of 13 cable shovels at an open pit copper mine,
located in northern Chile. The data was collected over a one-month period.
Figure 1 shows the frequency histogram for the unplanned electrical failures,
with failure codes ranked in descending order in accordance to the downtime
corresponding to each code. Applying the 80:20 rule, it is evident that priority
should be given to failure codes 1, 2, 11, 3, 10, 7, 12, 8 and 5. Of these,
maintenance can do little to reduce the downtime associated with failure codes
3 (substation changes or shovel moves) and 5 (substation power cuts).
Maintenance costs and downtime can be represented by two equations:
1
Costi ni MRCi
and
Downtimei ni MDTi
Rethinking
Pareto analysis
253
JQME
7,4
254
Table I.
Unplanned shovel
electrical downtime
Code
Description
1
2
11
3
10
7
12
8
5
15
6
9
4
17
14
16
13
Electrical inspections
Damaged feeder cable
Motor overtemperature
Change of substation or shovel move
Overload relay
Auxiliary motors
Earth faults
Main motors
Power cuts to substations
Air compressor
Rope limit protection
Lighting system
Coupling repairs or checks
Overcurrent faults
Control system
Operator controls
Miscellaneous
Total
Quantity
Duration
(min)
Time
(%)
Cum.
(%)
30
15
36
27
23
13
7
12
21
8
10
26
15
6
7
5
9
270
1,015
785
745
690
685
600
575
555
395
355
277
240
225
220
165
155
115
7,797
13.0
10.1
9.6
8.8
8.8
7.7
7.4
7.1
5.1
4.6
3.6
3.1
2.9
2.8
2.1
2.0
1.5
100
13.0
23.1
32.6
41.5
50.3
58.0
65.3
72.5
77.5
82.1
85.6
88.7
91.6
94.4
96.5
98.5
100
Figure 1.
Pareto histogram of
unplanned shovel
electrical downtime
where Costi and Downtimei are the cost and downtime associated with the ith
failure code and ni, MRCi and MDTi represent the number of failures, the mean
repair cost and mean downtime respectively.
Figure 2 shows an alternative means of presenting the failure data listed in
Table I. An x-y scatterplot is used to plot mean downtime against the number of
unplanned failures for each failure code. Curves of constant downtime are
represented by a family of hyperbolae as shown. It can be seen that the failures
that consume most downtime are those associated with failure codes 1, 2 and
Rethinking
Pareto analysis
255
Figure 2.
x-y dispersion plot of
mean repair times
versus number of
failures
11. Thus the order of priority observed in the Pareto analysis is preserved,
however a clearer picture is available as to which factor failure frequency or
mean downtime is dominant.
A disadvantage of Figure 2 is that the curves of constant downtime are
hyperbolae and can be difficult to plot. A solution to this is to take the
logarithm of equations (1) and (2). Thus:
logCosti logni logMRCi
and
where log refers to log10. If an x-y graph is prepared of log(ni) against
log(MDTi), the curves of constant downtime now appear as straight lines with
uniform negative gradient (see Figure 3). Logarithmic scatterplots simplify the
identification of those failures which contribute most to total equipment
downtime or cost, whilst continuing to permit the visualisation of the influence
of failure frequency and mean downtime.
Repairs that require lengthy downtime can be considered acute problems.
Those failures that frequently reoccur (i.e. high n) can be considered chronic
problems. By determining threshold limits, the log scatterplot can be divided
into four quadrants, as shown in Figure 4. The upper quadrants denote acute
failures, whilst the right-hand quadrants denote chronic failures. The upper
right-hand quadrant is a region of acute and chronic failures.
Limit determination
Thresholds can either be absolute values determined by company policy, or
relative values that depend on the relative magnitudes and quantity of data.
One approach for determining relative values is to use average values as
follows.
JQME
7,4
256
Figure 3.
Log dispersion plot of
mean repair times
versus number of
failures
Figure 4.
Log scatterplot showing
limit values
D i Downtimei
and
The total number of failures is:
N i ni
Letting Q be the number of distinct failure codes used to categorise the repair
data, the threshold limit for acute failures can be defined as:
C
8
LimitMRC
N
or
LimitMDT
D
N
Rethinking
Pareto analysis
and the threshold limit for chronic failures can be determined as:
Limitn
N
Q
10
In the case of the unplanned electrical failures for the fleet of shovels,
D = 7,797 minutes, N = 270 and Q = 17. Therefore, the limit value for acute
failures is 7,797/270 = 28.9 minutes and the limit value for chronic failures is
270/17 = 15.9 repairs.
Jack-knife diagrams
When dealing with large data sets, it may be desirable to focus on only those
chronic failures having highest direct cost or downtime impact. To this effect,
the right-hand lower quadrant can be divided into two regions, A and B as
illustrated in Figure 4. The dividing limit is a line of constant cost or downtime,
defined by the product of the two limits shown in equations (8) and (10) or (9)
and (10) according to which parameter is of interest. The expression for this
line is:
Cost
C
C
where 0 < Cost
Q
N
11
D
D
where 0 < Downtime
Q
N
12
or
Downtime
In a similar manner, the acute failures in the left-hand upper quadrant could be
divided according to direct cost or downtime impact. However, proportionally
greater benefit can be obtained by preventing the reoccurrence of a single acute
failure than preventing the reoccurrence of a chronic failure. For this reason, all
of the acute failures remain within the priority area defined by the limit shown
in Figure 4. The resulting graphs have been christened ``jack-knife'' diagrams
after the inverted V shape of the limit. In Table II the unplanned electrical
breakdowns for the shovel fleet have been classified according to jack-knife
principles.
Root cause failure analysis and remedial action
To improve equipment availability, attention should be focussed on either
reducing or eliminating the number of unplanned failures, or reducing the time
necessary to diagnosis and repair failures.
257
JQME
7,4
258
Table II.
Unplanned shovel
downtime: electrical
maintenance problems
prioritised according to
jack-knife principles
Time
(%)
Average
time
1,015
685
13.0
8.8
21.8
33.8
29.8
63.3
15
13
7
12
8
6
5
785
600
575
555
355
220
155
10.1
7.7
7.4
7.1
4.6
2.8
2.0
41.7
52.3
46.2
82.1
46.3
44.4
36.7
31.0
339.0
36
27
745
690
9.6
8.8
18.4
20.7
25.6
46.3
21
26
395
240
5.1
3.1
8.2
18.8
9.2
28.0
Code
Description
Quantity
Duration
30
23
Acute failures
2
Damaged feeder cable
7
Auxiliary motors
12
Earth faults
8
Main motors
15
Air compressor
17
Overcurrent faults
16
Operator controls
Sub total
Once a prioritised list of failure codes has been identified, hypotheses can be
made about the possible cause (or causes) of each problem. Experienced
maintenance and operating personnel are indispensable to this process, since
familiarity with the machine, the operating environment and with maintenance
and operating practices is required. A list of possible root causes is as
illustrated in Table III. Although not necessarily exhaustive, these root causes
can be grouped according to whether they are inspection, maintenance,
operational, design, material quality or maintenance resource problems.
Chronic repairs are often associated with component quality defects, equipment
design problems, inappropriate operator practices or poor quality control in
upstream processes.
Two good examples of chronic repairs are provided by the data: motor overtemperature alarms (failure code 11) are more often than not a result of poor
blast fragmentation or shovel abuse. In both cases, corrective action should be
directed at mine operations. Outages to the shovel lighting system (failure code
9) typically result from wiring damage to structural vibration or poor filament
reliability. Redesign of the wiring harness may be one way of tackling this
problem. Another chronic problem, power cuts to the feeder substation (failure
code 5) could be due to operational planning problems or the electrical company
supplying power to the mine. The maintenance department can do little other
than draw mine management's attention to the problem.
Action
1. Inspection
A. Insufficient inspection frequency
B. Inadequate inspection procedures
C. Poor quality inspection
D. Difficulty in accessing/diagnosing
component
2. Maintenance
A. Insufficient PM frequency
B. Inadequate work procedures
C. Poor quality PM
D. Poor quality component installation
3. Operation
A. Incorrect operation or operator abuse
B. Poor quality control in upstream process
Rethinking
Pareto analysis
259
4. Design
A. Original component or design inadequate
for conditions
B. Modified component or design inadequate S. Purchase/lease additional tools
for conditions
5. Materials
A. Variation in component quality one
supplier
B. Variation in component quality many
suppliers
6. Resources
A. Wait on spares
B. Wait on personnel
C. Wait on shop space
D. Wait on tools
Table III.
Root causes and
possible actions
JQME
7,4
260
testing the machine to verify that it has been returned to its normal operating
state. Possible repair delays include difficulty in accessing and/or diagnosing a
faulty component, and waiting on maintenance resources (spares, personnel,
tools or workshop space; see Table III). Good examples of problems subject to
extended repair delays are earth faults (failure code 12). Earth faults in
electrical circuits can be difficult to diagnose and isolate. In order to reduce the
time necessary to isolate earth faults, the mine could consider installing or
modifying shovel indicator panels so that they display more detailed
information concerning the electrical status of various points in a circuit.
Following the assignation of root causes to each failure code, a set of actions
should be formulated to eliminate or mitigate the factors causing unplanned
downtime. A list of possible actions for eliminating or reducing unplanned
downtime is as shown in Table III. Table IV illustrates the application of these
principles to the maintenance priorities previously identified for the electric
shovel fleet.
Some maintenance actions may necessitate investment on the part of the
mine. An estimation of the expected reduction in downtime allows the
maintenance department to undertake a cost/benefit evaluation of
implementing the maintenance action plan. If the cost savings are projected
over say, a five-year period, an NPV can be calculated for the maintenance
project. The advantage of this approach is that it permits senior management
to evaluate maintenance projects alongside competing project alternatives.
Maintenance need no longer be perceived as a costly overhead, but as a
strategic tool to maximise asset utilisation.
Jack-knife trend plots
A further benefit of logarithmic scatterplots is that they provide a useful means
of visualising trends in maintenance performance. For example, Figure 5 shows
the evolution of four failure codes from a BE 495-B cable shovel working at an
open pit copper mine in Chile. Unplanned failures were analysed for a period of
Table IV.
Proposed actions for
reducing unplanned
shovel electrical
downtime
Code
Description
1
10
2
7
12
8
15
17
16
11
3
5
9
Electrical inspections
Overload relay
Damaged feeder cable
Auxiliary motors
Earth faults
Main motors
Air compressor
Ovecurrent faults
Operator controls
Motor overtemperature
Change of substation or shovel move
Power cuts to substation
Shovel lights
Root cause(s)
Action
2A
3A, 3B
3A
2A
1B, 1D
2A
1B, 2C
3A
4A
3A, 3B
B, F
J, K
J
B, F
B
B, F
B, C, F
J
D
J, K
1A, 5A
A, N
ACUTE
Rethinking
Pareto analysis
261
Figure 5.
Trends in unplanned
failures for BE 495-B
cable shovel
three years, 1997 to 1999 inclusive. The threshold limits used in the graph were
calculated relative to the total unplanned failure data set for the three year
period.
It can be seen that significant improvement has been made with respect to
two of the failure codes over the period of the study. Unplanned failures to the
shovel lubrication system were chronic in 1997 and 1998, and not classified in
1999. Similarly, the total downtime due to failures of controls in the operator
cabin has decreased. However, unplanned failures to the swing system
(comprising the two swing motors, spur gears and main ring gear) are
obviously an area of concern, increasing from acute in 1997 to chronic and
acute in 1999. Likewise, unplanned stoppages due to motor over-temperature
alarms (alarms) also increased in both frequency and duration (data was not
available for the 1999 period to confirm this tendency).
Another potential application of jack-knife trend diagrams is to the
preparation of maintenance budgets. A log scatterplot of the repair costs
incurred during the most recent time period could assist a maintenance
manager to fix performance targets for forthcoming periods. It is postulated
that Windows-based software could be developed to help automate this
procedure. Using a mouse, the points representing failure codes in the log
scatterplot could be selected and dragged to desired target positions. The
software could then automatically calculate the resulting cost and downtime
reductions, as well as display the corresponding operating budget for the
maintenance department.
Establishing failure priorities from trend data
When data is available for two consecutive time periods, maintenance priorities
should be established by not only considering the chronic or acute
classification of the most recent data points, but also the trend in movement of
JQME
7,4
262
those points. As Table V shows, six possible combinations exist for the
movement of failure codes between two consecutive time periods ni, MDTi,
and Downtimei refer to changes in the mean number of failures, mean
downtime and total downtime experienced by the ith failure code over
successive time periods (in the case that repair cost is the parameter of interest,
the parameters ni, MRCi, and Costi should be used).
Using these six possible combinations, failure priorities can be established
as shown in Table VI. Three priority classifications are suggested; high,
medium and low. High priority is assigned to those unplanned failures
determined to have increased in total downtime or cost and currently
positioned in the priority area defined by the jack-knife limit (comprising the
acute, chronic and acute, and chronic type A quadrants). Medium priority is
assigned to those failure codes that have:
.
experienced a reduction in total downtime or cost (i.e. some progress has
been made) yet are still currently located in the priority area defined by
the jack-knife limit; and
.
increased in total downtime or cost and are currently classified as
chronic type B failures.
Remaining failure codes are classified as low priority.
This classification scheme assumes that if the total downtime or cost
contribution of a failure code has reduced over two successive time periods,
then the maintenance department must be taking positive steps and need not
necessarily modify their policies regarding the corresponding component or
subsystem. (In practice, other factors may also influence failure frequencies and
Table V.
Classes of possible
failure code trends
Class
ni
MDTi
Downtimei
I
II
III
IV
V
VI
Decrease
Decrease
Increase
Decrease
Increase
Increase
Decrease
Increase
Decrease
Increase
Decrease
Increase
Decrease
Decrease
Decrease
Increase
Increase
Increase
Class
Table VI.
Matrix to establish
failure priorities
I
II
III
IV
V
VI
None
3
3
3
3
3
3
2
2
2
1
1
1
2
2
2
1
1
1
associated repair times: for example, seasonal variations. If failure audits are
carried out on a regular basis, then such variations will eventually come to the
attention of maintenance personnel.) In the short term, the maintenance
department should focus its efforts on redefining maintenance, inspection or
operating policies for those failure codes that show adverse trends and are
associated with most machine downtime or repair cost.
Once failure priorities have been assigned, root cause failure analysis can be
undertaken and an action plan established. The methodology outlined here was
applied with considerable success to determine availability improvements for
the BE 495-B shovel. As an additional benefit, it was found that the
maintenance personnel at the mine were quick to come to terms with the
methodology, and, as result, more willing to accept the results and
recommendations of the failure analysis study.
Conclusions
This paper has identified important deficiencies in Pareto analysis methods
commonly used to determine failure priorities. An alternative means of
establishing failure priorities is proposed using logarithmic (log) scatterplots.
Log scatterplots preserve the basic information content of a Pareto histogram,
but enable the identification of the dominant factors influencing the failures,
namely the failure frequency and mean downtime or cost. By applying limit
values, log scatterplots can be divided into four quadrants in order to classify
failures according to acute or chronic characteristics. This classification
facilitates root cause failure analysis, and allows the identification of chronic
failures often associated with considerable hidden lost production costs. By
trending failure data over successive time periods, log scatterplots provide a
useful graphical means of analysing the performance of maintenance
improvement initiatives. The methodology described in this paper has been
applied and adopted by a number of mining companies and equipment
providers in Chile.
References and further reading
Aranguiz, C.P. (2000) ``Analisis de reparaciones imprevistas de equipos mineros en faenas a rajo
abierto'', final year thesis, Faculty of Engineering, Catholic University of Chile, Santiago.
Hall, R., Knights, P. and Daneshmend, L.K. (2000), ``Pareto analysis and condition-based
maintenance of underground mining equipment'', Trans. IMM, Section A: Mining
Industry, Vol. 109, pp. A14-A22.
Knights, P. (1999) ``Analysing breakdowns'', Mining Magazine, September, Vol. 181 No. 3,
pp. 165-71.
Turina, C. (2001), ``Estudio comparativo de las intervenciones imprevistas en flotas de camiones
electricas operando en distintas faenas mineras en Chile'', final year thesis, Faculty of
Engineering, Catholic University of Chile, Santiago.
Rethinking
Pareto analysis
263