Académique Documents
Professionnel Documents
Culture Documents
art ic l e i nf o
a b s t r a c t
Article history:
Received 2 October 2012
Received in revised form
7 May 2013
Accepted 13 May 2013
Available online 28 May 2013
One class of maintenance optimization problems considers the notion of general repair maintenance
policies where systems are repaired or replaced on failure. In each case the optimality is based on
minimizing the total maintenance cost of the system. These cost-centric optimizations ignore the value
dimension of maintenance and can lead to maintenance strategies that do not maximize system value.
This paper applies these ideas to the general repair optimization problem using a semi-Markov decision
process, discounted cash ow techniques, and dynamic programming to identify the value-optimal
actions for any given time and system condition. The impact of several parameters on maintenance
strategy, such as operating cost and revenue, system failure characteristics, repair and replacement costs,
and the planning time horizon, is explored.
This approach provides a quantitative basis on which to base maintenance strategy decisions that
contribute to system value. These decisions are different from those suggested by traditional cost-based
approaches. The results show (1) how the optimal action for a given time and condition changes as
replacement and repair costs change, and identies the point at which these costs become too high for
protable system operation; (2) that for shorter planning horizons it is better to repair, since there is no
time to reap the benets of increased operating prot and reliability; (3) how the value-optimal
maintenance policy is affected by the system's failure characteristics, and hence whether it is worthwhile
to invest in higher reliability; and (4) the impact of the repair level on the optimal maintenance policy.
& 2013 Elsevier Ltd. All rights reserved.
Keywords:
Cost benet analysis
Dynamic programming
Maintenance
Markov processes
Reliability
Replacement
1. Introduction
Signicant material and personnel resources are allocated to
maintenance activities in companiesfor example over a quarter
of the total workforce in the process industry is said to deal with
maintenance work [22]. The importance of maintenance to industry is reected by the extensive and growing literature on optimal
maintenance, devoted to developing methods to ensure that these
considerable maintenance resources are allocated and used efciently, as they can be signicant drivers of competitivenessor
lack thereof if mismanaged (see the reviews by Pham and Wang
[19] and Wang [23]).
1.1. General repair maintenance policies
One class of problems considers the notion of general repair
maintenance policies, where, perhaps in conjunction with a
preventive maintenance program, systems are repaired or
replaced on failure. The question investigated in these studies
under various assumptions is, if the system has failed, when is it
n
Tel.: +1 7654940063.
E-mail address: kmarais@purdue.edu
0951-8320/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.ress.2013.05.015
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
cost rate limit, that is, the system is replaced when the repair cost
per unit time exceeds a xed value. The repair history can also be
incorporated into the problem by considering the number of
failures as well as the system age; see Kapur et al. [8], Makis and
Jardine [13], and Love et al. [12].
The optimizations are usually carried out assuming that repair
or replacement is instantaneousanother set of policies is developed by setting a repair time limit rather than a repair cost limit. In
Nakagawa and Osaka's (1974) approach, a repair is abandoned if it
cannot be completed within a predetermined time. Nguyen and
Murthy [18] motivate the consideration of repair time by positing
a situation where basic (and imperfect) repairs can be completed
locally but more extensive (and perfect) repairs require central
repair. This situation is readily seen in industry, where, for
example, airlines have small maintenance facilities at most
airports but only a few large maintenance facilities (see also [15]).
While generally not considered as being repair limit policies,
policies based on replacing once the system exceeds a certain
number of failures have been suggested, as well as policies based
on replacing the system once it exceeds some reference operating
time, for example ight hours or vehicle miles (see Wang [23] for a
review).
This paper builds on Kijima [10], Makis and Jardine [13], and
Love et al. [12] to develop a stochastic deterioration model of a
system under a general repair policy. Kijima proposed that the
effect of repair could be modeled as reducing the system's virtual
age and then used a g-renewal function to determine the optimal
time between replacements [9]. He let Vn be the system's virtual
age after the nth repair, Xn the additional age incurred between the
(n1)th and nth repair, and n the level of repair. In his Type I
model, the nth repair cannot remove the damages incurred before
the (n1)th repair. Thus, after the nth repair the virtual age of the
system becomes:
V n V n1 n X n
77
time for Type I systems exist. By formulating their problem as a grenewal function they were able to nd such solutions; however,
this approach did not allow them to consider the effect of failure
history on failure densities.
Accordingly, Love et al. [12] developed a semi-Markov decision
structure using the (n, tn) state-space and proposed a numerical
search procedure that could be used to identify repair-cost minimizing general repair policies for Type I systems where both the
repair cost and the failure rate may depend on the state. Their
policy takes the form of a control limit sn that denes the
maximum virtual age for a given accumulated number of failures
beyond which the system should be replaced rather than repaired.
1.2. The value of maintenance
The question of whether the reliability gained through maintenance is worth the cost of maintenance however is usually not
addressed, due, in part, to the difculty in doing so. Dekker [2] for
example notes the main question faced by maintenance management, whether maintenance output is produced effectively, in
terms of contribution to company prots, [] is very difcult to
answer. Therefore maintenance planning is usually shifted from a
value maximization problem formulation to a cost minimization
problem. In short, as noted by Rosqvist et al. [20] a cost-centric
mindset prevails in the maintenance literature for which maintenance has no intrinsic value.
In previous work we have proposed an alternative approach
using an objective function related to the value of maintenance
[16]. Using a simple preventive maintenance example, we showed
how a maintenance strategy could be developed based on both an
assessment of the value of maintenancehow much is it worth to
the system's stakeholdersand an assessment of the costs of
maintenance.
The purpose of this paper is to show how general repair
policies that maximize system value can be developed for stochastically deteriorating systems. Section 2 qualitatively discusses
how the existing literature on general repair maintenance optimization can be leveraged to take a value perspective and then
develops the quantitative analytical basis, using a semi-Markov
decision process and discounted cash ow techniques. Section 3
explores the results and practical implications of the value
perspective and introduces a simple visualization for selecting
the optimal action at each system failure. Finally, Section 4
discusses the advantages and limitations of the proposed
framework.
Maintenance is often a signicant component of an organization's operating costs. This work offers a way of quantifying the
return on this investment, or, what I term the value of maintenance. The analytics developed here allow the identication of
value-optimal, or at least value-informed, maintenance policies.
2. Theory
This section rst provides a qualitative discussion of the
approach, and then develops the model and optimization.
2.1. The value perspective on general repair policies: a qualitative
discussion
My specic purpose in this paper is to develop a discrete semiMarkov decision structure for a nite horizon problem and then to
identify general repair policies that maximize the net present
value generated by the system. While the semi-Markov decision
structure is based on that proposed by Makis and Jardine [13] and
78
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
and
1Fv=
Fv=
where f(x) is the pdf of the rst time to failure. Here the time index
is not necessary because the transition probabilities do not depend
on clock time.
Now the transition probabilities can be determined. First,
consider the transition from a new system to the rst failure at
time x1:
Z x1 1
f xdx
3
P 0;01;x1
xf v xdx
w0; 0; 0 C 0 C 2 0 C 3 0
x1
w1; ; 1 C 1 C 2 C 3
where is the virtual age of the system after repair and C2 and C3
are the operating cost and revenues of a system of virtual age .
For simplicity the repair cost here is shown as independent of the
state; this dependency can easily be incorporated into the analysis.
Alternatively, if the system fails, it can be replaced, as indicated
by the diamond and the dotted line, resulting in revenue in time
xk
Time
2
4
0
w
,1
,2
,3
,0
,0
(0
(0
,0
,0
,0
(0
(0
(1
w
3)
,
,1
(0
)
)
,1
,1
,
,1
(0
w
)
,3
,
(1
)
)
,2
,2
,
,1
(0
w
(1
w
w
2)
,
(1
,1
)
,2
)
+ 2,2
,
(0
w
)
)
3) ,3
, + 2,3
,
,2 ,1
(0
(2 (1
w
w
,2
(2
w
)
,3
3)
)
, ,3)
2
,3
+
+
3
,1 1,2 0,3
(
(
(2
w
w
w
,
(3
Repair
Replace
Virtual Age
f v x
n;v a 1
79
80
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
step 1 of
w0; 0; 1 C 0 C 2 0 C 3 0
the system and market assumptions affect the optimal maintenance policy.
C 2 v= a2 b2
12
v==c3
C 3 v= a3 b3
Fig. 2 shows the costs and revenues using the nominal values
assumed in the simulation.
Table 1 summarizes the remaining nominal values.
9
8
R i i
=
<
C 0 PVC 3 C 2 ; n; 0; xmax
0 P 0;0;1;x W1; x; i xdx
Wn; v; i max
R imax i
0
0
a 0;1: C PV C C ; n; v ;
f
P n;vn1;v0 x Wn 1; v x; i xdj ;
1
3
2
x0
x0
10
11
Now, step back one more time step (in the example shown in
Fig. 1, to time 2). Repeat the previous calculations for the failed
and functioning cases, using the next step W's just calculated.
Repeat the process until the rst time step is reached.
3. Results and discussion
This section illustrates the concepts introduced in the previous
section using a hypothetical system, and shows how changes in
20
Operating Cost and Revenue
where v0 is the virtual age after the repair action and PV is the
present value of the cost and revenue stream for the expected
mean duration and discount factor, , scaled to the time
interval size.
Since for simulation purposes W is needed at each time step,
when no failure has occurred W is updated according to
Operating Cost
Revenue
15
10
0.5
1.5
2.5
3.5
4.5
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
Table 1
Nominal parameters.
Parameter
Nominal value
Remarks
C0
C1
a2, b2
a3, b3, c3
tmax
3
3
10
5
1,1.15
20, 1.2, 4
5 years
30
0.8
5%
Gamma distribution
Replacement cost
Repair cost
Operating cost parameters
Revenue parameters
Time horizon
Time slices per year
Repair level
Annual interest rate
5
Gray = Replace
Black = Repair
4.5
4
2.5
2
1.5
1
0.5
0
2
3
Time [years]
virtual age. Each matrix element is then set to zero (repair), or to one
(replace), or to undened (impossible time and virtual age combinations). The matrix is then plotted such that the value one corresponds
to gray, zero to repair, and undened to no color.
The lack of denition at the border between the repair and
replacement areas in the graph can be addressed by decreasing the
time step size, which results however in very long run times. For
this paper the time step is therefore kept at one thirtieth of a year,
which yields sufciently clear results while keeping run time at a
reasonable time.
It is best to repair the system in two situations: (1) when the
virtual age is low and the failure probability is low and operating
prot high; or (2) when the system is close to the time horizon
and there is not enough time to recoup the investment in a new
system. Contrary to the cost-centric viewpoint, the system does
not have a maximum virtual age beyond which it is always better
to replacethis nding arises because the nite time horizon
means that late investments in new systems cannot be recouped.
While incorporating a salvage value may shift the curve somewhat
in favor of replacement, it is unlikely to result in a maximum age
because older systems will have lower salvage values.
Conversely, it is best to replace the system when the failure
probability is high, the operating prot is low (high virtual age),
and there is sufcient remaining time horizon to recoup the
investment in a new system.
5
4.5
Revenue = 0
Black = Repair
Gray = Replace
4
3.5
3.5
81
3
2.5
2
1.5
1
0.5
0
2
3
Time [years]
Fig. 4. Optimal repair/replace decisions for the nominal problem when revenue is
not considered.
82
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
13
7
p = 0.05
6
Repair Cost
p = 0.2
5
p = 0.5
4
3
2
1
1.5
2.5
3.5
5
C0 = C1 = 5
Gray = Replace
Black = Repair
C0 = 10C1 = 50
2
3
Time [years]
2
3
Time [years]
Fig. 6. Repair/replace decision when replacement costs are varied relative to repair costs.
4.5
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
5
p=0
Black = Repair
Gray = Replace
2
3
Time [years]
2
3
Time [years]
5
p = 0.2
Black = Repair
Gray = Replace
p = 0.5
Black = Repair
Gray = Replace
4
Virtual age [years]
4
Virtual age [years]
p = 0.05
Black = Repair
Gray = Replace
83
2
3
Time [years]
2
3
Time [years]
Fig. 8. Repair/replace decision when repair cost increases with virtual age.
20
Operating Cost and Revenue
Operating Cost
Revenue
15
10
0.5
1.5
2.5
3.5
4.5
84
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
More Rapid
Black = Repair
Gray = Replace
4
Virtual age [years]
4
Virtual age [years]
Constant Profit
Black = Repair
Gray = Replace
2
3
Time [years]
0
0
2
3
Time [years]
Fig. 10. Repair/replace decision when operating prot is (a) constant, and (b) decreases rapidly.
5
C3 = 0.5 Nominal
Gray = Replace
4 Black = Repair
Gray = Replace
Black = Repair
C3 = 2*Nominal
2
3
Time [years]
2
3
Time [years]
Fig. 11. Repair/replace decision when operating revenue is varied relative to operating cost.
10
Black = Repair
Gray = Replace
Black = Repair
0.8
0.6
0.4
0.2
0.5
Time [years]
4
6
Time [years]
10
Fig. 12. Repair/replace decision when the time horizon is (a) decreased, and (b) increased.
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
= = 10
Black = Repair
Gray = Replace
4
Virtual age [years]
= =1
Black = Repair
Gray = Replace
85
2
3
Time [years]
2
3
Time [years]
Zero revenue
= = 10
Black = Repair
Gray = Replace
4
Virtual age [years]
4
Virtual age [years]
Zero revenue
= =1
Black = Repair
Gray = Replace
2
3
Time [years]
2
3
Time [years]
Fig. 14. Repair/replace decision for changing failure characteristics when revenue is not considered.
, are increased, the probability that the system will fail increases
as the system ages (as does its variance). In this case, replacement
becomes more valuable since it returns the system to a lower
virtual age and hence lower probability of failure.
The optimal decision determined using the value approach is
signicantly different from that obtained when revenue is not
considered, that is, when cost is minimized, as shown in Fig. 14.
For both the low and high failure probability scenarios, ignoring
revenues results in fewer replacements. When revenue is not
considered, only the reduced failure probability and operating cost
benets offered by newer systems are considered. Therefore the
benets of new systems are underestimated, resulting in a value
sub-optimal strategy, as discussed earlier.
86
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
= 0.98
Black = Repair
Gray = Replace
2
3
Time [years]
2
3
Time [years]
5
= 0.4
Black = Repair
Gray = Replace
= 0.2
Black = Repair
Gray = Replace
4
Virtual age [years]
4
Virtual age [years]
= 0.8
Gray = Replace
Black = Repair
4
Virtual age [years]
2
3
Time [years]
2
3
Time [years]
5
= 0.98
C = 4.75
1
Black = Repair
Gray = Replace
= 0.2
C1 = 7
Black = Repair
Gray = Replace
2
3
Time [years]
2
3
Time [years]
Fig. 16. Repair/replace decision for decreasing repair levels with repair cost adjustment.
K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687
87
Acknowledgments
4. Conclusion
In previous work we have argued that while maintenance is
traditionally seen as a cost-driver, this view is limited and ignores
the contribution of maintenance to the value of a system. This
paper shows how the value view of maintenance can be applied to
the familiar general repair problem. I used a semi-Markov decision
process coupled with a discounted cash ow techniques to
estimate the net present value of a system under different
responses to failure, and then used a dynamic programming
approach to identify the optimal actions for any given time and
system condition, represented here by the system's virtual age.
The analysis showed that the value perspective results in
different decisions and that ignoring system revenue results in
value sub-optimal strategies that decrease the net value of the
system.
This approach provides a quantitative basis on which to base
maintenance decisions and thus ensure maximum expected
value. In particular, the results show:
1. The optimal action for a given time and condition changes as
replacement and repair costs change, and identies the point at
which these costs become too high for protable system
operation. This approach can therefore be used to identify
lemon designs that cannot be rescued through careful
maintenance.
2. The impact of planning horizon on the optimal action. For
shorter planning horizons it is better to repair, since there is no
time to reap the benets of increased operating prot and
reliability. As the planning horizon grows, replacement
becomes more attractive.
3. The impact on the optimal maintenance policy of the system's
failure characteristics. In particular, it is better to replace
systems where the probability of failure increases rapidly with
deterioration. This approach can therefore be used to assess the
value of investing in higher reliability, either through inherently more reliable systems, or through preventive
maintenance.
4. The impact on the optimal maintenance policy of the repair
level. As the repair level is decreased, the relative value of
replacement increases, because lower repair results in lower
reliability gains. This approach can therefore be used to
determine the optimal repair level.
This work opens several interesting avenues for future work.
While virtual age and number of failures are useful proxies for the
condition of many systems (e.g., mileage and failure count for a
vehicle), it would also be useful to consider more direct measures
of system state such as those offered by condition monitoring
systems. Another extension here is to allow a range of failures as
dened by repair cost and time to occur in each state.