Vous êtes sur la page 1sur 12

Reliability Engineering and System Safety 119 (2013) 7687

Contents lists available at SciVerse ScienceDirect

Reliability Engineering and System Safety


journal homepage: www.elsevier.com/locate/ress

Value maximizing maintenance policies under general repair


Karen B. Marais n
School of Aeronautics and Astronautics, Purdue University, USA

art ic l e i nf o

a b s t r a c t

Article history:
Received 2 October 2012
Received in revised form
7 May 2013
Accepted 13 May 2013
Available online 28 May 2013

One class of maintenance optimization problems considers the notion of general repair maintenance
policies where systems are repaired or replaced on failure. In each case the optimality is based on
minimizing the total maintenance cost of the system. These cost-centric optimizations ignore the value
dimension of maintenance and can lead to maintenance strategies that do not maximize system value.
This paper applies these ideas to the general repair optimization problem using a semi-Markov decision
process, discounted cash ow techniques, and dynamic programming to identify the value-optimal
actions for any given time and system condition. The impact of several parameters on maintenance
strategy, such as operating cost and revenue, system failure characteristics, repair and replacement costs,
and the planning time horizon, is explored.
This approach provides a quantitative basis on which to base maintenance strategy decisions that
contribute to system value. These decisions are different from those suggested by traditional cost-based
approaches. The results show (1) how the optimal action for a given time and condition changes as
replacement and repair costs change, and identies the point at which these costs become too high for
protable system operation; (2) that for shorter planning horizons it is better to repair, since there is no
time to reap the benets of increased operating prot and reliability; (3) how the value-optimal
maintenance policy is affected by the system's failure characteristics, and hence whether it is worthwhile
to invest in higher reliability; and (4) the impact of the repair level on the optimal maintenance policy.
& 2013 Elsevier Ltd. All rights reserved.

Keywords:
Cost benet analysis
Dynamic programming
Maintenance
Markov processes
Reliability
Replacement

1. Introduction
Signicant material and personnel resources are allocated to
maintenance activities in companiesfor example over a quarter
of the total workforce in the process industry is said to deal with
maintenance work [22]. The importance of maintenance to industry is reected by the extensive and growing literature on optimal
maintenance, devoted to developing methods to ensure that these
considerable maintenance resources are allocated and used efciently, as they can be signicant drivers of competitivenessor
lack thereof if mismanaged (see the reviews by Pham and Wang
[19] and Wang [23]).
1.1. General repair maintenance policies
One class of problems considers the notion of general repair
maintenance policies, where, perhaps in conjunction with a
preventive maintenance program, systems are repaired or
replaced on failure. The question investigated in these studies
under various assumptions is, if the system has failed, when is it
n

Tel.: +1 7654940063.
E-mail address: kmarais@purdue.edu

0951-8320/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.ress.2013.05.015

better to replace, and when is it better to repair, and to what level?


Minimal repair returns the system to the condition it was in
immediately prior to failure, for example, patching a at tire. In
contrast, perfect repair returns the system to an as good as new
state. An engine overhaul may be seen as near-perfect repair.
(Worse repair where the system is in a worse condition after
repair; for example, if an engine suffers foreign-object damage due
to a lost tool, is also possible, but I do not consider it here) Many
optimal policies have been proposed, generally in the form of a
cost, age, or number of failures limit. Hence these policies are
typically referred to as repair limit policies.
Wang [23] suggests that repair limit policies were rst introduced by Gardent and Nonant [4] and Drinkwater and Hastings
[3]. Drinkwater and Hastings noted that while many organizations
used repair limits based on the type, age or location of a system,
there were no tools available to guide optimal actions. They
showed both analytically and through simulation how these limits
could be set to minimize the average repair cost per year of a eet
of vehicles. One problem of their approach is that the decision to
repair or replace only depends on that repair, and not on the
history of repairs of the system. A system could therefore limp
along through a series of repairs that fell just below the repair
cost limit, even though a replacement was justied when taking a
longer view. Beichelt [1] addressed this problem by using a repair

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

cost rate limit, that is, the system is replaced when the repair cost
per unit time exceeds a xed value. The repair history can also be
incorporated into the problem by considering the number of
failures as well as the system age; see Kapur et al. [8], Makis and
Jardine [13], and Love et al. [12].
The optimizations are usually carried out assuming that repair
or replacement is instantaneousanother set of policies is developed by setting a repair time limit rather than a repair cost limit. In
Nakagawa and Osaka's (1974) approach, a repair is abandoned if it
cannot be completed within a predetermined time. Nguyen and
Murthy [18] motivate the consideration of repair time by positing
a situation where basic (and imperfect) repairs can be completed
locally but more extensive (and perfect) repairs require central
repair. This situation is readily seen in industry, where, for
example, airlines have small maintenance facilities at most
airports but only a few large maintenance facilities (see also [15]).
While generally not considered as being repair limit policies,
policies based on replacing once the system exceeds a certain
number of failures have been suggested, as well as policies based
on replacing the system once it exceeds some reference operating
time, for example ight hours or vehicle miles (see Wang [23] for a
review).
This paper builds on Kijima [10], Makis and Jardine [13], and
Love et al. [12] to develop a stochastic deterioration model of a
system under a general repair policy. Kijima proposed that the
effect of repair could be modeled as reducing the system's virtual
age and then used a g-renewal function to determine the optimal
time between replacements [9]. He let Vn be the system's virtual
age after the nth repair, Xn the additional age incurred between the
(n1)th and nth repair, and n the level of repair. In his Type I
model, the nth repair cannot remove the damages incurred before
the (n1)th repair. Thus, after the nth repair the virtual age of the
system becomes:
V n V n1 n X n

Note that if we start with a new system (and any replacement


systems are also new) at t0, the system virtual age will therefore
always be less than or equal to the clock time.
The Type II model allows repair to remove damage caused by
prior failures too.
Several other authors have also used a similar simplication of
modeling repair as being able to only reduce wear since the last
repair, or being able to reduce wear from the beginning of system
use. Martorell et al. [17], coming from a proportional hazards
viewpoint, propose the proportional age setback (PAS) and proportional age reduction (PAR) models. The PAR model assumes
maintenance proportionally reduces the age gained since the
previous maintenance event, and is therefore similar to the Type
I model. The PAS model shifts the origin of the time from the
component age is measured, where the shift is proportional to the
degree of maintenance. It is therefore similar to the Type II model.
Doyen and Gaudoin [11], coming from a failure intensity viewpoint, model the impact of repair on failure intensity. They
propose two extremes: repair can at most reduce the increase in
failure intensity since the last repair, or, repair can reduce the total
increase in failure intensity from the beginning of system use.
Their models are therefore similar to Type I and Type II respectively. The authors add an interesting dimension by then allowing
a third model to vary between these two extremesthat is, repair
can reduce aging since the last m repairs, where m is set by the
modeler. In a Markov process sense, in other words the memory of
the process can be one step, m steps, or innite.
Makis and Jardine [13] used a semi-Markov approach and a
two-dimensional state space dened by the number of failures n
and the real age of the system tn to demonstrate that stationary
optimal policies that minimize the expected average cost per unit

77

time for Type I systems exist. By formulating their problem as a grenewal function they were able to nd such solutions; however,
this approach did not allow them to consider the effect of failure
history on failure densities.
Accordingly, Love et al. [12] developed a semi-Markov decision
structure using the (n, tn) state-space and proposed a numerical
search procedure that could be used to identify repair-cost minimizing general repair policies for Type I systems where both the
repair cost and the failure rate may depend on the state. Their
policy takes the form of a control limit sn that denes the
maximum virtual age for a given accumulated number of failures
beyond which the system should be replaced rather than repaired.
1.2. The value of maintenance
The question of whether the reliability gained through maintenance is worth the cost of maintenance however is usually not
addressed, due, in part, to the difculty in doing so. Dekker [2] for
example notes the main question faced by maintenance management, whether maintenance output is produced effectively, in
terms of contribution to company prots, [] is very difcult to
answer. Therefore maintenance planning is usually shifted from a
value maximization problem formulation to a cost minimization
problem. In short, as noted by Rosqvist et al. [20] a cost-centric
mindset prevails in the maintenance literature for which maintenance has no intrinsic value.
In previous work we have proposed an alternative approach
using an objective function related to the value of maintenance
[16]. Using a simple preventive maintenance example, we showed
how a maintenance strategy could be developed based on both an
assessment of the value of maintenancehow much is it worth to
the system's stakeholdersand an assessment of the costs of
maintenance.
The purpose of this paper is to show how general repair
policies that maximize system value can be developed for stochastically deteriorating systems. Section 2 qualitatively discusses
how the existing literature on general repair maintenance optimization can be leveraged to take a value perspective and then
develops the quantitative analytical basis, using a semi-Markov
decision process and discounted cash ow techniques. Section 3
explores the results and practical implications of the value
perspective and introduces a simple visualization for selecting
the optimal action at each system failure. Finally, Section 4
discusses the advantages and limitations of the proposed
framework.
Maintenance is often a signicant component of an organization's operating costs. This work offers a way of quantifying the
return on this investment, or, what I term the value of maintenance. The analytics developed here allow the identication of
value-optimal, or at least value-informed, maintenance policies.

2. Theory
This section rst provides a qualitative discussion of the
approach, and then develops the model and optimization.
2.1. The value perspective on general repair policies: a qualitative
discussion
My specic purpose in this paper is to develop a discrete semiMarkov decision structure for a nite horizon problem and then to
identify general repair policies that maximize the net present
value generated by the system. While the semi-Markov decision
structure is based on that proposed by Makis and Jardine [13] and

78

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

Love et al. [12], this approach differs from theirs in two


signicant ways.
First, maintenance is seen as a value-driver, rather than as a
cost-driver. The present work builds on the premise that engineering systems are value-delivery artifacts that provide a ow of
services (or products) to stakeholders. When this ow of services
is priced in a market, this pricing or rent of these system's
services allows the assessment of the system's value, as will be
discussed shortly. In other words, the value of an engineering
system is determined by the market assessment of the ow of
services the system provides over its lifetime. For example, the
value of a commercial passenger aircraft is related to the revenue
seat miles provided by the aircraft, while the value of a telecommunications satellite is related to the bandwidth offered by its
transmitters. This perspective has been developed in several
previous publications; see for example, Saleh and Marais [21]
and Marais and Saleh [16].
Second, the problem is explicitly dened with a nite time
horizon, in contrast to most maintenance optimizations, which
minimize average cost per unit time while explicitly or implicitly
assuming an innite time horizon. Investment decisions in general
are made with the assumption that the desired return will be
generated within some nite period. For example, an investor in a
wind farm will typically estimate the value of the investment over
15 years. The nite time horizon introduces important dimensions
that are not visible using an innite horizon, as will be discussed
shortly. See also Mamer [14] and Huang and Guo [7] for treatments of the nite horizon maintenance problem and
Hartman and Murphy [6] for a treatment of the nite horizon
replacement problem.
My argument is based on three key components:
First, I consider systems that deteriorate stochastically, and I
model their state evolution as a semi-Markov process using a
three-dimensional state space dened by the accumulated number of failures, the system's virtual age, and the clock time. The
probability of failure depends on the system's virtual age. Repairs
are Type I and reduce the system's virtual age acquired since the
last failure.
Second, I consider that the system provides a ow of service
per unit time, and this ow depends on the state of the system.
This ow in turn is priced. I consider both constant revenue with
deterioration, and decreasing revenue using a family of revenue
curves. Similarly, the operating cost of the system can be either
constant, or increasing using a family of cost curves. I then
calculate a discounted cash ow incorporating maintenance costs
resulting in a Present Value (PV) for each possible evolution of the
system, or value trajectory of the system. In the semi-Markov
environment, the possible trajectories are dened by the stochastic failures and the general repair decisions.
Third, I formulate a dynamic programming approach and use it
to assess the optimal expected value of the system for each state
and point in time. These values can then be used to determine the
optimal action (repair/replace) for a failure occurring in a given
state and point in time. Several interesting ndings result, for
example, a system that is worthwhile replacing towards the
beginning of the time period may not even be worth repairing
towards the end of the time period.
In the following section, I set up the analytical framework that
corresponds to this qualitative discussion.

2.2. An analytical model of the value of systems under general repair


policies
In developing the value model of maintenance, I make a
number of assumptions to keep the focus on the main argument

of this work. These assumptions will be progressively relaxed in


future work.
The assumptions are the following:
1. The system is modeled as a semi-Markov decision process.
2. Repairs are Type I [10] and the repair level is constant. When
a system is replaced, its virtual age resets to zero.
3. The failure intensity of the system is solely dened by its
virtual age.
4. The failure instants k are decision epochs, and at each failure
instant there are two possible actions: repair (a 1) and
replace (a 0). That is, the system state is observed only at
failure events.
5. Time at failure instant k is denoted by tk is discretized into
slices ik using a scaling parameter such that ik/tk o(ik+1)/.
Thus failures are assumed to occur precisely at ik [12]. For
example, if the time unit is hours, and the rst failure occurs
after 3:25 h, i1 3. The time slices are assumed to be much
smaller than the intervals between failures.
6. The state of the system is described by (nk, vk, ik) where nk is
the number of failures, vk is the discretized virtual age and ik is
the discretized time at the kth decision epoch. The index k on
the failure count is necessary to account for systems that are
replaced. Where possible without causing ambiguity, I will
suppress the subscript k and refer the state is (n, v, i).
7. The one-time replacement cost is xed at C0 and the system
has no salvage value.
8. One-time repair costs, C1, are a stationary bounded nondecreasing function of the number of failures that have
occurred and the system's virtual age (C1(n, v)K1, n1, v0).
9. Operating costs C2, are also a stationary bounded nondecreasing function of the number of failures that have
occurred and the system's virtual age (C2(n, v)K2, n1, v0).
10. Operating revenues C3, are a stationary bounded nonincreasing function of the number of failures that have
occurred and the system's virtual age (C3(n, v)K3, n1, v0).
Both operating costs and revenues are associated with specic
time slices and may change to reect for example increasing
fuel costs or decreased demand for the service.
By denition the problem time horizon is nite, so iimax where
imax is dened by the user. Since the virtual age cannot exceed the
total time that has passed, we also have vimax. In the semiMarkov formulation, the maximum number of failures in the nite
time horizon is bounded by the time discretization. Thus, the state
space is nite and discrete, I {(n, v, i)|0nN, 0vimax, 0iimax}.
2.2.1. Stochastic deterioration model
There are two possible actions: repair (a 1), and replace
(a 0). A system with virtual age vk has a discretized time to
failure xk. After failure nk+1, but before repair, the virtual age is:
vk1 vk xk
where the denotes the virtual age prior to repair.
The Type I repair reduces the virtual age gained since the last
sojourn, so that after repair the virtual age is:
v
k1 vk xk
Thus repair on failure nk+1 takes the system from (nk, vk, ik) to
(nk+1, vk+xk, ik+xk) at the next decision epoch, where xk is the
discretized time to failure given the virtual age after repair vk.
Replacement takes the system from (nk, vk, ik) to (1, x0, ik+x0) at the
next decision epoch, where x0 is the discretized time to failure of a
new system.
The discrete sojourn time xk between the nth and (n+1)th
failure is assumed to depend only on the failure rate function and

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

and

the virtual age, though it can be extended to include dependence


on the accumulated number of failures. In Type I imperfect repair,
the repair can at best reduce the effective aging incurred since the
last failure, so the probability density function (pdf) of xk is the
time to rst failure conditioned on the virtual age vk at the kth
failure.
Assuming a repair was performed at (n, v, i), the discretized
conditional pdf of x is
f x v=
f x v=

1Fv=
Fv=

where f(x) is the pdf of the rst time to failure. Here the time index
is not necessary because the transition probabilities do not depend
on clock time.
Now the transition probabilities can be determined. First,
consider the transition from a new system to the rst failure at
time x1:
Z x1 1
f xdx
3
P 0;01;x1

xf v xdx

w0; 0; 0 C 0 C 2 0 C 3 0

where C0 is cost of acquiring the new system.


The rst possible failure opportunity is at i1 (see assumption
4). If the system does not fail, its virtual age increases by one time
step and the value of the revenue in time step 1 is w(0,1,1). If the
system fails it can either be repaired, as indicated by the arrow and
the dotted line, resulting in revenue in time step 1 of

x1

The integration begins at x1 since by denition the failure could


not occur before x1 and ends at (x1+1) to account for the
discretization (see assumption 5).
Alternatively, after a repair after the kth failure, the system
transitions to the next failure at time (vk+xk):
Z xk 1
f v xdx
4
P n;vk n1;vk xk

w1; ; 1 C 1 C 2 C 3

where is the virtual age of the system after repair and C2 and C3
are the operating cost and revenues of a system of virtual age .
For simplicity the repair cost here is shown as independent of the
state; this dependency can easily be incorporated into the analysis.
Alternatively, if the system fails, it can be replaced, as indicated
by the diamond and the dotted line, resulting in revenue in time

xk

The expected mean durations for repair and replacement are:


Z
n;0 a 0
xf xdx
0

Time
2

2.2.2. Cost and revenue ow


Fig. 1 shows the possible state transitions for a four-step
problem. The problem is formulated as a semi-Markov decision
process, but since it must be discretized to be solved on a
computer, I consider the possible events at each time step. Time
is shown on the horizontal axis while the vertical axis shows the
virtual age. Dene wn; v; i as the net value (revenue costs)
generated over one time step by a system with n accumulated
failures and virtual age v, at time i. At time i0 the system is new
and the associated value of the revenue ow to the next time step
is

4
0
w

,1

,2

,3

,0

,0

(0

(0

,0

,0

,0

(0

(0

(1
w
3)

,
,1
(0

)
)
,1
,1
,
,1
(0
w

)
,3
,

(1

)
)
,2
,2
,
,1
(0
w

(1

w
w

2)
,

(1
,1
)
,2
)
+ 2,2
,
(0
w

)
)
3) ,3
, + 2,3
,
,2 ,1
(0
(2 (1
w
w

,2

(2

w
)
,3
3)
)
, ,3)
2
,3
+
+
3
,1 1,2 0,3
(
(
(2
w
w
w

,
(3

Repair

Replace

Fig. 1. State transitions and net revenues for a four-step problem.

Virtual Age

f v x

n;v a 1

79

80

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

step 1 of
w0; 0; 1 C 0 C 2 0 C 3 0

where the virtual age has now reset to zero.


In a similar manner the net revenue at each time step can be
assessed depending on whether or not a failure has occurred and
on which action (repair/replace) was taken
8
no failure
>
< C 2 v C 3 v
repair
8
wn; v; i C 1 C 2 v C 3 v
>
: C C 0 C 0
replace
0

where the time intervals are assumed to be sufciently small that


any changes in cost and revenues are negligible.

2.2.3. A dynamic programming formulation


The optimal action for failures occurring for each possible
combination of state and time can be determined using a backwards recursion from the time horizon as follows. Dene W(n, v, i)
as the optimal expected net present value looking forward from
time step i to time step imax for a system with n failures and virtual
age v. At each failure instance we then seek a general repair policy
a such that (c.f. [12])

the system and market assumptions affect the optimal maintenance policy.

3.1. The nominal case


First, the results are discussed for a set of nominal parameters.
For clarity in the graphical presentation of results, the parameters
vary only with virtual age and not with the accumulated number
of failures. Also, for comparison with Makis and Jardine [13],
Kijima [10], and Love et al. [12], I model the lifetime of a new
system using a Gamma distribution with density f t t 1 =
expt and rst mean passage time (or expected time to
failure) = where the shape parameter is set equal to the scale
parameter . Operating costs and revenues increase and decrease
with system virtual age as follows:
v=

C 2 v= a2 b2

12

v==c3

C 3 v= a3 b3

Fig. 2 shows the costs and revenues using the nominal values
assumed in the simulation.
Table 1 summarizes the remaining nominal values.

9
8
R i i

=
<
C 0 PVC 3 C 2 ; n; 0; xmax
0 P 0;0;1;x W1; x; i xdx

Wn; v; i max
R imax i
0
0
a 0;1: C PV C C ; n; v ;
f
P n;vn1;v0 x Wn 1; v x; i xdj ;
1
3
2

x0

Wn; v; ijf PVC 3 C 2 ; n; v;


Z imax i
P n;vn1;vx Wn 1; v x; i xdj

x0

10

The optimal policy is found by setting W to zero for iimax and


then working backwards to i0 as follows. Begin at time step
i imax 1; in the example shown in Fig. 1 it would be time 3. For
each virtual age, calculate the value looking forward, assuming
that the system has failed, for the case where maintenance is
performed (triangle node), the case where the system is replaced
(square node), and the case where nothing is done (circular node).
The virtual ages depend on the chosen repair level. Select the
option that gives the maximum value. This calculation is shown by
Eq. (9), where the next step W is set to zero. Also calculate the
value looking forward assuming that the system has not failed;
this calculation is shown by Eq. (10), where the next step W is set
to zero. The probability of the system failing for each time and
virtual age combination is given by Eqs. (3) and (4). Thus the
expected value of each node is given by
Wn; v; i pf ailed Wn; v; ijf 1pf ailed Wn; v; ijf

11

Now, step back one more time step (in the example shown in
Fig. 1, to time 2). Repeat the previous calculations for the failed
and functioning cases, using the next step W's just calculated.
Repeat the process until the rst time step is reached.
3. Results and discussion
This section illustrates the concepts introduced in the previous
section using a hypothetical system, and shows how changes in

Fig. 3 shows the repair/replace decision that maximizes the net


present value of the system for each possible virtual age and
calendar time combination. The gure reads as follows. The x-axis
shows calendar time and the y-axis shows the virtual age of the
system. The virtual age of the system cannot exceed the calendar
time, but may be as low as zero if the system has been replaced. In
the black region of the plot the system should be repaired if a
failure occurs, while in the gray region the system should be
replaced. For example, if a failure occurs at time four years to a
system with virtual age 1 year (as indicated by the black ellipse on
the gure), it is best to replace the system. However, 6 months
later at time 4.5 years (the white rectangle on the gure), it is
better to repair the system. Thus the decision to repair or replace
the system depends on both the system's condition as represented
by its virtual age (i.e., failure probability and operating prot) and
on the time remaining.
The gure can be constructed by tracking the repair/replace
decision the optimization makes for each time and virtual age
combination. In this case, I used Matlab and created a twodimensional matrix where the rows represent time and the columns

20
Operating Cost and Revenue

where v0 is the virtual age after the repair action and PV is the
present value of the cost and revenue stream for the expected
mean duration and discount factor, , scaled to the time
interval size.
Since for simulation purposes W is needed at each time step,
when no failure has occurred W is updated according to

Operating Cost
Revenue
15

10

0.5

1.5

2.5

3.5

Virtual Age [years]


Fig. 2. Nominal cost and revenue.

4.5

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

Table 1
Nominal parameters.
Parameter

Nominal value

Remarks

C0
C1
a2, b2
a3, b3, c3
tmax

3
3
10
5
1,1.15
20, 1.2, 4
5 years
30
0.8
5%

Gamma distribution
Replacement cost
Repair cost
Operating cost parameters
Revenue parameters
Time horizon
Time slices per year
Repair level
Annual interest rate

5
Gray = Replace
Black = Repair

4.5
4

3.2. Variation in optimal maintenance strategy

This section explores the impact of varying the following


parameters on the optimal maintenance policy relative to the
nominal values given in Table 1: replacement and repair costs;
operating costs and revenues; time horizon; failure characteristics;
and the repair level.

2.5
2
1.5
1
0.5
0

Fig. 4 shows the optimal decisions when revenue is set to zero


and the other parameters are kept at the nominal values. In this case,
the value maximization becomes the standard cost minimization,
since revenues are zero. Because the revenue benets of replaced
systems with lower virtual ages are not considered, and because
replacement is more costly, replacement becomes less attractive.
Figs. 3 and 4 show graphically how bringing system revenue into
consideration alters the optimal policy. For the nominal case, applying the maintenance policy based on minimizing cost rather than the
policy based on maximizing value results in a 20% penalty on the
expected net present system value. While the specic numbers are
for illustrative purposes only, it is important to note the signicant
penalty that results from using a cost minimization approach.
The representation of the optimal repair/replace decisions
shown in Fig. 3 offers a convenient graphical summary of the best
action to take in response to a failure at any time and virtual age. It
can be prepared in advance for a particular system and market
conditions. Where there is uncertainty about future market conditions (e.g., future operating revenue), several scenarios can be
examined, as shown in the next section.

2
3
Time [years]

Fig. 3. Optimal repair/replace decisions as a function of virtual age and calendar


time for the nominal problem.

virtual age. Each matrix element is then set to zero (repair), or to one
(replace), or to undened (impossible time and virtual age combinations). The matrix is then plotted such that the value one corresponds
to gray, zero to repair, and undened to no color.
The lack of denition at the border between the repair and
replacement areas in the graph can be addressed by decreasing the
time step size, which results however in very long run times. For
this paper the time step is therefore kept at one thirtieth of a year,
which yields sufciently clear results while keeping run time at a
reasonable time.
It is best to repair the system in two situations: (1) when the
virtual age is low and the failure probability is low and operating
prot high; or (2) when the system is close to the time horizon
and there is not enough time to recoup the investment in a new
system. Contrary to the cost-centric viewpoint, the system does
not have a maximum virtual age beyond which it is always better
to replacethis nding arises because the nite time horizon
means that late investments in new systems cannot be recouped.
While incorporating a salvage value may shift the curve somewhat
in favor of replacement, it is unlikely to result in a maximum age
because older systems will have lower salvage values.
Conversely, it is best to replace the system when the failure
probability is high, the operating prot is low (high virtual age),
and there is sufcient remaining time horizon to recoup the
investment in a new system.

3.2.1. Replacement and repair costs


Replacement, while more expensive, offers benets in terms of
both reduced probability of failure and increased operating prot. I
consider here how the relative value of repair and replacement
changes as their costs are varied. Fig. 5 shows the change in the
expected maximum net present value (NPV) as the replacement
cost is increased relative to the nominal repair cost. As the ratio r
of replacement to repair cost increases, the present value
decreases, to the point r0 where it becomes negative. The variation
in the crossover value r0 as a function of the problem parameters is
left as a subject for future work. For a given set of operating cost

5
4.5

Revenue = 0
Black = Repair
Gray = Replace

4
3.5

Virtual age [years]

Virtual age [years]

3.5

81

3
2.5
2
1.5
1
0.5
0

2
3
Time [years]

Fig. 4. Optimal repair/replace decisions for the nominal problem when revenue is
not considered.

82

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

replacements, and, in the extreme case where repair costs


approach replacement costs, replacement will always be the
optimal decision.

and revenues, such a system is always unprotable, regardless of


maintenance policy. Similarly, increasing both repair and replacement costs will also eventually result in a system that is always
unprotable to operate, regardless of maintenance policy.
Fig. 6 shows the repair/replace decision plots when the repair
and replacement costs are equal, and when replacement is much
more costly than repair, relative to the nominal case shown in
Fig. 3. When the replacement cost is equal to (or less than) the
repair cost, there is nothing gained by not replacing and the
decision is always to replace. As the replacement cost increases
relative to the repair cost, the net value of replacement decreases.
Finally, once the replacement cost becomes too high, it is always
better to repair.
In the nominal case the repair costs are constant with virtual
age, limiting the cost associated with an older system. However, if
repair costs increase with virtual age, repair becomes less attractive because the immediate benet of lower repair/replacement
costs is smaller and the reliability and operating prot of an older
repaired system are lower. Consider for example variable costs of
the following form, as also shown in Fig. 7 (cf. [12]):
C1v= pv=1:5 1

3.2.2. Operating costs and revenues


The value-centric model proposed here allows the impact of
operating revenues on maintenance decisions to be captured, in
contrast to cost-centric models, which do not consider this aspect.
In this model the decision to repair or replace a system is driven
both by the maintenance and operating costs and by the system
revenues. The impact of varying repair and replacement costs has
been explored above; here, I consider the impact of varying
operating costs and revenues. System operating prot can be
increased in two ways: by decreasing operating costs, or by
increasing operating revenue. While the numerical impact of these
changes may be the same, they are achieved in different ways.
Operating costs depend primarily on the system design and the
structure of the organization. Operating revenues, in contrast,
depend primarily on the market conditions (e.g., total market size,
market share). Stated simplistically, operators have two levers to
increase prot: a design lever to decrease operating cost, and a
marketing lever to increase revenues. In turn, the operating prot
affects the optimal maintenance strategy, and the maintenance
strategy can be used to increase the total value under a given set of
market conditions.
To make the importance of operating cost and prot salient,
consider rst the somewhat stylized situation where operating
cost and revenue are independent of system condition. Here we
expect that since the prot does not decrease with system
deterioration, investments in replacement do not yield much

13

Setting p 0 results in constant repair cost of 1.


Fig. 8 shows the resulting optimal repair/replace decisions for
these three curves, together with a constant cost curve (p 0,
C1 1). As repair cost increases more rapidly with virtual age
(larger p), the relative value of replacement increases, because
(1) repair approaches the cost of replacement and (2) the expected
cost of future repairs on a repaired system is higher. Thus systems
with rapidly increasing repair costs will result in more frequent

7
p = 0.05
6

Repair Cost

p = 0.2
5

p = 0.5

4
3
2
1

1.5

2.5

3.5

Virtual Age [years]

Fig. 5. Impact of replacement and repair costs.

Fig. 7. Variable repair cost curves.

5
C0 = C1 = 5
Gray = Replace

Black = Repair

Virtual age [years]

Virtual age [years]

C0 = 10C1 = 50

2
3
Time [years]

2
3
Time [years]

Fig. 6. Repair/replace decision when replacement costs are varied relative to repair costs.

4.5

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

5
p=0
Black = Repair
Gray = Replace

2
3
Time [years]

2
3
Time [years]

5
p = 0.2
Black = Repair
Gray = Replace

p = 0.5
Black = Repair
Gray = Replace

4
Virtual age [years]

4
Virtual age [years]

p = 0.05
Black = Repair
Gray = Replace

Virtual age [years]

Virtual age [years]

83

2
3
Time [years]

2
3
Time [years]

Fig. 8. Repair/replace decision when repair cost increases with virtual age.

20
Operating Cost and Revenue

value. The only benet of replacement lies in the reduced failure


probability; the additional benet of increased prot at low virtual
ages is lost.
In contrast, if operating prot decreases rapidly as the system
deteriorates because for example the system uses more fuel or is
less attractive to customers, replacement becomes more attractive.
In this case, replacement offers both reduced failure probability
and increased operating prot. To investigate this case, I considered the more rapidly changing cost and revenue curves shown in
gray in Fig. 9.
Fig. 10 shows the resulting optimal repair/replace decisions
when prot is constant, and when prot decreases rapidly with
system condition. When prots are constant, replacement is less
attractive, as expected. However, when prots decrease more
rapidly, replacement becomes more attractive. Replacement
allows the higher operating prots at lower virtual ages to be
realized more often, and offers the added benet of lower probability of failure in the future.
Next, consider the case where the operating prot is lower or
higher than the nominal case across all system states. A higher
prot may for example correspond to a market in which the
system's revenues have increased (e.g., increased demand for
airline tickets over the summer), or to a system or organizational
design with lower operating costs (e.g., a more fuel efcient
aircraft, or a new labor agreement). The optimal strategy is not
obvious: repair may be more attractive because prots are

Operating Cost
Revenue
15

10

0.5

1.5

2.5

3.5

4.5

Virtual Age [years]


Fig. 9. Operating cost and revenue variation (b2 1.5; c3 1). The black curves
show the nominal case, the gray curves show the more rapidly changing case.

increased in all system conditions, or, replacement may be more


attractive because the investment is easily recouped.
Fig. 11 shows the impact on the optimal maintenance decision
of varying the operating prot by increasing the operating revenue
relative to the operating cost. When the operating prot is low,
replacement is less attractive because the return on investment in
a new system is low. As the prot increases, the investment in
replacement becomes more valuable because of the benets of
increased revenue and reduced probability of failure. Therefore the

84

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

More Rapid
Black = Repair
Gray = Replace

4
Virtual age [years]

4
Virtual age [years]

Constant Profit
Black = Repair
Gray = Replace

2
3
Time [years]

0
0

2
3
Time [years]

Fig. 10. Repair/replace decision when operating prot is (a) constant, and (b) decreases rapidly.

5
C3 = 0.5 Nominal

Virtual age [years]

Virtual age [years]

Gray = Replace
4 Black = Repair

Gray = Replace
Black = Repair

C3 = 2*Nominal

2
3
Time [years]

2
3
Time [years]

Fig. 11. Repair/replace decision when operating revenue is varied relative to operating cost.

10

Black = Repair
Gray = Replace

Black = Repair

Virtual age [years]

Virtual age [years]

0.8

0.6

0.4

0.2

0.5
Time [years]

4
6
Time [years]

10

Fig. 12. Repair/replace decision when the time horizon is (a) decreased, and (b) increased.

maintenance strategy should depend not only on the system


characteristics as represented by operating and maintenance costs,
but also on the market conditions as represented by the operating
revenue.

3.2.3. Time horizon


Because the optimal decision depends also on the time remaining, the maintenance strategy also depends on the planning
horizon. Fig. 12 shows the optimal repair/replace decisions for

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

= = 10
Black = Repair
Gray = Replace

Virtual age [years]

4
Virtual age [years]

= =1
Black = Repair
Gray = Replace

85

2
3
Time [years]

2
3
Time [years]

Fig. 13. Repair/replace decision for changing failure characteristics.

Zero revenue
= = 10
Black = Repair
Gray = Replace

4
Virtual age [years]

4
Virtual age [years]

Zero revenue
= =1
Black = Repair
Gray = Replace

2
3
Time [years]

2
3
Time [years]

Fig. 14. Repair/replace decision for changing failure characteristics when revenue is not considered.

the nominal system under two different planning horizons. When


the planning horizon is short, it is better to repair, because
(1) there is little time to recoup the investment in a new system,
and (2) the increase in reliability is small since the system's virtual
age is always low. In contrast, as the planning horizon increases,
replacement becomes more valuable. It is never advisable to
replace near the end of the planning horizon, unless this action
extends the planning horizon. Thus for example an organization
that has been maintaining a wind turbine under a maintenance
contract or warranty would choose to repair the system as the
contract or warranty approaches expiration, whereas the turbine
owner might choose to replace the system in the hope of
extending the wind farm's lifetime.

, are increased, the probability that the system will fail increases
as the system ages (as does its variance). In this case, replacement
becomes more valuable since it returns the system to a lower
virtual age and hence lower probability of failure.
The optimal decision determined using the value approach is
signicantly different from that obtained when revenue is not
considered, that is, when cost is minimized, as shown in Fig. 14.
For both the low and high failure probability scenarios, ignoring
revenues results in fewer replacements. When revenue is not
considered, only the reduced failure probability and operating cost
benets offered by newer systems are considered. Therefore the
benets of new systems are underestimated, resulting in a value
sub-optimal strategy, as discussed earlier.

3.2.4. Failure characteristics


If costs and revenues are kept constant, increasing the system
reliability increases the system expected value (See Saleh and
Marais [21] for a discussion of why the cost of additional reliability
may not always be recouped.). The optimal maintenance strategy
also changes in interesting ways, as shown in Fig. 13. When the
probability of failure is low and does not increase signicantly as
the system ages (i.e., low gamma parameters, and ), repair is
relatively more attractive, since the benets of increased reliability
offered by replacement are small. As the gamma parameters, and

3.2.5. Repair level


The repair level indicates how much the condition of the
system is improved by a repair. How does the optimal maintenance strategy change as repair is made more or less extensive?
Recall that by convention low values of the repair level mean
that the repair level is high (reduction in virtual age is large), and
high levels mean that the repair level is low. As is decreased, the
repair approaches perfect maintenance. When 0, repair and
replacement are equivalent.

86

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

= 0.98
Black = Repair
Gray = Replace

2
3
Time [years]

2
3
Time [years]

5
= 0.4
Black = Repair
Gray = Replace

= 0.2
Black = Repair
Gray = Replace

4
Virtual age [years]

4
Virtual age [years]

= 0.8
Gray = Replace
Black = Repair

4
Virtual age [years]

Virtual age [years]

2
3
Time [years]

2
3
Time [years]

Fig. 15. Repair/replace decision for decreasing repair levels.

5
= 0.98
C = 4.75
1

Black = Repair
Gray = Replace

Virtual age [years]

Virtual age [years]

= 0.2
C1 = 7
Black = Repair
Gray = Replace

2
3
Time [years]

2
3
Time [years]

Fig. 16. Repair/replace decision for decreasing repair levels with repair cost adjustment.

Fig. 15 shows the optimal repair/replace decisions for the


nominal system as the repair goes from near-minimal (0.98)
to near-perfect ( 0.2). Here the nominal constant repair costs are
assumed. When repair is near-minimal, it offers little benet in
terms of reduced failure probability or operating prot.

Replacement is therefore more attractive and there is a clear


division between repair and replacement, as shown in the rst
part of the gure. As repair becomes more extensive, it becomes
more like replacement and offers greater benets in terms of
reduced failure probability or operating prot. Repair therefore

K.B. Marais / Reliability Engineering and System Safety 119 (2013) 7687

becomes more attractive, and the division between repair and


replacement becomes less dened. Stated differently, for low the
selection of repair/replace is quite sensitive to time and virtual age
this sensitivity occurs because high repair levels have similar
effects on reliability and operating prot to replacement. Finally,
when repair is near-perfect, there is little distinction between
repair and replacement.
Note that this trend persists even when the repair cost is
increased as the repair becomes more extensive, as shown in
Figs. 16 for 0.98 and 0.2.
This section has explored the impact on the optimal maintenance policy of varying the replacement and repair costs; the
operating costs and revenues; the time horizon; the failure
characteristics; and the repair level.

87

A second area for future work revolves around more detailed


modeling of the market and operating environment. Earlier I
alluded to the idea of using different scenarios to identify optimal
maintenance policies, another approach is to model the market
stochastically (cf. [5]). The effect of relaxing the assumption that
repair and replacement times are negligible should also be
investigated.
Finally, while this approach allows the optimal maintenance
policy to be determined ahead of time, it is computationally
intensive, and the problem dimension increases rapidly as the
planning horizon extends. Computationally efcient approaches to
solving the problem would be a fruitful venue for future work.

Acknowledgments
4. Conclusion
In previous work we have argued that while maintenance is
traditionally seen as a cost-driver, this view is limited and ignores
the contribution of maintenance to the value of a system. This
paper shows how the value view of maintenance can be applied to
the familiar general repair problem. I used a semi-Markov decision
process coupled with a discounted cash ow techniques to
estimate the net present value of a system under different
responses to failure, and then used a dynamic programming
approach to identify the optimal actions for any given time and
system condition, represented here by the system's virtual age.
The analysis showed that the value perspective results in
different decisions and that ignoring system revenue results in
value sub-optimal strategies that decrease the net value of the
system.
This approach provides a quantitative basis on which to base
maintenance decisions and thus ensure maximum expected
value. In particular, the results show:
1. The optimal action for a given time and condition changes as
replacement and repair costs change, and identies the point at
which these costs become too high for protable system
operation. This approach can therefore be used to identify
lemon designs that cannot be rescued through careful
maintenance.
2. The impact of planning horizon on the optimal action. For
shorter planning horizons it is better to repair, since there is no
time to reap the benets of increased operating prot and
reliability. As the planning horizon grows, replacement
becomes more attractive.
3. The impact on the optimal maintenance policy of the system's
failure characteristics. In particular, it is better to replace
systems where the probability of failure increases rapidly with
deterioration. This approach can therefore be used to assess the
value of investing in higher reliability, either through inherently more reliable systems, or through preventive
maintenance.
4. The impact on the optimal maintenance policy of the repair
level. As the repair level is decreased, the relative value of
replacement increases, because lower repair results in lower
reliability gains. This approach can therefore be used to
determine the optimal repair level.
This work opens several interesting avenues for future work.
While virtual age and number of failures are useful proxies for the
condition of many systems (e.g., mileage and failure count for a
vehicle), it would also be useful to consider more direct measures
of system state such as those offered by condition monitoring
systems. Another extension here is to allow a range of failures as
dened by repair cost and time to occur in each state.

This work was partially funded through a Purdue Research


Foundation Summer Faculty Fellowship.
References
[1] Beichelt F. A replacement policy based on limits for the repair cost rate. IEEE
Transactions on Reliability 1982;31(4):4012.
[2] Dekker Rommert. Applications of maintenance optimization models: a review
and analysis. Reliability Engineering and System Safety 1996;51:22940.
[3] Drinkwater RW, Hastings NVJ. An economic replacement model. Operational
Research Quarterly 1967;18:12138.
[4] Gardent P, Nonant L. Entretien et renouvellement dun parc de machines.
Revue Franaise de Recherche Operationelle 1963;7:519.
[5] Hartman JC. An economic replacement model with probabilistic asset utilization. IIE Transactions 2001;33(9):71727.
[6] Hartman JC, Murphy A. Finite-horizon equipment replacement analysis. IIE
Transactions 2006;38(5):40919.
[7] Huang Y, Guo X. First passage models for denumerable semi-Markov decision
processes with nonnegative discounted costs. Acta Mathematicae Applicatae
Sinica 2011;27:26376.
[8] Kapur PK, Garg RB, Butani NL. Some replacement policies with minimal
repairs and repair cost limit. International Journal of Systems Science 1989;20
(2):26779.
[9] Kijima M, Morimura H, Suzuki Y. Periodical replacement-problem without
assuming minimal repair. European Journal of Operational Research 1988;37
(2):194203.
[10] Kijima M. Some results for repairable systems with general repair. Journal of
Applied Probability 1989;26(1):89102.
[11] Laurent Doyen, Olivier Gaudoin. Classes of imperfect repair models based on
reduction of failure intensity or virtual age. Reliability Engineering and System
Safety 2004;84(1):4556.
[12] Love CE, Zhang ZG, Zitron MA, Guo R. A discrete semi-Markov decision model
to determine the optimal repair/replacement policy under general repairs.
European Journal of Operational Research 2000;125(2):398409.
[13] Makis V, Jardine AKS. A note on optimal replacement policy under general
repair. European Journal of Operational Research 1993;69:7582.
[14] Mamer JW. Successive approximations for nite horizon semi-Markov decision processes with application to asset liquidation. Operational Research
1986;34:638644.
[15] Mane M, Crossley W. Probabilistic approach for selection of maintenance
facilities for air taxi operations, AIAA-2007-7786. In: Proceedings of the 7th
AIAA aviation technology, integration and operations conference. Belfast,
Northern Ireland, UK; September 1820 2007.
[16] Marais KB, Saleh JH. Beyond its cost, the value of maintenance: an analytical
framework for capturing its net present value. Reliability Engineering and
System Safety 2009;94(2):64457.
[17] Sebastian Martorell, Sanchez Ana, Serradell Vicente. Age-dependent reliability
model considering effects of maintenance and working conditions. Reliability
Engineering and System Safety 1999;64(1):1931.
[18] Nguyen DG, Murthy DNP. A note of the repair limit replacement policy. Journal
of Operational Research Society 1980;31:11034.
[19] Pham Hoang, Wang Hongzhou. Imperfect maintenance. European Journal of
Operational Research 1996;94:42538.
[20] Rosqvist T, Laakso K, Reunanen M. Value-driven maintenance planning for a
production plant. Reliability Engineering and System Safety 2009;94
(1):97110.
[21] Saleh JH, Marais KB. Reliability: how much is it worth? Beyond its estimation
or prediction, the (net) present value of reliability. Reliability, Engineering and
System Safety 2006;91(6):66573.
[22] Waeyenbergh G, Pintelon L. A framework for maintenance concept development. International Journal of Production Economics 2002;77(3):299313.
[23] Wang Hongzhou. A survey of maintenance policies of deteriorating systems.
European Journal of Operational Research 2002;139:46989.

Vous aimerez peut-être aussi