Académique Documents
Professionnel Documents
Culture Documents
a r t i c l e in f o a b s t r a c t
Article history: This paper presents an application of stochastic Petri nets (SPN) to calculate the availability of safety
Received 17 July 2009 critical on-demand systems. Traditional methods of estimating system reliability include standards-
Received in revised form based or eld return-based reliability prediction methods. These methods do not take into account the
15 January 2010
effect of fault-detection capability and penalize the addition of detection circuitry due to the higher
Accepted 19 January 2010
parts count. Therefore, calculating system availability, which can be linked to the systems probability
Available online 29 January 2010
of failure on demand (Pfd), can be a better alternative to reliability prediction. The process of estimating
Keywords: the Pfd of a safety system can be further complicated by the presence of system imperfections such as
Safety critical partial-fault detection by users and untimely or uncompleted repairs. Additionally, most system
Failure on demand
failures cannot be represented by Poisson process Markov chain methods, which are commonly utilized
Occupant safety
for the purposes of estimating Pfd, as these methods are not well-suited for the analysis of non-Poisson
Petri nets
System availability failures. This paper suggests a methodology and presents a case study of SPN modeling adequately
Fault detection handling most of the above problems. The model will be illustrated with a case study of an automotive
Airbag electronics airbag controller as an example of a safety critical on-demand system.
IEC 61508 & 2010 Elsevier Ltd. All rights reserved.
ISO 26262
0951-8320/$ - see front matter & 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.ress.2010.01.008
ARTICLE IN PRESS
A. Kleyner, V. Volovoi / Reliability Engineering and System Safety 95 (2010) 606613 607
provide increased protection to vehicle occupants under any If, as in colored Petri nets [16], tokens can have unique
condition. identities (labels), an alternative interpretation of ring facilitates
In order to account for the fault detection and consequent the preservation of the information about the systems past
repair of the system, the availability, or (1 Pfd), should be states: rather than considering removing a token from the
evaluated instead of simple reliability. In addition to that, for transitions input place and depositing a different token to the
safety-related systems, reliability requirements in product speci- output place as two disjoint actions, one can unite these two
cations are typically very high (0.9999 and higher), which would actions into a single action of moving the same token from an
associate with SIL 34 categories of IEC 61508 [1] or ASIL CD of input place to the output place. Memory can be assigned to tokens
ISO 26262. Therefore, traditional reliability demonstration testing with the result of the aging tokens [17]. Such tokens can move
would be cost prohibitive due to the extremely large number of freely throughout the Petri net without losing their memory.
test samples required to demonstrate those kinds of numbers [5]. While proliferation of great variety of versions and modeling
The only reasonable option to meet the specication would be to styles used in SPN modeling can be construed as a testimony of its
conduct a comprehensive modeling of the system availability. popularity and exibility, this also facilitates confusion among
To this end, the use of stochastic Petri nets (SPN) can be reliability practitioners who are used to relatively rigid and
suggested, as described next. standardized frameworks such as reliability block diagrams and
fault trees. In this context, clarity and simplicity of modeling is of
great importance [18] and the reader is invited to compare the
1.2. Reliability modeling using stochastic Petri nets models presented in this paper to the previously published
models that address a similar application [19].
A graphical framework called Petri nets was introduced by C.A.
Petri [6]. This framework focuses on modeling component states
that comprise the system, so that the state of the system can be 2. Model formulation
inferred from the states of its components. Possible states (called
places) are denoted with circles with the objects called tokens
This section will present several modeling scenarios for the
(denoted by small lled circles) occupying one of the places at a
reliability/availability of an automotive occupant safety system
time. The combined position of all the tokens in Petri net is referred
simulated as on-demand emergency system.
to as marking. Possible paths of token movements among places are
modeled using so-called transitions depicted as lled rectangles.
Movements of the tokens correspond to ring of transitions, 2.1. Modeling procedure
where the tokens from all input places are removed, and tokens
are deposited to all output places for this transition. Importantly, The function of the automotive occupant safety system is to
the ring of a transition can only occur when it is enabled, i.e., provide an emergency function (e.g., airbag deployment) in
certain conditions are satised. For example, an inhibitor arc that response to an event such as a vehicle crash. The simplied
connects a place anywhere in the model and a transition (the arc version of an emergency on-demand system in Fig. 1 consists of
is depicted by using a hollow circle at the transitions end) can the fault-detection system, power supply and user warning
disable the transition if there is sufcient number of tokens in the system, which can be as simple as a warning light.
place (this place does not have to be either input or output place The systems failure to perform its functions (i.e., to deploy an
for this transition). airbag in the case of vehicle crash) can occur when the emergency
The original Petri net has not included the concept of time, so system failure is combined with one of the following conditions:
that enabled transitions re immediately. Such Petri nets can be
particularly useful in safety assessments as formal methods are
1. System failure is not detected by the fault-detection system.
available to analyze so-called reachability of undesirable
2. System detected the problem, but failed to notify the user.
(unsafe) states and identify non-trivial scenarios that can lead
3. System notied the user, but the user failed to take reparative
to unsafe states [7]. These scenarios are of great importance for
action.
safety and reliability as they are analogous to cut sets in fault
4. The repair was scheduled or initiated, but was not completed
trees in the dynamic context where the order of events is taken
before the vehicle crash.
into account. However, the likelihood of those scenarios cannot be
quantitatively evaluated without explicit account of timing. To
this end, an extension called a stochastic Petri nets (SPN) was
developed some years later [8] and is a subset of so-called non-
autonomous Petri nets [9] and is of particular relevance to the
modeling of time-dependent system reliability (see, for example
[10,11]). SPN introduces delays between the enabling and ring of
a transition that are transitions attributes and can be either
absent, deterministic, or sampled from a given distribution
(stochastic). It is possible to provide an equivalent model to the
Markov representation exponential distributions for ring delays.
SPN is often used as a modeling preprocessor: so the model is
internally converted to Markov state space and solved using
standard Markov methods [12]. However, a discrete event (e.g.
Monte Carlo) simulation can be used to solve SPN directly [13] as
opposed to using the Markov method, which allows the use of
non-exponential statistical distributions. Depending on the con-
guration of the system, the error due to the use of an exponential
delay with the same mean value for non-exponential distributions Fig. 1. Simplied diagram of an automotive occupant safety system with fault
can be quite signicant [14,15]. detection.
ARTICLE IN PRESS
608 A. Kleyner, V. Volovoi / Reliability Engineering and System Safety 95 (2010) 606613
Condition 4 may occur because it takes a certain amount of upon it. Consequently, (1y) fraction of systems will not be
time for the system to be repaired. In exponential form this repaired after fault detection. Therefore, the overall probability of
pd
feature is expressed as m, mean time to repair (MTTR) [5]. failure on demand under a perfect detection scenario Pfd is easily
It is important to note that most vehicles remain in use for calculated as
extended time periods often exceeding 1015 years. It is also pd s
Pfd t 1y1Rt yPfd t 3
noteworthy that the human factor is involved in the key decision-
making and repair processes. Therefore, the modeling of the where pd
Pfd is the probability of failure on demand when all the
safety system is further complicated by the list of factors failures are detected (perfect detection), 1 y is the portion of the
classied as imperfections below: population which would not repair the failed system due to
economic reasons or a failure to notice warning light.
1. Detectability of the system failure is less than 100%. It is important to note that in certain cases, the function R(t)
2. Emergency system power supply (e.g., vehicle battery) may fail can be represented by a mixture of statistical distributions to
during the crash. reect the change in failure rate, for example in accordance to the
3. Warning light can go unnoticed after the fault is detected (the bathtub curve (see for example [21]). In those cases R(t) should be
human factor). addressed accordingly in the modeling process.
4. Repairs may not be initiated due to nancial, timing, or other
considerations. As an example, vehicle age and market value
2.3. Dynamic modeling
become important factors in repair decision, where percent of
repaired vehicles diminishes as vehicle age increases and
market value decreases. Rather than separating the whole population into two sub-
groups, let us assume that the decision as to whether to repair a
detected failure is made every time the warning signal appears
Accounting for the factors above makes the modeling more
with the probabilities y and 1 y, respectively. This decision is
challenging, but also more realistic.
considered to be independent of previous repair decisions for this
system (e.g., the system that has been repaired the rst time
2.2. System availability might not be repaired the second time). In addition, for the
moment we consider that y does not depend on time (this
If no repair of the safety system is considered, the reliability of assumption will be relaxed later). While the difference is subtle it
the system by the end of the design life is R(T). When system results in a need for dynamic (i.e., state-space based) reliability
failures follow exponential distribution, they can be represented modeling.
by the constant failure rate l. If all failures are detected and To this end, Markov analysis is widely used in modeling
repaired with a mean time to repair MTTR=1/m, where m electronics reliability [12], but it has two well-recognized
corresponds to a repair rate [20]. The unavailability (probability deciencies. The rst deciency is related to the large number
that the system will not work on demand) can be well of possible system states (on the order of kn, where k is the
approximated by a steady-state solution, providing the following number of possible states for each component and n is the
estimate: number of such components) that are needed to represent all
possible permutations. Although this issue can be mitigated by
s l
Pfd 1 the use of symmetry and hierarchical (nested) calculations, it
lm
remains an important limitation. The second limitation is a
s
where Pfd is the probability of failure on demand (fd), the steady- natural use of constant transition rate (following exponential
state solution (s). distribution) due to the Markovian property.
However, taking into account the considerations listed in To illustrate the dynamic solution, let us present the system
Section 2.1, Eq. (1) might represent the system unavailability described in Section 2.2 as a state-space solution (Fig. 2), i.e.,
inaccurately. Due to the fact that both detection and repairs are Markov chain.
taking place less than 100% of the time, the real time-dependent Initially the system is in state A, which corresponds to a fully
probability of failure on demand Pfd(t) will lie somewhere operational safety system with the detection system functioning
between this lower boundary and the unreliability of the system as intended. Transitioning from state A to state B indicates that
at the end of its life that never undergoes repairs
s nr
Pfd rPfd t rPfd t 1Rt 2
nr
where Pfd is the probability of failure on demand for the system,
which does not undergo repairs, R(t) the reliability of the system
under no-repair condition (conventional reliability function).
Importantly, those bounds (2) are quite wide, which provides
motivation for a more rened analysis.
Due to the factors listed as imperfections in Section 2.1, a
certain percent of the vehicle population will not be subject to
repair after the fault has been detected. In the simplest, static
scenario, the total population of the system can be separated into
two subpopulations based on whether a detected failure will be
repaired or not. Let us dene y as the percent of the population of
the vehicles subject to repair. This percent would include the
drivers responding to the warning light as opposed to those who
would ignore or not notice it. Next, we can consider the combined
effects of failure of the detection system and the presence of a Fig. 2. Markov chain for a system with imperfect detection (yportion of the
subpopulation that does not notice the warning or fails to act vehicles subject to repair).
ARTICLE IN PRESS
A. Kleyner, V. Volovoi / Reliability Engineering and System Safety 95 (2010) 606613 609
the main function has failed with the corresponding transition system is operating normally (the token is in Det system OK
rate l, but the detection system is still operational, and hence the place) when the main system fails (the token moves to System
driver receives a warning. On the other hand, if the detection sub- failed place), then two transitions are enabled at the same
system fails rst (with the corresponding transition rate n), the time (to Ready to repair and No repair). Just like in the
system transitions to state D (note that detection sub-system is Markov model (see Fig. 2) the decision is modeled by assigning
considered to be non-repairablethis assumption can be relaxed, those two transitions exponential rates cy and c(1 y),
e.g., if periodic inspections are introduced). The transition from respectively. However, when the detection system fails rst, the
D to E corresponds to the failure of the main function of the safety corresponding token moves to Det Sys Failed and the inhibitor
system after the detection system has failed, so this failure cannot originating in the place prevents the transition of the system
be detected, and therefore is not repaired, hence there is no token to the Ready to repair place (the transition becomes
reverse transition from E to D. Once the system transitions into disabled).
state B, a decision is made whether to repair the system or not
with the probabilities y and 1 y, respectively. This decision is 2.4. Modeling time-dependent parameters of the problem related to
modeled using a ctitious transition c that is very large (a specic a vehicle aging
value of c is immaterial as long as it is several orders of
magnitude larger then the other transition rates in the model).
In a real world the owners repair priorities often change with
Assigning a transition rate cy from state B to state C (ready to
the vehicle age. With declining vehicle market value, the number
repair) and the rate c(1 y) from state B to state E (non-repairable
of repairs considered by owners as non-essential is increasing.
system failure) ensures that B is a vanishing (transitional) state.
Since cost of repair remains virtually the same while vehicle value
A choice between repair (state C) and non-repair (state E) occurs
declines, the number of owners who choose to ignore warning
with the desired probabilities.
lights steadily increases with vehicle age when the problem is
Stochastic Petri nets are capable of addressing the main
considered non-critical to a vehicle performance.
shortcomings of Markov chains. As mentioned in the Introduction,
In order to model that phenomenon we will introduce here the
SPN focuses on modeling component states that comprise the
renewal attrition function as a ratio of the number of repairs to
system, so that the state of the system can be inferred from the
the number of failures:
states of its components rather than dened explicitly as required
by Markov state space. Places in SPN are similar to Markov states, #Parts Repaired
rt t 4
but SPNs tokens can represent individual components of the #Parts Failed
system and therefore allow differentiation among the state The typical renewal attrition function will have a shape
spaces for those components. As a result, marking (i.e., combined presented in Fig. 4. Where TLife is the expected vehicle life (e.g.
position of all the tokens) provides a means to describe the 10 years, 15 years, etc.), TW is the warranty term duration (e.g.,
system as a whole implicitly, without the need to explicitly depict 3 years, 5 years, etc.). The assumption is made that while the
the corresponding system state, thus potentially mitigating the vehicle is under warranty all the required repairs will be
state-space explosion. Effective system modeling using SPN performed
involves its decomposition into a set of relevant entities, where (
1:0 when t r Tw
each entity does not necessarily represent a physical component rt f t when T ot r T 5
of the system, but describes a phase of operation, or environ- W Life
Fig. 3. SPN model for a system with imperfect detection and with a portion of the
vehicles subject to repair. Fig. 4. Renewal attrition function.
ARTICLE IN PRESS
610 A. Kleyner, V. Volovoi / Reliability Engineering and System Safety 95 (2010) 606613
s l
Pfd 1:31 104 10
lm
In reality both detection and repairs are taking place in less
than 100% of the time, so this probability will be somewhere
between this lower boundary and the unreliability of the system
that never undergoes the repairs at the end of its life per (2):
s
Pfd 1:31 104 o Pfd o 1R15 years 0:05 11
In order to model the imperfections listed in the Section 2.1 let
us assume y = 0.98, meaning that 98% of the population will decide
to repair the faulty system. Fig. 7 shows a comparison of dynamic
and static scenarios for this value. One can observe that dynamic
scenario shows a slightly higher probability of failure.
Qualitatively this can be explained by the fact that two
populations are separated at the beginning (static scenario); the
sub-group that makes the repairs is less likely to fail and therefore
Fig. 7. Comparison of several approaches to account for the fact that only y = 0.98 the effective, dynamic fraction of this population will be slightly
fraction of all detected failures are repaired: static (when two populations are
higher than y = 0.98. The effect is minimal for the presented values
separated in the beginning) and dynamic (when the decision is made upon
demand). and therefore can be neglected.
Please note that the results shown in Fig. 7 for dynamic model
3. Case study: automotive occupant safety system are presented using both Markov chains and SPN (obtained using
100 million Monte Carlo runs). The result for the scenario where
all detected failures are repaired is also provided for reference
The concept on an emergency, on-demand system is utilized in
purposes.
automotive safety systems and particularly in the design of an
In some instances, the difference between the static and
airbag controller unit. The original data in this case study has been
dynamic models can be more signicant. For the case where
modied to protect the proprietary nature of this information.
demonstrated reliability R(15 years)= 0.5 and y =0.5 (a hypothe-
The modern airbag controller is a complicated electronic
tical scenario), the impact will be quite noticeable (see Fig. 8).
system containing crash sensors capable of detecting various
Please note that while the Markov model provides the
types of crashes (e.g. side impact vs. front collision) and 424
description of the system as a whole, SPN focuses on system
ring loops. The number of ring loops depends on the occupant
component behavior. If the constant transition rates are used the
safety options of the vehicle such as driver and passenger airbags,
results should be identical (see for example Fig. 7, where the
side curtains, rear passenger protection, belt pretensioners and
results by SPN and Markov chains are practically indistinguish-
dual-phase deployment. On-time deployment triggered by the
able for dynamic modeling). However, it is quite difcult to
vehicle crash is a safety-critical feature of a controller [22];
directly implement into the Markov chain model static subdivi-
therefore system reliability requirements are high and, depending
sion into two populations to provide a model analogous to the one
on a specic automotive customer, could range from 0.9999 to
given in Fig. 5.
0.999999. Each modern airbag controller is equipped with a fault-
Another important advantage of SPN is its ability to model
detection circuit that detects a system failure and triggers a
transitions that have variable rates when simulation is used to
warning such as a light indicator to alert the driver. The subse-
quent action may be either to repair or replace the faulty
component [19]. Conversely, the vehicle owner may not act on
the warning due to either inability to heed the warning in a timely
manner or a conscious decision not to repair the system for
nancial or other reasons.
In order to obtain a renewal attrition function for an airbag
controller an analysis of warehouse shipping history for this
product was conducted (the details of this method are outside the
scope of this paper). The following function was obtained:
(
1:0 when t r Tw
rt A eBt when T o t r T 9
W Life
values are: P fd 8:12 104 ; 6:33 104 , and 5:28 104 , for
b = 1 (exponential), 2, and 3, respectively.
Next, let us consider the effect or attrition and focus rst on
the exponential failures (see Fig. 10).
Note that for the rst 3 years there is no difference in the
results, since r(t)= 1 within the warranty period (9). Finally, let us
observe how changing the assumptions about the failures impacts
the probability of failure on demand. Using the same assumptions
as above for the models without attrition, we can observe (see
Fig. 11) that the negative impact of attrition increases with the
value of shape function b. Indeed, even the average value of
probability of failure on demand will not always decrease; the
corresponding values are: P fd 2:54 103 ; 2:68 103 , and
2:51 103 , for b = 1 (exponential), 2, and 3, respectively.
In the cases where power source survival (vehicle battery) is a
design concern (see Fig. 1) its effect on the model can be easily
accounted for by multiplying the probability of successful airbag
deployment (1 Pfd) by the probability of battery survival during
the crash.
Fig. 9. Comparing probability of failure on demand as a function of time for
different failure models that provide the same reliability if no repairs take place.
Weibull with shape parameters b = 1 (exponential), 2, and 3.
4. Conclusions
This method can also easily accommodate the time-dependent [9] David R, Alla H. Discrete, continuous, and hybrid Petri nets. Berlin,
input variables, such as system age, which in turn may affect the Heidelberg: Springer; 2005.
[10] Chew SP, Dunnett SJ, Andrews JD. Phased mission modeling of systems with
renewal rate of the system. To add the exibility, the SPN method maintenance-free operating periods using simulated Petri nets. Reliability
can be effectively combined with traditional reliability analysis Engineering and System Safety 2008;93:98094.
techniques, such as Markov chains, standards-based reliability [11] Clavereau J, Labeau P-E. A Petri net-based modelling of replacement
prediction, block diagrams, Weibull analysis, Monte Carlo simula- strategies under technological obsolescence. Reliability Engineering and
System Safety 2009;94:35769.
tion, etc. In summary, this method provides the efcient synthesis [12] Trivedi SK. Probability and statistics with reliability, queuing and computer
of practical engineering approach with the academic rigor of the science applications, 2nd ed. John Wiley and Sons; 2002.
modern stochastic simulation techniques. [13] Dutuit Y, Chatelet E, Signoret J-P, Thomas P. Dependability modeling and
evaluation by using stochastic Petri nets: application to two test cases.
Reliability Engineering and System Safety 1997;55:11724.
References [14] Faria JA, Matos MA. An analytical methodology for the dependability
evaluation of non-Markovian systems with multiple components. Reliability
Engineering and System Safety 2001;74(2):193210.
[1] IEC 61508: Functional safety of electrical/electronic/programmable electronic
[15] Khouas A, Derieux A, FDP: fault detection probability function for analog
safety related systems, 19982000.
circuits. In: The 2001 IEEE international symposium on circuits and systems,
[2] Foucher B, Boullie J, Meslet B, Das D. A review of reliability prediction methods
ISCAS 2001, 69 May 2001, vol. 4. p.1720.
for electronic devices. Microelectronics Reliability 2002;42:115562.
[16] Jensen K. Coloured Petri nets. Basic concepts, analysis methods and practical
[3] Kleyner A, Volovoi V. Reliability prediction using Petri nets for on-demand
safety systems with fault detection. In: Martorell S, Guedes Soares C, use, vol. 1. Berlin: Springer; 1993.
Barnett J, editors. Safety and reliability and risk analysis. Taylor and Francis; [17] Volovoi VV. Modeling of system reliability using Petri nets with aging tokens.
2008. p. 19618. Reliability Engineering and System Safety 2004;84(2):14961.
[4] Product Information CG989 8-Loop Firing IC CG989 by Bosch (2006) [18] Schneeweiss WG. Tutorial: Petri nets as a graphical description medium for
/http://www.semiconductors.bosch.de/pdf/CG989_Product_Info.pdfS. many reliability scenarios. IEEE Transactions on Reliability 2001;50(2):
[5] Kleyner A. Reliability demonstration: theory and application. In: Reliability 15964.
and maintainability symposium (RAMS) Tutorials CD, January 2008. [19] Yang SK, Liu TS. Failure analysis for an airbag inator by Petri nets. Quality
[6] Petri A. Kommunikation mit Automaten. PhD thesis, Institut fur Instrumen- and Reliability Engineering International 1997;13:13951.
telle Mathematik, Schriften des IIM, 1962. [20] OConnor P. Practical reliability engineering, 4th ed. Wiley; 2003.
[7] Sadou N, Demmou H. Reliability analysis of discrete event dynamic systems [21] Kleyner A, Sandborn P. A warranty forecasting model based on piecewise
with Petri nets. Reliability Engineering and System Safety 2009;94:184861. statistical distributions and stochastic simulation. Reliability Engineering and
[8] Symons FJW. Modelling and analysis of communication protocols using System Safety 2005;88:20714.
numerical Petri nets. PhD thesis, Department of Electrical Engineering [22] Teng S-H, Ho S-Y. Reliability analysis for the design of an inator. Quality and
Science, University of Essex, Essex, England, 1978. Reliability Engineering International 1995;11:20314.