Vous êtes sur la page 1sur 50

1

The next 2 slides cover some of the terminology you will encounter in this topic.
Stochastic means involving or subject to probabilistic behaviour. That is, the
behaviour cannot be predicted exactly. A stochastic model is contrasted to a
deterministic model where the relationships between the variables affecting the
system behaviour are known and defined.
Component failures represented as a point process tell us when on a time line the
failure events occurred and for a single component the time between the failures
gives us the times to failure for the component. To model these times with a
probability distribution the failures must be independent and identically distributed.

System failures represented as a point process give us when on the time line the
failure events occurred but tell us nothing about the time to failure of the individual
components causing the system failure.
If a system with a large number of components is in a steady state it can be modelled
using a homogeneous Poisson Process. Homogeneous in this context means that the
expected number of failures in any interval is constant and the distribution of the
time between failures is exponential.

The close conjunction of failures might be a chance phenomena but the


independence of failures occurring in quick succession needs be ascertained to gain a
better appreciation of the system or component reliability.
Independent means the failures are not related.
One failure does not contribute to the other or they do not have a common cause.
I have given some examples here of situations where failures may not be
independent.

ENGG960 Peter Gordon UOW

Identically distributed means that all components drawn from the population have
the same probability of surviving any given period. This must hold for components
concurrently in operation or components placed in operation sequentially over a
period of time.
I have listed some situations in which components lives may not be identically
distributed.
Note that although it is not helpful for the analysis we are doing in most situations
we would like components to become increasingly reliable. In fact as a maintenance
engineer you probably spend a lot of your time trying to establish positive trends in
component and system reliability.

This slide shows a diagrammatic representation of a stochastic point process for a


system that comprises 3 components. The failure of any of the components causes a
system failure.
If the component failures are independent renewal processes the point process for
the system is termed a superimposed renewal process.
The average rate of occurrence of failure (ROCOF) for each component is the inverse
of that components mean life. (MTTF).
The average rate of occurrence of failure for the system is the sum of the average
ROCOF of the components.

The data from the plot was derived from simulation of a system with 1000
components in series. The components fail by wear out mechanisms with the
individual component lives normally distributed.
The system is not preventively maintained. Components are replaced on failure. The
system commences operation with all components in new condition.
The plot shows that the ROCOF increases from zero to reach a constant average rate
after about 150 days.
The apparent initial deteriorating trend is characteristic of a superimposed renewal
system operated from new and is not indicative of reliability problems with the
system components.

10

This plot magnifies the apparent initial deteriorating trend from the previous plot by
showing the first 300 failures only.
The regression line of best fit shown is a power function of the operating time.
It can be seen more clearly here that the average ROCOF in this time period is not
constant. (The line of best fit would have to be a straight line)
The function for the ROCOF is obtained by differentiating the power function for the
cumulative number of failures.
The failure arrivals form a non-homogeneous Poisson process and with this form of
ROCOF (intensity) function the process is often referred to as a power law process.
If there was no trend we would expect the average ROCOF to be constant. I.e. the
power index would be 1.
Clearly in fitting a power function to the plotted data we will never get an index of
exactly 1.
To test whether the non-homogeneous Poisson process is in fact the most
appropriate model we can use the AMSAA model.
The application of this trend test is described in Ebeling (Ch 16.5)

11

12

Note using the MLE for b gives a value of b = 3.29


This does not change the conclusion.

13

14

This is the similar to the diagram given in OConnor .


It shows a point process for a single component.
The variable capital Xi gives the individual times to failure.
The variable lower case xi gives the cumulative time to each failure from the start of
observation.
In this example observation continues beyond the last observed failure. The
observation interval is designated x0.
The test compares the mean of the cumulative times to failure for the sample data
with the mean that would be obtained if the times to failure were drawn from the
same population. (identically distributed)

ENGG960 Peter Gordon UOW

15

If there is no trend the average of the cumulative times to failure will equal half the
period of observation and the Std Dev will be given by the period of observation
divided by the square root of 1 upon 12n where n is the number of failures.
The statistic U is a standard normal variable.*
If U is positive it means that more failures were observed in the second half of the
period of observation and the times to failure are decreasing . This would be a bad
trend. Ie it could indicate the component is becoming less reliable.
If U is negative it means that more failures were observed in the first half of the
period of observation and the times to failure are increasing = good trend.
However the indicated trend may not be statistically significant.
Even in the situation where there is no trend the average of the cumulative times to
failure for any sample will show variation from the population mean.
We know from the normal distribution that there is a 15.87% chance that the variable
(in our case the average of the cumulative times to failure) will have a value that is
more than 1 standard deviation from the mean. (U>1) This means that if we
concluded from the fact that U = 1 that the data was trending we would be making an
error 15.87% of the time.
*We know from the Central Limit Theorem that if Sn is the sum of n mutually
independent random variables, then the distribution function of Sn is wellapproximated by the normal distribution

ENGG960 Peter Gordon UOW

16

The probability we are making an error in rejecting the null hypothesis is termed the
level of significance.
In our case the null hypothesis is that the data is not trending.
15.87% would normally be considered too large for a level of significance.
A value of U of 1.645 or -1.645 gives a 5% level of significance. In most cases this
would be an acceptable level at which to reject the null hypothesis.

ENGG960 Peter Gordon UOW

17

If the period of observation is stopped at a failure the statistic is calculated in the


manner shown.

ENGG960 Peter Gordon UOW

18

The test was shown at the 5% significance level (U>1.645)


The probability of getting a value of the test statistic of 15.89 is however effectively
zero.

19

In a fleet situation we may record the times between failures for each individual
system but if our approach to managing and maintaining each system is the same we
will be interested in the overall picture of trends for the fleet.
In this situation we can use a Laplace test for the pooled data.

20

21

This method uses the data from all the samples to calculate the average number of
repairs that could be expected to have occurred by a given mileage. This plot is
described as the Nelson-Aalen plot in most references and the mean cumulative
function is also referred to as the cumulative intensity function. If the slope of the
plot is increasing with age the population rate of occurrence of failure is increasing.
(Sad trend)

22

OConnor explains that exploratory data analysis is a simple graphical technique for
searching for connections between time series data and explanatory factors. In the
reliability context, the failure data are plotted on a time line, along with other
information. For example, overhaul intervals, seasonal changes, or different operating
patterns can be shown on the chart. This diagram from OConnor depicts point
processes for 6 sub-systems and a superimposed point process for the overall system.
The system is overhauled every 1000 hours and from visual examination you can see
failures clustered after each overhaul indicating the overhaul is actually adversely
affecting reliability and also failures paired in a number of places for the sub-systems.
This could indicate that the failures are not independent of each other. A pattern like
this would warrant further investigation of the nature and cause of the failures.

23

This diagram is reproduced from OConnors text. OConnor shows the value of the
Laplace statistic for each of the parts comprising the system and also for the
superimposed system at the bottom.
You are invited to determine whether the data-sets are trending from visual
examination of the point process for each component and for the superimposed
process for the system. Compare this with OConnors assessment based on the U
value.

24

25

26

This is a diagrammatic representation of a stochastic point process for a pump that is


replaced on failure. The pump has been replaced 5 times since first installation. The
crosses indicate the times the failures occur measured from the time of the original
installation. The times to failure are shown in the table in chronological sequence.
In this type of situation it could take many years to gather failure data for a
component

27

28

29

The Laplace test is applicable where the times between failure are exponentially
distributed. The Lewis Robinson test is more generally applicable and is calculated by
dividing the Laplace statistic by the coefficient of variation for the sample. The
coefficient of variation is the standard deviation divided by the mean. For the
exponential distribution the coefficient of variation is equal to one. For the normal
distribution or other distributions associated with wear out failure mechanisms the
coefficient of variation will be less than 1. These distributions are said to be underdispersed with respect to the exponential. The Laplace test applied to data from a
wear-out failure mechanism will give a result biased towards accepting the null
hypothesis.
The coefficient of variation of a sample gives an indication of the type of failure
mechanism causing the component to fail. If it is clearly less than 1 this indicates that
the data is under-dispersed with respect to the exponential distribution and we are
dealing with a wear-out failure mechanism.

30

Although you have the answer I suggest you attempt this calculation yourself.
Sample avg of cumulative times to failure n-1x / n-1 = 358
Mean = x0/2 = 798/2 = 399
SD = x0 (1/(12(n-1)) = 798 x (1/(12x11) = 69.46
2

Cv =
/ = 0.216
1
We would reject the null hypothesis and conclude the data was trending. The level of
significance is 0.29%. This means that if we conclude the data is trending we will be
wrong 0.29% of the time. This is well within the 5% level of significance normally
used to test such an hypothesis.

31

32

This is a time series depiction of the data for our example. A time series depiction of
the times to failure provides another way to check for trend. With no trend the linear
regression line would be horizontal. A statistical test can be applied to test the
significance of the slope of the regression line.
This test is described in the E-reading reference (Walpole and Myers)
It gives a t-test value of 4.18. with (n-2) degrees of freedom. There is 0.09%
probability of getting this value if the population regression slope was 0 suggesting
strong evidence that the data is trending. This result is in line with that from the
Lewis-Robinson test. (0.29%)
Note that the MS Excel data analysis function Regression calculates the t-test value.
A section of the Excel output table showing the t Stat for the slope is pasted over the
chart on this slide. The P-value given by Excel is for a 2 tail t-test.
The P-value for this test is half the P-value shown in the excel table. (i.e. 0.0018 / 2)

33

34

Another situation that you may encounter quite frequently is where a number of
identical components are in operation concurrently.
In this situation more renewal data will be available but we must be confident that
the loads on the components are similar.

35

A fleet situation is similar to the previous case of many components in one system.
In a fleet we are looking at one or more identical components in concurrent
operation in many systems
This type of situation could generate lots of renewal records.

36

37

When we test for trend we are testing to see whether the times to failure are
dependent on the order in which they occur.
We test for trend when the data is in chronological sequence.
If we have a point process for a single component the sequence defined by
installation date is the same as that defined by replacement date.
If however we have identical components sourced from the same supplier that are
operating concurrently we get a different chronological sequence defined by the
installation dates from that defined by the replacement dates.
These are shown on the next slide.

38

If we sequence the ages at renewal based on the installation date we are in effect
looking to see whether the times to failure are dependent on the date the
components were supplied or installed. There are a number of things that occur in
supply, repair or installation of a component that could affect the life that it achieves.
If these factors are random the mean life will not be affected. Variability in repair and
installation processes is reflected in the variance of the times to failure. If over a
period of time however, there are systemic changes to methods or standards related
to supply, repair or installation the mean life is likely to show an increasing or
decreasing trend.
It is not so obvious how the times to failure could be dependent on the date the
components were replaced as this is after the event. In the example above the times
to failure are reasonably tightly banded and consequently the sequences are very
nearly the same.
In grouping data in this manner for multiple single component renewal processes we
are essentially analysing the data as a single sequence originating from a supplier or a
repairer. The linear regression trend test can be used to test for trend in the sequence
and also the Lewis-Robinson test can be applied to the combined sequence as if it
were a single process.

39

The P-Value is half the value shown in the excel table for a 2 tailed t-test. (0.37)

40

41

The LR test confirms that the apparent deteriorating trend is not statistically
significant. This result is very close to that from the linear regression trend test (Pvalue = 18.5%)
Note the P-value is the lowest level of significance at which the observed value of the
test statistic is significant.

42

43

From the statistical test the evidence of trend is borderline with a significance level
in rejecting the null hypothesis of 4.99%

44

The Lewis-Robinson test provides marginal evidence of trend. Note that had we
applied the Laplace test we would have accepted the null hypothesis of no trend.
The reliability is showing a deteriorating trend but the statistical significance of the
trend is borderline and would warrant continued observation of the system.

45

46

47

48

49

50

Vous aimerez peut-être aussi