Vous êtes sur la page 1sur 11

1 Copyright 2012 by ASME

A BAYESIAN APPROACH TO ASSESSMENT OF IN-LINE-INSPECTION TOOL


PERFORMANCE


Roger McCann
Applus RTD
Houston, Texas, USA
Richard McNealy
Applus RTD
Houston, Texas, USA


Harvey Haines
Kiefner and Associates, Inc.
Worthington, Ohio, USA



ABSTRACT
This paper discusses a method based on Bayes' Theorem
to estimate the probability that performance of an In-
Line-Inspection tool satisfies stated sizing accuracy
specifications. This leads to a new method for accepting
or rejecting tool performance that is entirely different
from methods based on confidence intervals.

INTRODUCTION
Typical sizing accuracy specifications for an In-Line-
Inspection (ILI) tool state that 80% of a tool's
measurements (certainty) for wall loss are within 10%
nominal wall thickness (tolerance). When ILI tool
performance is validated by direct examination and the
ILI tool predictions are compared with actual conditions
there is an accepted level of confidence associated with
the validation samples, e.g. a confidence level 95%.

There are four basic reasons why an ILI tool's
performance on a particular tool run may not meet
specifications:
1. The tool malfunctions.
2. The tool run in question is one of the 5% of all tool
run set of measurements in which the stated sizing
accuracy specification is not satisfied.
3. The shape or morphology of the corrosion is one in
which the ILI vendors specification does not hold.
4. Operational conditions, e.g., tool speed.

A tool malfunction (1) can usually be detected by absurd
data, data dropout, or post-run inspection of the tool. In
this paper we assume the tool has functioned properly and
the desire is to determine whether the particular tool run
satisfies the stated sizing accuracy specifications (2) or
not. If the corrosion is of an unusual morphology (3) then
the tolerance may need to be adjusted to make the sizing
accuracy specifications acceptable for the corrosion
morphology encountered, examples are pitting or pinhole
corrosion. Our experience is that tools usually fail to
perform because the shape of the corrosion is different
than expected, but we have also encountered corrosion
that was not sized properly and required regrading by the
ILI vendor.

It is impossible to verify whether the tool satisfies the
stated sizing accuracy specifications on any specific tool
run without direct verification of at least 80%, and
possibly 100%, of the tool's measurements. Such
verification in real-world situations is impossible due to
cost and time constraints. This paper presents a method
based on Bayes' Theorem to estimate the probability that
the performance of the ILI tool satisfies the stated sizing
accuracy specifications. This leads to a new method for
accepting or rejecting tool performance that is entirely
different from methods based on confidence intervals.


Proceedings of the 2012 9th International Pipeline Conference
IPC2012
September 24-28, 2012, Calgary, Alberta, Canada
IPC2012-90139
2 Copyright 2012 by ASME
Sizing accuracy specifications
Sizing accuracy specifications for an ILI tool have three
aspects: (Ref [5, 7.2.4])
1. Tolerance, e.g., measurements are accurate to within
10% of nominal wall thickness.
2. Certainty, i.e., the proportion, p
0
, of measurements
within the specified tolerance. Typically, p
0
= 0.8.
3. Confidence level (1 - )100% indicating the
confidence, typically 95% ( = 0.05), at which the
certainty level is satisfied for a given tolerance.

The second tool specification should actually be a lower
bound on the certainty, not an absolute value for the
certainty. It is virtually impossible to determine whether
certainty is precisely a predetermined value. For
example, it is impractical to distinguish between 0.8 and
0.81, let alone between 0.8 and 0.8001. Thus is, the
second tool specification should be

2'. Certainty p
0


Using Specification 2' instead of Specification 2 is
essential for the probabilistic method described below.

This brings us to an ambiguity in the term "certainty". It
can refer to overall tool performance or to tool
performance on the specific tool run under consideration.
In this paper "certainty" always refers to tool performance
on the specific tool run being assessed.

The (1 - )100% confidence level means that the tool
satisfies the stated sizing accuracy specifications on (1 -
)100% of tool runs, but does not satisfy the sizing
accuracy specifications on 100% of tool runs the tool. It
is impossible to verify whether on any specific tool run
the tool satisfies the stated sizing accuracy specifications
without direct verification of at least (1 - )100%, and
possibly 100%, of the tool's measurements. Such
verification in real-world situations is impossible due to
cost and time constraints. This paper presents a method
based on Bayes' Theorem to estimate the probability that,
given a random sample of verified ILI measurements, the
performance of the ILI tool satisfies the stated sizing
accuracy specifications. Specifically, this paper presents
a method based on Bayes' Theorem to estimate the
probability that certainty p
0
for given values of
tolerance and confidence level.

Assumptions
We assume we have:
Data from an ILI tool run.
A stated value for tolerance (necessary to determine
the number of measurements within tolerance.)
A random sample of size n of verified ILI tool run
data with exactly m measurements within tolerance
(usually direct examination measurements of
corrosion depth.)
A stated confidence level (necessary to evaluate a
term in Bayes' Theorem.)
Certainty is a random variable with unknown
distribution (to be estimated.)
The assumption that the data sample is random is
essential for the validity of this method as it is for the
validity of many statistical methods. The purpose of a
random sample is to obtain a data sample that is
(hopefully) representative of the entire data set.
Unfortunately, this does not always happen. Obtaining a
random sample is a difficulty in many statistical analyses,
especially assessing ILI performance. A discussion of
this problem is beyond the scope of this paper and is
deferred to a future time.

Notation and Terminology
Measurements within tolerance are called successes
or successful.
P(x) denotes the probability of x.
Given events A and B, P(A|B) denotes the probability
of event A given that event B has occurred.

Consider the following events H and E
m
:

H: certainty p
0

E
m
: exactly m of n measurements are successful
(i.e., are within tolerance)

The event H can be viewed as a hypothesis we want to
confirm and the event E
m
can be viewed as supporting
evidence for that hypothesis. We use "the probability that
certainty p
0
given the results of the sample of tool run
data" as an estimate of "the probability that certainty
p
0
". That is, we use P(H|E
m
) as an estimate of P(H). It
needs to be emphasized that if we took a different sample
of tool run data, it is possible (in fact likely) that we will
get a different value for P(H|E
m
), while P(H) remains the
same. Thus, this method is like all other statistical
methods in that if you change the data sample you are
likely to change the result.

We will also need the compliment H
c
of event H:

H
c
: certainty < p
0

3 Copyright 2012 by ASME
to apply Bayes' Theorem. Notice that H and H
c
are
mutually exclusive (the occurrence of either event
precludes the occurrence of the other) and exhaustive
(covers all possibilities for p).

Bayesian method
We have
P(H|E
m
) = probability that certainty p
0
given exactly
m of the n measurements are successes
P(E
m
|H) = probability that exactly m of the n
measurements are successes given certainty
p
0

P(E
m
|H
c
) = probability that exactly m of the n
measurements are successes given certainty <
p
0

Bayes' Theorem (Ref. [1], page 26) gives

P(E|E
m
) =
P(E
m
|E)P(E)
P(E
m
|E)P(E) + P(E
m
|E
c
)P(E
c
)


P(H|E
m
) can be interpreted as follows: If we hypothesize
certainty p
0
(event H) and take a very large number of
tool runs each of which has a random sample of
measurements with exactly m of n measurements in
tolerance (event E
m
), then P(H|E
m
) is an estimate for the
proportion of those tool runs on which the hypothesis is
valid. In other words, P(H|E
m
) estimates the probability
of selecting at random a tool run with certainty p
0
from
those tools runs. This is NOT the same as saying P(H|E
m
)
estimates the probability that certainty p
0
on a specific
tool run. Appendix 1 discusses how to evaluate the terms
on the right side of the equation for P(H|E
m
).

Acceptance and rejection thresholds
In order to accept or reject tool performance we need
threshold probabilities above which we accept tool
performance and below which we reject tool
performance. If we accept tool performance when A
P(H|E
m
), for some number A, then roughly speaking the
tool run has an A*100% chance of being acceptable
(given m successful measurements). A (1 - )100%
confidence level roughly means that a random tool run
has (1 - )100% chance of being acceptable. These
twoconditions are not the same, but they suggest that if
there is a (1 - )100% confidence level, then an
appropriate acceptance threshold is
(1 - ). For example, for a 95% confidence level, we
accept tool performance if 0.95 P(H|E
m
). This
acceptance criterion is justified at the end of this
discussion when we discuss the optimality of the method.

There does not appear to be an analogous rejection
threshold based on tool specifications. However, it seems
reasonable to reject tool performance if certainty < p
0
is
more likely than p
0
= 0.8, 95% confidence level, rejection
and acceptance thresholds of 0.5 and 0.95, respectively).
That is, we reject tool performance if P(H|E
m
) 0.5. A
smaller rejection threshold than 0.5 seems unreasonable,
and a case could certainly be made for a larger value. For
example, if we decide to reject tool performance if p
0

certainty is more than twice as likely than certainty p
0
,
then the rejection threshold would be 0.67.

Example
Suppose the certainty on a tool run is p
0
= 0.8 and we
have a random sample of verified ILI measurements of
size n = 25. If we use the Bayesian method to calculate
P(H|E
m
) for various values of the number m of successful
measurements we get the probabilities in Table 1.
Probabilities are smaller than 0.31 when m < 16 and
greater than 0.99 when m > 22.


Table 1: P(H|E
m
) for n = 25, p
0
= 0.8, 95% confidence level

If we take the acceptance threshold to be 0.95, then we
accept tool performance if 21 m. If we lower p
0
slightly
to 0.78, then P(H|E
m
) becomes 0.95 when m = 20 so that
we can also accept tool performance at this slightly
smaller acceptance threshold. If we increase p
0
to 0.82
then P(H|E
m
) becomes 0.95 when m = 21. This allows us
to accept tool performance at a higher minimum certainty
of 0.82 when m = 21. These examples show the
versatility of the method to assess individual cases.
(Notice that p
0
rounds to 0.8 in both cases.)

Since the actual certainty on this tool run is 0.8 (by
assumption), the binomial distribution can be used to
calculate the probability of randomly obtaining a given
number m of successful measurements. In particular, the
probabilities of obtaining 20 and 21 successful
measurements are 0.20 and 0.19, respectively. Thus, we
P(H|E
m
)
m Probability of certainty 0.8
16 0.31
17 0.55
18 0.74
19 0.87
20 0.93
21 0.97
22 0.99
4 Copyright 2012 by ASME
are almost equally likely to obtain 21 successful
measurements as the expected number of 20
(= 0.8*25). Yet, according to Table 1, we accept tool
performance if m = 21 and do not accept tool
performance if m = 20. This shows that we must not be
overly dogmatic in applying this method, or, in fact, the
confidence interval method to which similar examples
apply. Judicious reasoning, not application of hard
and fast rules, must accompany calculations to assure
proper assessment of ILI tool performance. As will be
discussed in the next section, even the interpretation of p
0

can greatly influence the acceptance or rejection of a
tool's performance.

If we take the rejection threshold to be 0.5, then we reject
tool performance if P(H|E
m
) < 0.5 According to Table 1,
we reject tool performance if m 16. Just as with
confidence interval methods there are values of m (in this
example 17, 18, 19, and 20) for which we can neither
accept, nor reject, tool performance. This is only to be
expected. It would be unreasonable to reject tool
performance of the probability were 0.01 less than the
acceptance threshold or to accept tool performance if the
probability were 0.01 greater than the rejection threshold.
If m = 17, then P(H|E
m
) = 0.55 > 0.5 so that we cannot
reject the tool. However, if p
0
is increased to 0.81, then
P(H|E
m
) < 0.50 and we can reject tool performance.
(Notice that p
0
rounds to 0.8) Of course, if the rejection
threshold were greater than 0.55, then we would have
rejected tool performance when p
0
= 0.8.

Given n, m, p
0
, and confidence level it is possible to
calculate the probability density function of H|E
m
and the
cumulative probability density function. These are shown
in Figure 1 for this example.


Figure 1a: probability density function of H|E
m

(n = 25, m = 22, p
0
= 0.8, 95% confidence level)

Figure 1b: cumulative probability density function of H|E
m

(n = 25, m = 22, p
0
= 0.8, 95% confidence level)
Comment on the previous example
The previous example shows that the meaning of 0.8 for
the value of p
0
is important. Does 0.8 mean that the value
is rounded to 0.8 or that the value is precisely 4/5? The
above example shows that there is considerable leeway in
accepting or rejecting tool performance when p
0
is a
rounded valued compared to when it is a precise value.

Comment on judicious reasoning
An extreme example of judicious reasoning, not
application of hard and fast rules, to assure proper
assessment of ILI tool performance occurs in the typical
setting (p
0
= 0.8, confidence level = 95%) when n = 27
and the acceptance threshold is 0.95. If there are m = 22
successful measurements, then P(H|E
m
) = 0.9499. If we
rigorously apply the acceptance threshold, we reach the
questionable conclusion that we do not accept tool
performance.

Comparison with confidence interval methods
API 1163 (Ref [5, page 36] gives confidence intervals as
a possible method for assessing certainty. Paradoxically
many (1 - 2)100% confidence intervals for binomial
distributions are not truly (1 - 2)100% confidence
intervals. They are only approximate (1 - 2)100%
confidence intervals. Their coverage (true proportion of
(1 - 2)100% confidence intervals that actually contain
the certainty) varies depending on values of m, n, and p.
This is due to the binomial distribution not being
continuous like a normal distribution. Published papers,
including Refs. [3] and [4], show that it is possible for
(1 - 2)100% confidence intervals to have coverage that
is considerably less than (1 - 2)100%. Since assessment
of ILI data is related to safety issues, the authors believe
that the coverage should be at least the nominal coverage
of (1 )100% for assessment of ILI data. The Clopper-
Pearson confidence interval is the only confidence
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
P
r
o
b
a
b
i
l
i
t
y

D
e
n
s
i
t
y

F
u
n
c
t
i
o
n
P
0
Probability Density Function For H|E
m
5 Copyright 2012 by ASME
interval known to the authors with this property.
Fortunately it is easy to calculate the endpoints of a
(1 - 2)100% Clopper-Pearson confidence interval (p
L
,
p
U
) in Excel using the worksheet function for the inverse
beta distribution:

p
L
= BETAINV(, m, n m + 1) for 0 < m
= 0 for m = 0

p
U
= BETAINV(1 , m + 1, n m) if m < n
= 1 for m = n

If (p
L
, p
U
) is a Clopper-Pearson (1 - 2)100% confidence
interval for an ILI tools certainty p, then p
L
< p with
(1 - )100% confidence level and p < p
U
with (1 - )100%
confidence level. A tool's performance is accepted with
(1 - )100% confidence level if p
0
< p
L
and rejected with
(1 - )100% confidence level if p
U
< p
0
. In particular, we
use the end points of a 90% (2 = 0.10) confidence
interval to accept or reject ILI tool performance at a 95%
confidence level. This means that acceptance results
for a 90% confidence interval are roughly comparable
to an acceptance threshold of 0.95 for the Bayesian
method. Straightforward calculations with 2 = 0.10
show that p
U
< 0.8 when m 16 and 0.8 < p
U
and 23 m.
Thus we reject tool performance if m 16 and accept tool
performance if 23 m. For this example rejection of tool
performance by this confidence interval method and the
Bayesian method are the same, but acceptance of tool
performance by the Bayesian method occurs for two more
values of m than for the confidence interval method.

Thoughtless application of either confidence intervals or
the Baysian method can lead to acceptance of tool
performance based on very small sample sizes. When
applying either technique the probability of having all
measurements out of tolerance needs to be considered.
For example, suppose the certainty is 0.8. If the
confidence level is 95%, then 5% of all runs do not meet
the certainty specification. The probability of a sample of
size n having all of its measurements within tolerance is
(0.8)n. The sample size n needs to be so large that the
probability of having all measurements within tolerance is
less than the probability of the certainty specification not
being met. That is, n needs to be so large that (0.8)n is
less than 0.05 (equivalent of 5%). This means 14 n.
Consequently for p
0
= 0.8 and confidence level = 95% we
need to have a sample size greater than or equal to 14.

The Table 2 compares the number of successful
measurements for rejection and acceptance of tool
performance by the Bayesian method (rejection and
acceptance thresholds of 0.5 and 0.95, respectively) and a
method based on the Clopper-Pearson 90% confidence
interval for the standard tool specifications of p
0
= 0.8 and
95% confidence level.


Table 2 Number of successful measurements for rejection and
acceptance of tool performance
(rejection and acceptance thresholds of 0.5 and 0.95,
respectively, for Bayesian method)
(p
0
= 0.8, 95% confidence level)

The minimum number of successful measurements in
Table 2 for acceptance of tool performance by the
Bayesian method ranges from 1 to 4 fewer than for the
confidence interval method and roughly increases with
sample size. The maximum number of successful
measurements for rejection by the confidence interval
method is the same or one smaller for the Bayesian
method with a rejection threshold of 0.5. Thus, rejection
criteria are roughly the same for the two methods. The
primary difference comes with the minimum number of
successful measurements for acceptance of tool
performance. The condition for acceptance of tool
performance by the Bayesian method in this example is
always less demanding for the confidence interval
method. In fact, if the acceptance threshold for the
Bayesian method is increased to 0.99 in this example, the
number of required successful measurements for
acceptance is still the same or one fewer than the number
for the confidence interval method.

Confidence intervals in API 1163
API 1163 gives no details as to how confidence intervals
can be used to construct confidence intervals as a method
for assessing certainty. API 1163 only gives a table
(Table 9 on page 37) of 95% confidence intervals for
Rejectif AcceptIf Rejectif AcceptIf Rejectif AcceptIf Rejectif AcceptIf
n m m m m n m m m m
14 8 12 8 14 33 22 27 22 30
15 9 13 9 15 34 23 28 23 31
16 10 14 10 16 35 24 29 24 32
17 10 15 10 17 36 24 30 25 33
18 11 15 11 17 37 25 31 25 34
19 12 16 12 18 38 26 31 26 34
20 13 17 13 19 39 27 32 27 35
21 13 18 13 20 40 27 33 28 36
22 14 19 14 21 41 28 34 28 37
23 15 19 15 22 42 29 35 29 38
24 16 20 16 23 43 30 35 30 39
25 16 21 16 23 44 30 36 31 40
26 17 22 17 24 45 31 37 32 40
27 18 23 18 25 46 32 38 32 41
28 19 23 19 26 47 33 39 33 42
29 19 24 19 27 48 33 39 34 43
30 20 25 20 28 49 34 40 35 44
31 21 26 21 29 50 35 41 35 45
32 21 27 22 29 51 36 42 36 45
BayesianMethod Intervalmethod BayesianMethod Intervalmethod
ClopperPearson ClopperPearson
90%Confidence 90%Confidence
6 Copyright 2012 by ASME
sample size 25. Fortunately, it is possible to determine
how most of this table could have been calculated. Given
a random sample of size n with x successes, the standard
textbook (1 )100% confidence interval (p
L
, p
U
) for a
population proportion has endpoints


p
L
= p z
/2
p(1 p)
n

p
U
= p+ z
/2
p(1 p)
n
(1)

where p = x/n and z
/2
is (1 /2)100-th percentile of
the standard normal distribution. In particular, z
0.025
=
1.96 when = 0.05. In order to prevent absurdities p
L

and p
U
are restricted to be non-negative and at most 1. It
is often overlooked that eq. (1) contains an assumption
that n is large relative to the true proportion p being
estimated and (1 p). A common requirement is

5 np and 5 n(1 p) (2)

In practice p is often replaced with p in eq. (2) because p
is usually unknown. Also, some authors replace 5 with
10. Eq. (2) is merely a rule of thumb, not an
analytically determined condition. Ref. [3] cites five
other conditions that have also appeared in textbooks to
indicate that n must be large.

Eq. (1) gives the endpoints of confidence intervals in
Table 9 of API 1163, except when p = 0 and p = 1. Eq.
(1) clearly has a problem when p = 0 or 1 because the
interval degenerates to a single point. Table 9 of API
1163 avoids this problem by some unmentioned
procedure.

Refs [3] and [4] show that the coverage of the confidence
intervals whose end points are given in eq. (1), and by
implication those in API 1163, can be significantly less
than the nominal coverage of (1 )100%. In particular,
the coverage of 95% confidence interval given by eq. (1)
when n = 25 is only 88%, not the nominal 95% for p =
0.8. Details of this calculation are given in Appendix 2.

Probabilities of false acceptance and false rejection
The probability of false rejection (Type 1 error) of tool
performance, P
FalseRejection
, is the probability that we reject
tool performance when in fact we should accept tool
performance. That is, P
FalseRejection
is the probability that
there are m
r
or fewer successes and p p
0
.

The probability of false acceptance (type 2 error) of tool
performance, P
FalseAcceptance
, is the probability that we
accept tool performance when in fact we should reject
tool performance. That is, P
FalseAcceptance
is the probability
that there are m
a
or more successes and p < p
0
. Appendix
3 shows

P
FalseRejection
=
P(E
m
|H)
m

m=0
(1-p
0
)(n+1)


P
FalseAcceptance
=
P(E
m
|H
c
)
n
m=m
a
p
0
(n+1)


Table 3 gives values of P
FalseRejection
and P
FalseAcceptance
for
various values of n in the typical setting p
0
= 0.8 with
95% confidence level and rejection-acceptance thresholds
of 0.05 and 0.95, respectively. P
FalseRejection
does not
decrease monotonically because the values are very small
and m
r
does not decrease monotonically. The important
observation is that P
FalseRejection
is less than 0.004. This
means that for practical purposes the likelihood of
rejecting tool performance when it should be accepted is
negligible. The case for P
FalseAcceptance
is slightly different.
Values of P
FalseAcceptance
are small, but not necessarily
negligible.



Table 3 Probabilities of False Acceptance and False Rejection
(p
0
= 0.8, 95% confidence level, rejection and acceptance thresholds of
0.5 and 0.95, respectively)

Why is sample size important?
Table 3 seems to imply that sample size is not an
important consideration in assessing tool performance.
This is a completely incorrect conclusion. We will justify
this statement by considering the typical setting p
0
= 0.8
with 95% confidence level and rejection-acceptance
thresholds of 0.05 and 0.95, respectively. Appendix 5
gives a formula for the probability of m
r
or more
successful measurements when p
0
p. Table 4 gives
7 Copyright 2012 by ASME
these probabilities for various values of n. That is, Table
4 gives the probability that we accept a tool's performance
when the performance is acceptable.



Table 4 Probability the number of
successful measurements m
r
when p
0
p
(p
0
= 0.8, 95% confidence level, rejection and
acceptance thresholds of 0.5 and 0.95, respectively)

Thus when n = 15 we will fail to accept about 16%
(roughly 1 in 6) of acceptable performances, while when
n = 50 we fail to accept only 10% (1 in 10). Thus,
acceptance is considerably more likely with the larger
sample size, in general the larger the sample the more
accurate the results. The Bayesian method is optimal
for acceptance.

The following discussion applies only to the typical
situation with p
0
= 0.8, 95% confidence level, and
acceptance threshold of 0.95. However, something
similar may hold in general. Intuitively, it is not possible
to conclude that 0.8 certainty from a sample unless
more than 80% of its measurements are successes. That
is, we cannot expect to be able to accept tool performance
using a sample of size n with fewer than 0.8n + 1
successful measurements. It is easily verified that the
acceptance number of successes, m
a
, in Table 2 equal
0.8*n + 1 rounded to the nearest integer. In Excel

m
a
= ROUND(0.8*n + 1, 0)

This formula has been verified for values of n up to 100
and larger values of 500, 1000, and 10,000. Thus, the
acceptance number of success for the typical situation
appears to be optimal in the sense that no smaller value
will suffice.

Advantages of the Bayesian method
The Bayesian method
1. is less restrictive on accepting tool performance
than a comparable confidence interval method
based on the Clopper-Pearson confidence
interval, at least for the situations depicted in
Table 2.
2. allows the calculation of the probability density
function for H|E
m
, which could be used for
probabilistic assessments of failure.
3. allows the calculation of the probabilities of false
acceptance and false rejection.
4. gives an optimal acceptance criterion, at least for
the situation with p
0
= 0.8, 95% confidence level
acceptance threshold of 0.95

Disadvantage of the Bayesian method
The primary disadvantage of the Bayesian method is it
relies heavily on the certainty and confidence level in the
sizing accuracy specification. If these are erroneous then
the Bayesian method is also likely to be erroneous. The
amounts to restating the proverb "Garbage in, garbage
out."

Minimum Sample Size
Generalizing the argument in the second paragraph before
Table 2 to arbitrary certainties and confidence levels,
requires that the sample size n satisfies (p
0
)
n
< . That is,
n > ln()/ln(p
0
).

The Bayesian method is optimal for acceptance
Intuitively, it is not possible to conclude that 0.8
certainty from a sample unless more than 80% of its
measurements are successes. That is, we cannot expect to
be able to accept tool performance using a sample of size
n with fewer than 0.8n + 1 successful measurements. It is
easily verified that the acceptance number of successes,
m
a
, in Table 2 equal 0.8*n + 1 rounded to the nearest
integer. In Excel

ma = ROUND(0.8*n + 1, 0)

for the data in Table 2. It appears that in general m
a
is
very close to p
0
*n + 1 rounded to the nearest integer.
Actually, rounding p
0
*n + 0.99999 works better in Excel
due to the way fractional parts of 0.5 are rounded. There
are still inaccuracies when the fractional part of p
0
*n + 1
is close to 0.5, but these are not excessive for the cases
considered below. We compared m
a
with ROUND(p
0
*n
+ 0.99999, 0) in Excel for cases with

p
0
: 0.750, 0.775, 0.800, 0.825, 0.850
confidence levels: 90%, 92.5%, 95%, 97.5%
sample size ranging from the minimal acceptable
value to 100
8 Copyright 2012 by ASME
acceptance thresholds determined by the
confidence level ((1 - )100% confidence level
determines an acceptance threshold of (1 - ))

The Table 5 describes the cases for which m
a
and
ROUND(p
0
*n + 0.99999, 0) are not equal. For a fixed p
0

the comparisons of m
a
with ROUND(p
0
*n + 0.99999, 0)
are independent of the confidence level, even though the
minimum sample size changes. For 95% confidence level
only 15 of 433 cases (roughly 3%) had ROUND(p
0
*n + 1,
0) > m
a
and in each case ROUND(p
0
*n + 1, 0) is only one
larger than m
a
. ROUND(p
0
*n + 1, 0) is also at most one
larger than m
a
for all other confidence levels considered.
In short, m
a
is as small as can be reasonably expected, at
least for the cases considered.



Table 5 Comparison of m
a
with ROUND(p
0
*n + 0.99999, 0)
(confidence levels, sample size, acceptance thresholds described in text)

The optimality of m
a
in these calculations justifies using
the confidence level to determine the acceptance
threshold.

REFERENCES

1. Haldar, H. and Mahadevan, S., Probability, Reliability,
and Statistical Methods in Engineering Design, John
Wiley & Sons, New York, 2000, pp. 26.
2. McCann, R., McNealy, R., and Gao, M., In-Line-
Inspection Performance Verification, II, Validation
Sampling, NACE, Corrosion 2008, Paper No. 08151.
3. Brown, L.D., Cai, T.T., and DasGupgta, A., Interval
Estimation for a Binomial Proportion, Statistical
Science, Vol. 16, No. 2, 2003, 101-133.
(http://correio.cc.fc.ul.pt/~mcg/aulas/dinpop/Mod7/Brown_et_al.pdf)
4. Brown, L.D., Cai, T.T., and DasGupgta, A.,
Confidence Intervals for a Binomial Proportion and
Asymptotic Expansions, Annals of Statistics, Vol 30,
No. 1, 2002, 160-201.
(http://wwwstat.wharton.upenn.edu/~tcai/paper/Binomial-Annals.pdf)
5. API Standard 1163, In-Line Inspection Systems
qualification Standard, First Edition, August, 2005,
American Petroleum Institute, Washington, D. C.

APPENDIX 1
Evaluation of P(E|H
c
), P(E|H), P(H
c
) and P(H)

The probability of exactly m successes in n measurements
for a given proportion p of successes is given by the
binomial distribution:

B(m, n, p) = [
n
m
p
m
(1 -p)
n-m


P(E
m
|H
c
) is the "sum" of all these probabilities for p < p
0

divided by the "sum" of all possible probabilities (0 < p <
1):

P(E
m
|E
c
) =
] B(m,n,p)dp

0
0
] B(m,n,p)dp
1
0


The right side of this identity is the beta distribution,
which is easily evaluated in Excel using a worksheet
function:

P(E
m
|H
c
) = BETADIST(p
0
, m + 1, n m + 1)

The beta distribution is commonly encountered in
application of Bayes' Theorem. Since H and H
c
are
mutually exclusive and exhaustive we have P(E
m
|H) +
P(E
m
|H
c
) = 1, so that

P(E
m
|H) = 1 P(E
m
|H
c
).

P(H) and P(H
c
)
A (1 - )100% confidence level implies that (1 - )100%
of the ILI runs satisfy H so that the probability of a
random ILI run satisfying H is 1 . That is, P(H) = 1
. Since P(H) + P(H
c
) = 1, we have P(H
c
) = 1 P(H) =
.

We are now able to evaluate all the terms in the formula
for P(H|Em), the probability that certainty p
0
given a
sample of tool run data.











0.750 0.775 0.800 0.825 0.850
Numberofsamples
(95%confidencelevel)
Numberofsampleswith
ROUND(po*n+0.99999,0)>m
a
Numberofsampleswith
ROUND(po*n+0.99999,0)<m
a
0 0 0 0 0
90 89 87 85 82
p
0
0 11 0 4 0
9 Copyright 2012 by ASME
APPENDIX 2
Calculation of Coverage of API 1163
95% Confidence Interval when n =25, p = 0.8

The probability P (m) of exactly m successes in a random
sample of size n for population proportion p is given by
the binomial distribution, which was discussed at the
beginning of Appendix 1.

Table A1 gives P (m) for all possible choices of m when n
= 25 and p = 0.8, along with the endpoints (converted
from percentages to decimals) of the corresponding
confidence intervals in Table 9 of API 1163 for each
possible number of successes. Note that p
L
and p
U
are
independent of the of the population proportion. The
chosen population proportion (0.8) is only used to
calculate the probabilities, in Table A1.


Table A1 Probabilities and 95% confidence interval endpoints
from API 1163, Table 9 (changed from percent to decimal)

Suppose we take a random sample of size 25, determine
the number of successes, and construct a confidence
interval according to Table A1. Notice that there are only
26 possible confidence intervals. If we repeat this
method for all possible samples of size 25, the proportion
of times any given confidence interval is constructed is
the same as the proportion of times its corresponding
number of successes occurs. For example, the proportion
of times we construct the confidence interval (0.64, 0.96)
equals the proportion of times there are 20 successes.
Since we constructed all possible confidence intervals by
this method, the proportion of times there are m successes
is exactly P (m). Then, the proportion of confidence
intervals that contains 0.8 equals the sum of all P (m) with
0.8 in the confidence interval corresponding to m
successes. Since 0.8 is only in the confidence intervals
corresponding to m from 16 through 22, the proportion of
confidence intervals that contain 0.8 is given by
88 . 0 P(m)
22
16
=

= m

Thus, the true coverage of this method for obtaining
confidence intervals is 0.88, not the nominal value of
0.95. This means that if the true certainty of an ILI tool
were 0.8 and we used Table 9 to determine a confidence
interval for the certainty, the true confidence level would
be 88%, not 95%. Inadequacies of eq. (1) to determine
confidence intervals with the nominal coverage, even
with large sample sizes, are well-documented in the
literature. The interested reader is directed to Ref. [3] as
a starting point.











m Prob(m) pL pU
0 3.36E-18 0.00 0.11
1 3.36E-16 0.00 0.12
2 1.61E-14 0.00 0.19
3 4.94E-13 0.00 0.25
4 1.09E-11 0.02 0.30
5 1.83E-10 0.04 0.36
6 2.43E-09 0.07 0.41
7 2.64E-08 0.10 0.46
8 2.38E-07 0.14 0.50
9 1.80E-06 0.17 0.55
10 1.15E-05 0.21 0.59
11 6.27E-05 0.25 0.63
12 2.93E-04 0.28 0.68
13 1.17E-03 0.32 0.72
14 4.01E-03 0.37 0.75
15 1.18E-02 0.41 0.79
16 2.94E-02 0.45 0.83
17 6.23E-02 0.50 0.86
18 1.11E-01 0.54 0.90
19 1.63E-01 0.59 0.93
20 1.96E-01 0.64 0.96
21 1.87E-01 0.70 0.98
22 1.36E-01 0.75 1.00
23 7.08E-02 0.81 1.00
24 2.36E-02 0.88 1.00
25 3.78E-03 0.89 1.00
API 1163
10 Copyright 2012 by ASME
APPENDIX 3
Probabilities of false acceptance and false rejection
(Type 1 and Type 2 errors)

Let A and B denote two events. The event in which both
A and B occurs is denoted by AB. The multiplication rule
for probability can be stated as

P(AB) = P(A|B)*P(B)

Let the sample size n be fixed and consider the maximum
number of successes m
r
for which we reject tool
performance and the minimum number of successes m
a

for which we accept tool performance. Values for m
r
and
ma are given in Table 2 for 14 n 51.

The probability of false rejection (Type 1 error) of tool
performance, P
FalseRejection
, is the probability that we reject
tool performance when in fact we should accept tool
performance. That is, P
FalseRejection
is the probability that
there are m
r
or fewer successes and p p
0
. That is,

P
FalseRejection
= P(m
r
or fewer successes and p p
0
)

=
P(exactly m successes anu p p
0
)
m

m=0
P(exactly m successes anu p p
0
)
n
m=0


=
P(E
m
B)
m

m=0
P(E
m
B)
n
m=0

)
=
P(E
m
|B)
m

m=0
P(E
m
|B)
m

m=0


The probability of false acceptance (type 2 error) of tool
performance, P
FalseAcceptance
, is the probability that we
accept tool performance when in fact we should reject
tool performance. That is, P
FalseAcceptance
is the probability
that there are m
a
or more successes and p < p
0
. That is,

P
FalseAcceptance
=P(m
a
or more successes and p < p
0
)

=
P(exactly m successes anu p < p
0
)
n
m=m
a
P(exactly m successes anu p < p
0
)
n
m=0


=
P(E
m
B
c
)
n
m=m
a
P(E
m
B
c
)
n
m=0


=
P(E
m
|B
c
)
n
m=m
a
P(E
m
|B
c
)
n
m=0


Appendix 1 describes how to calculate P(E
m
|H
c
) and
P(H
c
). Appendix 4 shows

P(E
m
|B)
n
m=0
= (1 -p
0
)(n +1) anu P(E
m
|B
c
)
n
m=0
= p
0
(n +1)

regardless of the value of m. Consequently,

P
FalseRejection
=
P(E
m
|H)
m

m=0
(1-p
0
)(n+1)


P
FalseAcceptance
=
P(E
m
|H
c
)
n
m=m
a
p
0
(n+1)

11 Copyright 2012 by ASME
APPENDIX 4
Evaluation of P(E
m
|H

)
n
m=
and P(E
m
|H)
n
m=


The probability of exactly m successes in n measurements
for a given proportion p of successes is given by the
binomial distribution:

B(m, n, p) = [
n
m
p
m
(1 -p)
n-m


We have

B(m, n, p)
n
m=0
= 1

Consequently

_ B(m, n, p)Jp
p
0
0
n
m=0
= _ B(m, n, p)Jp
n
m=0
p
0
0
= _ Jp
p
0
0
= p
0





According to eq. 1 in Section 8.384 of Ref [6] we have

_ B(m, n, p)
1
0
Jp = [
n
m
_ p
m
(1 -p)
n-m
1
0
Jp

= [
n
m

m! (n -m)!
(n +1)!


=
n!
m! (n -m)!
m! (n -m)!
(n +1)!


=
1
n +1


so that ] B(m, n, p)
1
0
Jp is independent of m and equals
1/(n + 1). Consequently

P(E
m
|B
c
)
n
m=0
=
] B(m, n, p)Jp
p
0
0
] B(m, n, p)Jp
1
0
n
m=0


= p
0
_ B(m, n, p)Jp
1
0


= p
0
(n +1)

Similarly

P(E
m
|B)
n
m=0
=
] B(m, n, p)Jp
1
p
0
] B(m, n, p)Jp
1
0
n
m=0


= (1 -p
0
) _ B(m, n, p)Jp
1
0


= (1 -p
0
)(n +1)













APPENDIX 5
Probability of m
r
or more successful measurements
when p
0
p

P(m
a
oi moie successes & p
0
p) =
P(exactly m successes & p
0
p)
n
m=m
a
P(exactly m successes & p
0
p)
n
m=0


=
P(E
m
B)
n
m=m
a
P(E
m
B)
n
m=0


=
P(E
m
|B)
n
m=m
a
P(E
m
|B)
n
m=0


=
P(E
m
|B)
n
m=m
a
(1 -p
0
)(n +1)

Vous aimerez peut-être aussi