What Is Missing in Non Destructive Tes

x
ME FEATURE w
From Materials Evaluation, Vol. 73, No. 1, pp: 4454.

Copyright 2015 The American Society for Nondestructive Testing, Inc.
NDT
by Charles Annis, John C. Aldrin, and
Harold A. Sabbagh
hat is missing in nondestructive

testing (NDT) capability evaluation is
what is missing in many engineering
evaluations of riskunderstanding of
the statistical premises governing their calculation.
Apparently, it is easy to forget that clever reasoning,
however valid, cannot rescue a faulty premise. And if
NDT practitioners do not even know what that premise
is, they are in trouble at the outset. It is the authors
objective here to begin to remedy that.
What is
Missing in
Nondestructive
Testing
Capability
Evaluation?
Photo credit: Matt Lieb
The Most Common Mistake Engineers Make in

Statistics
44
MATERIALS EVALUATION JANUARY 2015
Two of the authors have been practicing engineers for

nearly five decades, each. In their experience, the
most common mistake that engineers make in their
statistical analysis is beginning with a valid mathematical statement that is conditionally true, then
proceeding with a series of valid mathematical operations to arrive at an answerwhich may or may not be
true, depending on the long-forgotten (or ignored)
conditions.
Consider two questions about the following mathematical statement: 2 + 2 = 5.
l Question 1: Is this a valid mathematical statement?
l Answer 1: Yes. Addition is defined as a binary
operation. It requires two addends, a sign indicating the operation, an equal sign, and the sum.
Statement 1 meets these criteria; thus, it is a valid
mathematical statement.
l Question 2: Is the statement true?
l Answer 2: No. It is false. A valid statement can be
false and still be a valid mathematical construct.
Capability
0.99
0.98
0.95
0.9
Fraction < X
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.05
0.02
0.01
0.0
0.2
0.4
(a)
0.6
0.8
1.0
X
0.99
0.98
0.95
0.9
Fraction < X
Engineers would never work from something as

obviously false as this statement; but just because
something is not obviously false does not mean it is
not false. Consider the commonplace statistical
example of computing upper and lower bounds
expected to include 95% of the population given the
following observations. (Making inferences based on
small samples can be dangerous, but the purpose
here is not to discuss that trap, only to illustrate that
not all data are normal and assuming that they are,
without checking, is irresponsible.)
X: 0.10, 0.16, 0.23, 0.32, 0.43, 0.62, 1.0.
Answer: Everyone knows 2 standard deviations
from the mean enclose 95% of the sample, so the
^, and the upper bound is

lower bound is X 2s
^
X + 2s . The mean is X = X / n = 0.409, and the
^, is 0.314,
estimate of the standard deviation, s
so the bounds are 0.219 and 1.037, respectively.
Simple. And in this case, wrong. Why? Because the
data do not have a normal distribution. Rather, the
data are lognormal, which means that < log(X) < ,
so the lower bound can never be negative.
It is easy to check the normal assumption: make a
quantile-quantile (Q-Q) plot. A Q-Q plot displays the
quantiles (percentiles) of the data against the
quantiles expected from the probability modelin this
case, the normal model. If the data are (approximately) normal they plot as a straight line, which they
do not in Figure 1a, but do when plotted as log(X) in
Figure 1b.
The lesson here is not that engineers do not check
if their data have a normal distribution; it is that they
seldom check the validity of any of their statistical
assumptions.
For example, ask engineers about their finite
element analysis and they will say what software was
used. Pressed for details, they might add that the
material was assumed to be isotropic and linear
elastic and tell about its ultimate strength, yield
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.05
0.02
0.01
2
102
(b)
4 5 6 789
101
4 5 6 789
100
Figure 1. Normal quantile-quantile (Q-Q) plot, on which normal data plot as a

straight line: (a) normal Q-Q plot; and (b) log-normal Q-Q plot.
JANUARY 2015 MATERIALS EVALUATION
45
ME FEATURE w
x ndt capability evaluation
strength, elastic properties, loading conditions, and

temperature profile. However, ask engineers about
their statistical analysis and they will say what statistical software was used. Quoting from an older work,
Simply not understanding the nature of the assumptions being made does not mean that they do not
exist (Frank et al., 1993).
Implicit Statistical Assumptions in Regression

Analysis
POD (a)
Signal strength, or log()
Since most engineers are familiar with ordinary leastsquares (OLS) linear regression, the authors will use it
0.8
0.6
0.4
0.2
= 0
a10
POD (a)
(a)
(b)
a50
a90
Target size, a or log(a)
0.8
0.6
0.4
0.2
The Response Must Be Continuous and Observable

(Part 1)
= 0
censor
Figure 2. Regression of signal strength (Y) on target size (X): (a) ordinary leastsquares requires all responses to be observable; and (b) replacing censored values
with the censoring value skews the result anticonservatively. POD = probability of
detection.
46
to investigate those under-appreciated assumptions in

regression analysis. OLS chooses parameter estimates
that minimize the summed squared difference
between the model and the observations. Figure 2a
shows the relationship between OLS regression of NDT
response and target size, as well as illustrates the relationship between probability of detection (POD) and
signal strength. OLS has been a fundamental part of
engineering practice for 200 years.
All analysis relies on assumptions concerning the
relationship between reality and the process being
modeled. Perhaps the most obvious is this:
l The model must look like the data (Harrell, 2001;
Venables and Ripley, 2010).
While this may be self-evident, checking to see if
the assumption holds is less so, given how often it is
ignored. There are five other implicit assumptions that
must be satisfied for the resulting parameter estimates
to be useful:
l The response must be continuous and observable.
l The model must be linear in the parameters.
l The variance must be homoscedastic (have uniform
variance).
l The observations must be uncorrelated with respect
to time, space, or both (Chatfield, 1989; Cressie,
1993; Cressie and Wikle, 2011).
l The errors must be normal (Sakia, 1992).
If any of these assumed conditions are not met,
the resulting analysis will be wrong, even though that
fact may be far from obvious. The assumptions also
hold for the technique of maximum-likelihood estimation (MLE), frequently used in POD evaluation. MLE
will be discussed later in this section. Now the authors
consider each requirement more closely.
In Figure 2a, all responses are observed. This is not

always the casesometimes the response is below
some noise threshold or above some saturation value.
In that case it is censored. Since it is unknown (other
than being below some noise or above some saturation), it is obviously not possible to compute the difference (error) between the observation and the model,
so finding a summed squared error is not possible.
So, what to do with censored observations? They
can either be ignored (which means throwing away
useful information about the sought after model
parameters) or replaced with their censoring value.
Both choices are bad. Figure 2b illustrates that the
OLS parameter estimates based on replacing an
observation with its censoring value results in an
erroneous, anticonservative, POD versus size model.
Since neither option is acceptable, what can be done?
there is no need to jettison 200 years of

OLS experience to use the MLE criterion.
With censored data, OLS is untenable: the errors
cannot be computed so they cannot be minimized.
Rather than minimizing the summed error, the likelihood is maximized. When the data are not censored,
MLEs are exactly equal to OLS estimators, so there is
no need to jettison 200 years of OLS experience to use
the MLE criterion. OLS is powerless to deal with
censoring, but likelihood handles censored data
easily.
Before leaving the topic, the authors illustrate in
Figure 3 how well MLEs perform with the regression
data in Figure 2b, showing the correct censored
regression fit as compared with the OLS fit of all
the data. It is not perfect, but it is far superior to its
alternatives.
POD (a)
For some unfathomable reason, engineers are not

introduced to likelihood in their first statistics course.
Rather, they are given a collection of formulas to
memorize. They are not taught how statistical stuff
works but spend tedious hours discussing red and
black balls in urns. That cannot be fixed now, but the
authors aim to provide a very brief introduction to likelihood, the foundation of modern statistics.
Probability and likelihood are two sides of the
same coin: probability provides the chances that
outcome X will occur, given a model with stated
parameters. Likelihood is the chances that these
parameters are the best possible for the given probability model, given this collection of observations.
Their mathematical formulations are identical. The
only difference is what is known. With probability the
parameters (for example, mean and standard
deviation of normal data) are known, and the probability is desired of the next observation falling in a
given range. With likelihood there is a collection of
observations, and the most likely mean and standard
deviation are desired.
Likelihood is defined, then, as the ordinate of the
given probability density, the gaussian, for example.
To find the MLE of the mean, an optimization problem
must be solved. This involves differentiating the log of
the product of likelihoods (one likelihood for each
observation), setting the derivative equal to zero, and
solving. In this example, the maximum likelihood
occurs at X = X / n. Look familiar? More involved

problems may require more sophisticated optimization
algorithms, but the idea is the same: the best
parameter estimates are those that maximize the likelihood of observing what has occurred.
Looking at censored observations, X is unknown,
except that it is greater or less than some censoring
value. The ordinate is unknown. How can the likelihood be maximized if the likelihood is unknown?
Simply, since X could be anything in the censored
reason, the likelihood of a censored observation is
defined as all of them, that is, the integral of the probability density below or above, the censoring value.
Then the optimization problem is solved.
0.8
0.6
0.4
0.2
Probability and Likelihood
= 0
censor
a10
a50
a90
Figure 3. Censored regression using maximum-likelihood estimation (blue dashed

line) correctly accounts for observations with actual responses obscured by background noise and thus censored. POD = probability of detection.
47
ME FEATURE w
The Response Must Be Continuous and Observable

(Part 2)
Many NDT techniques provide only a binary, hit/miss
outcome. Signal strength is often misleading because
a small crack can weep penetrant and appear larger,
and a large crack can still be so tight as to prevent
penetrant entry. OLS cannot be used with binary data.
(More specifically, it can be used, but it is wrongly
Dependent variable, Y
80
60
40
20
0
0
20
40
60
80
Independent variable, X
(a)
Dependent variable, Y
80
60
40
20
0
0
(b)
20
40
60
80
Independent variable, X
Figure 4. Inefficient versus efficient model building: (a) technique 1, estimate the
mean behavior by connecting the group means, and estimate the bounds by
connecting the points at group mean 2sample.sd; and (b) technique 2, a parametric model can use all the data collectively, not locally to provide a better overall
description of the data.
48
often used. Just because software can be coerced to

provide an answer does not mean the answer is
meaningful.)
One (creative) technique to describe binary
response is to group the data and analyze the
grouped averages. To illustrate how ill advised this
idea is, the authors will use it with continuous data
since its deficiencies are more obvious there.
In Figure 4a, the data are in groups of five observations each. The six sample means and six standard
deviations were calculated, assuming that they had a
normal distribution. To estimate the underlying Y = f(X)
relationship, the sample means were connected. To
estimate the lower (and upper) bound, normal distributions were drawn, centered at the sample means
and based on the sample standard deviations. All of
the mathematical manipulations in technique 1 are
valid as in Figure 4b, but this approach begs several
questions:
l Is the true underlying relationship really as crooked
as it appears?
l Are the six standard deviations really different, or do
they result entirely by chance, and would another
random sample of 30 look rather different?
l Is it the best that can be done? It requires estimating 12 parameters, 6 sample means, and
6 standard deviations and tacitly assumes that the
observed behavior is the actual behavior. Might
there be sufficient reason to suspect the true
relation is much simplerlike a simple straight
line?
A parametric model, Figure 4b, produces a more
believable description of the underlying reality and
does not tempt the unwary into trying to explain group
differences that are only illusory. The parametric
model is more efficient, requiring only three parameters, as compared with 12. Thus it provides more
(30 3 = 27) degrees of freedom for estimating the
standard deviation of the underlying variability
(error).
There are two sets of bounds on the regression
plot. The innermost bounds are the confidence
bounds on the mean line. The confidence bounds are
expected to contain the true relationship (red line) in
95 of 100 nominally identical experiments. The outer
bounds are prediction bounds on the individuals. The
next future single observation is expected to fall within
the prediction bounds 95% of the time. It is important
to note that this does not mean that 95 of the next
100 observations will fall within the prediction
bounds. It means that of 100 similar, nominally
identical, experiments, the next single observation in
95 of the experiments would likely be contained
within that experiments prediction bounds. If the
(1)
POD( a, ...) =
exp f [ X ]
1 + exp f [ X ]
where
f(X) = b0 + b1X.
As before, the authors purpose is not to provide a
prcis on mathematical statistics, but rather to call the
readers attention to standard statistical methods that
are well suited for solving many of NDT engineering
problems. Consider real data, found as example 3
hm.csv in MIL-HDBK-1823A (2009), which focuses on
techniques to produce POD versus size curves based
on experimental data (DOD, 2009). The completed
analysis is shown in Figure 5 (DOD, 2009).
The Model Must be Linear in the Parameters
Statisticians and engineers use the term linear
model differently. In statistics, a model is linear if it is
linear in the model parameters. In engineering, a
system is linear if the output is a linear function of the
input. So, y = b0 + b1x2 + b0sin(x) is a linear statistical
model, but y = b0 + b1eb2x is not. Not meeting the
requirement for being linear in the model parameters
means that OLS cannot be used, but as with GLM, that
means only that some other technique is required.
a90/95
a90
a50
1.0
0.9
0.8
0.7
0.6
POD a
probability that the next single observation will be

within the prediction bounds is 0.95, then the probability that the next two observations will be within the
bounds is 0.95 0.95 = 0.9025, so the 95% prediction bounds for a single future observation are also
the approximate 90% bounds for the next two observations. Confidence bounds describe how well the
model captures the true (X,Y) relationship. The prediction bounds describe the anticipated behavior of the
next single observation.
Attempts at grouping binary data suffer from all
these shortcomings, especially treating random
behavior as if it were meaningful. A binary analog is
needed of the parametric model technique, that is, a
generalized linear model (GLM).
A linear model links Y with X directly, f(X) = g(Y) = Y,
where g(Y) is the identity function. This idea can be
generalized, where the link is based on the probability
of observing Y rather than observing Y directly, for
example, f(X) = g(Y) = log(p / [1 p]).
The model parameters are again chosen to
maximize the probability of observing what was
actually observed.
A large number of grouped means is no longer
needed, and only two model parameters are necessary
to estimate as shown in Equation 1:
0.5
0.4
0.3
0.2
0.1
0.0
0.0
2.54
5.08
7.62
10.16
Size, a (mm)
Figure 5. Probability of detection (POD) as a function of target size, supported by

hit/miss data. The confidence bounds are constructed using extremes of feasible
likelihood. Details in MIL-HDBK-1823A, Appendix G.
There is an entire area of applied statistics dedicated

to nonlinear regression (Bates and Watts, 1988). In
fact, Equation 1 is a non-linear model.
The Variance Must be Homoscedastic
The variance (data scatter about the line or model)
must be approximately constant because the fitting
criterion is minimized summed squared differences
between the model and each observation. If the
variance were, say, small at one extreme, and large at
the other, the smaller observations would be slighted
so as to do a better job with the larger ones where the
deviations are proportionally larger. The resulting
parameter estimates would not be useful. Sometimes,
it is possible to transform the observations (log[y], or
box-cox transformation, for example) to stabilize the
variance so OLS is viable. Sometimes the transform
does stabilize the variance but also destroys the
simplicity of an underlying (X,Y) relationship.
Sometimes, as with errors that are both additive and
multiplicative, no transformation will be effective.
The Observations Must be Uncorrelated
Most observations are uncorrelated but not all. Some
are autocorrelatedrelated with their neighbors in
time so more recent observations are likely to be
49
ME FEATURE w
The most common mistake

engineers make with probability is
multiplying probabilities.
similar. An obvious example is weather data:
tomorrow is more likely to look like today than it will
look like last month. But what about NDT data
collected hourly? Are early morning data somehow
different from data collected in late afternoon? How is
that known?
Spatial autocorrelation should not be overlooked
either. Consider a components random surface topography, for example. Deviations from print may be
random, but are more likely to be self-similar when
they are proximal rather than distal. There are statistical techniques for separating the random component
of these deviations from the systemic, techniques that
are useful for product improvement.
Here again, the authors purpose is not a discussion of mathematical statistics, only to suggest
through references areas worth further engineering
study.
a90
a50
a90/95
Nonexistent a90
and a90/95
1.0
0.9
0.8
POD (a)
0.7
0.6
0.5
0.4
These misses would be
very improbable if the
POD model were correct
and the local POD > 0.95
0.3
0.2
0.1
Clue that
something
is wrong
0.0
0.0 2.54
5.08
7.62 10.16 12.7 15.24 17.78
Size, a (mm)
Figure 6. Preliminary analysis of Nondestructive Testing Information Analysis
Center 1997 A9002 3-L dataset showing some problems that arise from data plots.
POD = probability of detection.
50
The Errors Must be Normal

This requirement means binary data are not suited
for OLS analysis because, not only are the errors
not normal, the variance is not constant. This is
because the error variance depends on the mean:
Var(p) = p(1 p), which means it is greatest when
p = 0.5 and diminishes as p approaches either zero or
one. Software may be coerced to produce an answer
but it will be wrong.
The Model Must Look Like the Data
Perhaps the first rule of statistical data analysis should
be to plot the data. It would have prevented using a
normal distribution erroneously in the earlier example,
and it can avoid considerable wasted time and misdirection in most circumstances. The plot is necessary,
but not sufficient, because the plot itself requires
scrutiny, sometimes very close scrutiny. Consider the
following real example.
Look carefully at Figure 6. The model says that
POD is nearly 100% for sizes greater than 12.7 mm
(0.5 in.), but seven cracks larger than 12.7 mm
(0.5 in.) were missed. The model says the POD is
nearly perfect, but the data say otherwise. These
seven misses would be exceedingly unlikely if the true
POD were greater than 95% as the model indicates.
Something is clearly wrong.
It is a fact that if the data disagree with the model,
then the model is wrong. What about the model could
be wrong? This model, as with many POD models,
assumes that POD approaches zero for very small
cracks, and approaches one for large cracks. This is a
reasonable assumption in nearly every NDT situation
but not all. In some situations the POD may never
reach zero because of excessive noise and false
positives. In other situations, like this one, the POD
never reaches one, perhaps due to inaccessibility of
the inspection site, malfunction of the inspection
apparatus, or lack of attentiveness by the inspector.
Again, the purpose is not a discussion of mathematical statistics, only to suggest through references areas
worth further engineering study.
Categorical Data
So far, only the requirements on the response, Y, have
been considered, but there are other, underappreciated, ignored, or unknown assumptions being made
about the independent observations, X. Remember,
just because the assumptions being made are
unknown does not mean they are not being made.
One requirement of X for OLS regression to be meaningful is that it be continuous and observable.
However, many things cannot be described by a
position on the number line, and attempts to do it
anyway are part of the ad hoc analysis problem.
Consider, for example, different eddy current probe
manufacturers. How could they be placed on a real
number line? They cannot.
Suppose the data involved eddy current probes
from different manufacturers and needed to be
included in your regression analysis to evaluate manufacturer performance. A parameter could be created
called Mfg and assigned values of 1, 2, and 3. The not
altogether obvious problem is that manufacturer 3 has
been defined to have three times the influence as
manufacturer 1. It has been further stipulated that the
difference between manufacturer 1 and manufacturer 2
is the same as between manufacturer 2 and manufacturer 3. As a result, any analysis would be hopelessly
wrong because the purpose of the analysis is to
determine the relative performances of each manufacturer, and that has been made impossible. Such data
are categorical and cannot be analyzed simply by
assigning values.
One technique (there are others) is to define two
Mfg parameters, Mfg1 and Mfg2, for example, so that
each appears in the model only when response is
from that manufacturers probe, like in Table 1.
TABLE 1
Example binary categorical data

Manufacturer
Mfg1
Mfg2
1
2
3
0
0
1
0
1
0
With four probe manufacturers this would require

three Mfg parameters, and so on. The regression
would determine the coefficients associated with each
Mfg, with manufacturer 1 providing the baseline so
coefficients would be the difference between manufacturer 1 and the others. If this R is used, this coding
will be handled when a parameter is defined as being
categorical. Refer to a recent work for a great place
to start to learn about analyzing categorical data
(Agresti, 2002).
The Most Common Mistake Engineers Make

with Probability
The most common mistake engineers make with probability is multiplying probabilities. Here is an
examplePOD improvement through redundant
inspection. The logic of the purported improvement
is something like this:
l If PODsingle inspection = 0.9, then probability of miss
(POM) = 1 POD = 0.1.
l The probability of missing something twice is
POM POM. Therefore, the PODdouble inspection
= 1 POM2 = 1 0.01 = 0.99.
Thus, the redundant inspection has 99% POD
compared with the original, single inspection,
POD of 90%.
To see why looking at the same thing twice does
not change much, consider determining the fraction of
red apples and green apples in a barrel by inspecting
a sample of them.
A selected apple is examined and pronounced to
be red. The same apple is given to Thomas, who
concurs that it is red and passes the apple to Richard,
who also agrees that the apple is red. Now there are
three opinions on the color of one apple. How much
more is known of the fraction of red apples in the
barrel than after the first examination? Nothing. Now,
Richard gives the apple to Harold, who says that it is
green, not red. Now, how much more is known of the
fraction of red apples in the barrel than after the first
examination? Still nothing. However, more is known
about the quality of the inspection process. Though
nothing has been learned about the fraction of red
and green apples in the barrel, the inspection repeatability leaves something to be desired.
What does this mean? Repeated inspections of the
same thing are not very informative, and the information they do provide concerns the inspection itself, not
what is being inspected. This does not mean that
repeated inspections should not be carried out. It
means that repeated inspections will help understand
the inspection process better, but they will not
improve the probability of detecting what is being
looked for.
As another example consider two inspections, A
and B. The probability of finding a crack with either
inspection A or B is P(A or B) = P(A) + P(B) P(A and
B). To see this, look at Figure 7. In both figures, the
area representing found by inspection A and found
by inspection B is counted twice, and thus must be
subtracted from the sum of P(A) + P(B). So how then is
P(A and B) calculated?
Two events, A and B, are independent if the
outcome of one has no influence on the outcome of
the other. In that case, and only in that case, can one
51
ME FEATURE w
A
A
A and B
A and B
(a)
(b)
Figure 7. Venn diagrams showing: (a) independent; and (b) non-independent inspections. In (a), the venn diagram
shows crack found by inspection A, found by inspection B, and found by both A and B, and missed by both. Note that
area A and B is counted twice. In (b), inspections A and B are not independent. What was found (or missed) by A was
found (or missed) by B. Inspecting twice does not change probability of detection.
multiply probabilities so that P(A and B) = P(A) P(B)?

What is wrong with multiplying probabilities? Nothingif
the events are independent. However, when the events
are not independent, finding P(A and B) can be inconvenient, even tedious, and may require counting the
fraction of times A and B occur together.
Figure 7b shows complete correlation between
inspections; however, any non-zero correlation
means the inspections are not independent, and
the multiplying probabilities calculation is wrong.
Computational convenience is no substitute for
veracity.
By chance, events are often independent, so
chance protects the ignorant from the consequences
of the indiscriminant multiplying of probabilities, but
in many situations events are not independent.
Common Errors in Evaluating Nondestructive

Testing Sizing Capability
There have been some recent efforts to define and
demonstrate a complete process for evaluating sizing
capability, specifically addressing discontinuities in
welds and corrosion in aircraft structures (Ducharme et
al., 2012; Forsyth and Lepine, 2002; McCullagh and
Nelder, 1989; Nordtest, 1998). However, there are
some outstanding issues with the current practice for
the quantitative evaluation of sizing capability with
respect to NDT technique evaluation. One metric
frequently cited is the calculation of the 95% safety
limit against undersizing (LUS) bound for quantifying
sizing performance for discontinuities in welds
52
(Ducharme et al., 2012; McCullagh and Nelder, 1989;

Nordtest, 1998). However, there are some important
assumptions like linearity in the response and
constant variance in sizing error with changes in
discontinuity size that should be addressed before
using this measure. In addition, the simplistic
character of the bound from an OLS fit does not
adequately address the true variation of the bound
with the varying distribution of discontinuities and
limited sample numbers.
A sizing example from eddy current inspection for
surface cracks in metallic components is presented in
Figure 8a. Clearly, the results are dependent on
discontinuity size. A scatter plot of the sizing error
as a function of crack depth is shown in Figure 8b.
Characterization error for depth sizing was evaluated
with a linear model using MLE. The characterization
error results shown in Figure 8b include a linear model
fit (solid blue line) with corresponding 95% prediction
bounds (black dash-dot line). In general, the linear
model fit appears to be adequate for the censored set
of depths presented here and addresses the clear
dependency as a function of varying depth. The
common practice of fitting an OLS with the assumption that sizing performance does not vary with varying
crack depth is also included in the plot (red dashed
lines). Since there is a significant change in the lower
bound for error in the crack depth estimates as a
function of changing varying size, this simple OLS fit is
not appropriate. This demonstrates the need for care
when attempting to report a single value that defines
Characterization error, depth (mm)
Depth, estimated (mm)
0.51
0.38
0.25
0.13
0
0
(a)
0.13
0.25
0.38
Depth, known (mm)
0.51
(b)
0.25
0.2
0.15
0.1
0.5
0
0.5
0.1
0.15
0.15
0.2
0.25
0.3
0.36
0.41
0.46
Depth, known (mm)
Figure 8. Characterization error in depth sizing: (a) eddy current sizing results for crack depth (left censoring was applied for small cracks with
weak signals below the detection threshold); (b) characterization error for censored inversion results for crack depth. Plots include a linear
model fit (black solid line) with corresponding prediction bounds (blue dash-dot line). Red dashed lines are additional bounds based on an
ordinary least-squares fit and assumption that sizing performance does not vary with varying crack depth.
the entire lower bound for the safety LUS. Operators

should never mandate generating such numbers for
sizing capability evaluation when they are often not
appropriate for the data, as shown here.
Conclusion and Recommended Practice

The purpose of statistical modeling is not simply to
produce a description of the data (a French curve can
do that), but to produce a model with known statistical properties. That is the reason why so many of the
ad hoc statistical models are worthless: they may
appear to describe the immediate data but they
cannot be relied on to predict anything, nor can their
purported confidence bounds be believed since their
statistical properties are completely unknown.
Cloaking a dubious calculation in impressive-sounding
statistical raiment changes nothing but the potential
that it will impress the gullible and mislead the uninformed.
The current situation of sloppy engineering statistics is a consequence of statistical ignorance, and the
only remedy for ignorance is study. The authors do not
advocate that every engineering project have a statistician as a team member because, as often as not, the
statistician is as ignorant of physics and engineering
as the engineer is of statisticsnot a situation
conducive to effective communication and collaboration. What they do recommend is study, and this
is not a quick fix. Study is hard work, not simply
finding a useful-looking equation and blindly using it.
The place to begin is with an understanding of
mathematical statistics, which is to applied statistics

what physics is to engineering. Statistics is not mathematics. Mathematics is the language of statistics as it
is for engineering, but mathematics is neither engineering nor statistics. An engineer can be rather
accomplished as a mathematician and yet be
completely ignorant of mathematical statistics.
Where to begin? If having a statistician as a team
member is not the answer, and understanding mathematical statistics is necessary for engineering practice,
then the engineer must learn statistics. (The authors
are not suggesting the engineer shun the statistician
because frequent statistical consultation can greatly
facilitate learning. What they do recommend is that
the engineer learns what the statistician has learned,
since only the engineer understands the engineering
problem.) The authors recommend two texts, each
with a different purpose. Taken together, and with
honest effort, they can provide the requisite
knowledge. Begin with Meeker and Escobar, the single
best statistical reference for an engineer practicing in
the field (Meeker and Escobar, 1998). Especially
useful are the appendices summarizing the salient
results from mathematical statistics. For greater understanding of Meeker and Escobars summaries, the
authors recommend Casella and Berger (Casella and
Berger, 2001). Please remember that it is impossible
to learn physics in a month, and likewise statistics
cannot be learned any faster. Understanding physics
required serious study, and no less is required to
understand statistics.
53
ME FEATURE w
One final comment: the authors are all engineers.

Their purpose here is to call attention to the statistical
ignorance that is a pall on the practice of engineering,
with the hope of suggesting resources to remedy it.
w
x
ACKNOWLEDGMENTS
The authors would like to acknowledge ancillary support
from the Air Force Research Lab (AFRL) under a SBIR
Phase II Contract FA8650-13-C-5180 with Victor Technologies, LLC. They are especially grateful to Jeremy Knopp and
Eric Lindgren, of the AFRL/RXCA Wright-Patterson AFB, for
their insightful comments and suggestions. R, an opensource package for statistical computing, was used for the
statistical analyses in the paper.
AUTHORS
Charles Annis: Statistical Engineering, Palm Beach Garden,
Florida 33418.
John C. Aldrin: Computational Tools, Gurnee, Illinois 60031.
Harold A. Sabbagh: Victor Technologies, LLC, Bloomington,
Indiana 47401.
REFERENCES
Agresti, A., Categorical Data Analysis, 2nd ed., Wiley,
Hoboken, New Jersey, 2002.
Annis, C., Probabilistic Life Prediction Isnt as Easy as It
Looks, ASTM STP1450, ASTM Symposium on Probabilistic
Aspects of Life Prediction, ASTM International, West
Conshohocken, Pennsylvania, 2003.
Bates, D., and D. Watts, Nonlinear Regression Analysis and
Its Applications, Wiley, Hoboken, New Jersey, 1988.
Casella, G., and R. Berger, Statistical Inference, 2nd ed.,
Cengage Learning, Independence, Kentucky, 2001.
Chatfield, C., The Analysis of Time Series, 4th ed., Chapman
and Hall, London, England, 1989.
54
Cressie, N.A.C., Statistics for Spatial Data, Wiley, Hoboken,

New Jersey, 1993.
Cressie, N., and C.K. Wikle, Statistics for Spatio-temporal
Data, Wiley, Hoboken, New Jersey, 2011.
Ducharme, P., S. Rigault, I. Strijdonk, N. Feuilly, O. Diligent,
P. Pich, and F. Jacques, Automated Ultrasonic Phased
Array Inspection of Fatigue Sensitive Riser Girth Welds with
a Weld Overlay Layer of Corrosive Resistant Alloy (CRA),
NDT.net, September 2012.
Forsyth, D.S., and B.A. Lepine, Development and Verification of NDI for Corrosion Detection and Quantification in
Airframe Structures, AIP Conference Proceedings, Vol. 615,
No. 1, 2002, pp. 17871791.
Frank, I., and J. Freidman, A Statistical View of some
Chemometrics Regression Tools, Technometrics, Vol. 35,
No. 2, 1993, p. 110.
Harrell, F.E., Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival
Analysis, Springer, New York, New York, 2001.
Meeker, W.Q., and L.A. Escobar, Statistical Methods for Reliability Data, Wiley, Hoboken, New Jersey, 1998.
McCullagh, P., and J.A. Nelder, Generalized Linear Models,
2nd ed., Chapman and Hall/CRC Press, London, England,
1989.
DOD, MIL-HDBK-1823A, Nondestructive Evaluation System
Reliability Assessment, Department of Defense Handbook,
Philadelphia, Pennsylvania, 7 April 2009.
Nordtest, Guidelines for NDE Reliability Determination and
Description, Nordtest Technical Report 394, 1998.
Sakia, R.M., The Box-Cox Transformation Technique: A
Review, The Statistician, Vol. 41, 1992, pp. 169178.
Venables, W.N., and B. Ripley, Modern Applied Statistics
with S (Statistics and Computing), 4th ed., Springer, New
York, New York, 2010.

What Is Missing in Non Destructive Tes

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

What Is Missing in Non Destructive Tes

Transféré par

Droits d'auteur :

Formats disponibles

x

From Materials Evaluation, Vol. 73, No. 1, pp: 4454.

hat is missing in nondestructive

The Most Common Mistake Engineers Make in

MATERIALS EVALUATION JANUARY 2015

Two of the authors have been practicing engineers for

Engineers would never work from something as

^, and the upper bound is

Figure 1. Normal quantile-quantile (Q-Q) plot, on which normal data plot as a

JANUARY 2015 MATERIALS EVALUATION

strength, elastic properties, loading conditions, and

Implicit Statistical Assumptions in Regression

Signal strength, or log()

Signal strength, or log()

Target size, a or log(a)

The Response Must Be Continuous and Observable

Target size, a or log(a)

to investigate those under-appreciated assumptions in

MATERIALS EVALUATION JANUARY 2015

In Figure 2a, all responses are observed. This is not

there is no need to jettison 200 years of

For some unfathomable reason, engineers are not

occurs at X = X / n. Look familiar? More involved

Signal strength, or log()

Probability and Likelihood

Target size, a or log(a)

Figure 3. Censored regression using maximum-likelihood estimation (blue dashed

JANUARY 2015 MATERIALS EVALUATION

The Response Must Be Continuous and Observable

MATERIALS EVALUATION JANUARY 2015

often used. Just because software can be coerced to

probability that the next single observation will be

Figure 5. Probability of detection (POD) as a function of target size, supported by

There is an entire area of applied statistics dedicated

JANUARY 2015 MATERIALS EVALUATION

The most common mistake

7.62 10.16 12.7 15.24 17.78

MATERIALS EVALUATION JANUARY 2015

The Errors Must be Normal

Example binary categorical data

With four probe manufacturers this would require

The Most Common Mistake Engineers Make

JANUARY 2015 MATERIALS EVALUATION

multiply probabilities so that P(A and B) = P(A) P(B)?

Common Errors in Evaluating Nondestructive

MATERIALS EVALUATION JANUARY 2015

(Ducharme et al., 2012; McCullagh and Nelder, 1989;

Characterization error, depth (mm)

Depth, estimated (mm)

Depth, known (mm)

Depth, known (mm)

the entire lower bound for the safety LUS. Operators

Conclusion and Recommended Practice

mathematical statistics, which is to applied statistics

JANUARY 2015 MATERIALS EVALUATION

One final comment: the authors are all engineers.

MATERIALS EVALUATION JANUARY 2015

Cressie, N.A.C., Statistics for Spatial Data, Wiley, Hoboken,

Vous aimerez peut-être aussi