Académique Documents
Professionnel Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/245302363
CITATIONS READS
16 31
2 authors, including:
Mark G. Stewart
University of Newcastle
237 PUBLICATIONS 3,594 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Climate change effects and adaptation of civil infrastructure and buildings View project
All content following this page was uploaded by Mark G. Stewart on 13 October 2015.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
C H E C K I N G M O D E L S I N STRTTCTITHM. " O P ^ I ^ N
By M a r k G. Stewart 1 and R o b e r t E . Melchers 2
ABSTRACT: A large proportion of structural failures are due to human error in the
design stage of a structural engineering project, and many of these failures could
have been averted if there had been adequate design checking. Results are reported
Downloaded from ascelibrary.org by University of Newcastle on 10/12/15. Copyright ASCE. For personal use only; all rights reserved.
INTRODUCTION
1309
SELF-CHECKING
Survey Methodology
There is evidence (Rabbitt 1978) that self-checking efficiency for so-called
"omission" errors (i.e., failure to perform a task) is substantially lower (by
more than an order of magnitude) than self-checking efficiency for errors of
"commission" (incorrect performance of a task). For this reason, the part of
the study considering self-checking was limited largely to the study of self-
checking for errors of commission.
For practical reasons the data set was limited to first-year undergraduate
student examination scripts. For these intermediate calculations and correc-
tions could be examined for each set task. The number of individual re-
sponses totalled to just over 800. Each individual response was carefully
examined for calculation errors, and, where self-correction was evident, the
original and amended responses were recorded.
For the purposes of the present study, a self-correction was considered to
have occurred when there was evidence both of an original task response
and an amended response. In particular, self-checking was deemed to have
occurred if an incorrect result was amended in any manner. Typically this
would be by crossing out the original value and replacing it with a corrected
value.
Corrections for round-off error were excluded from consideration (Melch-
ers 1988), giving a sample size of 86 responses for which self-checking
resulted in a correction. There was no evidence of corrections leading to
further error, nor of a correct result changed to an error.
Mathematical Model
For each sample, the incorrect value (x) and the correctly self-checked
response (x,„) was used to evaluate the logarithmic error factor
x
«loE = l°gi0 — (1)
i d IU
0)
•<* 0.40-
U
Downloaded from ascelibrary.org by University of Newcastle on 10/12/15. Copyright ASCE. For personal use only; all rights reserved.
30
UJ °
O)
•rt 0 to
U
CI)
C Q 10-
o
00 i&SJ, l_
-4. -3. -2. -1. 0. 1. 2. 3. 4.
Logarithmic Error Factor e ^
FIG. 1. Error Factor "Self-Checking" Model Fitted to Self-Checking Efficiency (pj
Histogram
For those errors which were correctly self-checked, the error factor (x/xm)
is a measure of the magnitude of the "initial" error. Note that "initial" error
refers to an error before its possible correction.
Previous research has shown that the occurrence rate of error varies with
type of calculation, and that shorter calculations occur more frequently than
larger ones (Melchers 1989). These factors would influence the rate of error
detection and accordingly the survey data obtained were appropriately cor-
rected (Stewart 1987). Fig. 1 shows the histogram obtained for checking
efficiency as a function of error factor. A checking efficiency of unity in-
dicates that all errors were detected, a value of zero that none were detected.
Also shown in Fig. 1 is an empirical model based on fitting to the data a
modified Type I extreme value probability distribution (Stewart and Melch-
ers 1987a).
Comment
The process of self-checking is a complex one which appears to operate
within the subconscious level of thought. At best, the data can provide only
an indication of the underlying trends of the self-checking process.
The results support Grill's (1984) proposition, made in relation to prac-
ticing structural designers, that self-checking detects only the small or minor
errors that may occur in calculations, and that self-checking cannot ade-
quately safeguard against.errors due to misconceptions, oversights, or mis-
understandings. The latter errors are the results of deliberate and conscious
decisions that, once taken, appear seldom to be doubted by the designer
himself. At present little understanding of these types of errors appears to
be available; it would appear that there can be no effective self-checking
effort for this type of error.
The present survey demonstrates that the detection rate for self-checking
for small, or minor, initial error magnitudes is much greater than for larger
initial error magnitudes. It might be concluded that (as a group) designers
1311
Survey Methodology
In the study of the effectiveness of independent detailed design checking,
two factors influencing checking effectiveness were isolated as being of par-
ticular interest and also obtainable from survey results. These were total time
taken for checking [and therefore, indirectly, checking effort (Lind 1983)]
and error magnitude.
To obtain this information, a mailed survey technique was adopted. It was
recognized that while this approach might alert prospective respondents about
the real nature of the exercise (and thus lead to excessively good error de-
tection rates), no other viable alternative existed. At least indicative data
would be obtained. The survey was mailed to 150 civil engineering orga-
nizations and individuals in the state of Victoria, Australia. The total number
of responses was 47.
Prospective respondents were asked to detect and correct any errors or
mistakes in three pages of computations of design loadings for a steel portal
frame structure, and to record the checking time taken.
The types of errors which might occur in a design task have previously
been reviewed (Stewart and Melchers 1986). The most significant errors ap-
peared to be errors of commission and errors of omission. Because of the
difficulty in data analysis associated with errors of omission, only errors of
commission were included in the task to be checked.
As is shown in Table 1, the design consisted of 93 individual microtasks
that required checking. Of these, deliberate errors were incorporated in nine
microtasks. Successful error detection was defined to occur when the re-
spondent indicated clearly any error and corrected it in some way.
Mathematical Models
Fig. 2 shows the data points for checking efficiency Q5ind), defined as the
ratio of errors detected and errors present, plotted against checking time (f)
I O \ KL
IO. 1 2 1 22 ,1-^'^""
o.e-
o 1 1 I 1 Jf-^" 2 "' 1 ""
c
0) 1 1 1 3/r "
Downloaded from ascelibrary.org by University of Newcastle on 10/12/15. Copyright ASCE. For personal use only; all rights reserved.
0.B- sir
\,'/ij '
•* 1
• if
UJ ' Ojf 1
0.4-
• 4/ i
O) x . //
/
/ ././•"-
/h \
Pw = 1 / (i+Aexp (-Bt 1 ^
\
u 0.2- / / / - - Pirn =l-exp (-ait)
0}
, / / - - - p w =i-exp [ - a 2 ( t - t j ]
CJ
0.0- fe=^_ / , , » _ _ _ , _ _ _ _
0. 10. 30. 30. 40.
Checking Time (t in minutes)
FIG. 2. Comparison of Checking Efficiency Models as a Function of Checking
Time (A= 376.4, B = 1.4365, t0 = 10, a, = 0.05, a2 = 0.095; Integers Refer to Num-
ber of Concurrent Data Values)
as obtained from the survey. The data points show a lot of scatter. This is
considered to be due to the unavoidable lack of control over test conditions.
Nevertheless, the data do suggest a trend.
It has been suggested (e.g., Kupfer and Rackwitz 1980) that error detec-
tion is related to search theory and can be expressed as a negative expo-
nential curve:
PindO) = 1 - exp (-ajO (2)
where pmA(t) = the average checking efficiency as a function of checking
time t. The constant a 1 may be assumed to be proportional to the level of
detail examination and to the characteristics of the checker, and inversely
proportional to the task size. Statistical tests indicated that this model does
not provide a reasonable fit to the data for any value of a! (see Fig. 2).
A better description of design checking as a function of checking time is
possible by using an S-shaped "learning curve" as used in the field of psy-
chology of education (Estes 1959; Hull 1952). In terms of design checking,
the use of an S-curve has some appeal. The initial increase in checking ef-
ficiency may be attributed to the designer attempting to understand the de-
sign concept and procedure. This is followed by a period of checking each
microtask for any errors and in which many of the errors would be detected.
Finally the designer would reach the stage of diminishing returns for his
effort, resulting in a reduced rate of checking efficiency. An appropriate S-
curve is
1
PindW = (3)
1 + A exp (-Btl/2)
where the constant A must be sufficiently large to ensure pmi(0) = 0; and
constant B is inversely proportional to the task complexity and proportional
1313
J.f
Downloaded from ascelibrary.org by University of Newcastle on 10/12/15. Copyright ASCE. For personal use only; all rights reserved.
UJ
0.4 li
en — Pw = l - e x p ( - C i t n ^
c - - Pw - 1 - e x p (-cgm§)
"g 0.2-f
h — Pw = 1 / (1+Aexp (-Bt^rrg
I
0.0
OiO 0^5 1^0 1.5 a.o
E r r o r Magnitude r%
and this is also seen to provide a reasonable fit to the data when f0 = 10
and a2 = 0.095 (see Fig. 2). Evidence from education psychology indicates
that "learning curves" progressively change from S-shaped to negative ex-
ponential as the subject's training increases (Harlow 1959). This observation
is of relevance since a superior checking efficiency would be expected from
an engineer with relevant expertise or experience in similar designs. This
could result in t0 reducing with experience. It is unlikely, however, that t0
would reduce to zero, since each design is unique and even expert design
checkers would require some effort to become familiar with the design. A
"learning curve" of the form given by Eqs. 3 or 4 therefore appears more
appropriate than the negative exponential curve, Eq. 2.
When the error magnitude (me) relative to the correct value x,„, defined as
(5)
is plotted (Fig. 3) against checking efficiency for each error and for re-
sponses with a similar checking time (20 ± 1 min), it is seen that larger
errors are more easily detected than smaller ones. Such an observation seems
reasonable, and may be incorporated in the negative exponential model
(Eq. 2) proposed by Lind (1983) or the shifted negative exponential model
(Eq. 4)
OVERVIEW CHECKING
Survey Methodology
In this study practicing engineers were again used as subjects, with a ques-
tionnaire mailed to 210 civil engineering organizations and individuals
throughout Australia. A total of 105 survey responses were obtained.
Decisions as to the adequacy of 11 simple structural designs, all simply
supported beam members, each with a different loading configuration, were
required. Nine designs used steel universal beams, two used reinforced con-
crete sections. The possible response options were preselected as "under-
sized," "correct," "oversized," and "unsure." It was clearly stated that the
decision was to be based on personal judgement, without the aid of engi-
neering design aids or detailed calculations, and based on previous experi-
ence with Australian codes and practice. The respondents were also re-
quested to record both their response time and the extent of relevant professional
engineering experience (in years).
In the following, the member sizes shown in the survey sheet will be
termed the "suggested" design for each case, and the theoretically correct
member size as simply the "correct" design.
In the analysis, responses marked as "unsure" were ignored in further
analysis. The reinforced concrete designs led to the highest proportion of
"unsure" responses (5.2%). By comparison, only 0.4% of steel member de-
signs were recorded as "unsure."
The relative degree of adequacy of a "suggested" member design was
measured through the percentage resistance error (Re), defined as the per-
centage difference between the "correct" design (RCD) and the "suggested"
design (RSD), using bending moment resistance as a comparative measure
Re = — x 100% (8)
RcD
Member resistance was considered to be adequately described by working
stress methods as specified in Australian Standard design codes (AS 1250
and AS 1480). The appropriate "correct" design, the percentage resistance
1315
error, and the appropriate response to the survey question are given for each
case in Table 2. It is evident that the "suggested" design is in seven cases
overdesigned, in three underdesigned, and in one correctly designed.
Mathematical Models
A number of factors are known to influence the effectiveness of control
measures (Ingles 1986). Of these, the following were singled out for atten-
tion in the present study: percentage resistance error, experience, and check-
ing time (and therefore, indirectly, checking effort). To consider these fac-
tors systematically, a model of the overview checking process is required.
It was found difficult to devise a measure of overview checking effec-
tiveness in terms of error detection efficiency. Overview checking tends to
be concerned with the outcome of a number of design steps and processes
and as such is unlikely to be able to detect anything about any one of them.
Design reviewers are more concerned with the functionality of the result.
Accordingly, a simple overview checking model consisting of two decisions
was formulated:
i
-100. 0. 100. 200. 300.
Percentage Resistance E r r o r Rg
Re
(9)
PsiSc(Re) - 1
H ¥U,v)dz; Re<x
where /(z,.) = probability density function for the t distribution; and v and
(106)
8 = constants. The model is compared with survey data in Fig. 4. The pa-
rameters x, o-, v, and 8 are given in Table 3.
Fig. 4 shows that the proposed model intercepts all except one of the 95%
confidence intervals. This suggests that the present model is reasonably ap-
propriate; however, other models could also be postulated.
The second part of the decision process is concerned with the probability
of selecting a member as "oversized" given that the "suggested" member
has previously been deemed "safe." This is a conditional probability povereiied|safe.
The general trend of p0versized|safe against Re shown in Fig. 5 is not unex-
pected; as the percentage resistance error increases, so does the degree of
oversizing, resulting in a higher proportion of "oversized" responses.
It is possible to develop an oversized|safe decision model also based on
1317
—i—>ib—t i
-100. 0. 100. 200. 300.
Percentage Resistance E r r o r R,
where pSBtc(Re) is defined by Eq. 10. The parameters for his model are given
in Table 3. This model is shown with the survey data in Fig. 5 and is seen
to provide a reasonable fit.
The probability of judging a designed member as "correct" is evidently
/^correct Psafe * Pcorrect|safe Psnfe * (1 Poversized|safe) (12)
Fig. 6 shows the general model for judging the designed member as "cor-
rect," and its comparison to the survey data.
Effect of Experience
Experience is a term that is widely used in the profession, but one which
lacks precise definition in terms which can be quantitatively interpreted. Due
to the relatively small sample size obtained from the survey, the use of a
continuous variable to represent "experience" for mathematical models was
not possible. Hence a binary variable was adopted; "inexperienced" and
"experienced." These two terms were defined arbitrarily (in three alternative
Percentage Resistance E r r o r R.
El. Lower and upper 20th percentiles of experience (remaining 60% ignored).
E2. Lower and upper 50th percentiles of experience.
E3. Less than four years and greater than four years experience.
Downloaded from ascelibrary.org by University of Newcastle on 10/12/15. Copyright ASCE. For personal use only; all rights reserved.
Both graphical and statistical methods were used to examine the effects
of experience for decision-making effectiveness. Nonparametric statistical
tests of significance were employed because the survey data were in a di-
chotomized format. The most powerful of these tests is the Randomization
Test for Matched Pairs. A more general but less powerful test is the Cochran
Test, which was used to confirm results from the Randomization Test (Siegal
1956).
The significance tests for psafe concluded that it was highly likely that the
probability of selecting a designed member as safe is not related to the ex-
perience level of the overview checker. Further support to this conclusion
is found in the observation that the proposed model for psaf(! given by Eq.
10 fits within the 95% confidence intervals for all subsampies, irrespective
of experience categories (El, E2, and E3).
For the second decision, however, the null hypothesis of no difference
between poversized|safe f° r "inexperienced" and "experienced" conditions was
rejected at the 5% level. It can therefore be stated with a considerable amount
of certainty that p0versized|safe increases for positive Re values when the expe-
rience level is high.
The effect of experience on poverSized|safe may be modeled using Eq. 11, with
changed parameters for "inexperienced" and "experienced" conditions. The
relevant parameters for the model, as determined by experience level, are
given in Table 3. The two resulting models were compared with relevant
survey data for each of three subsampies of "experienced" and "inexperi-
enced." In the main, the two proposed models plotted within the 95% con-
fidence intervals of the respective survey data, hence providing credibility
to the models.
If, as has been suggested, p0versized|safe is experience-dependent, then pcomcl
evaluated from Eq. 12 must be also experience-dependent. A comparison of
the two proposed models for experience (i.e., "inexperienced" and "expe-
rienced") with the average experience model for /5correct is shown in Fig. 7.
It appears that for Re — 0 the pcorrea values are somewhat contradictory; pcomct
for inexperienced engineers is slightly higher than that for experienced en-
gineers. This observation is also supported by the survey data for the "cor-
rect" response. The reason for this remains unclear, but is most likely due
to variability in the survey data.
Effect of Time
The effects of response time were evaluated statistically in a manner rather
similar to that applied to the study of experience. For both decisions, the
effects of response time were considered to be negligible, as indicated by
the nonrejection (at the 5% level) of the null hypothesis of no differences
due to time. It was also shown that there was no statistical evidence of a
relationship between experience level and response time.
1319
O.B-
Downloaded from ascelibrary.org by University of Newcastle on 10/12/15. Copyright ASCE. For personal use only; all rights reserved.
0.2
o.o i . . • i i
-300. -200. -JOO. 0. 100. 200. 300.
Percentage Resistance E r r o r Ra
REVIEW
g.0-
0.0 >
10" 4 10" 3 10" g 10" 1 10° 10 s lO 8 10 3 10 4
s.o
4.0
2.0-
c 6.0
CD Error Distribution
a After "Independent
Checking" According
4.0 to eqn. (8)
2.0-
a 0.0
o 10"4 10' 3 10"2 10"1 101 1C2 103 104
c. 10°
a.
6.0
Error Distribution
5.0- After "Independent
Checking" According
4.0- to eqn. (9)
3.0
2.0
1.0
0.0
10 10" 3 10" 2 10" 1 10° 10 1 10 2 10 3 10 4
x/x m
FIG. 8. One-Step Calculation Error Distribution and Effect of Checking: (a) Initial
Error Distribution; (lb) Error Distribution After Self-Checking; (c) Error Distribution
After "Independent Checking" According to Eq. 7; (d) Error Distribution After "In-
dependent Checking" According to Eq. 8
1321
procedures, it is clear that the present work is only a first step in this di-
rection.
Finally, it is readily acknowledged that the methods and techniques em-
ployed may be criticized on a number of grounds (Stewart and Melchers
1985, 1986, 1987). However, the deficiencies in technique reflect the new-
ness of this research area. The only work comparable to the present inves-
tigations is "human factors engineering" or "ergonomics." These deal mainly
with man-machine interfaces and hence mainly with psychomotor tasks. The
tasks involved in design checking are mainly of a cognitive nature.
CONCLUSION
APPENDIX I. REFERENCES
51-60.
Norman, D. A. (1981). "Categorization of action slips." Psychological Re view. 88(1),
1-15.
Rabbitt, P. (1978). "Detection of errors by skilled typists." Ergonomics, 21(11),
945-958.
Siegal, S. (1956). Non-parametric Statistics for Behavioral Sciences. McGraw-Hill,
New York, N.Y.
Sneath, N. (1979). "Discussion paper on liability and indemnity under conditions of
finite risk." Third Int. Conf. on Statistics and Probability in Soil and Struct Engrg.,
Sydney, Australia, 419-422.
Standards Association of Australia, (n.d.). SAA Steel Structures Code, AS 1250,
Sydney, Australia.
Standards Association of Australia, (n.d.). SAA Concrete Structures Code, AS 1480,
Sydney, Australia.
Stewart, M. G. (1987). "Control of human errors in structural design." Thesis pre-
sented to the Department of Civil Engineering and Surveying, University of New-
castle, at New South Wales, Australia, in partial fulfillment of the requirements
for the degree of Doctor of Philosophy.
Stewart, M. G., and Melchers, R. E. (1985). "Human error in structural reliability—
IV: Efficiency in design checking." Res. Rept. 3/1985, Dept. of Civ. Engrg.,
Monash Univ., Melbourne, Australia.
Stewart, M. G., and Melchers, R. E. (1986). "Human error in structural reliability—
V: Efficiency in self-checking." Res. Rept. 018.12.86, Dept. of Civ. Engrg. and
Surveying, Univ. of Newcastle, Newcastle, Australia.
Stewart, M. G., and Melchers, R. E. (1987a). "Human error in structural reliabil-
ity—VI: Overview checking." Res. Rept. 019.01.87, Dept. of Civ. Engrg. and
Surveying, Univ. of Newcastle, Newcastle, Australia.
Stewart, M. G., and Melchers, R. E., (1987b). "Structural design and design check-
ing." Proc, First Nat. Struct. Engrg. Conf, Melbourne, I.E. Australia, 700-705.
Voth, R. T. (1974). "An experimental study comparing the effectiveness of three
training methods in human relations." Attitudes and Decision Making Skills, Dis-
sertation Abstracts Int. (A), University Microfilms International, Michigan, 6817-
6818.
Walker, A. C. (1980). "Study and analysis of the first 120 failure cases." Symp.,
Struct. Failures in Bldgs., Inst, of Struct. Engrs., London, U.K., 15-40.
Zakay, D., and Wooler, S. (1984). "Time pressure, training and decision effective-
ness." Ergonomics, 27(3), 273-284.
A = constant;
B = constant;
c0 = constant;
El = lower and upper 20th percentiles of experience;
E2 = lower and upper 50th percentiles of experience;
E3 = less than 4 years and greater than 4 years experience;
eiog = logarithmic error factor;
f(z,i) = t distribution probability density function;
1323
Ps = self-checking efficiency;
Psafe = probability of judging a design as "safe";
Re = percentage resistance error;
RcD = bending moment resistance for "suggested" design;
RsD = bending moment resistance for "correct" design;
t = checking time;
k = time to become familiar with design to be checked;
X = incorrect value;
•*m = correctly self-checked response;
X = constant;
z = standard variable;
«1 = constant;
a2 = constant;
8 = constant;
V = constant; and
a = constant.
13k!4