Vous êtes sur la page 1sur 12

Cleanroom Software Engineering for Zero-Defect Software

Richard C. Linger

IBM Cleanroom Software Technology Center


100 Lakeforest Blvd.
Gaithersburg, MD 20877

Abstract In traditional software development, errors were


Cleanroom software engineering is a theoy-based, regarded as inevitable. Programmers were urged to
team-oriented process for developing very high quality get software into execution quickly, and techniques
software under statistical quality control. CIeanroom for error removal were widely encouraged. The
combines formal methods of object-based box struc- sooner the software could be written, the sooner
ture specijlcation and design, function-theoretic cor- debugging could begin. Programs were subjected to
rectness verification, and statistical usage testing for private unit testing and debugging, then integrated
quali~y certi$cation, to produce softiare that is zero into components with more debugging, and fma.lly
defects with high probability. CIeanroom manage- into subsystems and systems with still more debug-
ment is based on a ll~e cycle of incremental develop- ging. At each step, new interface and design errors
ment of user-function sof~are increments that were found, many the result of debugging in earlier
accumulate into the jhal product. Cleanroom teams steps. Product use by customers was simply another
in IBM and other organizations are achieving remark- step in debugging, to correct errors discovered in the
able quality results in both new system development field. The most virulent errors were usually the
and modl~cations and extensions to existing systems. result of fixes to other errors in development and
maintenance [2]. It was not unusual for software
Keywords. Cleanroom software engineering, formal products to reach a steady-state error population,
spectication, box structures, correctness verification, with new errors introduced as fast as old ones were
statistical usage testing, soft ware quality certitlcation, fixed. Today, debugging is understood to be the
incremental development. most error-prone process in software development,
leading to “right in the small, wrong in the large”
programs, and nightmares of integration where all
Zero-defect software
parts are complete but do not work together because
On fust thought, zero-defect software may seem of deep interface and design errors.
an impossible goal. After all, the experience of the In the Cleanroom process, correctness is built in
f~st human generation in software development has by the development team through formal specifica-
reinforced the seeming inevitability of errors and per- tion, design, and verification [3]. Team correctness
sistence of human fallibility y. Today, however, a new verification takes the place of unit testing and debug-
reality in software development belies this f~st gener- ging, and software enters system testing directly,
ation experience [1]. Although it is, theoretically with no execution by the development team. All
impossible to ever know for certain that a software errors are accounted for from fust execution on, with
product has zero defects, it is possible to know that no private debugging permitted. Experience shows
it has zero defects with high probability y. Cleanroom that Cleanroom software typically enters system
software engineering teams are developing software testing near zero defects and occasionally at zero
that is zero defects with high probability, and doing defects.
so with high productivity. Such performance The certflcation (test) team is not responsible for
depends on mathematical foundations in program testing in quality, an impossible task, but rather for
specification, design, correctness vetilcation, and sta- certifying the qua.lit y of software with respect to its
tistical quality control, as well as on engineering dis- specflcation. Certflcation is carried out by statis-
cipline in their application. tical usage testing that produces objective assess-
ments of product quality. Errors, if any, found in

2
0270-5257/93 $03.00 @ 1993 IEEE
testing are returned to the development team for cor- A traditional project experiencing, say, five
rection. If quality is not acceptable, the software is errors/ KLO C in function testing may have encount-
removed from testing and returned to the develop- ered 25 or more errors pcr KLOC when measured
ment team for rework and rever~lcation. from f~st execution in unit testing. Quality compar-
The process of Cleanroom development and cer- isons between traditional and Cleanroom software
tflcation is carried out incrementally. Integration is are meaningful when measured from fu-st execution.
continuous, and system functionality grows with the Experience has shown that there is a qualitative
addition of successive increments. When the final difference in the complexity of errors found in
increment is complete, the system is complete. Cleanroom and traditional code, Errors left behind
Because at each stage the harmonious operation of by Cleanroom correctness vefilcation, if any, tend to
future increments at the next level of refinement is be simple mistakes easily found and fixed by statis-
predefmed by increments already in execution, inter- tical testing, not deep design or interface errors,
face and design errors are rare, Cleanroom errors are not only infrequent, but
The Cleanroom process is being successfully usually simple as well.
applied in IBM and other organizations. The tech- Highlights of Cleanroom projects reported in
nology requires some training and practice, but Table 1 are described below:
builds on existing skills and software engineering
practices. It is readily applied to both new system IBM Flight Control. A HH60 helicopter avionics
development and re-engineering and extension of component was developed on schedule in three
existing systems. The IBM Cleanroom Software increments comprising 33 KLOC of JOVIAL [6].
Technology Center (CSTC) [4] provides technology A total of 79 corrections were required during statis-
transfer support to Cleanroom teams through educa- tical certification for an error rate of 2.3 errors per
tion and consultation. KLOC for verit3ed software with no prior execution
or debugging.

Cleanroom quality results


IBM COBOL Structuring Facility (COBOL/SF).
Table 1 summarizes quality results from COBOL/SF, IBM’s fust commercial Cleanroom
Cleanroom projects. Earlier results are reported in product, was developed by a six-person team. The
[5]. The projects report a “certflcation testing product automatically y transforms unstructured
fiiilure rate;” for example, the rate for the IBM Flight COBOL programs into functionally equivalent struc-
Control project was 2.3 errors per KLOC, and for tured form for improved understandability and main-
the IBM COBOL Structuring Facility project, 3.4 tenance. It makes use of proprietary graph-theoretic
errors per KLOC. These numbers represent all algorithms, and exhibits a level of complexity on the
errors found in all testing, measured from fust-ever order of a COBOL compiler.
execution through test completion. That is, the rates The current version of the 85 KLOC PL/I
represent residual errors present in the software fol- product required 52 KLOC of new code and 179
lowing correctness vefilcation by development corrections during statistical certiilcation of five
teams. increments, for a rate of 3.4 errors per KLOC [7].
The projects in Table 1 produced over a half a Several major components completed certification
million lines of Cleanroom code with a range of O to with no errors found. In an early support program
5.1 errors per KLOC for an average of 3.3 errors per at a major aerospace corporation, six months of
KLOC found in all testing, a remarkable quality intensive use resulted in no functional equivalence
achievement indeed. errors ever found [8]. Productivity, including all
Traditionally developed software does not specification, design, vetilcation, certification, user
undergo correctness vetilcation. It goes from devel- publications, and management, averaged 740 LOC
opment to unit testing and debugging, then more per person-month. Challenging schedules defined for
debugging in function and system testing. At entry competitive reasons were all met. A major benefit of
to unit testing, traditional software typically exhibits Cleanroom products is dramatically reduced mainte-
30-50 errors/KLOC. Traditional projects often nance costs. COBOL/SF has required less than one
report errors beginning with function testing (or person-year per year for all maintenance and cus-
later), omitting errors found in private unit testing. tomer support.

3
I Table 1. C1.atImOm Qwtlii Results

Year Tectumloyy Project Quafit y/Pmductitity

1987 I Clean roOm IBM Flight Control: Helicopter Avionics System Component ● Certification testing failure r-ate: 2.3 ermrs/KLOC
Software
I I
33 KLOC (Jovial)
● En-or-fix redu-d 5x
fhglIIW@
● Completed ahead of stied U1e

1988 Cleanroom IBM COBOL Structuring Facility Product for automatically ● IBMs first Cleanroom product
Soflware restructuring COBOL programs
● Certification testing failure rate: 3.4 ermrs/KLOC
Engineering 85 KLoC (PL/1)
● Productivity 740 LOCfPfvf

● Deployment failures 0.2 errorslKLOC. all simnle fixes

1989 Partial Cleanroom NASA Satellite Control Project 1 ● Certification testing failure rste: 4.S errms/KLOC
Software 40 KLOC (FORTRAN)
● 50.percsnt improvement in quality
Engineering
“ Productivity 780 LOC/PM

● 80.percant improvement in productivity

1990 Cleanroom University of Tennessee: Clean mom tool ● Certification testing failure rate: 3.0 errom/KLOC
Softwars 12 KLOC (Ada)
Engineering

1990 CfeanrOOm Martin Mariettw Automated documentation system ● First compilation: no errors found
Softwars l.g KLOC (FOXBASE)
● Certification testing failurs rate: 0.0 errors/KLOC (no
Engineering
errors found)

1991 Cleanmom IBM System Software ● First mmpilation: no errms found

E
SoftWars First increment 0.6 KLOC (C)
● Certification testing failure rate: 0.0 emm’s/KLOC (“o
Engineering
errors found)

1991 Partial Cleanroom IBM System Product ● Testing failure rate: 2.6 errors/KLOC
Software Three increments, total 107 KLOC (mixed languages)
● Productivity 4S6 LOC/PM
Engineering

1991 Cieanroom IBM Language Product ● Testing failurs rate: 2.1 emom/KLOC
First increment 21.9 KLOC (PL/X)

IBM Image Product Component ● First compilation: 5 syntax errors


3.5 KLOC (C)
● Certification testing failurs rste: 0.9 errors/KLOC

IBM Printer Application ● Cert!ficatlon tasting failure rate: S.1 errors/KLOC


Sof%wm First increment 6.7 KLOC (C)
Engineering

I 1992 I Partial Cleanroom IBM Knowledge Based System Application ● Testing Failuiw Rate 3.5 en-ors/KLOC
Software 17.8 KLOC (TIRS)
Engineering

I 1992 I Cleanroom NASA Satellite Control Projects 2 and 3 ● Testing Failurs Rat= 4.2 emors/KLOC
170 KLoC (FORTRAN)

IBM Device Controller ● Certification testing Failurs Rate: 1.8 errors/KLOC


First increment 39.9 KLOC (C)

IBM Databsse Transaction Processor “ Testing Failura Rate: 1.8 errors/KLOC


First increment 8.5 KLOC (JOVIAL)
● No design erms, all simple fixes
Engineering

1993 I Partial Cieanroom IBM LAN Softwars ● Testing Failure Rata 0.8 ermrs/KLOC

I I Software
Engineering
,
First increment 4.8 KLOC (C)

musand lines of code


Not= All testing failure rates are measured fmm first-ever execution where KEW KLOC=
correctness verification has taken the olace of unit testimz and debumim?. PM = oerson month

NASA Satellite Control Project 1. The Coarse/Fine averages. Some 607’0 of the programs compiled cor-
Attitude Dete rrnination System (CFADS) of the rectly on the f~at attempt.
NASA Attitude Ground Support System (AGSS)
was the fwst Cleanroom project carried out by the Martin Marietta Automated Documentation System.
Software Engineering Laboratory (SEL) of the A four-person Cleanroom team developed the proto-
NASA Goddard Space Flight Center [9]. The type of the Automated Production Control Doc-
system, comprised of 40 KLOC of FORTRAN, umentation Systemr a relational data base
exhibited a certflcation fh.ilure rate of 4.5 errors per application of 1820 lines programmed in FOXBASE.
KLOC. Productivity was 780 LOC per person- No compilation errors were found, and no fir.ihwes
month, an 80°/0 improvement over previous SEL were encountered in statistical testing and quality
certflcation. The software was certified at target

4
levels of reliability and cor&dence. Team members IBM Knowledge Based System Application. A five-
attributed error-free compilation and failure-free person team developed a prototype knowledge-based
testing to the rigor of the Cleanroom methodology system for the FM Air Tra.file Control System.
[10]. The team reported a total of 63 errors for the 17.8
KLOC application, for a rate of 3.5 errors/KLOC.
IBM System Software. A four-person Cleanroom The fact that Cleanroom errors tend to be simple
team developed the frost increment of a system soft- mistakes was borne out by project experience; only
ware product in C. The increment of 0.6 KLOC two of the 63 errors were class~led as severe, and
compiled with no errors, and underwent certflcation only five required design changes. The team devel-
through 130 statistical tests with no errors found. oped a special design language for knowledge-based
Subsequent use in another environment resulted in applications, together with proof rules for correctness
one specKlcation change. verification.

IBM System Product. A Cleanroom organization of NASA Satellite Control Projects 2 and 3. A 20
50 people developed a complex system software KLOC attitude determination subsystem of the
product. The system, written in PL/1, C, R13XX, Solar, Anomalous, and Magnetospheric Particle
and TIRS, was developed in three increments Explorer satellite flight dynamics system was the
totaling 107 KLOC, with an average of 2,6 second Cleanroom project carried out by the Soft-
errors/KLOC found in testing [11]. Causal analysis ware Engineering Laboratory of the NASA Goddard
of errors in the f~st increment revealed that five of Space Flight Center. The third project was a 150
its eight components experienced no errors whatso- KLOC flight dynamics system for the ISTP
ever in testing. The project reported development Wind/Polar satellite, These projects reported a com-
team productivity of 486 LOC per person-month. bined error rate of 4.2 errors/KLOC in testing [13].

IBM Language Product. A seven-person Cleanroom IBM Device Controller. A five-person team devel-
team developed an extension to a language product. oped two increments of device controller design and
The fust increment of 21.9 KLOC was up and microcode in 40 KLOC of C, including 30.5 KLOC
cycling in less than half the time normally required, of function deffitions. Box structure spec~lcation
and exhibited a certflcation error rate of 2.1 of chip set semantics revealed a number of hardware
errors/KLOC in testing. errors prior to any execution. The multiple
processor, bus architecture device processes multiple
IBM Image Product Component. A 3.5 KLOC real-time input and output data streams. The
image product component was developed to com- project reported a failure rate of 1.8 errors/KLOC in
press and decompress data from a Joint Photo- testing.
graphic Expert Group (JPEG) data stream. The
component exhibited three errors in testing, all IBM Database Transaction Processor. A five-person
simple mistakes. No additional errors have been team developed the fwst increment of a host-based
found in subsequent use. database transaction processor in 8.5 KLOC of
JOVIAL. Rigorous use of correctness verit3cation
IBM Printer Application, An eleven-member team resulted in a failure rate of 1.8 errors/KLOC in
developed the fust increment of a graphics layout testing, with no design errors encountered. The
editor in C under 0S/2 Presentation Manager. The team reported that correctness verflcation reviews
editor operates in a complex environment of vendor- were far more effective in detecting errors than were
developed code that exports more than 1000 func- traditional inspections.
tions, and uses many of the 800 functions of 0S/2
PM. The fust increment of 6.7 KLOC exhibited a IBM LAN Software. A four-person team developed
rate of 5.1 errors/KLOC in testing [12]. All but 1.9 the fwst increment of a LAN-based object server in
errors/KLOC were attributed to the vendor code 4.8 KLOC of C, resulting in a failure rate of 0.8
interface and PM and C misunderstandings. errors/KLOC in testing, The team utilized a popular
case tool for recording spec~lcations and designs.

5
Cleanroom management by incremental
development install atlm, stubbed p.s”el navi gat! on,
sign on, Incr 1 $tubbed primary functions
Management planning and control in Cleanroom sign off

is based on developing and certifying a pipeline of


software increments that accumulate to the final
product. The increments are developed and certfled
panel primary funct 1ens,
by small, independent teams, with teams of teams ncwig.t ion Incr Z Incr 3 stubbed seco”dnvy
functions
for large projects. Dete rmining the number and
functional content of increments is an important task
secondary
driven by requirements, schedule, and resources.
[ncr 4 f.ncti ons
Functional content should be defined such that
increments accumulate to the final product for con-
tinual integration, execute in the system environment
for statistical usage testing, and represent end-to-end
user function for quality certification.
An incremental development of a miniature inter-
active application shown in Figure 1, together with :FEIEIZEI

corresponding development and certification pipe-


lines. Each increment is handed off from develop-
ment to certification pipelines in turn, and results in
SF ‘ElzH
a new quality measurement in MTTF. Early incre-
ments that implement system architecture receive
more cumulative testing than later increments that 1111
NTF ~TF MTTF MTTF
implement localized functions. In this way, major 1 1,2 1,2,3 System

architectural and design decisions are validated prior


to their elaboration at lower levels. Figure 1. A Miniature Incremental Development
The time required for design and verification of
increments varies with their size and complexity, and always present to accommodate problems and sol-
careful planning and allocation of resources is utions.
required to deliver successive increments to certif- The Cleanroom incremental development life
ication on schedule. Long-lead-time increments may cycle is intended to be “quick and clean,” not “quick
require parallel development. and dirty” [14]. The idea is to quickly develop the
Figure 2 illustrates the Cleanroom life cycle of right product with high quality for the user, then go
incremental development and certitlcation. The on to the next version to incorporate new require-
functional spec~lcation is created by the develop- ments arising from user experience.
ment team, or by a separate specflcation team for Experienced Cleanroom teams with sufllcient
large projects, and the usage specitlcation is created knowledge of subject matter and processing environ-
by the certification team. Based on these specKlca- ment can achieve substantially reduced product
tions, a joint planning process defines the initial development cycles. The precision of Cleanroom
incremental development and cert~lcation plan for development eliminates rework and results in dra-
the product. The development team then carries out matically reduced time for certflcation testing com-
a design and verification cycle for each increment, pared to traditional methods. And Cleanroom teams
based on the functional specflcation, with corre- are not hostage to error correction following product
sponding statistical test case preparation by the cer- release,
tiilcation team based on the usage specflcation. Cleanroom affords a new level of manageability
Completed increments are periodically delivered to and control in adapting to changing requirements.
certflcation for statistical testing and computation of Because formally engineered software is well-
MTTF estimates and other statistical measures. documented and under good intellectual control
Errors are returned to the development team for cor- throughout development, the impact of new require-
rection. If the quality is low, improvements in the ments can be accurately assessed, and changes can be
development process are initiated. As with any planned and accommodated in a systematic manner.
process, a good deal of iteration and feedback is

6
necting objects comprising a system architecture [17,
18].
Custafn.r R.qul ruwnt B Without a rigorous spec~lcation technology, there
was little incentive in the past to devote much effort
to the spec~lcation process. Specifications were fre-
quently written in natural language, with inevitable
+&
ambiguities and omissions, and often regarded as
throwaway stepping stones to the code. Box struc-
I I tures, however, provide an economic incentive for

9
I narsmmnta I
DWO I npmcnt
Plonnlng I precise
tions
spec~lcation. Initial box structure specifica-
often reveal gaps and misunderstandings in user
requirements that would ordinarily be discovered

“’F’+’”” later
project.
There
in development

are
with system specification,
function
two
at high

engineering
namely, deftig
for users, and deftig
cost

problems
the right
the right structure for
and risk to

associated
the

+
the specification itself. Box structures address the
fwst problem by precisely deffig current under-
standings of required function at each stage of devel-
opment for informed review and rnoditlcation.
The second problem deals with scale-up in
complex specifications, namely, how to organize

+
F1 Statlstlaal
Tsstln~
myriad
coherent
structures
detaits
abstractions
incorporate
of behavior
for human
the crucial
and processing
understanding.
mathematical
into
Box
prop-
J Intwfoll Tlmmm
erty of referential transparency, such that the infor-
mation content of an abstraction, say a black box, is
sufficient to define its refinement to state box and
clear box forms without reference to other specKlca-
tion parts. This property permits specifications of

MTTF Estimateg
J large systems to be hierarchically
loss of precision at high levels
organized,
or of details
with
at low
no

levels.
Three fundamental principles underlie the box
Figure 2. The Cleanroom Life Cycle structure design process [17]:

1, All data to be defined and retained in a design


Incremental development provides a framework for
are encapsulated in boxes (objects, data
replanning schedules, resources, and functional
abstractions).
content, and permits changes to be incorporated and
packaged in a stepwise fashion. 2, AU processing is defined by sequential and
concurrent uses of boxes.

Cleanroom software specification 3. Each use of a box in a system occupies a dis-


Cleanroom development begins with a specKlca- tinct place in the usage hierarchy of the system.
tion of required system behavior and architecture. Each box can be defined in the three forms of
The object-based technology of box structures is an black, state, and clear box, with identical external
effective spec~lcation technique for Cleanroom behavior but increasing internal detail. These forms
d.velOPm.mt [1 S, 16]. Box structures provide a
isolate and focus on successive creative ddnitiom of
stepwise refinement and vefilcation process in black external behavior, retained data, and processing,
box, state box, and clear box forms for deftig respectively, as follows.
required system behavior and deriving and con-

7
The black box of an object is a precise specifica- history that must be retained as state data between
tion of external, user-visible behavior in all possible transitions to achieve required black box behavior.
circumstances of use. The object may be an entire The transition function of a state box is
system or system part of any size. The user may be
a person or another object. A black box accepts a (S, OS) + (R, NS),
stimulus (S) from a user and produces a response
(R) before the next stimulus is processed. Each where OS and NS represent old state and new state,
response of a black box is determined by its current respectively. While the external behavior of a state
stimulus history (S H), with black box transition box is identical to its corresponding black box, the
function stimulus history is replaced by reference to old state
and generation of new state as required by each tran-
(S, SH + R). sition.
State boxes correspond closely to the traditional
Any software system or system part exhibits black view of objects as encapsulations of state data and
box behavior in that its next response is determined services, or methods, on that data. In this view,
by the history of stimuli it has received. In simple stimuli and responses are inputs and outputs, respec-
illustration, imagine a hand calculator and two stim- tively, of specitlc service invocations.
ulus histories The clear box of an object is derived from its state
box by deftig a procedure to carry out the state
Clear 713 and Clear 713+ box transition function. The transition function of a
clear box is thus
Given a next stimulus of 6, the two histories produce
a responses of (S, OS) ~ (R, NS) by procedure.

7136 and 6 A clear box is simply a program that implements


the corresponding state box. Clear box forms
respectively. That is, a given stimulus will produce include sequence, alternation, iteration, and concur-
different responses based on history of use, not just rent structures [15]. A clear box may invoke black
on current stimulus. boxes at the next level for independent refinement.
The objective of a black box spec~lcation is to That is, the process is recursive, with each clear box
define required behavior in all possible circumstances possibly introducing opportunities for definition of
of use, that is, the responses produced for any pos- new, or extensions to existing, objects in black box,
sible stimulus and stimulus history, Such specifica- state box, and clear box forms.
tions include erroneous and unexpected stimuli, as Through this stepwise refinement process, box
well as correct usage scenarios. By deftig behavior structure specflcations evolve as usage hierarchies of
solely in terms of stimulus histories, black box spec- objects wherein the services of a given object may be
ifications do not depend on, or prematurely define, used and reused in many places at many levels as
design internals. required. Clear boxes play a crucial role in the hier-
Black box spec~lcations are often recorded in archy by ensuring the harmonious cooperation of
tabular form; in each row, the stimulus and condi- objects at the next level of refinement. Appropriate
tion on stimulus history are sufficient to define the objects and their clear box connections are derived
required response, Scale up to large specifications is out of immediate processing needs at each stage of
achieved by identifying classes of behavior for refinement, not invented a priori with connections
nesting tables, and through use of specification func- left to later invention.
tions [19] to encapsulate conditions on stimulus his- Box structures bring correctness vefilcation to
tories. object architectures. State boxes can be veritied with
The state box of an object is derived from its respect to their black boxes, and clear boxes verified
black box by identifying those elements of stimulus with respect to their state boxes. [15].
Cleanroom software design and verification
Design and veritlcation of clear box procedures is [set w to
minimum of
based on functional and algebraic properties of their z and &solute
val w of xl
constituent control structures. The control struc- ... Do

tures of structured progr amming used in clear box [setw to


minimum of [set Y to absolute
design, namely, sequence, ifthenelse, whiledo, etc., z and absolute WI ue of x]
value of xl lFx<e
are single-entry, single-exit structures with no side DO THEN
y := .*
effects in control flow possible. In execution, a ... [set Y to absolute ELSE
[set M to value of xl y:=~
control structure simply transforms data from an minimum of = F1

input state to an output state. This transformation, z and absolute [set n to minimum
value of xl of z and y] [setu to minimum
known as a program function, corresponds to a ... of z and y]
au IFy<z
mathematical function, that is, it defines a mapping ... THEN
~ :=
Y
from a domain to a range by a particular rule. ELSE
“:=~
Program functions can be derived from control
FI
structures. For example, for integers x, y, and z, the
00
program function of the sequence, ...

DO
Figure 3. Stepwise Refinement of a Design Fragment
z := abs (y)
with Intended Functions for Verification
w := max(x, z)
OD
conditions make use of function composition for
is, in concurrent assignment form, sequence, case analysis for alternation, and function
composition and case analysis in a recursive equation
w, z := max(x, abs(y)), abs(y)

and for integer x > = O, the program function of the


iteration, Control Correctness
Structures: Conditions:
WHILE -.--------- -----------
X>l
DO
X:= X.2 Sequence For al 1 arguments:

OD [f]
DO
is, in English, Does g fol lowed by h do f?
9;
h
set odd x to 1, even x to O 00

In stepwise refinement of clear box procedures, an


intended function is defined and then refined into a Ifthenelse
control structure and new intended functions for
[f]
refinement, as illustrated in the miniature example of IF p Whenever p is true
Figure 3. Intended functions are recorded in the THEN does g do f, and
whenever p is false
design, delimited by square brackets and attached to
ELS: does h do f?
their refinements. Design Slmplitication is an import- h
ant objective in refinement, to arrive at compact FI

and straightforward designs for verification. The cor-


rectness of each refinement is determined by deriving Whiledo
its program function, that is, the function it actually
[f] 1s termination guaranteed, and
computes, and comparing it to the intended func- WHILE p whenever p is true
tion. 00 does g followed by f do f, and
9 whenever p is false
A Correctness Theorem [20] defines how to make
00 does doing nothing do f?
the comparison of intended functions and program
functions in terms of correctness conditions to be
verified for each control structure. The correctness Figure 4. Correctness Theorem Correctness Conditions
in Question Form

9
for iteration. For sequence, one condition must be
checked, for alternation, two conditions, and for iter- Program: Subproofs:
--------- ----------
ation, three conditions, as shown in Figure 4. The
conditions
pendent,
are language and subject matter inde-
II [fl]
00
gl
fl = [Do gl; gz; [fz] 00] ?

The nested and sequenced control structures of a g2


[fz] f2 = [WHILE pl 00 [f31 00] ?
clear box define a natural decomposition hierarchy WHILE
that enumerates the independent subproofs required,
00 !:31 f3 = [00 g3; [f4]; g8 00] ?
one for each control structure. An Axiom of g3
Replacement
intended functions
permits algebraic
and their control structures in the
substitution of
I [f4]
IF
f4 = [IF P2 THEN [fsl ELSE [f61 FII

THE~2[f5] fs = [00 g4; g5 00] ?


hierarchy of subproofs. This substitution permits
g4
proof arguments to be localized to the control struc-
ELS!5[f6] f6 = [00 g6; g7 00] ?
ture at hand, and, in fact, the proofs for each control gb
structure can be carried out in any order. A minia- g7
FI
ture program and its required subproofs are shown g8
in Figure 5. 00

In essence, clear boxes


number of control structures,
are composed
each of which
of a ftite
is veri-
I 00

Figure 5. A Program and its Constituent Subproofs


fied by checking a ftite number of correctness con-
ditions. Even though all but the most trivial
at high levels may take, and well be worth, more
programs exhibit an essentially infiite number of
time, but it does not take more theory.
execution paths, their verification can be carried out
Correctness veriilcation produces quality results
in a finite number of steps. For example, the clear
superior to unit testing and debu@ng. For each
box of Figure 6 requires verification of exactly 15
program part, function-theoretic correctness condi-
correctness conditions.
tions permit veritlcation of all possible effects on
The value to software quality of the reduction of
data. Unit testing, however, checks only effects of
verification to a ftite process cannot be overempha-
particular test paths selected out of many possible
sized. It permits Cleanroom development teams to
paths. A program or program part may have many
verify every line of design and code through mental
paths to test, but only one function to verify.
proofs of correctness in team reviews. Written proofs
In addition, veritlcation is more efficient than unit
are also possible for extra cotildence, for example, in
testing. Most verification conditions can be checked
vefication of life- or mission-critical software.
in a few seconds in team reviews, but unit tests take
In team reviews, every correctness condition of
substantial time to prepare, execute, and check.
every control structure is vefied in turn. Every
team member must agree that each condition is
correct. An error is possible only if every team Cleanroom software quality certification
member incorrectly verities a particular correctness Techniques and benefits of statistical quality
condition. The requirement for unanimous agree- control in hardware development are well known.
ment based on individual veritlcations results in soft- In cases where populations of items are too large to
ware at or near zero defects prior to fwst execution. permit exhaustive testing, statistical sampling and
Function-theoretic veritlcation scales up to large analysis methods are employed to obtain scientific
systems. Every structured system, no matter how assessments of quality.
large, has top level programs composed of fkmiliar In simple illustration, the process of statistical
sequence, alternation, and iteration structures, which quality control in manufacturing is to 1) sample the
typically invoke large-scale subsystems at the next population of items on a production line, 2) measure
level involving thousands of lines of code (each of the quality of the sample with respect to a design
which has its own top level programs). The correct- assumed to be perfect, 3) extrapolate the sample
ness conditions for these structures are scale-free, quality to the population of items and 4) if the
that is, they are invariant with respect to the size and quaSity is inadequate, identfi and correct flaws in
complexity of the operations involved. Vefilcation production. In applying statistical quality control to
hardware products, the statistics lie in the variation

10
of physical properties of items in the population. This process, known as statistical usage testing
But in the case of software products, all copies are [6], amounts to testing software the way users intend
identical, bit for bit, so where are the statistics? to use it. The entire focus of statistical testing is on
It turns out that software has a statistical property external system behavior, not internals of design and
of great interest to developers and users, namely, its implementation as in conventional coverage testing.
execution behavior. That is, how long on average Cleanroom certtilcation teams have deep knowledge
will a software product execute before it fails, say by of expected usage, but no knowledge of design inter-
abending, producing incorrect output, etc.? Thus, nals.
the process of statistical quality control in software is As noted, the role of a Cleanroom certitlcation
to 1) sample the essentially infinite population of team is not to debug software, but rather to certify
possible user executions of a product based on the its quality through statistical testing techniques. The
frequency of expected usage, 2) measure the quality certtilcation may show adequate quality, but if not,
of the sample by determining g if the executions are the software will be returned to the development
correct, 3) extrapolate the quality of the sample to team for rework,
the population of possible executions, and 4) if the In practice, Cleanroom quality certtilcation is
quality is inadequate, identify and correct flaws in carried out in three steps, as follows:
the development process, for example, improvements
to inadequate correctness verification, Step 1: Specify usage probability distributions.
Usage probability y distributions are models of
intended usage of a software product. They define
all possible usage patterns and scenarios, including

r
erroneous and unexpected usage, together with their
[0 := odd.members [Q) I I even_members (01 1 probabilities of occurrence. Usage probability dis-
PROC Odd_Before_Even (ALT Q)
tributions represent the virtually infiite population
OATA
odds : queue of integer [ initializes to empty ]
of possible executions of a soft ware product, together
evens : queue of integer [ initializes to empty ]
with their expected frequencies of use,
x : integer
ATAO Distributions are defined by the cert~lcation team
based on box structure specitlcations of system func-
tion, plus information on system usage probabilities
obtained from prospective users, actual usage of
prior versions, etc. Formal grammars permit
x := end(0)
seq
1 [ x is
I true
IF odd(x)
odd -> odds := odds II x
-> evens := evens II x ] 1 seq
1
wdo
3
compact
and review.
representations of distributions for analysis

THEN ite
end[odds) := x 2
Step 2 Randomize test cases against usage proba-
ELSE
end(evens) := x
\ bility distributions. Test cases are derived from the
FI
distributions, such that every test represents actual
OD
usage and will effectively rehearse user experience

II
[0 := U II odds,
with the product. Because the test cases are com-
odds := empty 1
WILE odds <> empty pletely prescribed by the distributions, test case pro-
00 wdo
duction is a mechanical, and automatable, process.
x := end(odds) seq
end(0) := x 1, 3 In miniature illustration, Figure 7 depicts a usage
00 J specification and corresponding test case generation

k [0

00
evens
WHILE evens
:= 0
: = empty
.>
II evens,
]
empty
wdo
for a program
Delete
tribution,
(D), Query
with

sirnp~led
four
(Q),
user stimuli
and Print
for illustration,
(P).
to Update

shows
A usage dis-
projected
(U),

x
end(q)
:= end[evens)
:= x 1, seq
3 probabilities
the four stimuli,
of use of 32, 14, 46, and 8 percent for
respectively (omitting scenarios of
1
00 use, etc., for simplicity). These probabilities are
sEO = .eque..e

CORP ITE = I fthenelse mapped onto an interval of O to 99, dividing it into


MOO = whi I edo
four partitions proportional to the probabilities.
Assuming a test case contains six stimuli, each test is
Figure 6. A Procedure with 15 Correctness Conditions

11
Usage Probabi 1i ti es:

usage
nigh quality Cmd.1
Program Possi bi I I ty Distribution tfTTF . . . ,x...d +,t.1
Sti MUIi Di stri but ion Interval test #J. ee”tl.” ****

U (update) 32$ 9-31

0 (delete) 14% 32 - 45
tlTTF
0 (query) 48$ 46 - 91
Estlnute
P (print] B$ 92 - 99

I’@o. quality cd, 1


Test Case Generat i 0.:
no reliabllitv th
!ro.
Test Number Random Numbers Test Case

1 29 11 47 52 26 94 UUOQUP

2 62 98 39 79 82 65 OUDOOO
3 83 32 58 41 36 17 UDODOU

4 36 48 86 &2 28 77 DDOUUO
Figure 8. Two Sample MTrF Graphs Produced by the
... ... ... Cleanroom Certification Model

quality characteristic of coverage testing (if many, or


Figure 7. Simplified Usage Probability Distribution and
few, errors are found, is that good or bad?).
Statistical Test Case Generation fora Program
with Four User Stimuli In incremental development, a usage probability
distribution can be stratified into subsets that exer-
generated by obtaining six two-digit random cise increasing functional content as increments are
numbers, determining g the partitions within which added, with the full distribution in effect once the
they reside, and appending the corresponding stimuli final increment is in place. In addition, alternate dis-
(U, D, Q, or P) to the test case. In this way, each tributions can be defined to permit independent cer-
test case is ftithful to the distribution and represents t~lcation of infrequently used system functions (with
a possible user execution of the system. low probability in primary distributions) that carry
high consequences of failure, for example, code for
Step 3: Execute raudom test cases, assess success or emergency shutdown of a nuclear reactor.
failure, and compute quality measures. Each test case But there is more to the story of statistical usage
is executed and its results are verified against system testing. Extensive analysis of errors in large-scale
specifications. Time in execution up to correct com- software systems reveals a spread in the filure rates
pletion or failure is recorded in appropriate units, for of errors of some four orders of magnitude [2].
example, CPU time, wall clock time, number of Virulent, high-rate errors can literally occur every few
transactions, etc. In effect, these times, known as hours for some user, but low-rate errors may show
interfail times, represent the quality of the sample of up only after decades of use. High-rate errors have a
possible user executions. Interfail times accumulated profound effect on product quality, but they com-
in testing are processed by a quality certification prise only a small fraction of total errors. In fact,
model [6] that computes product Mean Time To this small fraction (under 30/0) is responsible for
Failure (MTTF) and other measures of quality. nearly two-thirds of the software failures reported
Figure 8 depicts graphs produced by the certification [5].
model. The X and Y axes plot errors fixed and Because statistical usage testing amounts to testing
computed MTTF, respectively. The curve for high- software the way users will use it, errors tend to be
quality software shows exponential improvement, found in failure-rate order on average, that is, any
such that the MTTF quickly exceeds the total test remaining virulent, high-rate errors tend to be found
time, whereas the curve for low-quality software first. As a result, errors left behind, if any, at com-
shows little MTTF growth. pletion of testing tend to be low-rate errors that are
Because statistical usage testing embeds the soft- infrequently encountered by users.
ware development process in a formal statistical Traditional coverage testing does not fmd errors in
design, MTTF measures provide a scientflc basis for failure-rate order, but rather, in random order. On
management action, unlike the anecdotal evidence of any given coverage path, an error will either be

12
found or not. If found, an error may be low rate, 8. A Success Story at Pratt & Whitney: On Track for
high rate or in between, That is, coverage testing is the Fu~ure with IBMs VS COBOL II and COBOL

not biased to fmd errors in any particular rate order. Structuring Facility, publication GK20-2326, IBM
Corporation, White Plains, NY,
Finding and fixing low-rate errors has little effect on
MTTF and the user perception of quality. But 9. Kouchakdjian, A., S. Green, and V. R. Basili, “Eval-
finding and fixing errors in fiiilure-rate order has dra- uation of the Cleanroom Methodology in the Soft-
ware Engineering Laboratory.” Proc. Fourteenth
matic effect, with each correction resulting in sub-
Annual software Engineering Workshop, NASA
stantial improvement in MTTF. In fact, statistical
Goddard Space Flight Center, Greenbelt, MD,
usage testing is more than 20 times more effective at November 1989.
extending MTTF than is coverage testing [5].
10. Trammell, C. J., L. 11. Binder, and C. E. Snyder,
“The Automated Production Control System: A Case
Acknowledgements Study in Cleanroom Software Engineering,” ACM
Transactions on Sof[ware Engineering and Method-
The author wishes to thank Kim I-Iathaway for
ology, Vol. 1, No. 1, January 1992, pp. 81-94.
her contributions and assistance in developing this
paper. Suggestions by Michael Deck, Philip 11. Hausler, Philip A., “A Recent Cleanroom Success
Story: The Redwing Project,” Proc. Seventeenth
Hausler, Ilarlan Mills, Mark Pleszkoch, and Alan
Annual Software Engineering Workshop, NASA
Spangler were appreciated. Special acknowledge-
Goddard Space Flight Center, Greenbelt, MD,
ment is due to the members of the Cleanroom teams December 1992.
whose quality results are reported in this paper, and
12. Deck, M. D., P. IIausler, and R.C. Linger, “Recent
who are setting new standards of professional excel-
Experiences with Cleanroom Software Engineering,”
lence in software development. Proc. 1992 IBM Software Development Conference,
Toronto, Canada, 1992

References 13. Green, S.E. and Rose Pajerski, “Cleanroom Process


Evolution in the SEL~ Proc. Sixteenth Annual Soft-
1. Mills, H. D., “Certifying the Correctness of Software,”
ware Engineering Workshop, NASA Goddard Space
Proc. 25th Hawaii International Conference on
Flight Center, Greenbelt, MD, December, 1991.
System Sciences, IEEE Computer Society Press,
January, 1992, pp. 373-381. 14. Mills, H. D., Private communication.

2. Adams, E. N., “Optimizing Preventive Service of So[t- 15. Mills, H. D., R. C. Linger, and A, R. Hevner, Princi-
ware Pro ducts,” IBM Journal of Research and DeveI- ples of Information Systems Analysis and Design,
oprnent, January, 1984. Academic Press, San Diego, CA, 1986.

3. Mills, H. D., M. Dyer, and R. C, Linger, “Cleanroom 16. Mills, H. D., R. C. Linger, and A. R. Hevner, “Box
Software Engineering,” IEEE Software, September, Structured Information Systems,” IBM Systems
1987, pp. 19-25. Journal, Vol. 26, No. 4, 1987, pp. 393-413.

4. Linger, R, C. and R, A, Spangler, “The IBM 17. Mills, H. D., “Stepwise Refinement and Verification
Cleanroom Software Engineering Technology in Ilox-Structured Systems,” IEEE Computer, June,
Transfer Program,” Proc. SEI Software Engineering 1988.
Education Conference, IEEE Computer Society Press,
18. Hevner, A. R. and H. D. Mills, “Box Structured
San Diego, CA, October 5-7, 1992.
Methods for Systems Development with Objects,”
5. Cobb, R. H. and H. D. Milk, “Engineering Soltware IBM Systems Journal (to appear).
Under Statistical Quality Control,” IEEE Sof/ware,
19. Pleszkoch, M. G., P. A. Hausler, A. R. Hevner, and
November, 1990, pp. 44-54.
R. C. Linger, “Function-Theoretic Principles of
6. Curritt, P. A., M. Dyer, and H. D. Mills, “Certifying Program Understanding,” Proc. 23rd Hawaii Znterma-
the Reliability of So ftware,” IEEE Trans. on Software tional Conference on System Sciences, IEEE Com-
Engineering, Vol. SE-1 2, No. 1, January, 1986, pp. puter Society Press, January, 1990, pp. 74-81.
3-11.
20. Linger, R. C., H. D. Mills, and B. I. Wht, Structured
7. Linger, R. C. and H. D. Mills, “A Case Study in Programming: Theory and Practice, Addison-Wesley,
Cieanroom Software Engineering: The IBM COBOL Reading, MS, 1979.
Structuring Facility,” Proc. 12th International Com-
21. Poore, J. H. and H. D. Mills, “Bringing Software
puter Science and Applications Conference, 1EEE
Under Statistical Quality Control,” Quality Progress,
Computer Society Press, October, 1988.
November 1988.

13

Vous aimerez peut-être aussi