Académique Documents
Professionnel Documents
Culture Documents
Yashwant K. Malaiya
Jason Denton
Computer Science Dept.
Colorado State University
Fort Collins, CO 80523
malaiya j denton@cs.colostate.edu
Defects Found s b
s
25 b bb s
b
bs
b s
b s
D 20
bs
e b
bs
f
s
b
bs
e s
c b
s
t 15 bbs
s
b
bs
bs
bbbs
s
10 s
s
s
s
s
5 s
closer to the actual number if we use a more strict test cov- The plots suggest that 100% block coverage would un-
erage measure. cover 40 defects, 100% branch coverage would uncover 47
The numbers across the top of the plot in Figure 2 give the defects, whereas 100% p-use coverage would reveal 51 de-
number of test cases applied for the top curve. They show fects. If we assume that all of the software components are
that despite the linear growth of defect coverage, more test- reachable, than we can expect that the actual total number of
ing effort is needed to obtain more coverage and thus find faults to be slightly more than 51. In actual practice, often
more faults. The second curve shows the plot that would be large programs contain some unreachable code, often called
obtained if some of faults were already removed in a previ- dead or obsolete code. For C programs it can be in the neigh-
ous test phase when current testing was initiated. The sec- borhood of 5%. For accurate estimates this needs to be taken
ond plot shows that when the initial defect density is lower, into account. For simplicity of illustration, here we will as-
testing starts finding defects at a higher coverage level. This sume that all of the code is reachable. Unreachable code can
corresponds to to shifting of the curve downwards. be minimized by using coverage tools and making sure that
all modules and sections are entered during testing.
3. Applying the New Approach The data sets used in this paper are for programs that are
not evolving and thus the actual defect density is constant.
We note that the fitted model becomes very linear after
the knee in the curve. Let us define Cki as the knee, where the
Applicability of this model is illustrated by the plots in
figures 3, 4, and 5. This data was collected experimentally
by Pasquini et al from a 6100 line C program by applying linear part intersects the x-axis. For block, branch and p-use
20,000 test cases [15] . The test coverage data was collected coverage it occurs at about 40%, 25% and 25% respectively.
using the ATAC tool. Figure 3 shows a screen from RO- Below we see the significance of this value.
BUST, a tool which implements our technqiue; and that has The coverage model in Equations 4 and 6 provides us a
been developed at CSU [4]. Further development of this tool new way to estimate the total number of defects. As we can
is underway to include additional capabilities. see in Figures 3, 4 and 5, which use the data obtained by
For the 20,000 tests, the coverage values obtained were: Pasquini et al., the data points follow the linear part of the
block coverage : 82.31% of 2970 blocks, decision cover- model rather closely. Both the experimental data and the
age : 70.71% of 1171 decisions, and p-use coverage 61.51% model suggest that the 100% coverage eventually achieved
of 2546 p-uses. This is to be expected since p-use cover- should uncover the number of faults as given in Table 1 be-
age is the most rigorous coverage measure and block cover- low. Numbers in the last column have been rounded to near-
age is the least. Complete branch coverage guarantees com- est integer.
plete block coverage, and complete p-use coverage guaran- It should be noted that for this project 1240 tests revealed
tees complete decision coverage. 28 faults. Another 18,760 tests did not find any additional
Data Set: Pasquini
50
45
Coverage Data
40 Model
35
D 30
e
f 25
e
c
t 20
s
15
10
5
0
0 0.2 0.4 0.6 0.8 1
Branch Coverage
Figure 4. Defects vs. % Branch Coverage
Table 1. Projected number of total defects considering the fact that often even obtaining 100% branch
with 100% coverage coverage is infeasible, we are unlikely to detect more faults
than what the P-use coverage data provides even with fairly
rigorous testing. It should be noted that C-use coverage does
Defects Coverage Defects not fit in the subsumption hierarchy and therefore it is hard to
Coverage Measure found achieved expected interpret the values obtained by using C-use coverage data.
Block Coverage 28 82% 40 Further application of this new method is illustrated by
Branch Coverage 28 70% 47 examining the data provided by Vouk [17]. These three data
P-uses Coverage 28 67% 51 sets were obtained by testing three separate implementations
C-uses Coverage 28 74% 43 of a sensor management program for an inertial navigation
system. Each program is about five thousand lines of code.
In the first program, 1196 tests found 9 defects. For the other
faults, even though at least 5 more faults were known. The
two programs 796 test revealed 7 and 9 defects respectively.
data suggests that the enumerables (blocks, branches etc.)
Figure 6 shows the plots of P-use coverage achieved versus
not covered by the first 1240 tests were very hard to reach.
the number of defects found. Table 2 shows the estimates
They perhaps belong to sections of the code intended for
for the total number of faults that would be found with 100%
handling special situations.
coverage.
There is a subsumption relationship among blocks,
branches and P-uses. Covering all P-uses assures covering
all blocks and branches. Covering all branches assures cov-
Table 2. Projected number of total defects
erage of all blocks. Thus among the three measures the P-
with 100% coverage (Vouk’s data sets)
use coverage measure is most strict. There is no coverage
measure such that 100% coverage will assure detection of
all the defects. Thus in the above table, the entry in the last Faults Expected Faults
column is a low estimate of the total number of defects actu- Data Set Found Block Branch P-use
ally present. Using a more strict coverage measure raises the Vouk1 9 11 11 18
low estimate closer to the actual value. Thus the estimate of Vouk2 7 8 8 8
51 faults using P-use coverage should be closer to the actual Vouk3 10 10 11 13
number than the estimates provided by block coverage. Two
coverage measures, DU-path coverage and all-path cover-
age are more strict than P-use coverage, and may be suitable Table 2 again shows that the estimates obtained are con-
for cases where ultra-high reliability is required. However sistent with the subsumption hierarchy.
Data Set: Pasquini
60
50
Coverage Data
Model
D 40
e
f 30
e
c
t
s 20
10
0
0 0.2 0.4 0.6 0.8 1
P-use Coverage
Figure 5. Defects vs. % P-use Coverage
4. Significance of Parameters i
i = , A0
Cknee
Figure 2, which is a direct plot of the model, and figures Ai 1
(10)
3, 4, and 5 which plot actual d ata, suggest that at higher cov- Here we can make a useful approximation. For a strict
erage values, a linear model can be a very good approxima- coverage measure, for C i = 1, C D 1. Hence from equa-
tion. This leads us to an analytical interpretation of the be- tion 7, we have
havior. This interpretation can be used for obtaining a sim-
ple preliminary model for planning purposes. Ai0 + Ai1 1: (11)
We will obtain a linear model from equation 4. Let us as- Replacing A0 by 1 , A1 in equation 10, and using 8, we
i i
sume that at higher values of C i equation 4 can be simplified can write,
as
i =1, 1
Cknee ai ai (12)
ai ln[ai eai2 C i
0 1
C D (C i ) = 0 1 This can be written as ??,
= ai0 ln(ai1 ) + ai0 ai2 C i i
= Ai0 + Ai1 C i where C i > Cni i = 1 , ( Dmin )D0
Cknee
(7)
D0 Di 0 min 0
(13)
i , D0 , Di are parameters and D0 is the ini-
Where Dmin
D D min 0 0
many quality assurance engineers and managers. Existing [2] R. W. Butler and G. B. Finelli. The infeasibility of quanti-
methods for estimating the number of defects can underes- fying the reliability of life-critical real-time software. IEEE
timate the number of defects because of a bias towards easily Transactions on Software Engineering, 19(1):3–12, Jan.
testable faults. The exponential model tends to generate un- 1993.
stable projections and often yield a number equal to defects [3] M. Chen, M. R. Lyu, and W. E. Wong. An empirical study
already found. of the correlation between coverage and reliability estima-
tion. In IEEE Third International Symposium on Software
We have presented a model for defects found based on Metrics, Berlin, Germany, Mar. 1996.
coverage, and shown that this model provides a very good [4] J. A. Denton. ROBUST, An Integrated Software Reliability
description of actual data. Our method provides stable pro- Tool. Colorado State University, 1997.
jections early in the development processes. Experimental [5] L. Hatton. N-version design versus one good design. IEEE
data suggests that our method can provide more accurate es- Software, pages 71–76, Nov./Dec. 1997.
timates and provide developers with the information they re- [6] M. Hutchings, T. Goradia, and T. Ostrand. Experiments
quire to make an accurate assessment of software reliability. on the effectiveness of data-flow and control-flow based test
The choice of coverage measure does has an effect on the data adequacy criteria. In International Conference on Soft-
projections made, and we suggest that a strict coverage mea- ware Engineering, pages 191–200, 1994.
sure such as P-uses should be used for very high reliability [7] P. Lakey and A. Neufelder. System and Software Reliability
programs. Assurance Notebook. Rome Laboratory, Rome, New York,
Here we have assume that no new defects are being in- 1997.
troduced in the program. Further investigations are needed [8] M. R. Lyu, J. R. Horgan, and S. London. A coverage analysis
tool for the effectiveness of software testing. In Proceedings
to extend this method for evolving programs.
of the IEEE International Symposium on Software Reliability
Engineering, pages 25–34, 1993.
6. Acknowledgement [9] Y. K. Malaiya and J. A. Denton. What do software reliabil-
ity parameters represent? In 8th International Symposium
on Software Reliability Engineering, pages 124–135, Albu-
This work was supported in part by a BMDO funded querque, NM, Nov. 1997.
project monitored by ONR. [10] Y. K. Malaiya, N. Karunanithi, and P. Verma. Predictability
of software reliability models. IEEE Transactions on Relia-
bility, pages 539–546, Dec. 1992.
References [11] Y. K. .Malaiya, N. Li, J. Bieman, R. Karcich, and B. Skibbe.
The relationship between test coverage and reliability. In
[1] R. V. Binder. Six sigma: Hardware si, software no! Proceedings of the International Symposium on Software Re-
http://www.rbsc.com/pages/sixsig.html, 1997. liability Engineering, pages 186–195, Nov. 1994.
[12] Y. K. Malaiya, A. von Mayrhauser, and P. Srimani. An exam-
ination of fault exposure ratio. IEEE Transactions on Soft-
ware Engineering, pages 1087–1094, Nov. 1993.
[13] Y. K. Malaiya and S. Yang. The coverage problem for ran-
dom testing. In Proceedings of the International Test Con-
ference, pages 237–245, Oct. 1984.
[14] S. McConnell. Gauging software readiness with defect track-
ing. IEEE Software, 14(3), May/June 1997.
[15] A. Pasquini, A. N. Crespo, and P. Matrella. Sensitivity of
reliability growth models to operational profile errors. IEEE
Transactions on Reliability, pages 531–540, Dec. 1996.
[16] P. Piwowarski, M. Ohba, and J. Caruso. Coverage mea-
surement experince during function test. In Proceedings of
the 15th International Conference on Software Engineering,
pages 287–300, May 1993.
[17] M. A. Vouk. Using reliability models during testing with
non-operational profiles. In Proceedings of the 2nd Bell-
core/Purdue Workshop on Issues in Software Reliability Es-
timation, pages 103–111, Oct. 1992.