Académique Documents
Professionnel Documents
Culture Documents
Combinations
Laleh Sh. Ghandehari, Yu Lei, David Kung
Combinatorial
I.
Testing;
Fault
Localization;
INTRODUCTION
168
A MOTIVATING EXAMPLE
169
P1
P2
Actual Output
Expected Output
True
True
True
False
False
False
False
False
o
o
true, false ,
where s, f
true if s trace f and s, f
false if
s trace f .
PRELIMINARIES
A. Basic concepts
Assume that the system under test (SUT) has k input
parameters, denoted by set P
p , p , , p . Let d be the
domain of parameter p . That is, d contains all possible
values that p could take, and let D
d
d
d .
Let S be the set of all the statements in the source code of
SUT.
170
for
(1)
Min
o
c
,
,,
Core
,,
Derived
,,
Derived
,,
Derived
(2)
APPROACH
A. Test generation
Let c be the top-ranked suspicious t-way combination
taken as input by our approach. In this step, a group of tests
is generated which contains a core member and at most t
derived members. The core member f is created such that it
contains c and the suspiciousness of environment of c in f is
minimized. That is, for each parameter p involved in c, f has
the same value for p as c, i.e. c
f; and for each parameter
p that does not appear in c, f takes a value that has the
minimum suspiciousness value among all the values of p.
171
0, c 0 , components b
suspiciousness value.
0 and d
B. Rank generation
In this section, we produce a ranking of statements in
terms of their likelihood to be faulty by analyzing the
spectrums of the core member and derived members. The
suspiciousness of statement s is computed by the following
formula:
s
s, f / |g |
1
0.5
0
if s, f
if s, f
if s, f
true and s, f
s, f
true
false
(3)
false
4
0.57
0.77
value
0
0.2
0.47
0
d
0.3
0.3
0.33
D. Complexity analysis
Let k be the number of parameters, t the strength of the
initial test set and n the number of statements in the subject
program. To generate the core member, it is necessary to find
the component with the minimum suspiciousness value for
all the parameters that are not involved in the inducing
combination (so that the suspiciousness of the environment
is minimized). This takes k t O d where d is the
largest domain size.
Parameter
value
C. Discussion
The effectiveness of our approach depends to some
extent on the quality of the top-ranked suspicious
combination identified by BEN . If the top - ranked
172
V.
EXPERIMENT
C. Trace Collection
We used Gcov to collect execution trace. Gcov reports
the number of times statements are executed by a given test.
A statement is included in the execution trace of a given test
if and only if it is executed by the test one or more times.
LOC
# of faulty
versions
# of compatible
faults
printtokens
472
printtokens
10
13
printtokens2
399
10
printtokens2
10
13
replace
512
32
24
replace
Programs
Model
#Constraints
292
schedule
schedule2
301
10
schedule2
tcas
141
41
36
tcas
totinfo
440
23
20
totinfo
3
3
4
5
0
0
10
Extra code
Definition
Missing code
Incorrect code
Total
Kill
Total
Kill
Total
Kill
Total
Kill
Total
Kill
printtokens
printtokens2
10
replace
28
28
32
32
schedule
All
schedule2
10
tcas
31
30
10
41
36
totinfo
18
23
12
173
8
36
schedule
D. Metrics
Recall that the output of our approach is a ranking of
statements in terms of their likelihood to be faulty. In order
to find the faulty statement, we inspect statements in the first
rank, and then statements in the second rank, and continue to
do so until we find the faulty statement. We record the
number of statements that must be inspected to find the
faulty statement in each program to measure the
effectiveness of our approach.
The last two columns show the average number and the
percentage of code that must be inspected to locate the fault.
To compute this number, we started from statements that
were ranked at the top and for statements ranked at the same
rank, we started from the first statement as output by our
approach. We did not perform any dependency analysis. As
discussed later, dependency analysis could further reduce the
percentage of code that needs to be inspected. As shown in
the table, in 7 versions of schedule less than 6% of the code
must be inspected to locate the fault and only about 28 tests
are generated after testing. In the worst case 43% of the code
in
schedule2
must
be
inspected.
printtokens
Test
strength
(t)
# of tests
in t-way
test set
170
33
0.67
2.33
Avg #of
statements
inspected to find
faults
Avg percentage
of statement
inspected to
locate faults
28.67
15.24
printtokens2
170
35.88
0.55
2.44
24.44
12.53
replace
192
32
12.66
3.62
35.25
14.57
schedule
64
23.86
0.28
9.14
5.94
schedule2
64
42
55
43.30
tcas
100
17
1.70
2.47
15.58
22.53
405
12
5.67
3.5
12.08
18.58
1434
13.14
4.14
11.71
18.02
30
9.4
17.6
14.30
156
7.14
0.86
3.57
11.43
9.29
totinfo
174
H. Threats to validity
Threats to internal validity are factors that may be
responsible for the experimental results, without our
knowledge. One of the key steps in our experiments is
modeling the input parameters, that may affect the
correctness of the result. To reduce this threat, we have done
this step by using the program specifications and error-free
versions, without having any knowledge about the faults.
Further, consistency of the results has been carefully checked
to detect potential mistakes made in the experiments.
# of tests
in t-way
test set
# of tests for
identifying inducing
combination
# of times the
core member
does not fail
# of tests for
ranking suspicious
statements
#of statements
inspected to
find faults
percentage of
statement inspected to
locate faults
170
45
0.53
printtokens2
170
110
0.51
schedule
64
140
3.24
Program
Printtokens
schedule2
64
42
0.79
tcas
100
71
1.54
totinfo
30
28
2.44
175
found, the remaining statements are examined in a nonincreasing order of their suspiciousness scores. Other
approaches such as Pinpoint, AMPLE and Ochiai [1] are
reported that adopt the general framework of Tarantula but
use different metrics to compute suspiciousness of
statements. Experiments reported by [13], shows that no
other spectrum-based approaches statistically significantly
outperform Tarantula.
Renieris and Reiss in [16] proposed three different
spectrum-based approaches, set union, set intersection and
nearest neighbor. They assume the existence of one failed
run and a large number of passed runs. The input of the
approach is a group of program runs, instead of a test set.
The set union method computes f
S s, where f is a
program spectra of a failing run and S s is the union
spectra of a set of passed runs. The intersection model
computes, the intersection spectra, S s f, of a set of
passed runs. The statements in these spectra are then
checked to find the actual faults.
RELATED WORK
176
CONCLUSION
ACKNOWLEDGMENT
177