Vous êtes sur la page 1sur 4

How Do You Know When

You Are Done Testing?

By Richard Bender

and the system state after the test exe- The first two steps are totally inter-
When you ask testers how they know cutes. twined. Testing, by definition, is com-
they are done testing, the most common paring an expected answer to the
responses are: Build Test Cases: There are two parts observed answer. You need to define
needed to build test cases from logical quantitatively and qualitatively how
We test until we are out of time and test cases: creating the necessary data; much testing is enough and then design
resources; and building the components to support tests that will ensure that criteria is met.
testing (e.g., build the navigation to get to You must do this for each type of testing:
We test until all of the test cases we cre- the portion of the program being tested). functional, performance, usability, secu-
ated ran successfully at least once and rity, etc. Given the space constraints, we
there are no outstanding severe defects. Execute Tests: Execute the test case steps will only address functional testing in this
against the system being tested and docu- paper.
I admire the honesty of the first answer ment the results.
which comes from the “clean con- The first thing we need to understand is
science” school of testing – “I did all the Verify Test Results: Verify that the that you cannot exhaustively test any
testing I could under the constraints man- expected test results match the observed software system. The upper limit to the
agement gave me and my conscience is results. [Note: This pre-supposes that the total number of tests for a program is:
clear”. The obvious question that follows specifications are clear enough and
the second answer is how much function detailed enough to actually calculate the [2n(L1*L2*...*Lx)(V1*V2*...*Vy)]!
and code were actually tested? In the expected answer ahead of time. Testing
vast majority of cases the team has no specifications to ensure that they are cor- where “n” is the number of decisions,
quantitative measure of their level of test- rect, unambiguous, logically consistent, “Li” is the number of times a given deci-
ing. and written in sufficient detail is a non- sion can loop, x is the number of deci-
trivial issue and the subject of another sions which cause loops (x < or = to n),
Stepping back, testing is divided into the paper.] “Vi” is the number of all of the possible
following eight activities: values that each input variable can have,
Verify Test Coverage: Track the amount and y is the number of input variables.
Define Test Completion Criteria: The of coverage achieved by the successful The factorial (“!”) is because the order in
test effort has specific, quantifiable goals. execution of each test. which the set of tests are executed does
Testing is completed only when the goals make a difference as to the results. This
have been reached (e.g., testing is com- Manage the Test Library: Maintain the number is actually absolutely meaning-
plete when the tests that address 100% relationships between the test cases and less mathematically as well as being
functional coverage of the system all the programs being tested. Keep track of practically impossible to achieve. In
have executed successfully). what tests have/have not been executed, many programs this number exceeds the
and whether the executed tests have number of molecules in the universe
passed or failed.
Design Test Cases: Logical test cases are [1080 according to Stephen Hawkings].
defined by five characteristics: the initial
state of the system prior to executing the Manage the Resolution of Identified
The goal of test case design is to identify
test; the data in the system (e.g., data base Defects: Track the status of defects and
an extremely small subset of the possible
values); the inputs; the expected results; retest as needed.
combinations of data that will give you
Figure 3 Figure 4
B is always true. There is no Geneva can now see the A defect. When any as we expected because the D, E, F leg
Convention for software which limits us defect is detected all of the related tests worked. In this case we did not see the
to one defect per function. must be rerun. defect at C because it was hidden by the
F leg working correctly.;
Figure 3 shows the results of running the The above example addresses the issue
tests. When we run test variation 1 the that two or more defects can sometimes Therefore, the test case design algorithms
software says A is not true, it is false. cancel each other out giving the right must factor in:
However, is also says B is not false, it is answers for the wrong reasons. The
true. The result is we get the right answer problem is worse than that. The issue of The relations between the variables (e.g.,
for the wrong reason. When we run the observability must be taken into account. and, or, not);
second test variation we enter B true When you run a test how do you know it The constraints between the data attrib-
which the software always thinks is the worked? You look at the outputs. For utes (e.g., it is physically
case – we get the right answer. When we most systems these are updates to the impossible for variables one and
enter the third variation with just C true, databases, data on screens, data on two to be true at the same time);
the software thinks both B and C are true. reports, and data in communications The functional variations to test (i.e., the
Since this is an inclusive “or” we still get packets. These are all externally observ- primitives to test for each logical
the right answer. We are by now report- able. relationship); and
ing to management that we are three Node observability;
quarters done our testing and everything In Figure 5 let us assume that node G is
is looking great. Only one more test to the observable output. C and F are not The design of the set of tests must be
run and we are ready for production. externally observable. We will indirectly such that if one or more defects are pres-
However, when we enter the fourth test deduce that the A, B, C function worked ent, you are mathematically guaranteed
with all inputs false and still get D true by looking at G. We will indirectly that at least one test case will fail at an
that we know we have a problem. deduce that the D, E, F function worked observable point. When that defect is
by looking at G. Let us further assume fixed, if any additional defects are pres-
There are two key things about this there is a defect at A where the code ent, then one or more tests will fail at an
example so far. The first is that software, always assumes that A is true no matter observable point.
even when it is riddled with defects, will what the input is. A fairly obvious test
still produce correct results for many of case would be to have all of the inputs set A by product of these algorithms is that
the tests. The second thing is that if you to true. This should result in C, F, and G some variations get flagged as
do not pre-calculate the answer you were being set to true. When this test is “untestable”. That means there is no way
expecting and compare it to the answer entered the software says A is not true, it to design a set of tests which include this
you got you are not really testing. Sadly, is false. Therefore, C is not set to the variation and still guarantee that all
the majority of what purports to be test- expected true value but is set to false. defect scenarios will be observable. This
ing in our industry does not meet this cri- However, when we get to G it is still true is caused by a combination of constraints
teria. People look at the test results and Figure 5
just see if they look “reasonable”. Part of
the problem is that the specifications are
not in sufficient detail to meet the most
basic definition of testing.

When test variation four failed it lead to


identifying the B stuck true defect. The
code is fixed and test variation four, the
only one that failed, is rerun. It now
gives the correct results. This meets the
common test completion criteria that
every test has run correctly at least once
and no severe defects are unresolved.
The code is shipped into production.
However, if you rerun test variation one,
it now fails (see Figure 4). The “A stuck
true” defect was not caused by fixing the
B defect. When the B defect is fixed you
year Earl Pottorff and I took on the prob-
lem and discovered the need for a higher
level of test coverage – data flow based
testing.

Data flow coverage adds a higher level of


rigor to the testing process. For each
variable used as input to a statement, it
determines if each possible source of the
data has been tested. For example, let’s
say we have a statement that adds A to B
to get C. There might be three different
places in the program that modify A
where there is a path from that place to
the statement being tested where A is not
overridden in between. Each of these
three data flow relationships must be test-
ed. [Note: data flows are sometimes
called “set-use pairs” in the literature.]

Testing to the C1 level does not guarantee


that each of these data flows will be exe-
cuted. In fact when you reach C1 cover-
age you usually still have 20% to 40% of
these data flows not tested. This is a sig-
nificant amount of function untested.
The basic data flow coverage is called D1
and includes the C1 coverage.

Looking at Figure 6 again we see that Figure 9


segment “8” uses variable X to determine This sets X to “10”; segment “8” there- be types that would have been difficult to
how many times to loop. The logic is that fore loops 10 times. When we executed debug. For example, most non-repro-
it loops, subtracts “1” from X, and checks Test 2, X was last set by segment “7”. ducible defects have spurious data flows
to see if X is now “0”. If it is not “0” it This sets X to “20”; segment “8” will at their root. Static data flow analysis
loops again; if it is “0” it terminates the loop 20 times. Remember that these two actually is able to predict where many of
loop and continues on to the next state- tests satisfied 100% C1 coverage. these will occur before even running the
ment. X is modified by segments “1”, However, we never executed a path tests.
“3”, and “7”. When we executed Test 1 where segment “1” was the last place to
above, X was last set at segment “3”. set X prior to executing segment “8”. We There is another interesting insight from
Figure 8 need to add Test 3 data flow analysis about test suite design
which follows the – i.e., packaging the test cases into sets
path 1, 2 (false), 4, for execution. In order to test a given
5(true), 6, 8. data flow, the test case which includes it
However, let us might have to be in a certain position in
assume that seg- the test suite. The most common require-
ment “1” sets X to ment is for the test to be the first one exe-
minus 1. After we cuted. It is not unusual to have multiple
loop and subtract 1, data flows each requiring their test to be
the loop control first. This requires the test suite to be
variable is now broken into smaller execution packets
minus 2 and so on. each with the right tests first. I have even
This path causes a seen data flows which would be tested
nearly infinite loop only if included in a test which happened
(actually 232-1 iter- to be in just the sixteenth through the
ations if X is a four nineteenth position in the test suite. It
byte field). would work fine if placed anywhere from
the first to the fifteenth position or in the
When we increased twentieth position or later.
the test criteria from
C1 to D1 we did Yet another interesting by product is that
find 25% more in order to execute certain data flows,
code-based defects. multiple transactions must be executed in
These also tended to a particular sequence. What happens is

Vous aimerez peut-être aussi