Académique Documents
Professionnel Documents
Culture Documents
4. Implementation Details
To enable model checking, we forcefully introduced bugs in each
of the algorithms and then tested the algorithms using JPF to
Figure 5. Leader Election diagram detect deadlocks and assertion violations. We describe briefly the
methodology by which the bugs were introduced into the
3.3 Leader Election algorithms.
A pictorial representation of leader election algorithm is shown in
Figure 6. As the name suggests, the leader election algorithm is
used to elect the leader from a set of process which are arranged 4.1 Dining Philosophers
in the form of a ring. The algorithm that we consider in this paper We modified the algorithm in way that a deadlock of the system
is the Chang-Roberts algorithm. “The algorithm ensures that the is possible. We implemented this as follows: We did not use the
process with the maximum identifier gets elected as the leader. concept of dirty or clean fork and request token. Instead,
Every process sends messages only to its left neighbor and whenever a philosopher becomes active, she goes to hungry state
receives messages from the right neighbors. A process can send and tries to grab first the left and then the right fork. She then
election message along with its identifier to its left, if it has not goes to eating state. Eating is simulated by sleeping for one-
seen any message with a higher identifier than its own identifier. hundred milliseconds. After that, the philosopher goes into
It also forwards any message that has an identifier than its own thinking state modeled by sleeping for one-hundred milliseconds.
identifier. It also forwards any message that has an identifier The philosopher then tries to eat again. This sequence is repeated
greater than its own; otherwise, it swallows that message. If a in a cycle. Such implementation of the algorithm induces the
process receives its own message, then it declares itself as the possibility of a potential deadlock when there are two or more
leader by sending a leader message. philosophers. The invariant in this case is “There is no deadlock
In this algorithm, one or more processes may spontaneously wake in the system”. If the invariant is violated a counterexample is
up and initiate the election. When a process wakes up, either provided by JPF.
spontaneously or on receiving a message, it first executes the We tried to introduce different kinds of bugs in the algorithm and
wake up procedure before receiving the message. A process explored the applicability of our approach. For example, one of
knows that the election is done when the leaderid is not null” [4]. the modifications we tried was, each odd philosopher takes the
The algorithm is shown the Figure 7. left fork first and then the right fork, while each even philosopher
takes the right fork first and then the left. Another example is to
change the behavior of one philosopher to be different from the 5.1 Dining Philosophers
behavior of the rest. When we ran the algorithm using the defective logic described in
section 4.1, the following are the results that we have obtained.
4.2 Lamport Mutex
We introduced a defect in Lamport Mutex in three ways (i) Every From To Without With Improvement
odd or even numbered process could enter the critical section Threads Threads trace Trace (%)
without explicitly requesting for it and checking if entering the 2 4 402 207 50%
critical section is allowed. (ii) Processes could enter the critical
section when its request timestamp is the lowest in the request 2 6 3650 3248 12%
queue OR when its request timestamp is the lowest in the direct 3 6 3650 1677 54%
dependency clock queue but NOT both and (iii) process with id
2 5 1186 1131 5%
equal to two could enter the critical section without explicitly
requesting for it and checking if entering the critical section is Table 1. Results for Dining Philosophers
allowed. We clearly see that the effect of using a trace of the previous run
The logic behind inducing such defects in the code was to make helps us in finding assertion violation after visiting lesser number
possible more than one process to enter the critical section at the of states for a bigger number of threads/process. The percent
same time. The assertion that we stated in the program was improvement is also shown. We find that in some cases (for
“Number of processes in the critical section is no more than one”. example scaling from 2 to 5 and from 2 to 6) the improvement is
If this assertion is violated in the above scenario, we would get a not very great. The improvement basically depends on how
counterexample. symmetric the old and the new problems are. More the symmetry,
more will be the improvement.
4.3 Leader Election
For the leader election part, we altered the code in a way that each 5.2 Leader Election
process declares itself as a leader once it gets a leader message. In the Leader Election algorithm, we noticed an even better
The assertion that we have is “There is only one leader in the improvement than in the Dining Philosopher’s problem. The
system”. following are the results of the Leader Election algorithm when
Other modifications of the algorithm that we tried include: the we run it with a defective code as described in Section 4.3
process with id equal to two announces itself as a leader without From To Without With Improvement
election; when a process receives a message with id greater than Threads Threads trace Trace (%)
or equal to its id minus one; If this assertion is violated, in the
3 6 238 209 13
above scenario, JPF would detect an assertion violation.
6 9 490 293 40
4.4 Testing Methodology
9 12 832 302 64
Once the above three algorithms are made defective using the
modifications described above, we proceed to test the code using 12 14 1109 371 66.5
JPF. The flow of the testing is as follows: (i)We test the programs 14 17 1602 441 73
for large number of process (N) without using a partial trace and
Table 2. Results for Leader Election
note down the number of explored states (A) to get a
counterexample; (ii)We enable trace capturing and run JPF for a As we can there is a substantial improvement in the results once
small number of processes (n << N) and note down the number of we guide the JPF and it finds assertion violations much more
explored states (B) to get a counterexample; (iii)We run the easily than if we let it run normally.
program for N processes using the trace from the previous step to
guide the search and note down the number of explored states (C) 5.3 Lamport Mutex
to get a counterexample; We determine the improvement due to For the Lamport Mutex algorithm we noticed no improvement in
using the partial trace using the formula: the number of explored states even after replaying the trace of
smaller runs. Even in most of the cases replaying the trace was
R = (A – (B + C) / A) * 100 not possible. After some brainstorming we concluded that the
We explored different values for n and N, thus experimenting reason for these results is the fact that the state space of the
with various scaling factors between two consecutive steps. program mutates with changing the number of processes. This
During this iterative process we observe the improvement in explains why we can not replay a trace for smaller scope or in the
terms of number of states explored by the model checker. few cases when this was feasible the results were unsatisfactory.
However, how could we recognize that an algorithm would have
5. Results such a property? A logical explanation is that the behavior of a
In this section we present the empirical results obtained during process depends on its interaction with all the other processes in
our research and provide some interpretations of the results. For the system (i.e. the behavior depends on the number of processes
each of the modified algorithm we report the results in separate in the system). For this algorithm a process needs acks from all
sections. the other (N – 1) processes to enter the critical section. For
example, if we have a stored trace for four processes, every time
before one of them enters the critical section it needs three
acknowledgements. Now, if we try to use this trace for eight
processes during the replay no process will be able to enter the The approach taken by Mittal and Garg [7] used a debugger to
critical section (because it needs seven acks), even though for the reexecute a traced computation. They use predicate control to
scope of four it was able to do that. This means that the state of debug a distributed systems program. They developed an
the system for the first four processes will not replicate the one algorithm to maintain predicate by adding synchronizations and
from the previous run (smaller scope), which makes the showed that the concept of ‘admissible sequence of events’ helps
consecutive choices in the trace irrelevant. transform synchronization problem to finding a path in the graph.
Godefroid and Khurshid [11] used genetic algorithms to guide
5.4 Bug Fix in Java PathFinder state search to error states. The tool they used was verisoft. They
Initially, when we tried to scale a counterexample by replaying a
concluded that genetic algorithms did outperform traditional
trace file, we found out that we were getting ‘No error found’
systematic and random state space searches.
every time we increased the scope. We then tried to create our
own code for replaying the choices from the trace file. JPF We found that there was no work that had been done in testing
provides the opportunity of creating listener classes and algorithms using model checking by storing the traces and then
registering objects of these classes in the search and JVM. This replaying it. Our work shows that it is indeed possible to do so.
allows us to execute particular event handlers when a specific
event occurs. During our implementation we added several 7. Conclusion
control prints, in particular in the event just before a new choice is We presented a novel approach for testing distributed programs
to be made. This printing showed that while replaying the trace using Java PathFinder that addresses the combinatorial explosion
JPF starts to backtrack until it reaches the initial state. Note that problem by reducing the state space to be explored by the model
when trace is used JPF marks the choice generators as done and checker. We introduced the idea of incremental model checking.
no backtracking on them is possible. Then, while replaying a trace Our approach uses counterexamples for smaller scope (number of
if JPF reaches an already explored state, it would backtrack until threads) as partial trace to guide subsequent searches on an
the initial state and report that no error is detected. To understand increased scope (increased number of threads). This could be
the cause of this problem we started to compare the choices from repeated iteratively until the scope of the original program is
the trace file and the ones printed during the replay phase. We reached.
noticed that choices start to repeat in a loop once JPF reaches the
end of the file. What JPF should do is once it is done with We evaluated our approach on three distributed algorithms—
execution of the trace file it should start searching from that point. Dinning Philosophers, Lamport Mutual Exclusion and Robert-
After manual inspection of the source code of JPF we localized Chang Leader Election. Our results show that this technique leads
and fixed the problem. We also sent a patch to Peter Mehlitz of to decrease in the total number of states that need to be explored if
NASA. Note that to solve this problem we had to study the the given distributed program has certain properties. Namely, the
architecture of JPF and understand the way JPF works. behavior of all threads must be deterministic and identical.
Further, the behavior of each thread must not depend on the
number of threads in the system. We plan to explore the
6. Related Work applicability of our approach to other distributed algorithms and
An earlier work related to model checking was done by Tatsuhiro
also explore how its effectiveness is affected by different heuristic
Tsuchiya and Andre Schiper [9]. Their job was to verify
search algorithms (such as A*, Iterative Deepening etc).
asynchronous ‘heard of’ (HO) model based algorithms for solving
consensus problems in Distributed Systems. Their idea was to fix
the total number of processes but explore every possible state 8. References
with these fixed processes. They use symbolic model checking [1] GuillaumeBrat, Klaus Havelund, SeungJoon Park and
wherein Boolean functions represented by Binary Decision Willem Visser ‘Java PathFinder, Second Generation of a
Diagrams (BDDs) are used to represent the state space. The Java Model Checker’, Research Institute for Advanced
model checker they used was called NuSMV. Their methodology Computer Science, 2000.
took about 30 minutes to explore 108 states. Another work in this [2] Tatsuhiro Tsuchiya, Shin’ichi Nagano, Rohayu Bt Paidi
area was done by Garg and Tomlinson [10] wherein a user can and Tohru Kikuno ‘Symbolic Model Checking for Self
specify an unwanted behavior of a distributed process as a Stabilizing Algorithms’, Vol. 12, No.1, IEEE, Jan 2001.
sequence of relevant events. A computation verifies all this by
checking if a causal path of the computation matches one of the [3] Willem Visser, Corina S. Pasareanu and Sarfraz Khurshid
sequences created above. A similar concept of replaying the ‘Test Input Generation with Java PathFinder’, ISSTA’04,
messages from the trace was used by Garg and Tarafdar [5] July, 2004.
wherein the execution of a distributed process was traced and then [4] Vijay K. Garg ‘Elements of Distributed Computing’,
a synchronization strategy was developed to prevent this failure in Wiley and Sons, IEEE press, 2002.
future that is during the rexecution after a failure, information [5] Ashis Tarafdar and Vijay Garg ‘Software Fault Tolerance
from the previous trace was used to prevent this failure from of Concurrent Programs using Controlled Re-execution’,
occurring in future. Chakraborty and Garg [6] used another DISC’99, Slovakia, pp210-224.
approach to prevent the state space explosion. They used the
distributive lattice properties of global state graph to reduce the [6] Arindam Chakraborty and Vijay Garg ‘On Reducing the
number of state spaces. They concluded that using a monitor Global State Graph for Verification of Distributed
process which formed the global state lattice, the number of state Computations’ 7th International Workshop on
spaces was indeed reduced. Microprocessor Test and Verification (MTV'06) Common
Challenges and Solutions, December 2006, Austin, Texas, [10] Eddy Fromentin, Michael Raynal, Vijay Garg and Alex
USA. Tomlinson.’ On The Fly Testing of Regular Patterns in
[7] Neeraj Mittal and Vijay Garg ‘Debugging Distributed distributed computations’
Programs Using Controlled Re-execution’ ACM [11] Patrice Godefroid and Safraz Khurshid ‘Exploring Very
Symposium on Principles of Distributed Computing Large State Spaces using Genetic Algorithms’, TACAS
(PODC'00), Portland, Oregon, July 2000, pp. 239 – 248. 2002, pp 266-280, 2002.
[12] http://javapathfinder.sourceforge.net. Date Accessed: April
[8] W. Visser, K. Havelund, G. Brat, S. Park and F. Lerda 23rd, 2007.
‘Model Checking Programs’, Automated Software [13] http://en.wikipedia.org/wiki. Date Accessed: April 23rd,
Engineering Journal.Volume 10, Number 2, April 2003. 2007.
[9] Tatsuhiro Tsuchiya and Andre Schiper “Model Checking
of consensus algorithms”, Technical Report, Osaka
University