Vous êtes sur la page 1sur 9

Parallel Discrete Event Simulation Algorithm for Manufacturing Supply Chains

Author(s): R. Roy and R. Arunachalam

Source: The Journal of the Operational Research Society, Vol. 55, No. 6 (Jun., 2004), pp. 622629
Published by: Palgrave Macmillan Journals on behalf of the Operational Research Society
Stable URL: http://www.jstor.org/stable/4101966 .
Accessed: 23/05/2011 00:07
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=pal. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Palgrave Macmillan Journals and Operational Research Society are collaborating with JSTOR to digitize,
preserve and extend access to The Journal of the Operational Research Society.


Journal of the Operational Research Society (2004) 55, 622-629

C 2004 OperationalResearchSociety Ltd.All rights reserved. 0160-5682/04 $30.00









R Roy* and R Arunachalam

University of Warwick, UK
Paralleldiscreteevent simulation (PDES) is concernedwith the distributedexecution of large-scalesystem models on
multiple processors. It is an enabler in the implementation of the virtual enterprise concept, integrating semiautonomous models of production cells, factories, or units of a supply chain. The key issue in PDES is to maintain
causalityrelationshipsbetween system events, while maximizingparallelismin their execution. Events can be executed
conservativelyonly when it is safe to do so, sacrificingthe extent to which potential parallelismof the system can be
exploited. Alternatively, they can be processed optimistically without guarantee of correctness, but incurring the
overhead of a rollback to an earlier saved state when causality error is detected. The paper proposes a modified
optimistic scheme for distributedsimulation of constituentmodels of a supply chain in manufacturing,which exploits
the inherentoperating characteristicsof its domain.
Journalof the OperationalResearchSociety (2004) 55, 622-629. doi:10.1057/palgrave.jors.2601688
Keywords:simulation;distributedmodels; manufacturing;supply chain

Parallel discrete event simulation (PDES) has received
attention in many applications with large, complex systems,
for example, telecommunications. However, its use in
manufacturing has been limited.' Early studies (eg, reference2) of distributed execution used fine-grained decomposition to map system entities (eg, machines, transportation
devices) on to different processors. Later work also
considered system architectures with loosely coupled submodels,3 in which the relative independence of the processes
reduces communication overhead and makes them particularly suitable for efficient PDES. Much of the research on
parallel and distributed simulation has for many years
concentrated on algorithms for parallel processing, but more
recent studies also include work on web-based cooperative
model development4 and shared processing,5 decomposition
methods for large-scale models based on Discrete Event
Specification (DEVS) formalism,6 and results from the
implementation of parallel processing of a fine-grained
virtual factory model.7
Manufacturing supply chains are usually large, complex
systems consisting of semi-autonomous cells, factories, etc
that are interconnected by material and information flow.
They are asynchronous in nature and, hence, the use of a
global clock for the simulation to proceed in a lock-step
manner is inefficient. Modular development and distributed
processing of the models of the constituent units have the
potential to make the modelling process more manageable
*Correspondence:R Roy, WarwickManufacturingGroup, International
ManufacturingCentre, Universityof Warwick,CoventryCV4 7AL, UK.


and improve execution time significantly, thus making the

simulation of such systems more feasible in practice. The
main focus of research into PDES has been on algorithms
for maintaining causality relationships between system
events. Conservative approaches allow the simulation to
proceed only up to safe time limits that avoid causality
errors, but can limit the extent to which inherent parallelism
can be exploited. Optimistic protocols, on the other hand,
allow a logical process (LP) to proceed without regard to the
future events it will receive, but it rolls back to an earlier
saved state when causality error is detected; the procedure
can be computationally expensive. The relative efficiency of
the two approaches is application dependent.8 This paper
proposes an algorithm based on the optimistic protocol that
is modified to improve performance by taking advantage of
the operating characteristics of supply chain systems.

PDES algorithms
The physical system in PDES is viewed as a number of
interacting physical processes (PPs) and is modelled by
constructing a simulator consisting of Logical Processes
(LPs), one for every PP. Interactions between PPs are
modelled by corresponding LPs sending and receiving
timestamped messages. Simulation proceeds by each LP
processing the events in its input queue in timestamp order.
A causality error, however, can occur if an LP finds a
message in its queue with a timestamp less than its own clock
In the conservative approach,9"1 a clock is associated with
each incoming link of an LP, which is set to the timestamp of


the messageat the top of the queue or, if it is empty, to that

of the last receivedmessage. Each LP repeatedlyselects the
link with the smallest clock time; if the associated queue
contains a message, it is processed or else the LP blocks
(waits). The mechanism guaranteesavoidance of causality
errors, but deadlocks can occur (ie, each LP in the cycle
waiting for a message). Deadlocks are usually avoided
through the use of null messages,sent by an LP on all its
output links after the processingof each event and that are
used to provide lower bounds for timestamps of future
messages that it will send. Empiricalevidence suggeststhat
performance is in many instances affected by a large
proportionof null messages." An alternativeapproachuses
a detection and recovery algorithm;12when a deadlock is
detected, it resorts to sequential processing of events to
advancesimulationtime and resolvethe deadlock.The basic
conservative algorithm is inefficient in exploiting the
inherent parallelismof events, particularlyin the case of
loosely coupled systems, and mostly rely on look-ahead
informationto improveperformance.'3A global synchronization function is used to choose, among all LPs, a set of
events that are safe to be processed,usually based on the
notion of distances between LPs. In practice, a large
proportion of time may be wasted in searchingfor a safe
event.14 The use of look-ahead information requires the
simulation modeller to be involved in the details of the
synchronizationmechanism,and verify that any modifications to the model will not affect the look-aheadproperties.
In optimistic approaches, each LP has a single input
queue,and the only constrainton its executionis that it must
follow the local causality principle. In the commonly
employed Time Warp paradigm,15 when a message arrives
with a timestampsmallerthan the local clock, the LP rolls
back to an earliertime; this may resultin a cascadingseries
of actions to undo the effectsof messagessent by it to other
LPs. Antimessagesare used to provide the event trail for
the cascaded rollbacks.15 Wheneveran LP sends a message
to another, an antimessageis also created and stored in
the correspondingoutput queue;when it rolls back, all the
antimessagesup to the point of the rollback are sent to the
destinationLPs to cancelpreviouslysent messages.A global


virtualtime (GVT), the minimumof the virtualclocks of all

LPs and the timestampsof all messagesin transit,providesa
lower bound of the furthestan LP will need to roll back and,
hence, the time for which state variablesneed to be stored.
Optimistic schemes try to exploit as much parallelismas
possible. The drawback can be a significant overhead of
memory management'6 and 'thrashing' behaviour where
most of the time is spent in executingincorrectevents and
undoing the effects with long, cascaded rollbacks.17 The
executiveof the protocol is more complex to develop than
for a conservativescheme,but the issues of synchronization
aremore transparentto the modeller;however,selectionof a
suitable interval for state-savingoperationsis a problem.'8
Table 1 summarizes the relative merits of the two
The choice of protocol still remainsa problem.8Much of
recent research has been directed at throttling optimistic
behaviourby limitingevent computationsbeyond GVT to a
simulation time window.19Others have proposed limiting
speculativeexecution and rollbacks to a local level, while
remote LPs are sent messagesonly when it is safe to do so;
hence, antimessagesare not needed.20 Adaptiveprotocols to
combineconservativeand optimisticschemeshave also been

The ratio of the number of externalevents scheduledby

one LP on another to the number of internal events
scheduledon itself could be regardedas a measure of the
degreeof coupling betweenthe LPs in a distributedsystem,
and affects the extent of parallelism present. In the
distributedmodellingof a manufacturingsupplychain, each
of its constituentunits (cells, factories,etc) are modelled as
an LP. The external events (eg, placement of orders) are
relatively small in number compared to that of internal
events (eg, start of a machining cycle). The use of a
conservativealgorithmfor such a loosely coupledsystemwill
lead to significantblockages due to the infrequentflow of
messages.The occurrenceof deadlockscan also be high due
to multipleloops in the customer/supplierrelationshipsthat
exist between the LPs. Optimisticapproachesdo not suffer
from these consequences, but performanceis affected by
rollbacks and the potential flooding of messages that can

Comparisonof conservativeversusoptimisticschemes




Limitedby worst-casescenario

Not limited


Dependson the qualityof 'look-ahead'

informationpresentin the simulation

Can exhibit'thrashing'behaviour;significant
overheadof memorymanagement


Simpleto develop

Complex;harderto verifyrobustness

Model development

Complicated;requiresthe modellerto be awareof


Moretransparentand robustto changesin model,

but selectionof state-savingintervalis a problem

624 Journal
Vol.55, No.6

follow. Jefferson15argues that for most applications,each

input to an LP (eg of a part) resultsin a few internalevents
(processing of the part) but only a single external event
(output of the processed part), which would limit antimessages.When an LP receivesan orderin a manufacturing
supply chain, however, it would normally need to place
orders for a number of different components from its
suppliers. Hence, the assumption does not hold and
significantdegradationof performancemay result.

Proposed algorithm
A modifiedalgorithmis proposedherebasedon the assumed
featureof a manufacturingsupplychain that externalevents
(eg shipments)are typicallybatchedtogetherfor action. The
use of weekly MRP planningbucketsis an extremeexample
in relation to order processing. Even in a lean production
environment,the consequencesare not as instantaneousor
rapidwhen comparedto that of internaleventsand, as such,
at least for the purposes of modelling, the external events
could be batched with little or no loss of integrityin the
analysis of supply chains. A rollback then needs to occur
only when the timestampof a message receivedis less than
the simulationclock at which the LP began to process the
last batch of externalmessages;any furtherrollbackswould
be wasted.
In Time Warp,the input and output queuesarepart of the
simulator.A modifiedLP architectureis proposedhere with
the aim of avoidingwasted rollbacks(Figure 1). It has three
functional units: message controller, simulator, and statesavingmechanism.The simulatorand the messagecontroller
share a client-server relationship. The message controller
incorporates the input and output queues and acts as a
message serverto the simulator,which performsthe actual
simulationof the PP. At appropriateintervals,the simulator
requests messages from the message controller, which

responds by servingmessages from the input queue of the

LP. When the simulatoroutputsa message,it is storedin the
appropriateoutput queue and transmittedby the message
controller.The state-savingmechanismis used to enable the
LP to rollback.
Each message controller has a (queue) clock associated
with it that stores the simulation time at which the last
requestfor messageswas made by the simulator.When the
message controllerreceives a request for messages, all that
have timestamps greater than its own queue clock value
(initially set to zero) and less than or equal to the current
simulationtime of the LP are sent. The messagecontroller's
queue clock is then incremented to the value of the
simulation clock. Rollbacks occur when an incoming
message from another LP has a timestamp less than or
equal to its queue clock value, that is, only if the message
would have been processedwith a previous batch if it had
arrivedearlier.A sequentiallist of all previousclock valuesis
maintainedto determinethe point of rollback,whichwould
be the firstvaluein the list that is greaterthan or equal to the
timestamp of the message that caused the rollback. The
procedureis summarizedbelow.

Wait until
request by
send all messages messg in input queue that satisfies
tqueueclk[i]<messg.t ? sim.t; where tqueueclk[i] = last entry in
queue clock list tqueueclk, messg.t = timestampof message,
sim.t = clock time of simulator
tqueueclk[i + 1] = sim.t

An alternativetechniqueto antimessagesis also proposed

using rollbackcounters,which is the numberof times an LP
has rolled back since the start. The count is added to every


Message controller
Input queue



Message to output queue

Save state

Load state

State saving mechanism

Request rollback

Figure1 ModifiedLP architecture.


outgoing message.When an LP rolls back, it is incremented

and a controlmessagewith the currentvalue is sent on all its
output links. The rollbackcounts and the control messages
are used by the message controller to detect invalid
messages. Two types of messages can be received from
another LP: normal message and rollback control message.
For a normalmessage,its timestampis used to determinethe
position in the time-sequencedinput queue where it is to be
inserted.The queue is checked for any messages from the
same LP with a smaller timestamp and a higher rollback
count than that of the messagereceived;this would indicate
that the LP that sent the message has since performed a
rollback and the new (invalid) message is not inserted.
Otherwise,the next step is to check if the simulatorneeds to
roll back. If the timestampof the message is less than or
equal to the currentinput clock value, the message should
have been receivedearlierto satisfy a previous request for
messagesand a rollbackis initiated.The point of rollbackis
determinedby finding the latest in the queue clock list to
have a value greaterthan or equal to the timestampof the
If the incoming message from another LP is, instead, a
rollback control message, it is inserted in the position
determined as before based on timestamp values. All
messages from the same LP lying ahead of the inserted
control message (ie, with greaterthan or equal timestamps)
are checked for any with a lower rollback count than its
own, which would indicatethat the messagewas sent before
that LP rolled back;every such (normalor control)message
is deleted. Finally, as before,if the timestampof the control
messageis less than or equal to the input queue clock value,
a rollback of the simulator is initiated and the point of
rollback is similarlydetermined.The procedureis summarized below.
For every normal message m receivedfrom an LP /


determinei such that

i =position in the input queue Q at which the message is to be
iffor any j = i-1 ..., 1
message Q[j] is from I and Q[j].rbc> m.rbc, delete m; where
rbc- rollback count
insert m at position i in Q
if m.ts clk, issue rollback; where ts = timestampand clk = input
queue clock value
For every control message cm receivedfrom LP 1


determinei such that

i = position in the input queue Q at which the message is to be
insert cm at position i in Q
for all messages Q[j] such thatj> i and Q[j] is from I
if Q[j].rbc < cm.rbc, delete Q(j); rbc= rollback count

if cm.ts<,clk, issue a rollback; where ts = timestampand

clk = input queue clock value

When rollback requiredas a result of (normal or control)
message m

find the last value of i such that
is the queue clocklist
tqueueclk[ij] <m.ts; wheretqueueclk
rollback to t = tqueueclk[i]
rbc + +,; incrementrollback count rbc of LP by one
broadcast rollback control message (t,rbc) on all output links;
t = timestampof control message

Proof of correctness
Consider a true representationPS of a physical system
composed of N processes that communicate exclusively
through message passing. PS is representedby a directed
graph consisting of N vertices {P1, P2, ... ,PN}, each of which

representsa processin PS. A PDES model MS is composed

such that for everyprocessPi, thereexistsan LP, LPi, in MS
that representsthe behaviourof that process.If a messagepassing link exists between Pi and Pj, a correspondinglink
exists between LPi and LPj, and message delivery is
guaranteed.PS could be viewed as the equivalentmodel
based on a global synchronizationclock.
For a process Pk, (k

1, 2,...,N), in PS, define tk = 0 and

Z, the startand end of the simulationperiod.Further,

the sequenceof times at
2t, t4,...,t (k)} represents
Pk processesincoming messagesand the sequenceis
monotonicallyincreasing(the assumptionof batch processing of messages,and the times are not necessarilythe same
for all LPs). Let 1, termed input set, contain all messages
that arrive at process Pk during time interval [t 1, t], for
j- 1,...,n and I- { }. Similarlydefine O4,termed output
set, to contain all messages that Pk generatesin the time
t<]and Ok- { }. Let If(j) and Ok(j) definethe
messageinput and output historiesof processPk, that is, the
sequence of messages received and transmittedby it until
time tj:


Iok + Ok +


+ Ijk

A function FI exists for the process such that Ok(j)

F(f(j-1)). The state of process Pk at time t is represented
by Sf, which depends on the input received by it; hence,
there exists a state transition function Gk such that


S-Define the similar terms

in MS

by using lower case

alphabets,that is, for correspondingLP, LPk, define input
set I&, output set of, input history ik(j), output history ok(j),
output function fk, state s, and state transition function gk.

As it is modelled on an optimistic algorithm, input and

626 Journal
Vol.55, No.6

output histories may be incomplete and invalid messages

may exist.
Define input history ik(j) of LPk to be correct, that is,
1. All valid messages with timestamp t, to<t ti , are

present in the input message stream for LPk, but not

necessarilyall deliveredto it yet.
2. A control message from another LP with a lower
timestampvalue and a higher rollback count than any
normal message from it does not exist anywherein the
input message stream of LPk.
3. If an invalid message m from another LP exists in ik(j)
with timestamptm and rollbackcount rm, then a control
messagec from that LP (as yet undelivered)also exists in
the input message stream of LPk, with timestampt, and
rollbackcount rCsuch that tc< t, and rc> r,,.
Condition CII ensures ik(j) will become complete
(message delivery is guaranteed).Conditions CI2 and CI3
ensure that currentinvalid messagesin ik(j) will be deleted
through the applicationof procedureP3.
Define output history ok(j) of LPk to be correct, that is,
ok(j)=ok(j), if:
1. All valid messages with timestamp t,



presentin the history.
2. A control message with a lower timestampvalue and a
higherrollbackcount than any normalmessagedoes not
3. If an invalidmessagem to anotherLP exists in ok(j) with
timestamp tm and rollback count r,,, then a control
message c to it also exists in ok(j) with timestamp tc

and rollback count rc such that

tc~ ti,


rc> rm.

The three conditions are necessary to ensure that the

messages it sends out to another LP do not violate
conditionsCII, CI2, and CI3, respectively,for correctinput
history of that LP.
A state s4of LPk is definedrecursivelyas valid

is valid (ie, the initial state of an LP is valid)

* is valid if
where k
gk(S n1,
valid state.

This implies that the LP has no outstanding or invalid

messageswith timestampless than or equal to tQ 1.Hence, it
cannot roll back to a state earlierthan s4and will producea
valid output
If, however, is incomplete or
oi--Ok(j). the LP will i/
next transformto a
contains invalid
state that is invalid.
The proof is structuredas follows. It is firstproved that if
an LP is at some valid state then, given correct input

history ik(j), it will transform to the next valid state sf41.

This property is then used to prove that any LP, given

correct input history ik(j), will generate correct output

history ok(j). Finally, using these results, the correctnessof
MS is proved.
Theorem 1 Assuming correct input history ik(j), if an LP is
at some valid state s, then at some time in the simulation the
LP can be guaranteed to be at the next valid state
s4+ 1.

Proof If i is complete and contains no invalid messages,

the LP will process i* and transformto the next valid state
1by the definitionof valid states (note: if ij contains any
message with timestamp greater than t?, it will not be
processed (from Pl)). However, if ik is incomplete and/or
incorrect,the LP will transformto some invalid state.
Let us firstconsiderthe case of if being incomplete,that is,
one or more messagesin 1 (true process input set) are not
includedin 4. Since by the definition of correct history, all
valid messagesare in the input stream,at some point in the
future if will be complete (message deliveryis guaranteed).
From the definition of valid state, the input history up to
time tjl is complete;hence, the last message to complete
will cause the LP to roll back to time t (from P2 and P4)
and states4.The LP now processesif and transformsto state
s+l I since all messageswith timestampsin the interval[t-1,
t] are now in ?j. (NB: the LP before the roll back will have
been in some simulationtime t, ti< t Z.)
Next, considerthe case wherei containsinvalidmessages.
From the definition of correct input history, a control
message with timestampless than an invalid message and a
higher rollbackcount must exist in the input streamwhich,
when it arrives,will delete it and make the LP roll back to
time t4 and state s (from P3 and P4); note that from the
definitionof valid states, the LP cannot roll back to a state
earlierthan s4.Combiningthe two cases, the last (normalor
control) message to complete and make it void of invalid
messages will cause the LP toi- roll back to s, which now
processes and transformsto state


Theorem 2 If an LP receives correct input during the course

of a simulation,it can be guaranteed to generate correct output.

Proof For the output generatedby an LP to be correct,it

must not violate conditions CO1, C02, and C03.
ConditionCO1 From Theorem 1, given correct input the
LP will go through all valid states and, as valid states
generatevalid output, Condition CO1 is satisfied.
ConditionCO2 There are two ways an LP can send a
control message with timestamp less than that of a valid
message:(a) the LP sends the control messagefirst and then
sends the normal message; (b) the LP sends the normal
messagefirst and at some point in the simulationrolls back
to a time less than its timestampand transmitsthe control


Consider the first case. In the proposed algorithm, the

rollback count of an LP is initialisedto zero, and then the
only operation performed on it is to increment its value
every time the LP rolls back. Hence, the rollback count
cannot decreaseduringthe course of the simulationand, if a
control messageis sent before the normalmessage,it cannot
have a greaterrollback count. For the second case, let us
assume that the LP is at a valid state s at time t. As s is
valid, all inputs in the time interval 0< t < have arrived.
t# than and a
Hence, the LP cannot roll back to a time earlier
control messagewith timestampless than tkcannot be sent.
Taking the two cases together, C02 is satisfied, that is, a
control messagewith a lower timestampvalue and a higher
rollbackcount than any normalmessagedoes not exist in the
output history.
Condition C03 Let us assume that the LP outputs an
invalidmessagewith timestamptm and rollbackcount r from
an invalid state. Then some i, where t <
must exist that
is not complete. As correct input history is assumed (and
message delivery is guaranteed),at some point will be
complete and the LP will roll back to t. This willi.cause the
rollback count to be incrementedand the transmissionof a
control messagewith timestamptk and rollbackcount r + 1
(from P4), thus satisfyingC03.
Since all three conditions are satisfied,we can conclude
that given correct input history, an LP is guaranteedto
generatevalid output. EO
Theorems 1 and 2 show that given correctinput, any LP
in MS will produce(in time) correctoutput that, in turn, are
(correct) inputs to all the other LPs with which it has an
output link. Next we consider the behaviour of MS as a
whole. Let us definea simulationrun in the interval[0, Z] to
be described by a set R

{[LP1], [LP2],...,[LPN]}, where

[LPk] denotes the output history of LPk for the entire

simulation run, that is, ok(n(k)) at time tn(k) Z. Since
messages are processed at discrete intervals, states of
LPs are defined at these intervals (state transition points)
only, that is, at 7 {t, 4, t2,...,tn(k)} for LPk, k 1,...,N.
If such message processing times for all the LPs are the
same, PS and any correct implementation of MS will
also go through M+ 1 = (n + 1) valid states at these times.
If they are not all the same, the state transitionpoints will
be defined by T {t0, t,...,tM}
{TUT2U.. .UTN}. Let
Re be the output history at the eth state transitionpoint at
time te.
Theorem3 Every simulationrun will produce correct output histories, Re, 0<e i M, at the state transitionpoints
{to, tl,...,tM} in [0, Z].
Proof Proof is by induction on e. For e= 0, Re= { } and,
hence the theorem holds trivially. Let us assume that for
some e> 0, Re-1 is correctand, hence, the output historiesof

all LPs at time te,_ are correct.ConsiderLPk,an LP in MS.

Since the input history of LPk is made up of the output
historiesof all LPs with which it has an input link, it will be
correct at time te-1 and, hence, LPk will be in a valid state
sk<I1 at time tk 1, where te-2 <-1
te-1. Then from
Theorems 1 and 2, it will reach the valid state sk with a
correct output history ok(j) at time tk, where te~ <ti te+ 1
Continuingthe argumentfor all LPs, we can conclude that
the simulation run will reach the next valid state with a
correct output history Re. By induction,the simulationwill
produce a correctoutput history R at time Z.

Performance of the algorithm

The performanceof a PDES algorithmin any applicationis
dependenton a numberof factors,for example,the extent of
parallelismthat is presentand the 'thrashing'behaviourthat
may follow. Formulation of general rules on a relative
performanceof different algorithmsis hard, even for the
debate on conservativeversusoptimisticprotocols, and any
objective assessmentof a particulartechnique has to deal
with substantiveissues.8,'21
The objective of the paper has
been to present an alternativeto the standardTime Warp
protocol, whichaddressessome weaknessesof the latterwith
benefitsthat could be significant,particularlyin applications
with the potential for high levels of rollbacks, but a true
assessmentwill requirefurtherresearch.
The algorithmhas the potentialto reducesignificantlythe
number of rollbacks. However, it uses an interrogative
approachof batch deliveryof messageswhen an LP requests
it, rather than an imperativescheme of delivery/processing
based simplyon timestamp.The messagecontrollerpolls for
messages ('wait until...'), which requiresadditionalcomputation but this is expected to be small compared to the
savingsin expensiveoperationsfor the much largernumber
of rollbacks that will often result from a standard Time
Warp implementation.Bagrodiaand Liao22drew the same
conclusion in their investigationof wasted rollbacksin the
context of priority servers, and a similar interrogative
approachwas implementedin Maisie, a distributedsimulation language.
The algorithmrequiresan LP to send only one message
per output link to delete all erroneousmessagesinstead of
one per message that may be affected by the rollback,and
thus has the potential to reduce greatly the message
overhead. Cascadedrollbacksmay still occur, but again at
each stage only one (control) message needs to be sent to
each output link. The maximumnumberof messagessent by
an LP network to undo the effects of a rollbackis equal to
the largestdistancethat can be traversedon a graphwithout
traversing a vertex more than once, where the vertices
representthe LPs and the arcs the messagepassing links.
The algorithmalso providesthe system developerwith a
well-defined and efficient mechanism to implement state-

628 Journal
Vol.55, No.6

saving operations that are transparent to the modeller.

Saving the state of the simulationat every clock update is
computationallyexpensive. In practice, periodic saving or
check pointingis usuallyemployedin the implementationof
TimeWarp.Infrequentsavingof state, however,could mean
excessive rollback distance and inefficientexecution, while
too frequentan interval would undo the benefits of check
pointing.The frequencyis thus an importantparameterthat
determinesperformanceand it is difficultto select.18In the
modified algorithm, the simulator always rolls back to a
point in the queue clock list and, hence, it is sufficientto
performstate-savingoperationsonly when this is updated.

A manufacturingsupply chain is typicallya loosely coupled
system, which makes it particularly suitable for parallel
processing.23The use of a conservativealgorithmfor such a
systemis expectedto lead to significantblockages,but there
are also concerns related to performance in applying
optimisticschemesto largemodels.However,rationalization
of the model by introducingbatch processingof messagesat
discreteintervalsallows a clear definitionof rollbackpoints
and the standardversion of the Time Warp algorithmto be
modified to address three key issues-reduce rollbacks,
control the extent of message passing requiredto undo the
effect of invalid operations, and make the state-saving
mechanismmore efficient.The proposedalgorithmdoes this
without, in the terminology of Reynolds,24affecting the
'aggressiveness'and 'risk' inherentin Time Warp.
Most physical systems exhibit a delay in processing
messages. Hence, the requirementfor their batching is in
itself not very restrictive. However, the efficiency of the
algorithmand, hence the benefit, will depend on the batch
frequency. What is an appropriate interval is clearly
dependenton the system environmentand will be a tradeoff decision between computing performance and model
integrity,but the processingof messagesonly a few times in
a simulatedday would be typicallyconsideredsufficientfor
manufacturingsupply chains. The times when this is done
could be parametersdefinedby the modellersand, since they
need not be the same for each LP, set at the local level,which
is particularly useful for modelling of supply chains
operatingin differenttime zones. This is the only decision
related to the algorithm that needs to be taken by the
modeller.The transparencyof the algorithmis an important
featuresince the level of sophisticationneededon the part of
the modellerto exploit the technologysuccessfullyhas often
been cited as a reason for the limited use of PDES by the
Batch processing,however,does not allow for immediate
handlingof urgentorders.Since even in such cases thereis a
processing delay, any loss of model integrity may not be
significantif the batchingintervalis not too infrequent,but a

useful enhancementwould be a mechanismfor the message

controller to interruptthe LP to handle such orders. The
interrogativeapproachused in the algorithmshould help to

1 Peng C and Chen FF (1996). Paralleldiscreteevent simulation

of manufacturing
systems:a technologysurvey.Comput
31: 327-330.
2 HonKKBandIsmailHS (1991).Application
of transputers
simulationof manufacturing
systems-a preliminarystudy.

Proc Inst Mech Eng-Part B, J Eng Manuf205: 19-23.

3 Fujii S, TsunodaH, Ogita A and Kidani Y (1994). Distributed
simulationmodel for ComputerIntegratedManufacturing.In:
Tew JD and Manivannan S (eds). Proceedingsof the 1994
WinterSimulationConference.IEEE: USA, pp 946-953.
4 Ferscha A and Richter M (1997). Java based co-operative
distributedsimulation. In: Andrad6ttirS, Healy KJ, Withers
DH and Nelson BL (eds). Proceedings of the 1997 Winter
SimulationConference.IEEE: USA, pp 381-388.
5 Pidd M and Cassel RA (2000). Using Java to develop discrete
event simulations.J Opl Res Soc 51: 405-412.
6 Lutz R (1998). High Level Architectureobject model development and supportingtools. Simulation71: 401-409.
7 Gan B-P and TurnerSJ (2000). An asynchronousprotocol for
virtual factory simulation on shared memory multi-processor
systems. J Opl Res Soc 51: 413-422.
8 FerschaA (1995).Paralleland distributedsimulationof discrete
event systems. In: Zomaya AYH (ed) Parallel and Distributed
ComputingHandbook.McGraw-Hill:New York, pp 1003-1041.
9 Bryant RE (1977). Simulation of Packet Communications
ArchitectureComputerSystems,MIT-LCS-TR-188,Massachusetts Instituteof Technology.
10 ChandiKM and Misra J (1979). Distributedsimulation:a case
study in design and verificationof distributedprograms.IEEE
TransSoftwareEng SE-5: 440-452.
11 SeethalaksmiM (1978). PerformanceAnalysis of Distributed
Simulation,MS Thesis, Universityof Texas, Austin.
12 Chandi KM and Misra J (1981). Asynchronous distributed
simulation via a sequence of parallel computations. Commun
ACM 24: 198-205.
13 Fujimoto R (1990). Paralleldiscreteevent simulation.Commun


14 Lubachevsky BD (1989). Efficient distributed event-driven

simulations of multiple-loop networks. CommunACM 32:
15 JeffersonDR (1985). Virtual time. ACM TransProgramLang
Sys 7: 404-425.

16 Das SR andFujimotoRM (1997).An empiricalevaluationof

performancetrade-offs in Time Warp. IEEE Trans Parallel
Distrib Sys 8: 210-224.
17 LubachevskyBD, ShwartzA and Weiss A (1991). An analysis
of rollback-basedsimulation.ACM TransModel CompSimul1:
18 Fleischmann J and Wilsey PA (1995). Comparativeanalysis
of periodic state saving techniquesin Time Warp simulators.
In: CorporateIEEE (ed). Proceedingsof the 9th Workshopon
Paralleland DistributedSimulation.IEEE: USA, pp 50-58.

19 TurnerSJ andXu MQ (1992).Performance

evaluationof the
boundedTime Warp algorithm.In: AbramsMA (ed). Proceedings of the 6th Workshopon Paralleland DistributedSimulation.
Society for ComputerSimulation:USA, pp 117-126.


20 Dickens PM and Reynolds Jr PF (1990). SRADS with local

rollback.In: Nicol D and Fujimoto R (eds). Proceedingsof the
SCS Multiconferenceon Distributed Simulation. Society for
ComputerSimulation:USA, pp 161-164.
21 Das SR (2000). Adaptive protocols for parallel discreteevent
simulation.J Opl Res Soc 51: 385-394.
22 Bagrodia RL and Liao WT (1990). Maisie: a language and
optimisingenvironmentfor distributedsimulation.In: Nicol D
and Fujimoto R (eds). Proceedingsof the SCS Multiconference
on DistributedSimulation.Society for Computer Simulation:
USA, pp 205-210.

23 ArunachalamR (2000).An agentbasedcomputational

for supplychainsimulation.PhD thesis, Universityof Warwick.
24 Reynolds Jr PF (1988). A spectrum of options for parallel
simulation.In:AbramsMA (ed). Proceedingsof the 1988 Winter
SimulationConference.IEEE:USA, pp 325-332.
25 FujimotoRM (1993).Paralleldiscreteevent simulation:will the
field survive?ORSA J Comp5: 213-230.

Received February2003;
2003 after one revision