Académique Documents
Professionnel Documents
Culture Documents
=================================
Abstract
~~~~~~~~
Background
~~~~~~~~~~
1. Atomic instructions
[Basic Dekker]
Thread1:
// Ingress
Store (ST) 1 into memory variable T1 -- publish intention to enter cs
Load (LD) from memory variable T2 -- ratify no collision with Thread2
if the value fetched from T2 is non-zero, clear T1 and retry
<CRITIALSECTION>
// Egress
ST T1 = 0 -- release lock
Thread2:
// Ingress
ST 1 into memory variable T2 -- announce intention to enter cs
LD from memory variable T1 -- ratify no collision with Thread1
if the value fetched from T1 is non-zero, clear T2 and retry.
<CRITIALSECTION>
// Egress
ST T2 = 0 -- release lock
Our proposed mutual exclusion mechanism is implemented with loads and stores
and avoids using atomic instructions. As noted above, the protocol to enter
a critical section involves a store followed by a load. Complicating the issue,
modern computer architectures often implement "weakened" or "relaxed" memory
models.
See [9], Section J, "Programming with Memory Models". In a weakened memory model
the system is permitted to make the ST operation visible to other processors
after the LD completes. That is, load and store operations executed on one
processor may be observed out-of-order by other processors. This relaxation
of the memory model may allow certain optimizations in the design of the processor
or its interconnects, potentially improving the overall performance of the system.
A common memory model is Total-Store-Order, or TSO, which is implemented in all
SPARC processors. The IA32 architecture implements the "Processor Ordering" (PO)
memory model, which is almost identical to TSO. See [3]. In TSO, stores must
complete and become visible in the order executed by the processor, but if the
processor executes a store followed by a load the system may reorder the accesses,
allowing the store to complete and become visible after the load finishes.
If we use a Dekker-like store-load mutual exclusion algorithm and the system
reorders the stores to be visible after the load, the Dekker algorithm can fail
and permit two threads into a critical section at one time. (This is referred
to as an exclusion failure and is extremely undesirable as the data protected by
the critical section can become inconsistent). To avoid reordering the programmer
must insert a "memory barrier" or MEMBAR instruction between the ST and the LD.
The MEMBAR instruction directs the processor to make the effect of the ST visible
before executing the LD. Unfortunately the MEMBAR instruction is costly on SPARC
anid IA32 processors, often with a latency similar to that of atomic instructions
such as
CAS.
Note that atomic instructions, such as CAS, and MEMBAR typically have
long latencies and greatly degrade the throughput of the processor that
executes such instructions.
Using the augmented Dekker form instead of the Dekker form with MEMBARs is
profitable if the following inequality holds:
We have found we can improve the performance of a Java Virtual Machine (JVM) by
applying the augmented Dekker scheme to certain synchronization problems found
within the JVM. We describe two applications -- Java monitors and JNI execution
barriers:
Java Monitors - Quickly Reacquirable Locks (AKA "Biased Locking")
~~~~~~~~~~~~~
Each Java object has a "lock word" which indicates if the object is
locked or unlocked. At any given time an object can be locked by
at most one thread.
If some other thread R tries to lock the object while the object is biased
toward T, R needs to revoke or rescind the bias from T. Once the bias is
revoked subsequent synchronization operations on the object revert to the
traditional monitor synchronization mechanism which employs CAS. It is
relatively rare for more than one thread to lock an object during the object's
lifespan, so revocation, while expensive, will be relatively rare. Simple
uncontended locking -- which QRL accelerates by removing the CAS -- is
extremely
frequent in comparison.
To safely revoke bias we need to make sure that the revokee and revoker
threads don't interfere with each other. The revokee (AKA bias-holding-
thread)
can lock and unlock a biased object by mutating the object's lock word with
a simple LD-UPDATE-ST sequence. To avoid interference during revocation, i
the revoking thread arranges that (a) the revokee is not currently in the
midst of
updating the object's lock word, and, (b) that during revocation the revokee
can not enter the "critical section" that updates the object's lock word.
Note that QRL locking is *asymmetric*. Biased lock and unlock operations
are typically quite frequent while revocation will be infrequent.
In the QRL filing [4] one of the embodiments used explicit memory barrier
instructions to synchronize the revoker and the revokee. Briefly, the
MEMBAR-based QRL lock operates as follows:
- The Unlock() method is not shown, but it utilizes the same Dekker critical
section construct as the Lock() code.
- For the purpose of brevity the code below uses spinning, but in
practice a real implementation would avoid spinning and instead use
POSIX pthreads condvars. Also, the code may be vulnerable to livelock.
The back-off mechanisms and other methods used to avoid livelock
are not shown.
Lock (Object)
Retry:
// Enter critical section ...
Self->InCrit = Object // ST
MEMBAR() // MEMBAR
if Self->RevokeInProgress == Object // LD
Self->InCrit = NULL
Delay
goto Retry
// critical section ... protects accesses to object->LockWord
// The critical sections performs a CAS-like operation on the
// lockword, except that the operation isn't atomic.
// Instead of CAS, we use {LD;test;br;modify;ST}
tmp = Object->LockWord
if tmp == BIASEDUNLOCKED(Self)
Object->LockWord = BIASEDLOCKED(Self)
Self->InCrit = NULL // Exit the critical section
return success
Self->InCrit = NULL
if ISBIASEDBYOTHERTHREAD(tmp)
Revoke (Object)
if ISFIRSTLOCK(tmp)
// Attempt to bias the object toward Self.
if CAS(&Object->LockWord, tmp, BIASEDLOCKED(Self) == tmp)
return success
... continue into traditional Lock code ...
Revoke (Object)
LockWord = Object->LockWord
if !ISBIASED(LockWord) return ;
BiasHolder = BIASEDTOWARD(LockWord)
pthread_mutex_lock (BiasHolder->RevokeMutex)
Verify = Object->LockWord
// Re-sample the lockword. It could have changed from BIASED to
// non-biased state. Only the first revoker performs
// revocation. When the 1st revoker resamples the lock-word it will
// still see the object in BIASED state. Subsequent revoke attempts
// (revoking threads) will notice that the lockword changed to non-
BIASED
// and return quickly.
if ISBIASED(Verify) AND BIASEDTOWARD(Verify) == BiasHolder
BiasHolder->RevokeInProgress = Object // ST
MEMBAR() // MEMBAR
while (BiasHolder->InCrit == Object) SPIN() ; // LD
Object->LockWord = UNBIAS (Object->LockWord)
pthread_mutex_unlock (BiasHolder->RevokeMutex)
Lock (Object)
Retry:
// Enter critical section ...
Self->InCrit = Object // ST
if Self->RevokeInProgress == Object // LD
Self->InCrit = NULL
Delay
goto top
// critical section ... protects accesses to object->LockWord
// The critical sections performs a CAS-like operation on the
// lockword, except that the operation isn't atomic.
// Instead of CAS, we use {LD;test;br;modify;ST}
tmp = Object->LockWord
if tmp == BIASEDUNLOCKED(Self)
Object->LockWord = BIASEDLOCKED(Self)
Self->InCrit = NULL // Exit the critical section
return success
Self->InCrit = NULL
if ISBIASEDBYOTHERTHREAD(tmp)
Revoke (Object)
if ISFIRSTLOCK(tmp)
// Attempt to bias the object toward Self.
if CAS(&Object->LockWord, tmp, BIASEDLOCKED(Self) == tmp)
return success
... continue into traditional Lock code ...
Revoke (Object)
LockWord = Object->LockWord
if !ISBIASED(LockWord) return ;
BiasHolder = BIASEDTOWARD(LockWord)
pthread_mutex_lock (BiasHolder->RevokeMutex)
Verify = Object->LockWord
// Re-sample the lockword. It could have changed from BIASED to
// non-biased state. Only the first revoker performs
// revocation. When the 1st revoker resamples the lock-word it will
// will still see the object in BIASED state. Subsequent revoke
attempts
// (revoking threads) will notice that the lockword changed to non-
BIASED
// and return quickly.
if ISBIASED(Verify) AND BIASEDTOWARD(Verify) == BiasHolder
BiasHolder->RevokeInProgress = Object // ST
SERIALIZE(BiasHolder) // SERIALIZE
while (BiasHolder->InCrit == Object) SPIN() ; // LD
Object->LockWord = UNBIAS (Object->LockWord)
pthread_mutex_unlock (BiasHolder->RevokeMutex)
Mutator threads can also call out of java code and into native C code by way
of
the Java Native Interface, or JNI. When a thread calls out it becomes a
non-mutator. To increase potential parallelism between the collector and
threads that have called out, the collector does not stop such threads. The
JVM
does, however, erect a logical execution barrier that prevents such threads
from
reentering the JVM (and becoming a mutator) while a collection is in progress.
The JNI reentry barrier prevents "inscape" - the barrier prevents threads that
are "outside" the JVM on JNI calls from reentering java code while a
collection
is in-progress. As a thread returns back from a JNI call into java code, the
thread passes through the JNI reentry barrier. If collection is in-progress
the barrier halts the thread until the collection completes. The prevents the
thread from mutating the heap concurrently with the collector. The JNI
reentry
barrier is commonly implemented with a CAS or a Dekker-like "ST;MEMBAR;LD"
sequence
to mark the thread as a mutator (the ST) and check for a collection in-
progress (the LD).
The JNI reentry path is a special case of synchronization where the mutator
and collector must coordinate their activities. We will show below how
an augmented Dekker mechanism can be used to accelerate the JNI reentry path,
allowing removal of the MEMBAR. Mutator-collector synchronization for JNI
reentry
mechanism is asymmetric in that calls outs via JNI occur frequently but
collections are relatively infrequent. We speed up the JNI reentry path by
removing the MEMBAR but at the cost of adding code and expense to the path
used to initiate a collection.
Modern "on the fly" collectors, [12], [13], stop threads individually instead
of all threads simultaneously. The JNI reentry barrier should provide
support
for both stop-the-world (STW) and stop-the-thread (STT) collectors. The
ability to
stop individual threads at the JNI reentry barrier is also useful for java-
level
thread suspension and java debuggers.
The collector stops threads that are executing in java code with a mechanism
distinct from the JNI reentry barriers. Threads in java code must
periodically
poll or check a garbage-collection-pending flag. If the flag is set the
thread
stops itself and notifies the collector. The collection can proceed only
after
all threads executing in java have stopped themselves. To assist the
collector
the JNI reentry barrier must also track the state of mutator threads so the
collector can determine if a particular thread is out on a JNI call or
executing
within java code. If the thread is executing in java code the collector will
rendezvous with the thread in a cooperative fashion (setting a flag "asking"
for
the thread to stop and then waiting for the thread to acknowledge the
request).
If the thread is executing outside java code (out on a JNI call) the collector
erects the JNI barrier to ensure the thread can not resume executing java code
while the collection is in-progress.
- The HotSpot 1.4X and 1.5.0 JVMs use MEMBAR-based JNI barriers.
JNIReentry ()
ST 1, Self->InJava
MEMBAR
LD Self->Halt, t
bnz t, StopSelf
...
GarbageCollector ()
for each mutator thread t
t->Halt = 1
MEMBAR
for each mutator thread t
if t->InJava
CooperateWaitForRendezvous (t)
CollectJavaHeap()
for each mutator thread t
t->Halt = 0
if t->InJava
ResumeFromSafepoint (t)
JNIReentry ()
ST 1, Self->InJava
LD Self->Halt, t
bnz t, StopSelf
...
GarbageCollector ()
for each mutator thread t
t->Halt = 1
MEMBAR
for each mutator thread t
SERIALIZE(t) // [TAG1]
if t->InJava
CooperateWaitForRendezvous (t)
CollectJavaHeap()
for each mutator thread t
t->Halt = 0
if t->InJava
ResumeFromSafepoint (t)
SERIALIZE(t)
~~~~~~~~~~~~
(A) If F has completed the {ST A} operation, then the value STed by
F into A will be visible to S by the time SERIALIZE(t) returns.
That is, the {LD A} executed by S will observe the value STed
into A by F.
The slow thread can post a UNIX signal or Windows ACB to the
fast thread and then wait for the fast thread to respond.
The signal handler or ACB executes a MEMBAR and then sends
an acknowledgement or reply to the slow thread, indicating that
the MEMBAR was complete. After sending the signal the slow
thread -- the thread that called SERIALIZE -- waits for acknowledgement
from the fast thread. Upon receipt of the reply the slow thread
knows that a MEMBAR has recently been completed by the fast thread.
In this way the revoker can force the revokee to execute a MEMBAR.
The slow thread literally forces the fast thread to transiently
stop executing its normal instructon stream and instead execute a
MEMBAR.
As applied to QRL, the revoker can send messages to all the martyr
threads -- or, as a a refinement to just the specific processor on
which the revokee was last dispatched. The revoker then waits for
acknowledgement of all the messages it sent. Upon receipt of all
acknowledgements the revokee can know with certainty that context
switching has occurred on the processor where the revokee last ran.
Since context switching implies that the store buffers were flushed,
the revoker then knows that any STs by the revokee to its InCrit field
are visible.
push displacement
-----------------
pull displacement
-----------------
Alternately, the slow thread can bind the fast thread to the
processor on which the slow thread is currently running.
When the bind operation, which is synchronous, returns,
the slow thread is guaranteed that the fast thread went OFFPROC.
This is so-called come-here or "pull" model where the
fast thread visits the processor associated with the slow thread.
Pull displacement is less efficient for STW operations, requiring
O(Threads) binding operations as compared to O(Processors) binding
operations for push displacement.
(3) Passive
If our mechanism users martyr threads and the revoking thread can determine the
last
processor on which the revokee was dispatched then revoker can send a signal to
just the martyr thread associated with that processor. Absent schedctl, the
revoker would need to send signals to *all* martyr threads.
Note that the revokee might migrate to another processor between the time
the revoker sampled sc_cpu and the time the revoker sends the signal to the
martyr thread. This is benign, as migration or context switching activities
also flush a processor's store buffer. In any case - if the revokee is still
running on the CPU, if the revokee has migrated to another CPU, or
if the revokee has been preempted, then we know that all latent STs
by the revokee are flushed and visible to the revoker.
Note that the revoker might load the revokee's sc_cpu value and then
the revokee might (a) be preempted by another thread, or (b) migrate
to some other processor before the revoker can send a signal to the
appropriate martyr thread. That is, the sc_cpu value fetched by the
revoker can become stale and out-of-date almost immediately. This
condition is benign as both context switches (being descheduled from
a processor -- called "going OFFPROC" in solaris terminology)
and migration cause a thread to serialize execution. In any case -
if the revokee is still running on the same CPU, if the revokee has
migrated to another CPU, or if the revokee has been preempted and is
not running -- then at the time the martyr thread acknowledges the
signal we know that if the revoke was in the midst of the critical
ST-LD sequence that either (a) the latent ST to InCrit, if any, will
be visible to the revoker, or the LD by the revokee will observe
the value recently stored by the revoker.
If a thread is not ONPROC then all its user-mode STs are visible
and all speculative past the interrupted user-mode IP is cancelled.
More precisely, there is a memory barrier between the transition
out of user-mode and the ST into the sc_state field. If T2
observes that T1's sc_state field is not ONPROC then T2 is guaranteed
that all T1's prior user-mode STs are visible to T2.
Consider the following trivial example where FastThread and SlowThread both
use mutual exclusion to access a common critical section.
[BASICMPROTECT]
FastThread:
ST InCrit = 1 // mark as in guarded critical
section.
<CriticalSection>
ST InCrit = 0
TrapHandler:
// Return and retry the offending store.
// This constitutes a spin.
return ;
SlowThread:
for (;;) {
while (InCrit == 1) Delay() ;
mprotect PageOf(A) RO // [SETRO]
if (InCrit == 0) break ; // [LDINCRIT]
mprotect PageOf(A) RW
BackOffDelay () ;
}
<CriticalSection>
mprotect PageOf(A) RW
- The fast thread first STs into InCrit. The processor implements
the ST into InCrit as an atomic operation with the following checks:
TrapHandler()
// On Unix or Linux the TrapHandler() would be implemented with signals.
// On Windows the TrapHandler() would be implemented with either
// structured exception handling (SEH) or vectored exception handling
(VEH).
// We return immediately, simply retrying the offending ST.
MEMBAR()
return ;
* [V1] QRL operators constructed with per-thread poll safepoints.
Lock (Object)
Retry:
// critical section ... protects accesses to object->LockWord
// Invariant: no safepoints inside the critical section.
tmp = Object->LockWord
if tmp == BIASEDUNLOCKED(Self)
Object->LockWord = BIASEDLOCKED(Self)
// Exit the critical section
return success
// Exit the critical section
if ISBIASEDBYOTHERTHREAD(tmp)
Revoke (Object)
if ISFIRSTLOCK(tmp)
// Attempt to bias the object toward Self.
if CAS(&Object->LockWord, tmp, BIASEDLOCKED(Self) == tmp)
return success
... continue into traditional Lock code ...
Revoke (Object)
LockWord = Object->LockWord
if !ISBIASED(LockWord) return ;
BiasHolder = BIASEDTOWARD(LockWord)
pthread_mutex_lock (BiasHolder->RevokeMutex)
Verify = Object->LockWord
// Re-sample the lockword. It could have changed from BIASED to
// non-biased state. Only the first revoker performs
// revocation. When the 1st revoker resamples the lock-word it will
// still see the object in BIASED state. Subsequent revoke attempts
// (revoking threads) will notice that the lockword changed to non-
BIASED
// and return quickly.
if ISBIASED(Verify) AND BIASEDTOWARD(Verify) == BiasHolder
ForceSafePoint (BiasHolder) ;
Object->LockWord = UNBIAS (Object->LockWord) ;
ResumeSafepoint (BiasHolder) ;
pthread_mutex_unlock (BiasHolder->RevokeMutex)
Lock (Object)
// Enter QRL critical section
// The ST to InCrit can trap into TrapHandler
Self->VPage->InCrit = Object
tmp = Object->LockWord
if tmp == BIASEDUNLOCKED(Self)
Object->LockWord = BIASEDLOCKED(Self)
// Exit the QRL critical section
// The ST to InCrit can trap into TrapHandler
Self->VPage->InCrit = NULL
return success
Self->VPage->InCrit = NULL
if ISBIASEDBYOTHERTHREAD(tmp)
Revoke (Object)
if ISFIRSTLOCK(tmp)
// Attempt to bias the object toward Self.
if CAS(&Object->LockWord, tmp, BIASEDLOCKED(Self) == tmp)
return success
... continue into traditional Lock code ...
Unlock (Object)
Self->VPage->InCrit = Object // Enter QRL critical section
tmp = Object->LockWord
if tmp == BIASEDLOCKED(Self)
Object->LockWord = BIASEDUNLOCKED(Self)
Self->VPage->InCrit = NULL // Exit QRL critical section
return success
Self->VPage->InCrit = NULL
... continue into traditional Unlock code ...
Revoke (Object)
LockWord = Object->LockWord
if !ISBIASED(LockWord) return ;
BiasHolder = BIASEDTOWARD(LockWord)
pthread_mutex_lock (BiasHolder->RevokeMutex)
Verify = Object->LockWord
// [TAG:QRLMPROTECT:V2]
// Re-sample the lockword. It could have changed from BIASED to
// non-biased state. Only the first revoker performs
// revocation. When the 1st revoker resamples the lock-word it will
// still see the object in BIASED state. Subsequent revoke attempts
// (revoking threads) will notice that the lockword changed to non-
BIASED
// and return quickly.
if ISBIASED(Verify) AND BIASEDTOWARD(Verify) == BiasHolder
BiasHolder->RevokeInProgress = Object
mprotect BiasHolder->VPage READONLY
MEMBAR()
while (BiasHolder->VPage->InCrit == Object) SPIN();
Object->LockWord = UNBIAS (Object->LockWord)
mprotect BiasHolder->VPage READWRITE
BiasHolder->RevokeInProgress = NULL
pthread_mutex_unlock (BiasHolder->RevokeMutex)
TrapHandler()
// On Unix or Linux the TrapHandler() would be implemented with signals.
// On Windows the TrapHandler() would be implemented with either
// structured exception handling (SEH) or vectored exception handling
(VEH).
MEMBAR()
if Self->VPage->InCrit != NULL
// In this case "Self" must be exiting a critical region and
// the ST of NULL into Self->VPage->InCrit trapped.
// Note that when we return from the trap handler we will
// restart the offending ST. The redundant ST is benign.
// Storing and re-storing a NULL into Self->VPage->InCrit
// is an idempotent operation. (It's OK if we do it twice).
mprotect Self->VPage READWRITE
Self->VPage->InCrit = NULL
while (Self->RevokeInProgess != NULL) SPIN();
return // retry trapping ST
Remarks:
-------
Lock (Object)
// Enter QRL critical section
// The ST to InCrit can trap into TrapHandler
// Note that we ST into InCrit and then LD from RevokeInProgress
// with no intervening MEMBAR.
Retry:
Self->VPage->InCrit = Object
if Self->RevokeInProgress == Object
Self->VPage->InCrit = NULL
goto Retry
tmp = Object->LockWord
if tmp == BIASEDUNLOCKED(Self)
Object->LockWord = BIASEDLOCKED(Self)
// Exit the QRL critical section
// The ST to InCrit can trap into TrapHandler
Self->VPage->InCrit = NULL
return success
Self->VPage->InCrit = NULL
if ISBIASEDBYOTHERTHREAD(tmp)
Revoke (Object)
if ISFIRSTLOCK(tmp)
// Attempt to bias the object toward Self.
if CAS(&Object->LockWord, tmp, BIASEDLOCKED(Self) == tmp)
return success
... continue into traditional Lock code ...
Revoke (Object)
LockWord = Object->LockWord
if !ISBIASED(LockWord) return
BiasHolder = BIASEDTOWARD(LockWord)
pthread_mutex_lock (BiasHolder->RevokeMutex)
Verify = Object->LockWord
// [TAG:QRLMPROTECT:V3-BTB]
// Re-sample the lockword. It could have changed from BIASED to
// non-biased state. Only the first revoker performs
// revocation. When the 1st revoker resamples the lock-word it will
// still see the object in BIASED state. Subsequent revoke attempts
// (revoking threads) will notice that the lockword changed to non-
BIASED
// and return quickly.
if ISBIASED(Verify) AND BIASEDTOWARD(Verify) == BiasHolder
BiasHolder->RevokeInProgress = Object
MEMBAR()
// Serialize (BiasHolder) ...
mprotect BiasHolder->VPage READONLY
mprotect BiasHolder->VPage READWRITE
while (BiasHolder->VPage->InCrit == Object) SPIN();
Object->LockWord = UNBIAS (Object->LockWord)
BiasHolder->RevokeInProgress = NULL
pthread_mutex_unlock (BiasHolder->RevokeMutex)
TrapHandler()
// On Unix or Linux the TrapHandler() would be implemented with signals.
// On Windows the TrapHandler() would be implemented with either
// structured exception handling (SEH) or vectored exception handling
(VEH).
// We return immediately, simply retrying the offending ST.
MEMBAR()
return ;
Remarks
-------
tmp = Object->LockWord
if tmp == BIASEDUNLOCKED(Self)
Object->LockWord = BIASEDLOCKED(Self)
// Exit the QRL critical section
Self->InCrit = NULL // Exit Critical
return success
Self->InCrit = NULL // Exit Critical
if ISBIASEDBYOTHERTHREAD(tmp)
Revoke (Object)
if ISFIRSTLOCK(tmp)
// Attempt to bias the object toward Self.
if CAS(&Object->LockWord, tmp, BIASEDLOCKED(Self) == tmp)
return success
... continue into traditional Lock code ...
Revoke (Object)
LockWord = Object->LockWord
if !ISBIASED(LockWord) return
BiasHolder = BIASEDTOWARD(LockWord)
pthread_mutex_lock (BiasHolder->RevokeMutex)
Verify = Object->LockWord
// [TAG:QRLMPROTECT:V3B]
// Re-sample the lockword. It could have changed from BIASED to
// non-biased state. Only the first revoker performs
// revocation. When the 1st revoker resamples the lock-word it will
// still see the object in BIASED state. Subsequent revoke attempts
// (revoking threads) will notice that the lockword changed to non-
BIASED
// and return quickly.
if ISBIASED(Verify) AND BIASEDTOWARD(Verify) == BiasHolder
BiasHolder->RevokeInProgress = Object
pthread_mutex_lock (ProtLock);
if ((Revoking++) == 0) {
IsRW = 0 ;
mprotect RevPage READONLY
}
ASSERT (IsRW != 0 && Revoking > 0) ;
pthread_mutex_unlock (ProtLock) ;
MEMBAR()
while (BiasHolder->VPage->InCrit == Object) SPIN();
Object->LockWord = UNBIAS (Object->LockWord) ;
pthread_mutex_lock (ProtLock)
if ((--Revoking) == 0) {
mprotect RevPage READWRITE
IsRW = 1 ;
}
pthread_mutex_unlock (ProtLock)
BiasHolder->RevokeInProgress = NULL
pthread_mutex_unlock (BiasHolder->RevokeMutex)
TrapHandler()
// On Unix or Linux the TrapHandler() would be implemented with signals.
// On Windows the TrapHandler() would be implemented with either
// structured exception handling (SEH) or vectored exception handling
(VEH).
if (FaultingAddress == RevPage) {
registers->UserIP = Retry ; // Deflect control
}
return ;
Lock (obj)
// Lockword mutator -- guarded critical section
Self->InCrit = obj
tmp = obj->LockWord
if (tmp == BIASUNLOCKED(Self)) {
obj->LockWord = BIASLOCKED(Self) ;
Self->InCrit = NULL ;
return ;
...
Trap()
if FaultingAddress == &Self->InCrit
if (Self->InCrit != null) {
mprotect PageOf(Self) RW
Self->InCrit = null
while (Self->RevObj != null) SPIN() ;
return
Revoke(obj)
for
v = obj->LockWord
if !BIASED(v) return
t = BIASHOLDINGTHREAD(v)
// Exclude other revokers so the only possible concurrency
// is this thread vs the BHT
lock t->RevLock
v = obj->LockWord
// The following test is an optional optimization ...
if !BIASED(vfy) || BIASHOLDINGTHREAD(vfy) != t
unlock t->RevLock
continue ;
t->RevObj = obj
MEMBAR
mprotect PageOf(t) RO
v = obj->LockWord
if !BIASED(v) || BIASHOLDINGTHREAD(v) != t
mprotect PageOf(t) RW
t->RevObj = null
unlock t->RevLock
continue ;
// If the BHT is within the critical section,
// wait for it to vacate.
if Self->InCrit == obj
while (Self->InCrit != 0)
v = obj->LockWord
if !BIASED(v) || BIASHOLDINGTHREAD(v) != t
mprotect PageOf(t) RW
t->RevObj = null
unlock t->RevLock
continue ;
// The BHT is outside the critical section and can't reenter
// There are no other sources of concurrency (lockword mutators)
// This thread (the revoker) has exclusive access to the object's
// lockword
obj->LockWord = UNBIAS(v)
mprotect PageOf(t) RW
t->RevObj = null
unlock t->RevLock
return
* [V3D]
-- [TAG:QRLMPROTECT:QD-BTB]
[TAG:QRLMPROTECT:V3D]
Variation on QRLMPROTECT:QD
See also: QRLMPROTECT:SMP040109B
Lock (obj)
// Lockword mutator
Self->InCrit = obj
tmp = obj->LockWord
if (tmp == BIASUNLOCKED(Self)) {
obj->LockWord = BIASLOCKED(Self) ; // [SQUASH]
Self->InCrit = null ;
return ;
}
if (ISREVOKING(v)) {
Self->InCrit = null ;
// Either spin waiting for obj->LockWord != REVOKING
// or simply retry the Lock() operation. Retrying
// is tantamount to spinning.
retry ;
}
...
Trap()
return
Revoke(obj)
// Replace the classic Dekker (ST InCrit;VMEMBAR;LD RevObj) with a
// distinguished REVOKING lockword encoding. We eliminate RevObj
// and replace it with a distinguished value in obj->MarkWord.
// The new entry/Lock protocol becomes (ST InCrit;VMEMBAR;LD LockWord)
for
v = obj->LockWord
if ISREVOKING(v) continue // spin
t = BHT(v)
if t == null return
if CAS (&obj->LockWord, v, REVOKING(Self)) != v continue ;
MEMBAR
// Ensure that REVOKING is visible to the BHT.
if (opto) {
// Equivalently: Serialize(t)
// See RFE 5079829
mprotect PageOf(t) RO
mprotect PageOf(t) RW
} else {
// Grab the lock to protect native C page mutators
// from encountering spurious traps.
pthreads_mutex_lock (t->RevMux) ;
mprotect PageOf(t) RO
mprotect PageOf(t) RW
pthreads_mutex_unlock (t->RevMux) ;
}
// Wait for the BHT to vacate the guarded critical section.
// The BHT won't reenter the critical section as REVOKING
// is now visible.
while (t->InCrit == obj) ;
MEMBAR #loadload
// Beware: a BHT-mutator could have stomped or overwritten
// the REVOKING value at [SQUASH], above. In that case we
// just retry the revocation operation.
v = obj->LockWord
if v != REVOKING (Self)
continue ;
if CAS (&obj->LockWord, REVOKING(Self), UNBIAS(v)) != v
continue
return
* [V3E]
-- [TAG:QRLMPROTECT:QD-BTB]
[TAG:QRLMPROTECT:V3E]
Variation on QRLMPROTECT:QD
See also: QRLMPROTECT:SMP040109B
-- Possible optimizations:
We elide the mprotect() or Serialize() operations
in Revoke() if the system is a uniprocessor.
Lock (obj)
// Lockword mutator
Self->InCrit = obj
tmp = obj->LockWord
if (tmp == BIASUNLOCKED(Self)) {
obj->LockWord = BIASLOCKED(Self) ; // [SQUASH]
Self->InCrit = null ;
return ;
}
if (v == REVOKING) {
Self->InCrit = null ;
// Either spin waiting for obj->LockWord != REVOKING
// or simply retry the Lock() operation. Retrying
// is tantamount to spinning.
retry ;
}
...
Trap()
return
Revoke(obj)
// Replace the classic Dekker (ST InCrit;VMEMBAR;LD RevObj) with a
// distinguished REVOKING lockword encoding. We eliminate RevObj
// and replace it with a distinguished value in obj->MarkWord.
// The new entry/Lock protocol becomes (ST InCrit;VMEMBAR;LD LockWord)
for
v = obj->LockWord
if v == REVOKING continue // spin
t = BHT(v)
if t == null return
The following variant of QRL avoids MEMBARs and memory protection operations.
At time (1) the ST languishes in the store buffer and is not yet
visible to other processors.
At time (2) the LD returns NULL/false. The Locking() thread
enters the QRL critical section.
At time (5) the LD returns NULL/false as the ST at time (1) is
not yet visible to the revoker. The fetched value is stale.
Both the revoking thread and the Lock()ing thread have entered
the QRL critical section at the same time. Such loss of exclusion
must be avoided.
Note that if the race occurs then the LD at time(2) completed and
fetched NULL and the ST at time (1) has retired, but is not yet visible
to the revoker. Since the LD at time(1) completed we know that all prior
instructions, including the ST at (1) are "committed". The ST will
_eventually become visible to a revoker. By inserting a sufficient delay
between (4) and (5) we can ensure that any ST from time (1) will be visible
at (5). Cf. [15] "Timed Consistency".
Put another way, a precondition for the race is that the LD at (2)
fetches NULL. If the LD fetches NULL then we know that the ST at (1)
is committed and will eventually become visible.
Delay-based QRL
~~~~~~~~~~~~~~~
Lock (Object)
Retry:
// ST InCrit then LD RevokeInProgess with no intervening MEMBAR.
Self->InCrit = Object
if Self->RevokeInProgress == Object
Self->InCrit = NULL
Delay()
goto Retry
Revoke (Object)
LockWord = Object->LockWord
if !ISBIASED(LockWord) return
BiasHolder = BIASEDTOWARD(LockWord)
pthread_mutex_lock (BiasHolder->RevokeMutex)
Verify = Object->LockWord
if ISBIASED(Verify) AND BIASEDTOWARD(Verify) == BiasHolder
BiasHolder->RevokeInProgress = Object
MEMBAR()
DelayForMaximumStoreBufferLatency()
while (BiasHolder->InCrit == Object) SPIN();
Object->LockWord = UNBIAS (Object->LockWord)
BiasHolder->RevokeInProgress = NULL
pthread_mutex_unlock (BiasHolder->RevokeMutex)
Remarks
-------
- The thread executing Revoker() does not have to be idle during the
"DelayForMaximumStoreBufferLatency" interval -- it can
accomplish other useful work.
- Each thread has a private dedicated virtual page which contains its
InJava flag. The sole variable in the page is the InJava field.
Permissions on the page can be changed via mprotect().
The Thread's "Slot" variable points to the thread's page.
In this case the "Halt" flag is encoded in the permissions
of the thread's page.
JNIReentry()
// If the ST traps, control vectors into TrapHandler().
ST 1, Self->Slot->InJava
TrapHandler()
Wait for any concurrent collection to complete
return ; // retry the ST
Collector()
// Stop-the-world
for each mutator thread t
mprotect pageof(t->Slot) READONLY
if t->Slot->InJava then StopMutatorCooperatively(t)
CollectJavaHeap()
for each mutator thread t
if !t->Slot->InJava then
mprotect pageof (t->Slot) READWRITE
Remarks:
Collector()
mprotect all mutator InJava pages READONLY
for each mutator thread t
if t->Slot->InJava then StopMutatorCooperatively(t)
CollectJavaHeap ()
mprotect all mutator InJava pages READWRITE
JNIReentry()
ST 1, Self->Slot->InJava
TrapHandler()
Wait for any concurrent collection to complete
return ; // retry the ST
Collector()
// stop-the-world
mprotect all InJava container pages READONLY
CollectJavaHeap ()
mprotect all InJava container pages READWRITE
Remarks:
JNIReentry()
ST 1, Self->Slot->InJava
LD Self->Halt
if non-zero goto StopSelf
TrapHandler()
Wait for any concurrent collection to complete
return ; // retry the ST
Collector()
// stop-the-world
for each mutator thread t
t->Halt = 1
MEMBAR()
mprotect pageof(t->Slot) READONLY
mprotect pageof(t->Slot) READWRITE
if t->Slot->InJava then StopMutatorCooperatively(t)
CollectJavaHeap ()
for each mutator thread t
t->Halt = 0
Wake the mutator if it is blocked in TrapHandler()
Remarks
-------
JNIReentry()
LD Self->Halt // optimization ...
if non-zero goto StopSelf // optimization ...
ST 1, Self->Slot->InJava
LD Self->Halt
if non-zero goto StopSelf
- This mechanism is analogous to QRL [V3] described above.
JNIReentry()
ST 1, Self->Slot->InJava
LD Self->Halt
if non-zero goto StopSelf
TrapHandler()
Wait for any concurrent collection to complete
return ; // retry the ST
Collector()
// stop-the-world
for each mutator thread t
t->Halt = 1
MEMBAR()
mprotect all mutator InJava pages READONLY
mprotect all mutator InJava pages READWRITE
for each mutator thread t
if t->Slot->InJava then StopMutatorCooperatively(t)
CollectJavaHeap ()
for each mutator thread t
t->Halt = 0
Wake the mutator if it is blocked in TrapHandler()
JNIReentry()
LD Self->Halt // optimization ...
if non-zero goto StopSelf // optimization ...
ST 1, Self->Slot->InJava
LD Self->Halt
if non-zero goto StopSelf
* [V9] Context-switching
JNIReentry()
ST 1, Self->InJava
LD Self->Halt
if non-zero goto StopSelf()
Collector()
for each cpuid
Flushed[cpuid] = 0
for each mutator thread t
t->Halt = 1
MEMBAR()
for each mutator thread t
if SCHEDCTL_ISEXECUTING(t) && !t->InJava then
cpuid = SCHEDCTL_CURRENTCPU(t) // access schedctl sc_cpu
if (!Flushed[cpuid])
Flushed[cpuid] = 1 ;
ContextSwitchTo (cpuid) ;
else
if t->InJava
StopMutatorCooperatively (t)
CollectJavaHeap()
for each mutator thread t
t->Halt = 0
CST1:
CST2:
CST3:
JNIReentry()
// The ST into BarPage will trap into TrapHandler() if the page is
READONLY.
ST 1, Self->InJava
ST 0, BarPage[(Self->ThreadID * CACHELINESIZE) & (PageSize-1)]
LD Self->Halt
if non-zero goto StopSelf
JNIReentry_SLOW()
ST 1, Self->InJava
MEMBAR
LD Self->Halt
if non-zero goto StopSelf
TrapHandler()
Deflect control to JNIReentry_SLOW
return ; // retry the ST
Collector()
// stop-the-world
for each mutator thread t
t->Halt = 1
PageArmed = TRUE
MEMBAR()
mprotect the single BarPage READONLY
mprotect the single BarPage READWRITE
PageArmed = FALSE
for each mutator thread t
if t->InJava then StopMutatorCooperatively(t)
CollectJavaHeap()
for each mutator thread t
t->Halt = 0
Remarks:
JNIReentry()
LD TrapArmed
if non-zero goto JNIReentry_SLOW()
// The ST into BarPage will trap into TrapHandler()
// if the page is READONLY.
ST 1, Self->InJava
ST 0, BarPage[(Self->ThreadID * CACHELINESIZE) & (PageSize-1)]
LD Self->Halt
if non-zero goto StopSelf
* [V11] JNI
JNIReentry()
// The ST into BarPage will trap into TrapHandler() if the page is
READONLY.
ST 1, Self->InJava
LD NeedToStop // optimization to decrease trap rate
if non-zero goto StopSelf // ""
nop // ""
ST 0, BarPage[(Self->ThreadID * CACHELINESIZE) & (PageSize-1)]
JNIReentry_SLOW()
ASSERT Self->InJava == 1
MEMBAR
LD NeedToStop
if non-zero goto StopSelf
TrapHandler()
Deflect control to JNIReentry_SLOW
return ; // retry the ST
Collector()
// stop-the-world
NeedToStop = 1
MEMBAR()
mprotect the single BarPage READONLY
for each mutator thread t
if t->InJava then StopMutatorCooperatively(t)
CollectJavaHeap()
mprotect BarPage READWRITE
NeedToStop = 0
FastThread:
ST 1, A
<CriticalSection>
ST 0, A
TrapHandler:
if (FaultingAddress == &A && A == 1) {
mprotect PageOf(A) READWRITE
A = 2
MEMBAR()
Adjust interrupted IP to skip ST
return
}
SlowThread:
for (;;) {
while (A == 1) Delay() ;
if (A == 2) CAS (&A, 2, 0) ;
mprotect PageOf(A) READONLY
if (A == 0) break ;
mprotect PageOf(A) READWRITE
BackOffDelay () ;
}
<CriticalSection>
mprotect PageOf(A) READWRITE
FastThread:
ST 1, A
<CriticalSection>
ST 0, A
TrapHandler:
if (FaultingAddress == &A && A == 1) {
mprotect PageOf(A) READWRITE
Trp = 1 ;
MEMBAR() ;
return ; // retry trapping ST
}
SlowThread:
for (;;) {
while (A) Delay() ;
if (VARIATION) Trp = 0 ; // optional optimization
mprotect PageOf(A) READONLY
if (A == 0) {
if (Trp) { Trp = 0 ; continue ; }
break ;
}
mprotect PageOf(A) READWRITE
BackOffDelay () ;
}
<CriticalSection>
mprotect PageOf(A) READWRITE
FastThread:
ST 1, A
<CriticalSection>
ST 0, A
TrapHandler:
if (FaultingAddress == &A && A == 1) {
mprotect PageOf(A) READWRITE
return ; // retry trapping ST
}
SlowThread:
for (;;) {
while (A) Delay() ;
if CAS (&A, 0, 2) != 0 continue ;
mprotect PageOf(A) READONLY
if (A == 2) break ;
mprotect PageOf(A) READWRITE
BackOffDelay () ;
}
<CriticalSection>
mprotect PageOf(A) READWRITE
CAS (&A, 2, 0) ;
- Similar to [V3B]
A = Fast-thread InCrit
B = Slow-thread InCrit
The use of "B" is optional, but avoids excessive trap rates.
This form works with either strong- or weak-mprotect semantics.
FastThread:
ST 1, A
LD B // optional optimization to avoid excessive traps
bnz ... // ""
ST Page
<CriticalSection>
ST 0, A
TrapHandler:
MEMBAR
return to FastThread
SlowThread:
ST 1, B
MEMBAR
mprotect Page READONLY
LD A
bnz ...
<CriticalSection>
mprotect Page READWRITE
ST 0, B
Miscellany
~~~~~~~~~~
1. BIASLOCKED(A)
In this case B's Lock attempt came before the A's unlock or
before A's unlock became visible. In either case, B will
attempt to revoke A's bias. The revoke action will ensure that
any latent STs performed by A will be visible to B.
2. BIASUNLOCKED(A)
In this case B will attempt to revoke the bias from A and convert L's
lockword from BIASUNLOCKED(A) to NORMALUNLOCKED.
Recall that A executed {ST X; ST BIASUNLOCKED(A) into lockword;}
Since B observed that the lockword is BIASUNLOCKED(A) we know by TSO
that A's ST into X is also visible to B.
* SerializeOneCPU (c)
SerializeAll ()
SerializeOneThread (t)
SerializeCPUs (CpuList[])
CPUOF(t) -> c or (-1)
Lets say our hypothetical processor has a store buffer, but the
store buffer contains (virtual address,data) pairs. The address
hasn't been translated or checked for protections. We'll call these
STs "in-flight". The store buffer is a strict FIFO. Translation and
validation happen as elements are removed from the store buffer.
If the ST generates an exception then the processor discards the contents
of the FIFO and rolls-back execution accordingly. (Clearly the
micro-architectural storage requirements in the CPU to roll-back state
over a long sequence of instructions could be extremely large).
Lock (Object)
// Enter QRL critical section
Retry:
if (Self->PageIsRO) {
// Slow-path
Self->PageIsRO = 0 ;
MEMBAR() ;
mprotect BiasHolder->VPage READWRITE
// Revert to traditional ST:MEMBAR:LD
Self->VPage->InCrit = Object
MEMBAR() ;
if (Self->RevokeInProgress) { // Excluded ?
Self->VPage->InCrit = NULL
Delay ()
// back-off. Consider using a Dekker-like "turn"
// variable to arbitrate and avoid live-lock.
goto Retry ;
}
} else {
// Fast-path ...
// Note that we ST into InCrit and then LD from
// RevokeInProgress with no intervening MEMBAR.
// The ST to InCrit can trap into TrapHandler
Self->VPage->InCrit = Object
if Self->RevokeInProgress == Object
Self->VPage->InCrit = NULL
goto Retry
}
tmp = Object->LockWord
if tmp == BIASEDUNLOCKED(Self)
Object->LockWord = BIASEDLOCKED(Self)
// Exit the QRL critical section
// The ST to InCrit can trap into TrapHandler
Self->VPage->InCrit = NULL
return success
Self->VPage->InCrit = NULL
if ISBIASEDBYOTHERTHREAD(tmp)
Revoke (Object)
if ISFIRSTLOCK(tmp)
// Attempt to bias the object toward Self.
if CAS(&Object->LockWord, tmp, BIASEDLOCKED(Self) == tmp)
return success
... continue into traditional Lock code ...
Revoke (Object)
LockWord = Object->LockWord
if !ISBIASED(LockWord)
BiasHolder = BIASEDTOWARD(LockWord)
pthread_mutex_lock (BiasHolder->RevokeMutex)
Verify = Object->LockWord
// Re-sample the lockword. It could have changed from BIASED to
// non-biased state. Only the first revoker performs
// revocation. When the 1st revoker resamples the lock-word it will
// still see the object in BIASED state. Subsequent revoke attempts
// (revoking threads) will notice that the lockword changed to non-
BIASED
// and return quickly.
if ISBIASED(Verify) AND BIASEDTOWARD(Verify) == BiasHolder
BiasHolder->RevokeInProgress = Object
BiasHolder->PageIsRO = Object
MEMBAR()
mprotect BiasHolder->VPage READONLY
while (BiasHolder->VPage->InCrit == Object) SPIN();
Object->LockWord = UNBIAS (Object->LockWord)
BiasHolder->RevokeInProgress = NULL
pthread_mutex_unlock (BiasHolder->RevokeMutex)
TrapHandler()
// On Unix or Linux the TrapHandler() would be implemented with signals.
// On Windows the TrapHandler() would be implemented with either
// structured exception handling (SEH) or vectored exception handling
(VEH).
// We return immediately, simply retrying the offending ST.
mprotect Self->VPage READWRITE
Self->PageIsRO = FALSE ;
MEMBAR()
if (Self->VPage->InCrit != NULL) {
Self->VPage->InCrit = NULL ;
}
MEMBAR()
while (Self->RevokeInPress != NULL) SPIN() ;
return ;
* Passive waiting:
= Have a blocking thread spin for at least MAXSTLATENCY nsecs after having
STed into "m.Queue".
The blocking thread knows that a lateny ST of null to Owner,
if such a ST exists, will eventually become visible -- in finite time.
IDEA: merge ST-visibility spin with normally spin-to-acquire
= the blocking thread uses a timed wait/park.
at most one blocking thread uses a timed wait, and then
for at most MAXSTLAT nsecs.
= Arrange for each CPU to periodically serialize and increment a ThisCPU-
>Serialize counter.
The contending thread waits/polls until it observes the CPUOF(Owner)-
>Serialize counter
change.
CPUTICK() { MEMBAR; SelfCPU->SerializeCounter++; }
* Safepoints:
permit the blocking thread to force the owner to a safepoint.
Use STW or STT safepoints. STT are preferred.
= STT per-thread safepoints (not currently available in hotspot)
= STW global JVM safepoints (not viable)
* Context-switching.
Depends on the "OFFPROC implies serialized" property.
= IPC round-trip message to martyr bound to CPUOF(Owner).
= Push displacement : transiently bind blocking thread to CPUOF(putative
Owner).
Displace the putative owner OFFPROC by forcibly migrating the contending
thread onto the
Owner's last known-CPU.
= Pull displacement : transiently bind putative owner to CPUOF(blocking
thread).
Alternately, forcibly but transiently bind the putative owner off it's
last known CPU.
http://home.comcast.net/~pjbishop/Dave/QRL-OpLocks-BiasedLocking.pdf