Vous êtes sur la page 1sur 2

Dealing with deadlock (review) Deadlock detection in distributed

environment (review)
n The Ostrich approach — stick your head in the sand and ignore
the problem
n Deadlock avoidance — consider resources and requests, and n Centralized algorithms
only fulfill requests that will not lead to deadlock u Coordinator maintains global WFG and searches it for

8Too hard for centralized systems, even harder in distributed cycles


systems!! u Ho and Ramamoorthy’s two-phase and one-phase

n Deadlock prevention — eliminate one of the 4 deadlock algorithms


conditions n Distributed algorithms
n Deadlock detection and recovery — detect, then break the u Global WFG, with responsibility for detection spread over
deadlock many sites
8More difficult when state is distributed u Obermarck’s path-pushing

u Must avoid reporting false deadlock u Chandy, Misra, and Haas’s edge-chasing

á In distributed systems, we typically assume single resource n Hierarchical algorithms


instances u Hierarchical organization, site detects deadlocks involving
only its descendants
u Menasce and Muntz’s algorithm
1 2
u Ho and Ramamoorthy’s algorithm

Distributed deadlock detection Distributed deadlock detection


(Obermarck’s Path-Pushing, 1982)
n Individual sites maintain local WFGs
u Nodes for local processes
n Path-pushing
u Node “Pex” represents external processes
u WFG is disseminated as paths — sequences of edges
n Deadlock detection:
u Deadlock if process detects local cycle
u If a site Si finds a cycle that does not involve Pex, it has found a
n Edge-chasing
deadlock
u Probe messages circulate
u If a site Si finds a cycle that does involve Pex, there is the possibility
u Blocked processes forward probe to processes holding
of a deadlock
requested resources
F It sends a message containing its detected cycle to any sites
u Deadlock if initiator receives own probe
involved in Pex
n Diffusion F If site Sj receives such a message, it updates its local WFG
u Query messages sent to dependent set graph, and searches it for a cycle
u Active processes discard query, blocked processes forward • If Sj finds a cycle that does not involve its Pex, it has found a
query under certain conditions, reply under other conditions deadlock
u Deadlock if initiator receives replies to all its queries • If Sj finds a cycle that does involve its Pex, it sends out a
3 message… 4
8Can report false deadlock

Distributed deadlock detection Distributed deadlock detection


(Chandy, Misra, and Haas’s Edge- (evaluation of algorithms)
Chasing, 1983)
n When a process has to wait for a resource (blocks), it sends a probe n Distributed deadlock detection
message to process holding the resource u Sites share responsibility for WFG and deadlock detection
u Process can request (and can have to wait for) multiple resources at 4 No single point of failure
once 4 Robust — multiple sites can detect the same deadlock
u Probe message contains 3 values:
8Avoiding false deadlock is hard
F ID of process that blocked
n Obermarck’s path-pushing
F ID of process sending message
u n(n–1)/2 messages to detect deadlock
F ID of process message was sent to
F n sites
n When a blocked process receives a probe, it propagates the probe to u size of a message is O(n)
the process(es) holding resources that it has requested
n Chandy, Misra, and Haas’s edge chasing:
u ID of blocked process stays the same, other two values updated as
u m(n–1)/2 messages to detect deadlock
appropriate
F m processes, n sites
u If the blocked process receives its own probe, there is a deadlock
u size of a message is 3 integers
5 6
Hierarchical deadlock detection Hierarchical deadlock detection (cont.)
n Sites are organized hierarchically
u A site is only responsible for detecting deadlocks involving its
children sites n Ho and Ramamoorthy, 1982
n Menasce and Muntz, 1979 u Sites are grouped into disjoint clusters
u Sites (called controllers) are organized as a tree u Periodically, a site is chosen as a central control site
F Leaf controllers manage resources F Central control site chooses a control site for each cluster

• Each maintains a local WFG concerned only about its u Control site collects status tables from its cluster, and uses the
own resources Ho and Ramamoorthy one-phase centralized deadlock detection
F Interior controllers are responsible for deadlock detection algorithm to detect deadlock in that cluster
• Each maintains a global WFG that is the union of the u All control sites then forward their status information and WFGs
WFGs of its children to the central control site, which combines that information into a
global WFG and searches it for cycles
• Detects deadlock among its children
u Control sites detect deadlock in clusters
u Whenever a controller changes its WFG due to a resource
request, it propagates that change to its parent F Central control site detects deadlock between clusters

F Parent updates its WFG, and searches it for cycles,


propagates changes upward
7 8

Perspective Deadlock recovery


n How often does deadlock detection run?
n Correctness of algorithms u After every resource request?
u There are few formal methods to prove the correctness of u Less often (e.g., every hour or so, or whenever resource utilization
deadlock detection algorithms — we usually use informal or gets low)?
intuitive arguments
n What if OS detects a deadlock?
n Performance u Terminate a process
u Usually measured as the number of messages exchanged to
F All deadlocked processes
detect deadlock
F One process at a time until no deadlock
F Deceptive since message are also exchanged when there is
• Which one? One with most resources? One with less cost?
no deadlock
F Doesn’t account for size of the message
– CPU time used, needed in future
u Should also measure:
– Resources used, needed
F Deadlock persistence time (measure of how long resources
• That’s a choice similar to CPU scheduling
are wasted) F Is it acceptable to terminate process(es)?

• Tradeoff with communication overhead • May have performed a long computation


F Storage overhead (graphs, tables, etc.) – Not ideal, but OK to terminate it
F Processing overhead to search for cycles • Maybe have updated a file or done I/O
9 10
F Time to optimally recover from deadlock – Can’t just start it over again!

Deadlock recovery (cont.)


n Any less drastic alternatives? Preempt resources
u One at a time until no deadlock
u Which “victim”?
F Again, based on cost, similar to CPU scheduling

u Is rollback possible?
F Preempt resources — take them away

F Rollback — “roll” the process back to some safe state, and


restart it from there
• OS must checkpoint the process frequently — write its state
to a file
F Could roll back to beginning, or just enough to break the
deadlock
• This second time through, it has to wait for the resource
• Has to keep multiple checkpoint files, which adds a lot of
overhead
u Avoid starvation
F May happen if decision is based on same cost factors each 11 time
F Don’t keep preempting same process (i.e., set some limit)

Vous aimerez peut-être aussi