Multiple Processor Systems

Multiple Processor Systems
8.1 Multiprocessors
8.2 Multicomputers
8.3 Distributed systems
Prepared by Modifying Tanenbaum’s Slides by

Hakan Uraz – Ankara University
Multiprocessor Systems
• Continuous need for faster computers

– shared memory model
– message passing multiprocessor (tightly coupled)
– wide area distributed system (loosely coupled)
Multiprocessors
Definition:
A computer system in which two or
more CPUs share full access to a
common RAM
Multiprocessor Hardware
• Although all multiprocessors have the
property that every CPU can address all of
memory,
– UMA (Uniform Memory Access)
multiprocessors have the additional property
that every memory word can be read as fast as
every other word.
– NUMA (Nonuniform Memory Access)
multiprocessors do not have this property.
4
UMA Bus-based multiprocessors

5
UMA Bus-Based Multiprocessors
• In part (a), if the bus is busy when a CPU wants to
read or write memory, the CPU just waits until the
bus becomes idle.
• In part (b), caches help solve the problem
• If a CPU attempts to write a word that is in one or
more caches, the bus hardware puts a signal on the
bus informing all other caches of the write.
– If other caches have a “clean” copy, they can just
discard their copies and let the writer fetch the cache
block from memory befoe modifying it.
– If some cache has a “dirty” copy, it must either write it
back to memory before the write or transfer it directly
to the writer over the bus.
6
• UMA Multiprocessor using a crossbar switch

UMA Multiprocessor using a crossbar switch
• The crossbar switch is a nonblocking

network
– No CPU is ever denied the connection it needs
because some crosspoint or line is already
occupied.
• The disadvantage is that the number of
crosspoints grows as n2.
8
• UMA multiprocessors using multistage switching
networks can be built from 2x2 switches
(a) 2x2 switch (b) Message format

Module tells whic memory to use
Address specifies an address within the module
Opcode specifies the operation (e.g. READ)
• Omega Switching Network

Omega Switching Networks
• For n CPUs and n memories we would need log2n
stages, with n/2 switches per stage, for a total of
(n/2)log2n switches.
• Suppose that CPU 011 wants to read a word from
memory module 110.
– The CPU sends a READ messade to switch 1D
containing 110 in the Module field.
– The switch takes the leftmost bit of 110 and uses it for
routing. (0 routes to the upper output, 1 routes to the
lower one)
– All the second-stage switches use the second bit for
routing.
11
Omega Switching Networks
• As the message moves through the
switching network, the bits at the left-hand
end of the module number are no longer
needed. They can be put to good use by
recording the incoming line number there,
so the reply can find its way back.
• The omega network is a blocking network.
Not every set of requests can be processed
simultaneously.
12
NUMA Multiprocessor Characteristics

1. Single address space visible to all CPUs
2. Access to remote memory via commands
- LOAD
- STORE
3. Access to remote memory slower than to local
NUMA Multiprocessors
• When the access time to remote memory is
not hidden (no caching), the system is
called NC-NUMA.
• When coherent caches are present, the
system is called CC-NUMA (Cache-
Coherent NUMA).
14
• The most popular approach for building
large CC-NUMA multiprocessors is the
directory-based multiprocessor.
– The idea is to maintain a database telling where
each cache line is and what its status is.
– When a cache line is referenced, the database is
queried to find out where it is and whether it is
clean or dirty.
15
(a) 256-node directory based multiprocessor

(b) Fields of 32-bit memory address
(c) Directory at node 36
• Let us trace a LOAD from CPU 20 that references
a cached line.
– The CPU 20 presents the instruction to its MMU which
translates it to a physical address.
– The MMU splits the address into the three parts shown
in (b). These are node 36, line 4, offset 8
– The MMU sends a request to the line’s home node, 36,
asking whether its line is cached, and if so, where.
– At node 36, the request is routed to the directory
hardware.
– The hardware indexes into table and extracts entry 4.
– Sees that line 4 is not cached and line 4 is fetched from
the local RAM
– Sends it back to node 20, and updates directory entry 4
to indicate that the line is now cached at node 20.
17
• A sequent request, this time asks about node
36’s line 2.
– From (c) we see that this line ic cached at node
82.
– The hardware could update directory entry 2 to
say that the line is now at node 20 and then
send a message to node 82 instructing it to pass
the line to node 20 and invalidate its cache.
• A “shared-memory multiprocessor” has a
lot of message passing going on.
18
Multiprocessor OS Types
Bus
Each CPU has its own operating system

Each CPU Has Its Own OS
• Each OS has its own set of processes that it
schedules by itself. It can happen that CPU
1 is idle while CPU 2 is loaded.
• There is no sharing of pages. It can happen
that CPU 1 has pages to spare while CPU 2
is paging continuously.
• If the OS maintains a buffer cache of
recently used disk blocks, each OS does this
independently of the other ones.
20
Bus
Master-Slave multiprocessors
Master-Slave Multiprocessors
• All system calls are redirected to CPU 1.
• There is a single data structure that keeps track of
ready processes. It can never happen that one CPU
is idle and the other is loaded.
• Pages can be allocated among all the processes
dynamically.
• There is only one buffer cache, so inconsistencies
never occur.
• Disadvantage is that, with many CPUs, the master
will become a bottleneck.
22
Bus
• Symmetric Multiprocessors
– SMP multiprocessor model
SMP Model
• Balances processes and memory dynamically, since
ther is only one set of OS tables.
• If two or more CPUs are running OS code at the same
time disaster will result.
• Mutexes may be associated with the critical regions
of OS.
• Each table that may be used by multiple critical
regions needs its own mutex.
• Great care must be taken to avoid deadlocks. All the
tables could be assigned integer values and all the
critical regions acquires tables in increasing order.
24
Multiprocessor Synchronization
TSL instruction can fail if bus already locked

• To prevent this problem, the TSL must first lock the
bus, preventing other CPUs from accessing it, then
do both memory accesses, then unlock the bus.
• However this method uses a spin lock. Wastes time
and overloads the bus or memory.
• Caching also does not eliminate the problem of bus
contention. Caches operate in blocks. TSL needs
exclusive access to cache block containing the lock.
The entire cache block is constantly being shuttled
between the lock owner and the requestor,
generating more bus traffic.
26
• If we could get rid of all the TSL-induced writes
on the requesting side, cache thrashing could be
reduced.
– The requesting CPU first do a pure read to see if the
lock is free. Only if the lock appears to be free does it
do a TSL to acquire it. Most of the polls are now reads
instead of writes
– If the CPU holding the lock is only reading the
variables in the same cache block, they can each have a
copy of the cache block in shared read only mode,
eliminating all the cache block transfers.
– When the lock is freed, the owner does a write,
invalidating all the other copies.
27
• Another way to reduce bus traffic is to use the
Ethernet binary exponential backoff algorithm.
• An even better idea is to give each CPU
wishing to acquire the mutex its own private
lock variable to test.
– A CPU that fails to acquire the lock allocates a
lock variable and attach itself to the end of a list of
CPUs waiting for the lock.
– When the current lock holder exits the critical
region, it frees the private lock that the first CPU
on the list is testing. It is starvation free.
28
Multiple locks used to avoid cache thrashing

Spinning versus Switching
• In some cases CPU must wait
– waits to acquire ready list
• In other cases a choice exists
– spinning wastes CPU cycles
– switching uses up CPU cycles also (and cache misses)
– possible to make separate decision each time locked
mutex encountered
– a thread failing to acquire a mutex spins for some time.
If a threshold is exceeded it switches. Threshold can be
static, or dynamic (depending on history)
Multiprocessor Scheduling
• On a multiprocessor, scheduling is 2-
dimensional.
– The scheduler has to decide which process to
run and which CPU to run it on
• Another complicating factor is that
sometimes the processes are unrelated,
sometimes they come in groups.
31
• Timesharing
– note use of single data structure for scheduling
• Timesharing
– Provides automatic load balancing
– Disadvantage is the potential contention for the
scheduling data structure
– Also a context switch may happen. Suppose the
process holds a spin lock. Other CPUs waiting on
the spin lock just waste their time spinning until
that process is scheduled and releases the lock.
• Some systems use smart scheduling, in which a
process acquiring a spin lock sets a process-wide flag
to show that it currently has a spin lock. The scheduler
sees the flag and gives the process more time to
complete its critical region.
33
• Timesharing
– When process A has run for a long time on CPU
k, CPU k’s cache will be full of A’s blocks. If A
gets to run again it may perform better if it is run
on CPU k.
– So, some multiprocessors use affinity scheduling.
– To achieve this, two-level scheduling is used.
• When a process is created, it is assigned to a CPU.
• Each CPU uses its own scheduling and tries to
maximize affinity.
34
• Space sharing
– multiple related processes or multiple related threads
– multiple threads at same time across multiple CPUs
• Space Sharing
– The scheduler checks to see if there are as
many free CPUs as there are related threads. If
not, none of the threads are started until enough
CPUs are available.
– Each thread holds onto its CPU until it
terminates (even if it blocks on I/O)
36
• Space Sharing
– A different approach is for processes to actively
manage the degree of parallelism.
– A central server keeps track of which processes
are running and want to run and what theie
minimum and maximum CPU requirements are
– Periodically, each CPU polls the central server
to ask how many CPUs it may use.
– It then adjusts the number of processes or
threads up or down to match what is available.
37
• Problem with communication between two threads

– both belong to process A
– both running out of phase
• Solution: Gang Scheduling
1. Groups of related threads scheduled as a unit (a gang)
2. All members of gang run simultaneously
• on different timeshared CPUs
3. All gang members start and end time slices together
– Time is divided into discrete quanta. At the start of each

new quantum, all the CPUs are rescheduled, with a new
thread being started on each one.
Gang Scheduling
Multicomputers
• Definition:
Tightly-coupled CPUs that do not share
memory
• Also known as
– cluster computers
– clusters of workstations (COWs)
• The basic node consists of a CPU, memory,
a network interface and sometimes a disk.
Multicomputer Hardware
• Interconnection topologies
(a) single switch (d) double torus
(b) ring (e) cube
(c) grid (mesh) (f) hypercube
• Diameter is the longest path between any
two nodes
– On a grid, it increases only as the square root of
the number of nodes N.
– On a hypercube, it is log N.
– But, the fanout and thus the number of links
(and the cost) is much larger for the hypercube.
43
• Switching
– Store-and-forward packet switching
• Packet must bu copied many times.
– Circuit switching
• Bits flow in the circuit with no intermediate
buffering after the circuit is set up.
44
• Switching scheme
– store-and-forward packet switching
Network interface boards in a multicomputer

• Network Interface Boards
– An outgoing packet has to be copied to the interface
board’s RAM before it is transmitted. The reason is that
many interconnection networks are synchronous, bits
must flow at a constant rate.
– If the packet is in the main RAM, this flow cannot be
guaranteed due to other traffic on the memory bus.
– Same problem occurs with incoming packets.
– Some boards have a full CPU, possibly in addition to
DMA channels.
• The main CPU can offload some work to the network board
such as handling reliable transmission.
• However two CPUs must synchronize to avoid race conditions.
47
Low-Level Communication Software
• Also kernel copying may occur.
– If the interface board is mapped into kernel
virtual address space, the kernels may have to
copy the packets to their own memory both on
input and output.
– So, the interface boards are mapped directly
into user space. However a mechanism is
needed to avoid race conditions. But a process
holding a mutex may not leave it.
– Hence there should be just one user process on
each node or some precautions are taken.
48
• If several processes running on node
– need network access to send packets …
• Map interface board to all process that need it
– take precautions for race conditions
• If kernel also needs access to network
– Suppose that while the board was mapped into
user space, a kernel packet arrives?
• Then use two network boards
– one to user space, one to kernel
• How to get packets onto the interface board?
– Use the DMA chip to copy them in from RAM.
• Problem is that DMA uses physical not virtual address
• Also, if the OS decides to replace a page while the DMA
chip is copying a packet from it, the wrong data will be
transmitted
• Using system calls to pin and unpin pages marking them as
temporarily unpageable is a solution
– But expensive for small packets.
– So, using programmed I/O to and from the interface
board is usually the safest course, since page faults are
handled in the usual way.
– Also, programmed I/O for small packets and DMA
with pinning and unpinning for large ones can be used 50
Node to Network Interface Communication

• Use send & receive rings
• coordinates main CPU with on-board CPU
to avoid races
• When a sender has a new packet to send, it first
checks to see if there is an available slot in the
send ring.
• If not, it must wait.
• Otherwise, it copies the packet to the next
available slot and sets the correponding bitmap bit.
• The on-board CPU checks the send ring. If it
contains any packets, it takes the one there longest
and transmits it and clears the bit.
• Since the main CPU is the only one that sets the
bits and the on-board CPU is the only one that
clears them, there are no race conditions. 52
User Level Communication Software
Processes on different CPUs communicate by sending messages
• Minimum services (a) Blocking send call

provided
– send and receive
commands
• These are blocking
(synchronous) calls
• Addresses are two-part

consisting of a CPU number
and a process or port number
(b) Nonblocking send call

• The performance advantage offered by non-
blocking primitives is offset by a disadvantage:
the sender cannot modify the message buffer
until the message has been sent.
– First solution: Have the kernel copy the message to
an internal kernel buffer and then allow the process
to continue as in (b). CPU time wasted for the copy
– Second solution: Interrupt the sender when the
message has been sent to inform it that the buffer is
once again available. No copy. Difficult to program.
– Third solution: Make the buffer copy on write.
– Initial solution: Blocking send (CPU idle during
message transmission).
54
• Blocking send is the best solution

– especially if multiple threads are available.
– it also does not require any kernel buffers to manage
– also, the message will be out the door faster if no
copying is required.
55
• “receive” can also be blocking or nonblocking.
• Blocking call is simple for multiple threads.
• For nonblocking call, an interrupt can be used to
signal message arrival. Difficult to program.
• Receiver can poll for incoming messages.
• Or, the arrival of a message causes a “pop-up
thread” to be created in the receiving process’
address space.
• Or, the receiver code can run directly in the
interrupt handler. Called active messages.
56
Remote Procedure Call
Send/receive are engaged in I/O. Many believe that I/O is the wrong
programming model. Hence RPC is developed.
• Steps in making a remote procedure call

– the stubs are shaded gray
– Packing the parameters is called marshaling.
Remote Procedure Call
Implementation Issues
• Cannot pass pointers
– call by reference becomes copy-restore (but might fail)
• Weakly typed languages
– client stub cannot determine size and cannot marshal
• Not always possible to determine parameter types
– E.g. printf.
• Cannot use global variables
– may get moved to remote machine
Distributed Shared Memory
• Note layers where it can be implemented

– hardware
– operating system
– user-level software
• With DSM each machine has its own virtual
memory and its own page tables.
• When a CPU does a LOAD or STORE on a
page it does not have, a trap to OS occurs.
• The OS then locates the page and asks the
CPU holding it to unmap the mage and send it
over the interconnect.
• In effect, the OS is just satisfying page faults
from remote RAM instead of from local disk
60
One improvement is to
replicate read only pages.
Replicating read-write pages
need special actions.
Replication
(a) Pages distributed on
4 machines
(b) CPU 0 reads page

10
(c) CPU 1 reads page

10
• In DSM systems, when a nonlocal memory word
is referenced, a memory of multiple of the page
size is fetched (because MMU works with pages)
and put on the machine making the reference.
• By transferring data in large units, the number of
transfers may be reduced (consider locality of
reference).
• On the other hand, network will be tied up longer
with a larger transfer blocking other faults
• Also false sharing may occur.
62
• False Sharing
– Processor 1 makes heavy use of A, process 2 uses B. The page
containing both variables will be traveling back and forth
between the two machines.
• Must also achieve sequential consistency
– Before a shared page can be written, a message is sent to all
other CPUs holding a copy of the page telling them to unmap
and discard the page.
– Another way is to allow a process to acquire a lock on a
portion of the virtual address space and then perform multiple
read/writes on the locked memory
Multicomputer Scheduling
• Each node has its own nemory and own set of
processes.
• However, when a new process is created, a
choice can be made to place it to balance load
• Each node can use any local scheduling
algorithm
• It is also possible to use gang scheduling
– Some way to coordinate the start of the time slots is
required.
64
Multicomputer Scheduling
Load Balancing
Process
• Graph-theoretic deterministic algorithm
– Each vertex being a process, each arc representing the flow of
messages between two processes. Arcs that go from one
subgraph to another represent network traffic. The goal is to find
the partitioning that minimizes the network traffic while meeting
the constraints. In (a) above the total network traffic is 30 units.
In partitioning (b) it is 28 units.
Load Balancing
• Sender-initiated distributed heuristic algorithm

– overloaded sender. Sender selects another node at
random and asks its load. If its load is below a
threshold, the new process is sent there. If not, another
machine is probed. If no suitable machine is found
within N probes, process runs on the originating
machine. Under conditions of heavy load, all
machines will send probes with big overhead.
Load Balancing
• Receiver-initiated distributed heuristic algorithm

– under loaded receiver. Does not put extra load on the
system at critical times. It is better to have the
overhead go up when the system is underloaded than
when it is overloaded.
A Bidding Algorithm
• Turns the computer system into a miniature economy.
• The key players are the processes which must buy CPU
time to get their work done, and nodes which auction
their cycles off to the highest bidder.
• Each node advertises its approximate price by putting it
in a publicly readable file.
• When a process wants to start up a child process, it
goes around and checks out who is offering the service.
It then determines the set of nodes whose services it
can afford. It then generates a bid and sends it to its
first choice.
• Processors collect bids sent to them and make a choice
picking the highest one. Winners/loosers are informed.
68
Distributed Systems
Comparison of three kinds of multiple CPU systems.

• Instead of each application reinvent the wheel, what
distributed systems add to the underlying network is
some common model that provides a uniform way
of looking at the whole system.
• One way of achieving uniformity is to have a
“middleware” layer software on top of OS.
Distributed Systems
Achieving uniformity with “middleware” layer on top of OS.

Middleware is like the OS of a distributed system.
Network Hardware
Computer
(a) (b)
• Ethernet
(a) classic Ethernet
(b) switched Ethernet
• Collisions result. Ethernet uses Binary Exponential
Backoff algorithm.
• Bridges connect multiple Ethernets
Network Hardware
The Internet
Routers extract the destinatiın address of a packet and looks it
up in a table to find which outgoing line to send it on.
Network Services and Protocols
Network Services
Network Services and Protocols
• Internet Protocol (v4 and v6)

• Transmission Control Protocol
• Interaction of protocols
• DNS (Domain Name System)
Document-Based Middleware
• The Web
– a big directed graph of documents
Document-Based Middleware
How the browser gets a page
1. Asks DNS for IP address
2. DNS replies with IP address
3. Browser makes connection
4. Sends request for specified page
5. Server sends file
6. TCP connection released
7. Browser displays text
8. Browser fetches, displays images
File System-Based Middleware
(b)
(a)
• Transfer Models
(a) upload/download model
(b) remote access model
Naming Transparency
(b) Clients have same view of file system
(c) Alternatively, clients with different view
• Remote file systems can be mounted onto
the local file hierarchy
• Naming Transparency
– Location transparency means that the path
name gives no hint as to where the file is
located. A path like /server1/dir1/dir2/x tells
that x is located on server1, but does not tell
where server1 is located.
– A system in which files can be moved without
their names changing is said to have location
independence.
79
• Semantics of File sharing

– (a) single processor gives sequential consistency
– (b) distributed system with caching may return obsolete value
• One way out of this difficulty is to propagate all
changes to cached files back to the server
immediately. Inefficient.
• Alternatively, using the upload/download model use
the session semantics: “Changes to an open file are
initially visible only to the process that made them.
Only when the file is closed are the changes visible
to other processes”.
• Alternatively, using the upload/download model,
automatically lock a file that has been downloaded.
81
Client's view
• AFS – Andrew File System
– workstations grouped into cells
– note position of venus and vice
• AFS
– The /cmu directory contains the names of the shared
remote cells, below which are their respective file
systems.
– Close to session semantics: When a file is opened, it is
fetched from the appropriate server and placed in /cache
on WS’s local disk. When the file is closed, it is uploaded
back.
– However, when venus downloads a file into its cache, it
tells vice whether or not it cares about subsequent opens.
If it does, vice records the location of the cached file. If
another process opens the file, vice sends a message to
venus telling it to mark its cache entry as invalid and
return the modified copy. 83
Shared Object-Based Middleware
• Main elements of CORBA based system

– Common Object Request Broker Architecture
– IDL (Interface Definition Language) defines CORBA objects.
– Marshalling like RPC
– Naming Service
• Scaling to large systems
– replicated objects
– Flexibility (on programming language,
replication strategy, security model...)
• Globe
– designed to scale to a billion users
– a trillion objects around the world
– a Globe object is a distributed shared object.
Globe structured object

Every object has one (or more) interfaces with (method
pointer, state pointer) pairs.
• The design of having interfaces be tables in
memory at run time means that objects are
not restricted to any language.
• A process must first bind to a Globe object
by looking it up. A security check is done,
the object’s class object (code) is loaded into
caller, a copy of its state is instantiated and a
pointer to its interface is returned.
87
• A distributed shared object in Globe

– can have its state copied on multiple computers at once
• Globe
– For sequential consistency, one mechanism is to have a
process called the sequencer issuing sequence numbers. To
do a write the write method first gets a sequence number and
then multicast a message containing the sequence number,
operation name,... to all related objects. All processes must
apply incoming methods in sequence number order.
– An alternative replication strategy has one master copy of
each object, plus slave copies. All updates are sent to master
copy and it sends the new state to slaves.
– Another alternative is having only one copy holding the state,
with all the other copies being stateless proxies.
– Each object can have its own replication, consistency,
security policies. This is possible because all the policies are
handled inside the object. Users are not aware of it. 89
The location service

allows objects to be
looked up anywhere
in the world. It is
built as a tree.
Internal structure of a Globe object
The control subobject accepts incoming method invocations and uses
the other subobjects to get them done. The semantics subobject
actually does the work required by the object’s interface
Coordination-Based Middleware
• Linda
– independent processes
– communicate via abstract tuple space
• Tuple
– like a structure in C, record in Pascal
– Unlike objects, tuples consist of pure data
(“abc”, 2, 5)
(“matrix-1”, 1, 6, 3.14)
(“family”, “is-sister”, “Stephany”, “Roberta”)
• Operations: out, in, read, eval
– read is the same as in except that it does not remove the tuple
from the tuple space.
– eval causes its parameters to be evaluated in parallel and the
resulting tuple to be put in the tuple space. This is how parallel
processes are created in Linda.
• E.g. out(“abc”, 2, 5); puts it in tuple space.
• E.g. in(“abc”, 2, ?i); searches the tuple space, if found
the tuple is removed from the tuple space and
variable i is assigned the value. The matching and
removal are atomic. If no matching tuple is present,
the calling process is suspended until another
process inserts the tuple.
This idea can be used to implement semaphores. There
may be duplicate (“semaphore S”) tuples. To create or
do an up a process can execute
out(“semaphore S”);
To do a down, it does
in(“semaphore S”);
92
Publish-Subscribe architecture
Information routers exchange information about subscribers.

• Publish/Subscribe
– When an information producer has a new information,
it broadcasts the information as a tuple on the network.
This action is called publishing.
– Processes that are interested in certain information can
subscribe to certain subjects, including the use of
wildcards.
– Subscription is done by telling a tuple daemon process
on the same machine that monitors published tuples
what subjects to look for.
– Various semantics can be implemented, including
guaranteed delivery, even in crash cases. It is
necessary to store old tuples in case they are needed
later. One way to store them is to hook up a database
to the system and have it subscribe to all tuples.
94
• Jini - based on Linda model
– devices plugged into a network
– offer, use services
– clients and services communicate and synchronize
using JavaSpaces. Entries are strongly typed (in Linda
they are untyped).
• Jini Methods
1. read: copy an entry that matches a template
2. write: put a new entry into the JavaSpace.
3. take: copy and remove an entry
4. notify: notify the caller when a matching entry is
written.
• When a Jini device wants to join the Jini federation, it
broadcasts a packet on the LAN asking if there is a
lookup service present.
• The protocol used to find a lookup service is the
discovery protocol.
• Lookup service replies with a piece of JVM code that
can perform the registration.
• Registration is for a fixed period of time (lease).
• Just before the time ends the device can re-register.
• If device’s request is successful, the proxy that the
requested device provided at registration time is sent
back to requester and is run to contact the device.
• Thus a device or user can talk to another device without
knowing where it is oe even what protocol it speaks.
96
• The write method also specifies the lease time. In
contrast Linda tuples stay until removed.
• Javaspace supports atomic transactions. Multiple
methods can be grouped together. They will either
all execute or none executes.
• Only when a transaction commits, the changes are
visible to other callers.
• JavaSpace can be used for synchronization between
communicating processes. E.g. a producer puts
items in a JavaSpace. The consumer removes them
with take, blocking if none are available. JavaSpace
guarantees atomicity.
97

Multiple Processor Systems

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Multiple Processor Systems

Transféré par

Droits d'auteur :

Formats disponibles

Multiple Processor Systems

Prepared by Modifying Tanenbaum’s Slides by

• Continuous need for faster computers

UMA Bus-based multiprocessors

• UMA Multiprocessor using a crossbar switch

• The crossbar switch is a nonblocking

(a) 2x2 switch (b) Message format

• Omega Switching Network

NUMA Multiprocessor Characteristics

(a) 256-node directory based multiprocessor

Each CPU has its own operating system

TSL instruction can fail if bus already locked

Multiple locks used to avoid cache thrashing

• Problem with communication between two threads

– Time is divided into discrete quanta. At the start of each

Network interface boards in a multicomputer

Node to Network Interface Communication

• Minimum services (a) Blocking send call

• Addresses are two-part

(b) Nonblocking send call

• Blocking send is the best solution

• Steps in making a remote procedure call

• Note layers where it can be implemented

(b) CPU 0 reads page

(c) CPU 1 reads page

• Sender-initiated distributed heuristic algorithm

• Receiver-initiated distributed heuristic algorithm

Comparison of three kinds of multiple CPU systems.

Achieving uniformity with “middleware” layer on top of OS.

• Internet Protocol (v4 and v6)

• Semantics of File sharing

• Main elements of CORBA based system

Globe structured object

• A distributed shared object in Globe

The location service

Information routers exchange information about subscribers.

Vous aimerez peut-être aussi