Vous êtes sur la page 1sur 36

Unit 3 : TLP AND MULTIPROCESSORS

Symmetric and Distributed Shared Memory Architectures


Cache Coherence Issues
Performance Issues
Synchronization Issues
Models of Memory Consistency
Interconnection Networks
Buses
Crossbar and Multi-stage Interconnection Networks

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

Symmetric and Distributed Shared Memory


Architectures

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

Thread-level Parallelism (TLP)


This is parallelism on a more coarser scale
Server can serve each client in a separate
thread (Web server, database server)
A computer game can do AI, graphics, and
physics in three separate threads
Single-core superscalar processors cannot fully
exploit TLP
Multi-core architectures are the next step in
processor evolution: explicitly exploiting TLP
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

Simultaneous MultiThreading (SMT)


Permits multiple independent threads to execute
SIMULTANEOUSLY on the SAME core
Weaving together multiple threads on the same
core
Example:
if one thread is waiting for a floating point operation to
complete,
another thread can use the integer units

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

Simultaneous MultiThreading (SMT)


Without SMT, only a single thread can
run at any given time

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

SMT Processor:
both threads can run
concurrently

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

But: Cant simultaneously use the


same functional unit

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

Cache Coherence Issues

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

Fish Machines
Dual-core Intel Xeon
processors
Each core is hyperthreaded
Private L1 caches
Shared L2 caches

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

10

Private vs Shared
Caches?
Advantages/disadvantages?
Advantages of private:
They are closer to core, so faster access
Reduces contention

Advantages of shared:
Threads on different cores can share the
same cache data
More cache space available if a single (or a
few) high-performance thread runs on the
system
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

11

The Cache Coherence Problem


Since we have private caches:
How to keep the data consistent across
caches?

Each core should perceive the memory as


a monolithic array, shared by all the cores

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

12

The Cache Coherence Problem


Suppose variable x initially contains 15213

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

13

The Cache Coherence Problem


Core 1 reads x

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

14

The Cache Coherence Problem


Core 2 reads x

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

15

The Cache Coherence Problem


Core 1 writes to x, setting it to 21660

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

16

The Cache Coherence Problem


Core 2 attempts to read x gets a stale copy

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

17

Performance Issues
This is a general problem with
multiprocessors, not limited just to multicore
There exist many solution algorithms,
coherence protocols, etc.
simple solutions:
Invalidation-based protocol with snooping
Protocol
Update Protocol
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

18

Invalidation Protocol with


Snooping
Invalidation:
If a core writes to a data item, all other copies
of this data item in other caches are
invalidated

Snooping:
All cores continuously snoop (monitor) the
bus connecting the cores.

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

19

Invalidation Protocol
Core 2 reads x. Cache misses, and loads the new copy

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

20

Update Protocol
Core 1 writes x=21660:

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

21

Invalidation vs Update Protocols


Multiple writes to the same location
invalidation: only the first time
update: must broadcast each write

Writing to adjacent words in the same


cache block:
invalidation: only invalidate block once
update: must update block on each write

Invalidation generally performs better:


it generates less bus traffic
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

22

Synchronization Issues

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

23

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

24

Models of Memory Consistency


Shared Memory is memory for which any
processor in the system can access that memory
space. Accessing memory is costly in terms of time
and as a result, Memory may be physically
distributed among processors. (a small record of
parts of the previously accessed memory). When a
processor needs a certain data, it first checks its
cache for the data. If it does not find the data, the
processor then checks either another processors
cache or the memory for the required data. The
problem answered in the lecture was how a system
handles data coherency of the individual caches
and the shared memory. There are a number of
protocols that have been categorized as either
snoopingIFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0
or directory based to deal with this
problem.

25

Snooping Protocols are based on monitoring bus


activity and carrying out commands depending on that bus
activity. The are four states that might occur when accessing
memory: Read-Hit, Read-Miss, Write-Hit and Write-Miss. A
Read-Hit occurs when a value to be found in memory is
found in the processors cache. A Read-Miss occurs when a
value to be found in memory is not found in the processors
cache. A Write hit occurs when the location in memory to be
referenced is listed in the cache as a latest copy. A Write-miss
occurs when the location in memory to be referenced is not
listed in the cache or is not listed in the cache as a latest
copy(but rather an older copy).
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

26

Interconnection Networks

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

27

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

28

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

29

Buses

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

30

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

31

Inter-Core Bus

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

32

Crossbar and Multi-stage


Interconnection Networks
There are three different types of
interconnection networks used with sharedmemory:
1.Uniform memory access (UMA)
2.Non-uniform memory access (NUMA)
3.Cache-only memory architecture (COMA).

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

33

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

34

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

35

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-3/PPt/Ver1.0

36