Académique Documents
Professionnel Documents
Culture Documents
=
(
(
(
(
(
(
33 32 31
23 22 21
13 12 11
33 32 31
23 22 21
13 12 11
33 32 31
23 22 21
13 12 11
c c c
c c c
c c c
b b b
b b b
b b b
a a a
a a a
a a a
31 13 21 12 11 11 11 11
b a b a b a c c + + + =
C
11
= 0
4 Pipeline for calculating an inner product :
Inner product consists of sum of k product terms:
Floating point multiplier pipeline : 4 segment
Floating point adder pipeline : 4 segment
after 1st clock input
after 8th clock input
Four section summation
k k
B A B A B A B A C + + + + =
3 3 2 2 1 1
Source
A
Source
B
Multiplier
pipeline
Adder
pipeline
after 4th clock input
A
1
B
1
Source
A
Source
B
Multiplier
pipeline
Adder
pipeline
A
4
B
4
A
3
B
3
A
2
B
2
A
1
B
1
Source
A
Source
B
Multiplier
pipeline
Adder
pipeline
Source
A
Source
B
Multiplier
pipeline
Adder
pipeline
after 9th, 10th, 11th ,...
A
8
B
8
A
7
B
7
A
6
B
6
A
5
B
5
A
4
B
4
A
3
B
3
A
2
B
2
A
1
B
1
A
8
B
8
A
7
B
7
A
6
B
6
A
5
B
5
A
4
B
4
A
3
B
3
A
2
B
2
A
1
B
1
5 5 1 1
B A B A +
, , , 1
6 6 2 2
B A B A +
+ + + + +
+ + + + +
+ + + + +
+ + + + =
16 16 12 12 8 8 4 4
15 15 11 11 7 7 3 3
14 14 10 10 6 6 2 2
13 13 9 9 5 5 1 1
B A B A B A B A
B A B A B A B A
B A B A B A B A
B A B A B A B A C
4 Memory Interleaving :
Simultaneous access to memory from two or more source using one memory
bus system conflict
To avoid this ,partitioned the memory to modules
connected to common memory address and data bus
Module is memory array with data register and address.
Assign d/t address sets to d/t module.
we can use for pipeline and vector processing.
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
AR
Memory
array
DR
Address bus
Data bus
4 Supercomputer
Supercomputer = Vector Instruction + Pipelined floating-point arithmetic
Has multiple functional units & each unit has its own pipeline configuration
Performance Evaluation Index
MIPS : Million Instruction Per Second
FLOPS : Floating-point Operation Per Second
megaflops : 10
6
, gigaflops : 10
9
Cray supercomputer : Cray Research, vector processing
Clay-1 : 12 functional unit, 80 megaflops, 4 million 64 bit words memory,
Clay-2 : 12 times more powerful than the clay-1
VP supercomputer : Fujitsu, vector and scalar process
VP-200 : 300 megaflops, 32 million memory, 83 vector instruction, 195 scalar instruction
VP-2600 : 5 gigaflops
4-7 Array Processors
4 Performs computations on large arrays of data
4 Array Processing
Attached array processor
4 Auxiliary processor attached to a general purpose computer
4 Enhance the performance by providing vector processing for scientific
applications
4 By means of parallel processing with multiple functional unit.
4 Have arithmetic unit with one/two pipelined floating point adder and
multiplier.
Vector processing : Adder/Multiplier pipeline
Array processing : array processors
General-purpose
computer
Input-Output
interface
Attached array
Processor
Main memory Local memory
High-speed memory to-
memory bus
Attached array processor
SIMD array processor
SIMD array processor :
4 Computer with multiple processing units operating in parallel
components
Master Control unit: control the operation in PEs & decode instruction
Main memory: store program
Processing element (PEs) with local memory
4 PEs has ALU, floating point arithmetic unit
4 first store i
th
components a
i
and b
i
Mi for (i= 1,2,.n) store in local
memory then
4 vector processing broadcast to all PEs
c
i
= a
i
+ b
i
Uses masking scheme to control PEs
Example ILLIAC IV
PE1
PE3
PE2
M1
M3
M2
PEn Mn
Master control
unit
Main memory
Class work 5 % time allowed 15min
1. A nonpipeline system takes 40 ns to process a task. The
same task can be Processed in a seven-segment pipeline
with a clock cycle of 10 ns. Determine the speedup ratio of
the pipeline for 100 tasks. what is the maximum speedup
that can be achieved?
2. The time delay of the four segments pipeline are as follows:
t1 50 ns, t2 = 30 ns, t3 = 85 ns, and t4 = 45 ns. The
interface registers delay time tr=5ns.?
a. How long would it take to add 100pairs of numbers in the
pipeline?
b. How can we reduce the total time to about one-half of the time
calculated in part (a)?
3. Draw a space-time diagram for a six-segment Pipeline
showing the time it takes to process nine tasks.