Académique Documents
Professionnel Documents
Culture Documents
Ch 15
ARM Organization and
Implementation
Summary
ARM architecture
5 stage pipeline
CMOS technology reduced size by ~1/10
Performance of cores improved dramatically
1995- : Separate instruction and data memories
In this chapter
Arithmetic/logic operations
Select and hold memory address.
Sequential addressing
Data register
ALU
address register
P
C
incrementer
PC
register
bank
instruction
control
A[31:0]
decode
A
L
U
b
u
s
multiply
register
&
b
u
s
b
u
s
barrel
shifter
control
ALU
D[31:0]
3
instruction
fetch
decode
execute
fetch
decode
execute
fetch
decode
execute
time
fetch STR
execute
decode
fetch ADD
fetch ADD
execute
decode
execute
execute
instruction
time
Embedded Systems, KAIST
All instructions occupy the datapath for one or more adjacent cycles
For each cycle that an instruction occupies the datapath, it occupies
the decode logic in the immediately preceding cycle
During the first datapath cycle, each instruction issues a fetch for the
next instruction
Branch instructions flush and refill the instruction pipeline.
Tprog =
N inst CPI
f clk
Instructions which occupy more than one slot: To occupy fewer slots
Pipeline stalls caused by dependencies between instructions are reduced.
Memory bottleneck
Solution
5-stage pipeline
Reduced CPI
next
pc
I-cache
immediate
fields
mul
LDM/
STM
instruction
decode
register read
4. Buffer/data
I decode
r15
3. Execute
pc + 8
1. Fetch
2. Decode
fetch
pc + 4
+4
+4
postindex
reg
shift
shift
pre-index
execute
ALU
forwarding
paths
mux
B, BL
MOV pc
SUBS pc
byte repl.
5. Write-back
Write-back to register or
memory
load/store
address
D-cache
buffer/
data
rot/sgn ex
LDR pc
write-back
10
Data forwarding
Exception
11
address register
increment
Rd
PC
increment
Rd
PC
registers
Rn
registers
Rm
Rn
mult
mult
as ins.
as ins.
as instruction
as instruction
[7:0]
data out
data in
i. pipe
data out
data in
i. pipe
12
address register
increment
increment
PC
Rn
registers
PC
registers
Rd
Rn
mult
mult
shifter
lsl #0
=A / A+B/ A-B
=A+B/ A-B
[1 1:0]
data out
data in
i. pipe
byte?
data in
i. pipe
13
address register
increment
increment
R14
registers
registers
PC
PC
mult
mult
shifter
lsl #2
=A
=A+ B
[23:0]
data out
data in
i. pipe
data out
data in
i. pipe
14
4. ARM Implementation
Design
Clocking scheme
15
Datapath timing
register
read
time
shift time
ph ase 2
precharge
invalidates
buses
register
write time
ALU time
ALU o ut
Embedded Systems, KAIST
16
The
The
The
The
The
ALU delay
Dominant
Highly variable
17
Adder design
Cout
A
B
CMOS AND-OR-INVERT
gates
Worst-case carry path:
32 gates long
sum
Cin
G: carry generate
P: carry propagate
Worst-case carry path:
8 gate delays
AND-OR_INVERT gates
& AND/OR logic
Cout[3]
A[3:0]
4-bit
adder
logic
B[3:0]
Cin[0]
sum[3:0]
18
fs: 5
NB
bus
01 23
carry
logic
G
ALU
bus
P
NA
bus
19
Computes the sums of various fields of the word for a carry-in of both
and one
The final result is selected by using the carry-in value to control a
multiplexer
Critical path O(log2[word width])
a,b[3:0]
+
c
a,b[31:28]
+, +1 +, +1
s s+1
mux
mux
mux
sum[31:16]
20
B operand latch
XOR gates
XOR gates
invert B
invertA
function
logic functions
logic/arithmetic
adder
C in
C
V
result mux
N
zero detect
result
21
unknow n
unknow n
22
no shift
left 1
in[2]
left 2
in[1]
left 3
in[0]
23
Multiplier design
24
High-speed multiplier
Carry-save adder
(a)
B Cin
Cout
(b)
B Cin
Cout
B Cin
Cout
B Cin
Cout
B Cin
Cout
B Cin
Cout
B Cin
Cout
B Cin
Cout
25
Speed up multiplication
by a factor of ~3
Added functionality
of the 64-bit result.
Rs >> 8 bits/cycle
Rm
rotate sum and
carry 8 bits/cycle
carry-save adders
partial sum
partial carry
26
read read
A
B
ALU bus
A bus
B bus
write
Vdd
Vss
ALU
bus
ALU
bus
PC
bus
INC
bus
PC
register cells
A bus
B bus
27
Datapath layout
address register
incrementer
Ad
PC
A B
inc
register bank
multiplier
shift out
ALU
shifter
data in
instruction
Din
Embedded Systems, KAIST
instruction pipe
data out
28
Control structures
instruction
coprocessor
decode
PLA
address
control
register
control
cycle
count
ALU
control
multiply
control
load/store
multiple
shifter
control
29
ARM supports
Coprocessor architecture
30
Tells the ARM the coprocessor cannot begin executing the instruction yet.
31
References
32