Académique Documents
Professionnel Documents
Culture Documents
CONTENT
I.
1.
2.
3.
4.
5.
6.
6
6
8
9
10
III.
Microprogramming
11
IV.
11.
12.
13.
14.
15.
16.
Pipelining
Pipelining Concepts
Pipeline Stalls or Bubbles
Pipeline Timing and Performance
Pipelined Data Path Design
Pipelined Control
Optimal Pipelining
13
13
14
16
16
16
16
Pipeline Performance
Data Dependencies and Hazards
Data Forwarding
Pipeline Branch Hazards
Delayed Branch and Branch Prediction
Advanced Pipelining
17
17
18
19
19
21
II.
7.
8.
9.
10.
V.
17.
18.
19.
20.
21.
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
31
R
I
25
rs
20
rt
15
rd
10
sh
fn
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
Opcode
Source 1
or base
Source 2
or destn
Destination
Unused
Opcode ext
imm
Operand / Offset, 16 bits
jta
inst
Instruction, 32 bits
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
The 1 out of 6 I-format ALU instructions (lui) have following common execution sequence:
1. Read out the contents of source register immediate value and forward them to ALU as input
2. Inform the ALU to perform the desired operation by means of appropriate control signal
3. Write the output of ALU in destination register rt
The Two (2) I-format memory access instructions (lw, sw) have following common execution sequence:
1. Read out the content of rs
2. Add the number of read out from rs to immediate value in instruction to form a memory address
3. Read from / write into memory at specified address.
4. In case of lw instruction, place the word read out from memory into rt
The Three (3) I-format conditional branch instructions (bltz, beq, bne) and Four (4) unconditional jump
instructions (j, jr, jal, syscall) have following common execution sequence:
1. Read out the contents of source registers rs & immediate value and forward them to ALU as inputs
2. Inform the ALU to perform the desired operation by means of appropriate control signal
3. The branch target address is specified by an offset relative to increamented program counter value ((PC)+4)
4. To branch back tp previous instruction, the offset value supplied in the immediate field of instruction will be -2,
which in branch target address [ (PC)+4-(2*4) = (PC)-4]
5. For beq, bne instructions, contents of rs and rt are compared to determine wheather branch condition is
satisfied.
6. For bltz, the branch decision is based on the sign bit of content of rs.
7. For 4 jump instructions (j, jr, jal, syscall):
PC is unconditionally modified to allow the next instruction to be fetched from jump target address.
The jump target address comes from instruction itself (j, jal) is read out from register rs or is a known
constant associated with the location of an operating system routine call (syscall)
2. THE INSTRUCTION EXECUTION UNIT
Step by step execution of all 22 MicroMIPS instructions can be depicted from below block diagram:
1. Beginning at the left end, the content of program counter (PC) is supplied to instruction cache and an
instruction word is read out from specified location.
2. With every clock cycle ticking, a new address is loaded into program counter causing a new instruction to
appear at output of instruction cache after a short access delay
3. Contents of various fields of instruction are sent to relevant blocks including control unit (decides the
operation to be performed)
4. Once an instruction has been read out from instruction cache, its various fields are separated and dispatched
to approx. place.
Example: op and fn fields goto control unit, rs, rt, rd will goto register file
5. The upper input of ALU always comes from register rs and lower input of ALU is from rt or immediate
value of instruction.
6. As the data from register file pass through ALU, the specified operation is performed and the output
appears at ALU output.
7. In case of arithmetic and logic instructions the output of ALU is stored in destination register and thus it
bye-pass data cache, run through feedback line is stored in rd of register file.
8. In case of memory access instructions, the ALU output data is treated as data address for writing into / read
from data cache
9. Data cache: For many instructions, the output of ALU is stored in a register thus, data cache is byepassed.
For lw and sw instructions, the data cache is accessed with the content of rt written into rt for sw
instruction and its output sent to register file for lw instruction
10. In one clock cycle, the content of any 2 registers out of 32 registers (mostly rs & rt) is read out from read
ports, At the same time, the output from ALU is stored in the register via write port.
11. The flip-flops representing registers are edge-triggered. So, reading / writing into same register in a single
clock cycle does not cause any problem.
3
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
12. For beq and bne instructions, contents of rs and rt are compared to determine whether the branch
condition is satisfied. The comparison is performed in next address block.
13. In case of bltz, the branch decision is based on sign bit of content of rs rather than comparison of two
register contents. This is performed by next address block.
14. Next address blocks also choose the jump target address under the guidance of control unit.
15. The jump target address comes from j, jal instructions is read out from register rs (jr instruction).
16. The middle part composing program counter, instruction cache, register file, ALU, data cache is known as
data path.
syscall
beq,bne
Next addr
jta
j,jal
bltz,jr
rs,rt,rd
PC
Instr
cache
12 A/L,
lui,
lw,sw
Reg
file
inst
22 instructions
(rs)
ALU
Address
Data
Data
cache
(rt)
imm
op fn
Control
Fig.2. Abstract view of the instruction execution unit for MicroMIPS.
Harvard
architecture
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
7. Instruction cache block also wont receive any control signal to read the instructions since instructions are
read out in every cycle.
8. Multiplexer 2 (At lower input of ALU):
i. The multiplexer at the lower input of ALU allows the control unit by asserting / deasserting ALUSrc
control signal to choose the content of rt or 32-bit sign-extended version of 16-bit immediate
operand to be used as second ALU input.
1. If ALUSrc signal = 0 (deasserting), then content of rt is used as ALU lower input
2. If ALUSrc signal = 1 (asserting), then content of 32-bit sign-extended version of
16-bit immediate operand is used as ALU lower input.
ii. Sign extension of immediate operand is performed by SE block.
9. Multiplexer 3 (At output of ALU and data cache): The control signal used here is RegInSrc
S.no Control signal
Selection
1
00
Data cache output
2
01
ALU output
3
10
Incremented PC value coming from next-address block
10. With every clock cycle ticking, a new address is loaded into program counter causing a new instruction to
appear at output of instruction cache after a short access delay.
11. Contents of various fields of instruction are sent to relevant blocks including control unit (decides the
operation to be performed)
12. As the data from register file pass through ALU, the specified operation is performed by ALUFunc signal and
the output appears at ALU output.
13. In case of arithmetic and logic instructions the output of ALU is stored in destination register and thus it byepass data cache, run through feedback line is stored in rd of register file.
14. In case of memory access instructions, the ALU output data is treated as data address for writing into
(DataWrite signal ) / read from (DataRead signal) data cache
Incr PC
Next addr
jta
Next PC
ALUOvfl
(PC)
PC
(rs)
rs
rt
Instr
cache
inst
rd
31
imm
op
Br&Jump
0
1
2
Ovfl
Reg
file
ALU
(rt)
/
16
0
32
SE / 1
Func
ALU
out
Data
addr
Data
cache
Data
in
0
1
2
Register input
fn
RegDst
RegWrite
ALUSrc
ALUFunc
DataRead
RegInSrc
DataWrite
CMageshKumar_AP_AIHT
Data
out
CS2071_Computer Architecture
7. MULTICYCLE IMPLEMENTATION:
Clock
Time
needed
Time
allotted
Instr 1
Instr 2
Instr 3
Instr 4
Clock
Time
needed
Time
allotted
Time
saved
3 cycles
5 cycles
3 cycles
4 cycles
Instr 1
Instr 2
Instr 3
Instr 4
With multicycle design, a subset of actions required for an instruction is performed in one clock cycle.
Hence the clock cycle can be made much shorter, with several cycles needed to execute a single instruction.
Advantages of multicycle implementation are greater speed and economy
6
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Inst Reg
x Reg
jta
Address
rs,rt,rd
(rs)
PC
imm
Cache
z Reg
Reg
file
ALU
(rt)
Data
Data Reg
op
y Reg
fn
Control
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
The control unit must distinguish between 5 cycles of mutlicycle design and additionally be able to perform
different operations depending on the instruction.
The above diagram depicts the control states and state transitions
The control state machine carries the required information along by moving from state to state. The control
state machine is set to state 0 when program execution begins
Then it moves from state to state until one instruction has been completed, at which it returns to state 0 to
begin the execution of another instruction.
The control state sequences for various MicroMIPS instruction classes are as follows:
ALU type 0,1,7,8
Load word 0,1,2,3,4
Store word 0,1,2,6
Jump / branch 0,1,5
In each state except state 5 & 7, the control signals are uniquely determined.
Information regarding the current control state and instruction executed is supplied by decoders.
Control signals can be easily determined by using control state machine diagram and decoder diagram
Example of control signals that are uniquely determined by control state information include:
Certain control signals depend only on the control state
ALUSrcX = ControlSt2 ControlSt5 ControlSt7
RegWrite = ControlSt4 ControlSt8
Auxiliary signals identifying instruction classes
addsubInst = addInst subInst addiInst
logicInst = andInst orInst xorInst norInst andiInst oriInst xoriInst
Logic expressions for ALU control signals
AddSub = ControlSt5 (ControlSt7 subInst)
FnClass1 = ControlSt7 addsubInst logicInst
FnClass0 = ControlSt7 (logicInst sltInst sltiInst)
LogicFn1 = ControlSt7 (xorInst xoriInst norInst)
LogicFn0 = ControlSt7 (orInst oriInst norInst)
9
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
op
/4
5
6
7
8
9
10
11
12
ControlSt0
ControlSt1
ControlSt2
ControlSt3
ControlSt4
ControlSt5
ControlSt6
ControlSt7
ControlSt8
0
1
2
3
4
op Decoder
st Decoder
0
1
2
3
4
fn
/6
13
14
15
bltzInst
jInst
jalInst
beqInst
bneInst
andiInst
10
sltiInst
12
13
14
15
andiInst
oriInst
xoriInst
luiInst
35
lwInst
43
63
/6
RtypeInst
fn Decoder
st
jrInst
12
syscallInst
32
addInst
34
subInst
36
37
38
39
andInst
orInst
xorInst
norInst
42
sltInst
swInst
63
Decoders
10.
10
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
III.
MICROPROGRAMMING
The control state machine resembles a program that has instructions /state, branching, and loops. Such
a hardware program is called as microprogram and its basic steps are microinstructions.
A single instruction in microcode. It is the most elementary instruction in the computer, such as
moving the contents of a register to the arithmetic logic unit (ALU).
It takes several microinstructions to carry out one complex machine instruction (CISC).
Also called a "micro-op" or "op," microinstructions differ within the same computer family and even
the same vendor.
Microprogrammed control is a control mechanism to generate control signals by using a memory
called control storage (CS), which contains the control signals.
Although microprogrammed control seems to be advantageous to CISC machines, since CISC
requires systematic development of sophisticated control signals, there is no intrinsic difference
between these 2 control mechanism.
Microprogramming is a method of control unit design in which the control unit selection and
sequencing information are stored in ROM and RAMs called control store or control memory.
Micro programmed control unit is a general approach used for implementation of control unit. Here
control signals are generated by a program similar to machine language programs
Instead of implementing the control state machine in custom hardware, we can store microinstructions
in locations of control ROM, fetching and executing sequence of microinstructions for each machine
language instruction.
Each microinstruction defines a step in execution of a machine language instruction.
Advantages of ROM-based implementation of control
o Simple hardware
o More regular
o Less dependent on instruction-set architecture details
o Same hardware can be used for different purpose by modifying ROM contents
Microprogramming : Designing a suitable sequence of microinstructions to realize a particular
instruction set architecture is called microprogramming.
Micro programmable machine: if the microprogram is easily modifiable, even by user then the
machine is called Micro programmable machine.
Micro instruction format:
o 23 bit microinstruction format. Each bit has one to one correspondence except sequence
control bits in multicycle datapath.
o The 2-bit sequence control field allows for the control of microinstruction sequencing in same
way that PC control affects the sequencing of machine language instructions.
Microprogrammed control unit: Microprogrammed control unit for MicroMIPS diagram shows 4
options (MUX) for choosing next microinstruction.
o Option 0: to advance the next microinstruction in sequence by incrementing
microprogram counter
o Option 1 & 2: allows branching to occur depending on opcode field in machine
instruction being excuted.
o Option 3: is to goto microinstruction 0 corresponding to state 0 (refer control
state machine). This initiates the fetch phase for next machine instruction
Dispatch table 1 : corresponds to multiway branch in going from cycle 2 to cycle 3
Dispatch table 2 : implements the branch between cycles 3 & 4. (refer control state machine)
11
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
PC
control
Cache
control
Register
control
JumpAddr
PCSrc
PCWrite
ALU
inputs
ALU Sequence
function control
FnType
LogicFn
AddSub
ALUSrcY
ALUSrcX
RegInSrc
RegDst
RegWrite
InstData
MemRead
MemWrite
IRWrite
Dispatch
table 1
Dispatch
table 2
0
1
2
3
MicroPC
1
Address
Microprogram
memory or PLA
Incr
Data
Microinstruction register
op (from
instruction
register)
Sequence
control
(For detailed explanation with microprogram example please Refer page no. 269 - 271 in text
book B.Parhami)
12
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
IV. PIPELINING
11. PIPELINING CONCEPTS
2 strategies for achieving greater performance:
Strategy 1: multiple-instruction-issue or superscalar organization: use multiple independent data paths that can
accept several instructions that are read out at once.
Strategy 2: Pipelined or super-pipelined organization: overlap the execution of several instructions in singlecycle design, starting next instruction before previous instruction has executed.
Pipelining:
Pipelining is an implementation technique where multiple instructions are overlapped in execution. The
computer pipeline is divided in stages.
Each stage completes a part of an instruction in parallel. The stages are connected one to the next to form
a pipe - instructions enter at one end, progress through the stages, and exit at the other end.
Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction
throughput.
The throughput of the instruction pipeline is determined by how often an instruction exits the pipeline.
Cycle 1
Cycle 2
Instr
cache
Reg
file
Instr
cache
Instr 3
Instr 4
Instr 5
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
ALU
Data
cache
Reg
file
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Cycle 9
Time dimension
Instr 2
Instr 1
Task
dimension
Reg
file
In task-time diagram, stages of each task are horizontally aligned and their positions along the horizontal
axis represent the timing of their execution.
In space-time diagram, the vertical axis represents stages in the pipeline (the space dimension) and boxes
representing the various stages of a task are diagonally aligned.
Ideally a q-stage pipeline can increase instruction execution throughput by a factor of q. But this fact is
not quite the case because of the following:
o Effects of pipeline start-up and drainage
o Wastage due to unequal stage delays.
o Time overhead of saving stage results in registers
o Safety margin in clock period necessitated by clock skew.
13
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
1
2
f
f = Fetch
r = Reg read
a = ALU op
d = Data access
w = Writeback
3
4
5
6
7
10
11
Cycle
1
2
3
4
5
Start-up
region
10
11
Cycle
Drainage
region
Pipeline
stage
Instruction
(a) Task-time diagram
Fig. Two abstract graphical representations of a 5-stage pipeline executing 7 tasks (instructions).
12.
BUBBLE INSERTION:
First detect the type of data dependency
Bubble insertion: The phenomenon of inserting redundant and harmless instruction (adding 0 to a register /
shifting a register by 0 bit) before the next instruction. Such instruction is called as no-op (no-operation)
instruction. Since they didnt perform any useful task but use the memory they resembles the bubble in a
water pipe is called bubble insertion.
Insertion of bubbles in a pipeline implies
o reduced throughput
o hurts the performance when more than 2 bubbles are inserted.
So bubble insertion should be minimized. It can be minimized by relocating an useful instruction in a
program between the data dependent instruction instead of inserting bubbles.
DATA FORWARDING:
the phenomenon of bypassing the output of ALU of 1st instruction to the input of ALU that is needed as
input for execution of 2nd instruction without storing the output value of 1st instruction in memory is called
data forwarding . please see below diagrams for clear understanding
Control dependency:
When a conditional branch is executed, the location of the next branch instruction depends on whether the branch
condition is satisfied. Since branch instructions are based on testing the register contents, branch condition will be
resolved at the end of 2nd pipeline stage. Therefore a bubble is required after every conditional branch instruction.
14
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
$5 = $6 + $7
Cycle 1
Cycle 2
Instr
cache
Reg
file
Instr
cache
$8 = $8 + $6
Cycle 3
$9 = $8 + $2
Cycle 4
Cycle 5
ALU
Data
cache
Reg
file
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
sw $9, 0($3)
Cycle 6
Cycle 7
Cycle 8
Data
forwarding
Reg
file
Read-after-write data dependency and its possible resolution through data forwarding .
Cycle 2
Cycle 3
Cycle 4
Instr
cache
ALU
Instr
cache
Reg
file
Instr 3
Reg
file
Data
cache
Reg
file
ALU
Data
cache
Bubble
Instr
cache
Instr 4
Instr 5
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Time dimension
Instr 2
Instr 1
Cycle 1
Reg
file
ALU
Bubble
Instr
cache
Reg
file
Data
cache
Bubble
ALU
Reg
file
Instr
cache
Task
dimension
Writes into $8
Reg
file
Reg
file
Data
cache
Reg
file
ALU
Data
cache
Reg
file
Reads from $8
C ycle 1
C ycle 2
Instr
mem
Reg
file
ALU
Instr
mem
sw $6, . . .
C ycle 3
C ycle 4
C ycle 5
C ycle 6
C ycle 7
Data
mem
Reg
file
Reg
file
ALU
Data
mem
Reg
file
Instr
mem
Reg
file
ALU
Data
mem
Reg
file
Instr
mem
Reg
file
ALU
Data
mem
C ycle 8
Reorder?
lw $8, . . .
Insert bubble?
$9 = $8 + $2
Without data
forwarding, three
(two) bubbles are
needed to resolve a
read-after-load data
dependency
Reg
file
Read-after-load data dependency and its possible resolution through bubble insertion and data forwarding.
15
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
C ycle 1
C ycle 2
Instr
mem
Reg
file
Instr
mem
$6 = $3 + $5
C ycle 3
Insert bubble?
C ycle 4
C ycle 5
ALU
Data
mem
Reg
file
Reg
file
ALU
Data
mem
Reg
file
Instr
mem
Reg
file
ALU
Data
mem
Reg
file
Instr
mem
Reg
file
ALU
Data
mem
$9 = $8 + $2
Assume branch
resolved here
C ycle 6
C ycle 7
C ycle 8
Reorder?
(delayed
branch)
Reg
file
13.
PIPELINE TIMING AND PERFORMANCE (Refer page no. 284 in text book B.Parhami)
14. PIPELINED DATA PATH DESIGN (Refer page no. 285-286 for detailed description of each stage in
text book B.Parhami)
The pipelined datapath for MicroMIPS is obtained by inserting latches or registers in single-cycle data path.
The 5 pipeline stages are
1. Instruction Fetch
2. Instruction Decode and register access
3. ALU operation
4. Data memory access
5. Register writeback
Stage 1
Stage 2
NextPC
ALUOvfl
1
PC
inst
Instr
cache
rs
rt
(rs)
Stage 4
Stage 5
Reg
file
ALU
imm SE
Incr
IncrPC
SeqInst
op
Data
addr
Ovfl
(rt)
15.
16.
Stage 3
Next addr
Data
cache
Func
0
1
0
1
2
rt
rd 0
1
31 2
Br&Jump
RegDst
fn
RegWrite
ALUSrc
ALUFunc
DataRead
RetAddr
DataWrite
RegInSrc
CMageshKumar_AP_AIHT
0
1
CS2071_Computer Architecture
V. PIPELINE PERFORMANCE
17. DATA DEPENDENCIES AND HAZARDS
Data dependency in pipeline : Execution of one instruction depending on completion of a previous
instruction or the phenomenon of one instruction requiring data generated by previous instruction is called
data dependency
The generated data may reside in a register or memory location where the subsequent instruction expects to
find the value.
In the below diagram, each instruction from 2nd through 5th instruction reads a register written into by the 1st
instruction.
o The 5th instruction needs the content of $2 register after completion of register writeback by 5th
instruction.
o The 4th instruction needs the new content of register $2 in the same cycle when the 1st instruction
produces it which results in a little problem.
o But the 2nd & 3rd instruction needs the content of 1st instruction before the 1st instruction execution.
This results in a major problem of data dependency.
Data dependency in pipeline can cause pipeline stalls which diminish the performance.
Types of data dependency:
o Read-after-compute: register access after updating it with a computed value. This dependency exists
when 1 instruction updates a register with a computed value and a subsequent instruction uses the
content of that register as an operand.
o Read-after-load: register access after updating it with data from memory. This dependency arises
when one instruction loads a new value from memory into a register and a subsequent instruction
uses the content of that register as an operand.
Cycle 1
Cycle 2
Instr
cache
Reg
file
Instr
cache
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
ALU
Data
cache
Reg
file
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Cycle 9
$2 = $1 - $3
Instructions
that read
register $2
Reg
file
17
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
SINCE THE BELOW TOPICS ARE CLEAR AND READABLE IN THE BOOK PLEASE REFER PAGE
NO. 298-308 IN TEXT BOOK B.PARHAMI)
18.
DATA FORWARDING:
Resolving Data Dependencies via Forwarding: When a previous instruction writes back a value
computed by the ALU into a register, the data dependency can always be resolved through forwarding
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Instr
cache
Reg
file
ALU
Instr
cache
Cycle 6
Cycle 7
Data
cache
Reg
file
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Cycle 8
Cycle 9
$2 = $1 - $3
Instructions
that read
register $2
Reg
file
Certain Data Dependencies Lead to Bubbles: When the immediately preceding instruction writes a value
read out from the data memory into a register, the data dependency cannot be resolved through forwarding
(i.e., we cannot go back in time) and a bubble must be inserted in the pipeline.
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Instr
cache
Reg
file
ALU
Instr
cache
Cycle 7
Data
cache
Reg
file
lw
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Reg
file
Instr
cache
Reg
file
ALU
Data
cache
Cycle 8
Cycle 9
$2,4($12)
Instructions
that read
register $2
Reg
file
18
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
19
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Not taken
Not taken
Predict
taken
Taken
Not taken
Predict
taken
again
Taken
Taken
Predict
taken
Taken
Not taken
Predict
not taken
Not taken
Taken
Predict
taken
again
Predict
not taken
again
Taken
Not taken
Predict
not taken
Taken
Not taken
Predict
not taken
again
Not taken
Taken
Not taken
Not taken
Predict
taken
Taken
Not taken
Predict
taken
again
Taken
Predict
not taken
Not taken
Predict
not taken
again
Taken
Low-order
bits used
as index
Addresses of recent
branch instructions
Target
addresses
History
bit(s)
Incremented
PC
0
1
From
PC
Compare
Next
PC
Logic
20
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Single-cycle
Next addr
jta
Next PC
ALUOvfl
(PC)
PC
Instr
cache
inst
rd
31
0
1
2
Reg
file
imm
op
ALU
(rt)
/
16
ALU
out
Data
cache
Data
out
Data
in
Func
0
32
SE / 1
Data
addr
0
1
0
1
2
Data Reg
32 y Reg
SE /
ALU
y Mux
4
0
1
2
4 3
(rt)
imm 16
/
30
0
1
2
3
Func
ALU out
Register input
fn
RegDst
RegWrite
ALUSrc
Stage 1
Stage 2
IRWrite
ALUOvfl
PC
fn
RegInSrc
RegDst
Stage 3
1
inst
Instr
cache
rs
rt
(rs)
ALUSrcX
RegWrite
Stage 4
ALUFunc
ALUSrcY
Stage 5
Reg
file
IncrPC
Address
Data
cache
ALU
imm SE
Incr
Data
Data
addr
Ovfl
(rt)
500 MHz
CPI 1.1
op
MemWrite
MemRead
Next addr
NextPC
PCWrite
DataRead
RegInSrc
DataWrite
ALUFunc
125 MHz
CPI = 1
rt
rd
31
Func
0
1
0
1
0
1
2
0
1
2
2
3
5
SeqInst
op
21.
(rs)
Reg
file
0
1
Data
InstData
Br&Jump
rt
0
1
rd
31 2
Cache
ALUZero
x Mux
ALUOvfl
0
Zero
z Reg
1
Ovfl
x Reg
rs
PC
0
1
SysCallAddr
jta
Address
Ovfl
30
/
4 MSBs
Inst Reg
(rs)
rs
rt
26
/
Multicycle
Br&Jump
RegDst
fn
RegWrite
ALUSrc
ALUFunc
DataRead RetAddr
DataWrite
RegInSrc
21
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
PCSrc
JumpAddr
500 MHz
CPI 4