Vous êtes sur la page 1sur 5

CSE 490/590 Homework2 Spring 2014

Section 1:
1. This exercise is intended to help you understand the relationship between
delay slots, control hazards, and branch execution in a pipelined processor. In this
exercise, we assume that the following MIPS code is executed on a pipelined
processor with a fivestage pipeline, full forwarding and a predicttaken branch
predictor:

a. Label 1: lw $1,40 ($6)
beq $2, $3, Label 2 : Taken
add $1, $6, $4
Label 2: beq $1, $2, Label 1 : Not taken
sw $2, 20 ($4)
and $1, $1, $4
b. add $1, $5, $3
Label 1: sw $1, 0 ($2)
add $2, $2, $3
beq $2, $4, Label 1 : Not taken
add $5, $5, $1
sw $1, 0 ($2)

Draw the pipeline execution diagram for this code, assuming there are no delay slots and
that branches execute in the EX stage.



2. Assume that we have a multiple-issue pipelined processor with the following number of
pipeline stages, instructions issued per cycle, stage in which branch outcomes are resolved,
and branch predictor accuracy:

Pipeline Issue Width Branches execute in Branch Predictor Branches as a % of
Depth stage accuracy instructions
a. 10 4 7 80% 20%

b. 25 2 17 92% 25%


Control hazards can be eliminated by adding branch delay slots. How many delay slots must
follow each branch if we want to eliminate all control hazards in this processor?

3. This exercise examines how exception handling interacts with branch and load/store
instructions. Problems in this exercise refer to the following branch instruction and the
corresponding delay slot instruction:

Branch and delay slot
a. beq $1, $0, Label
sw $6, 50 ($1)
b. beq $5, $0, Label
nor $5, $4, $3

a. Assume that this branch is correctly predicted as taken, but then the instruction at
Label is an undefined instruction. Describe what is done in each pipeline stage for
each cycle, starting with the cycle in which the branch is decoded up to the cycle in
which the first instruction of the exception handle is fetched.

b. Repeat Exercise 3.1, but this time assume that the instruction in the delay slot also
causes a hardware error exception when it is in MEM stage.

c. What is the value in the EPC if the branch is taken but the delay slot causes an
exception? What happens after the execution of the exception handler is completed?

Section 2:

1. We have a program core consisting of five conditional branches. The program core will be
executed thousands of times. Below are the outcomes of each branch for one execution of the
program core (T for taken, N for not taken).

Branch 1: T - T - T

Branch 2: N - N - N N

Branch 3: T N T N T N

Branch 4: T T T N T

Branch 5: T T N T T N T

Assume the behavior of each branch remains the same for each program core execution. For
dynamic schemes, assume each branch has its own prediction buffer and each buffer initialized to
the same state before each execution. List the predictions for the following branch prediction
schemes:

A. Always taken
B. Always not taken
C. 1-bit predictor, initialized to predict taken
D. 2-bit predictor, initialized to weakly predict taken

What are the prediction accuracies? [15 points]

2. In this exercise, we make several assumptions. First, we assume that an Nissue superscalar
processor can execute any N instructions in the same cycle, regardless of their types. Second, we
assume that every instruction is independently chosen, without regard for the instruction that
precedes or follows it. Third, we assume that there are no stalls due to data dependences that
no delay slots are used, and that branches execute in the EX stage of the pipeline. Finally, we
assume that instructions executed in the program are distributed as follows:

ALU Correctly predicted beq Incorrectly predicted beq lw sw
a. 50% 18% 2% 20% 10%
b. 40% 10% 5% 35% 15%

a. What is the CPI achieved by a 2-issue static superscalar processor on this program?
b. In a 2-issue static superscalar processor that only has one register write port, what speedup is
achieved by adding a second register write port?
c. For a 2-issue static superscalar processor with a classic five-stage pipeline, what speed-up is
achieved by making the branch prediction perfect?
d. Repeat exercise C, but for a 4-issue processor. What conclusion can you draw about the importance
of good branch prediction when the issue width of the processor is increased?

3. Here is a series of address references given as word addresses: 2, 3, 11, 16, 21, 13, 64, 48, 19,
11, 3, 22, 4, 27, 6 and 11. Using the series references show the hits and misses and the final cache
contents for a direct-mapped cache with four-word blocks and a total size of 16 words.

4.


lw $1, 40($6)
beq $2, $0, Label ; Assume $2 = $0
sw $6, 50($2)
Label: add $2, $3, $4
sw $3, 50($4)


a. For this problem, assume that all branches are perfectly predicted, eliminating all control hazards,
and that no delay slots are used. If we change load/store instructions to use a register without an
offset as the address, these instructions no longer need to use the ALU. As a result, MEM and EX
can be overlapped and the pipeline has only 4 stages. Change the code to accommodate this
changed ISA. Assuming this change does not affect clock cycle time, what speedup is achieved in
this instruction sequence?

b. Assuming stall on branch and no delay slots, what speedup is achieved on this code if branch
outcomes are determined in the ID stage, relative to the execution where branch outcomes are
determined in the EX stage?

Vous aimerez peut-être aussi