Vous êtes sur la page 1sur 11

# EE457 Quiz (~10%)

## Closed-book Closed-notes Exam; No cheat sheets; No cell phones or computers

Calculators and Verilog Guides are not needed and hence not allowed.

Fall 2015
Friday, 9/25/2015 (A 2H 50M exam)
11:00 AM - 01:50 PM in THH201; 02:00 PM - 04:50 PM in THH101

## Students Last Name: _______________________________________

Students First Name: _______________________________________
@usc.edu
Students DEN Bb username: ______________________________

Ques#

Topic

Page#

Time

2-4

30 min.

40

4-5

30 min.

40

CPU Performance

25 min.

30

25 min.

34

20 min.

36

9-10

30 min.

42

160 min.

222

Points

Score

6

Single-Cycle CPU

Total

Perfect Score

210

## University of Southern California

September 25, 2015 1:59 am

1 / 11

40

points) 30 min.

## State Diagram and RTL design:

1.1

In preparation to the next part of the question, we have given below a solved question which is
similar to but simpler than the question in the next part.
This is similar to your HW#1A problem of finding the largest number divisible by 7 among 16
8-bit unsigned numbers. But here, we do not need the largest number divisible by 7. We need
to copy numbers from array A, if they are divisible by 5 to array B. Four-bit counters I & J are
indexes into A and B. Solution below is complete. Please go through it.
(A[I] <= 5)(I != 15)

A
15
35
20
14
70
0

B
15
35
20
70
16
16

15

16

INI
I <= 0;
J <= 0;

X <= A[I];

1
2
3

if (A[I] == 5)
{B[J] <= A[I];
J <= J + 1;}

(A[I] > 5)

if (X == 5)
{B[J] <= A[I];
J <= J + 1;}

(X > 5)

DIV_5
X <= X - 5;

if (A[I] <= 5)
I <= I + 1;

## B has all 16s

initially which
are invalid!

if (X <= 5)
I <= I + 1;
(X <= 5)(I == 15)

DONE

2 / 11

3 / 11

if (X <= 5)
I <= I + 1;

if (X == 5)
{B[J] <= A[I];
J <= J + 1;}

DIV_5
X <= X - 5;

(A[I] > 5)

if (A[I] <= 5)
I <= I + 1;

if (A[I] == 5)
{B[J] <= A[I];
J <= J + 1;}

X <= A[I];

INI
I <= 0;
J <= 0; S
K <= 0;

DONE

if (X <= 7)
J <= J + 1;

if (X == 7)
{C[K] <= B[J];
K <= K + 1;}

DIV_7
X <= X - 7;

(B[J] > 7)

if (B[J] <= 7)
J <= J + 1;

if (B[J] == 7)
{C[K] <= B[J];
K <= K + 1;}

X <= B[J];

## (B[J] <= 7)(J != Jmax)

pts

(X > 7)

State diagram for Question 1.2 narrated on the next page. Please complete it.

40

(X > 5)

1.2

## Now we have 3 arrays, A, B, and C, all are 16x8 in size. But B

and C are initially filled with 16s, which are invalid numbers for
B and C. Copy a number from A to B if it is divisible by 5. Then
from this subset of numbers in B, copy the numbers that are also
divisible by 7 into array C. A number like 14 in A does not get
copied to B and hence does not go into C as it did not make it to B.
Since 5 and 7 are prime numbers, numbers copied to C are
necessarily divisible by 35. It is possible that all 16 numbers get
copied to B as well as C or some to B and a subset of them (or
nothing) to C. Besides the 4-bit counters I and J we have a 4-bit
counter K which is an index into C.

A
15
35
20
14
70
0

B
15
35
20
70
16
16

C
35
70
16
16
16
16

15

16

16

0
1
2
3
4

## B and C are initially

filled with 16s
which are invalid.

Mr. Bruin took the completed state diagram of Q 1.1 and made
the (incomplete) state diagram on the previous page. He added
two states LOAD_B and DIV_7 which are similar to the
LOAD_A and DIV_5 states. He assumed a Jmax register and used it in the LOAD_B and DIV_7
states but did not know how and when to set it to the right value in the LOAD_A and DIV_5
states, nor how and when to reinitialize J to zero.
That is when you, Mr. Trojan, were called in to help. Please complete the state diagram. Here,
instead of begin .. end of Verilog, we are using the curly parentheses {}. Also, similar to
Verilog, we assume that, in our state diagram also, a later assignment to a reg variable in a
procedure overrides an earlier assignment to the same reg variable.

40

points)

30 min.

2.1

Variation of the min/max lab: In Part 3 Method 1 you tried to hold-on control in the CMx or in the
CMn state as you expected the data to be made up of either ascending or descending chunks of
data. Now we are told that most of the time the data oscillates and it is best to give control to the
other party whether the current M[I] is useful to you or not. So there is no loop at all on any of
the 4 significant states. Of course, if it is not useful to you (i.e. if M[I] < Max in CMx state or
M[I] > Min in CMn state) you should not increment the I counter so that the other party can
take a look at the same data in CMn_Second or CMx_Second state respectively. The
"_Second" suffix tells us that he is the second person to look at the data, hence the I counter is
incremented unconditionally in the _Second states.
All needed states, state transition arrows, and RTL within each state are already completed in
the incomplete state diagram on the next page. Please complete the state transition conditions
on the next page.

2.2
6

pts

Mr. ___________ (Bruin/Trojan) says that in the Min/Max lab, if all 16 data items are the same
(identical, say 40H = 0100_0000B), it takes 16 clocks, to process them in any of the 6 parts.
How many clocks does such data take in the above design? ___________.
Note: We do not count the INI and the DONE states in counting the clocks.

4 / 11

INI

Start

Max <= M[I];
I <= I + 1;

{Max <= M[I];
I <= I + 1;}

CMnx
Compare with Max

I <= 0;

Reset

5 / 11

I <= I + 1;
if (M[I] >= Max)
Max <= M[I];

CMx_Second
Compare with Max

DONE

I <= I + 1;
if (M[I] <= Min)
Min <= M[I];

CMn_Second
Compare with Min
1

{Min <= M[I];
I <= I + 1;}

CMn
Compare with Min

30+4

pts

30

## points) 25 min. Performance

A multi-cycle CPU has three types of instructions Q_type (Quick_type), M_type (Medium_type),
and S_type (Slow_type). They are named in that fashion as the Quick type instruction takes the
least number of clocks to execute where as the Slow type instruction takes the most number of
clocks to execute. Find X and Y based on the information provided in the table below.
Q_type

M_type

25

## Percentage of execution time

40

10

50

50

10

40

of the Benchmark

Frequency of occurrence
in the dynamic execution trace of the Benchmark

S_type

## Calculation for X and Y:

14

pts

pts

If you were to improve the CPI of one instruction by one clock, you choose _______ (Q / M / S)
because of its ________ (A / B / C / D). Note: CPI improvement is different from speeding up by a factor.
A. highest number of clocks taken
B. highest Percentage of Execution time
C. highest Frequency of occurrence
D. other (you state here if you chose this) __________________________________________________
Your colleague offered to reduce the CPI of one of the three instructions by 2 clocks provided you agree
to increasing the CPI of the other two instructions by 1 clock each. Your detailed response?

12

pts

## EE457 Quiz - Fall 2015

6 / 11

4.1

X (X2X1X0) and Y (Y2Y1Y0) are 3-bit unsigned numbers. You need to compute (S=X+Y-8) and
produce the 4-bit signed results S (S3S2S1S0) represented in 2s complement system. Can the
result fit-in S for all possible X and Y? Explain. ______________________________________
____________________________________________________________________________

4.1.1

## Produce the above S using the 3-bit RCA

on the side. From C3, you can cleverly
produce S3.

pts

pts

4.2
2+6

pts

4.3

15

pts

34

## Using the 4-bit RCA on the side as a

subtracter, produce D = A - B, where D is
a 5-bit signed number and A and B are 4bit signed numbers, all three represented in
2s complement system. Produce the 4-bit
SOV (signed overflow) signal and using
this and D3 (and your experience in Lab #3
ALU with SLT), produce D4 based on the
observation that if the 4-bit signedoverflow did not occur, the D4 and D3 are
___________ (the same / different) and if
4-bit signed-overflow did occur, the D4
and D3 are _________ (the same / different)

X2
C3

S3

Y2

X0

Y0

a b
a b
a b
C2
C1
C0
cin
cout cin
cout cin
s
s
s

B3

C4

Y1

cout

S2

A3

X1

S1
B2

A2

S0
B1

A1

B0

A0

a b
a b
a b
a b
C3
C2
C1
C0
cin
cout cin
cout cin
cout cin
s
s
s
s

cout

D3

D2

D1

D0

D4

You are given the first five bits of two 16-bit numbers below.
A = 1 1 0 0_1 X X X_X X X X_X X X X
B = 1 1 0 1_1 X X X_X X X X_X X X X
The highlighted bit (the bit-12) is different among the upper five bits. Lower 11 bits can any bits
and may or may not match between the A and the B.
A. A and B are unsigned numbers. The higher is ________________ (A / B / cant tell).
B.1. A and B are signed numbers represented in 2s complement system.
The higher is ________________ (A / B / cant tell).
B.2. A and B are signed numbers represented in 1s complement system.
The higher is ________________ (A / B / cant tell).
B.3. A and B are signed numbers represented in Sign-Magnitude system.
The higher is ________________ (A / B / cant tell).
C.1. The Sum S produced by adding A and B is _______________ (right / wrong / cant tell) if A, B, and
S are all unsigned numbers.
C.2. The Sum S produced by adding A and B is _______________ (right / wrong / cant tell) if A, B, and
S are all signed numbers represented in 2s complement system.
D. If we use the Adder/Subtracter design of our Ch #4 class-notes, to produce the difference D (D=A-B),
the Raw Carry C32 will be __________ (a 0 / a 1 / cant tell) and the V bit will be __________ (a 0 / a 1
/ cant tell). The difference D is __________ (right/wrong/cant tell) if A, B, D are all unsigned. The
difference D is __________ (right/wrong/cant tell) if A, B, D are all signed numbers in 2s complement .

7 / 11

5.1

## Common CISC practice is to have ________________________________________________

(one ADD instruction / two separate ADD and ADDU instructions). _______ (Like / Unlike) the
CISC processors, the MIPS processor _____________________ (has /does not have) a BCS
(Branch if Carry is Set) instruction ________________________________________________
____________________________________________________________________________
The result deposited into the destination register \$3 by ADD \$3, \$1, \$2 and ADDU \$3, \$1, \$2
_______ (is / isnt) identical. Trap is caused by ____________ (ADD / ADDU / both / none)
when _______________________________________________________________________

5.2

## The ______ (1/2/3) instruction(s) _______________ (preceding/following) the JAL instruction

in execution, together with the JAL instruction in MIPS make up the CISC CALL instruction.
Similarly the ______ (1/2/3) instruction(s) _______________ (preceding/following) the JR \$31
instruction in execution, together with the JR \$31 instruction in MIPS make up the CISC RTN
(Return) instruction. There _______ (is a / is no) PUSH instruction in MIPS because the
hardware _________(does / doesnt) support SP (the stack pointer). _____________ (Hence /
However) the programmer ______________________________________________________
____________________________________________________________________________
____________________________________________________________________________

5.3

Intel follows ___________ (Little Endian / Big Endian) system. In the Intel 80486 processor
system address space, byte 0000_400FH is the ____________ (most / least) significant byte of the
32-bit word with system address ______________ (state in hexadecimal).
The 32-bit word 4000 consists of the four bytes 4000, 4001, 4002, and 4003 in ____________
____________________ (Little-Endian / Big-Endian / both kinds of /neither kind of) processor.

5.4

## If a 32-bit processor is not a Byte-Addressable processor, the programmer ________________

(can still / cannot) manipulate bytes _______ (even / just) by using more instructions.

5.5

## Shown on the side is the memory interface to a byte-wide

SRAM memory chip in a memory system based on minimum
number of byte-wide banks for an USC128 processor (128-bit
data, 32-bit logical address, byte-addressable processor) .

pts

pts

9
pts

## Fill-in the 3-blanks (marked by the 3 arrows) in the figure on

the side. Also find the system addresses corresponding to the
lowest-addressed two bytes of this memory chip.
The lowest-addressed two bytes of this chip map to the system
byte addresses (in hex) _______________________________
_________________________________________________.

A[19:4]

____KB
A[

A31
A30
A29
A28
A27
A26
A25
A24
A23
A22
A21
A20

WE
RD

]
D[7:0]

11

pts

points) 20 min.

CS
D[

pts

36

BE15

## The system addresses mapping to any location in this memory

chip will have the same upper ________ (state a number) bits
namely ______________ (state their labels in the form X[13:2]).
September 25, 2015 1:59 am

8 / 11

## ( 18 + 24 = 42 points) 25 min. Single-cycle CPU:

You are familiar with the ordinary jump instruction J (Jump with the 26-bit jump address field),
Jal (Jump and Link), and Beq (Branch if Equal). In class we discussed a made-up instruction,
Beqal (Branch if equal and link, a conditional call instruction). Also you have reviewed
implementation of a made up instruction, Bzim as narrated below.
Bzim rt, offset (rs); //Bzim = branch if zero indirectly through memory

## if (rt) == 0, then (PC) <= M [offset + rs];

Branch if rt is zero to the special branch target address which
is the content of the location in memory whose address is
calculated by adding sign-extended offset to the contents of rs.
Here we are replacing the Beqal by Bzimal. This is also a conditional call instruction but the
condition is as stated in the Bzim instruction above.

6.1

The data path on the next page is nearly complete. Complete the connections to loose ends
marked with 9 arrows 1

6.2

ALUOp1

ALUop0

MemWrite

Branch

lw

sw

beq

ALUSrc

JUMP

RegWrite

BZIMal

Memtoreg

R-format

BZIM

RegDst

Control Signal Table: Complete the four rows and four columns. Whenever possible, use dont cares.
Instruction

24
pts

Bzim
Bzimal
J
Jal

## Blank area for rough work:

It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you.
The next three topics, pipelined CPU, cache and virtual memory are interesting and challenging too. They are the focus of the midterm exam. Then we cover advanced topics. Best!
Gandhi, TAs: Sanmukh, Jizhe, Fangzhou, Pezhman, Mentors: Heqing, Madhusudhan, Ninad, Srikar,
HW Graders: Sukruthi, Shunqing, Sailesh, Jingyu, Tushar, Lab graders: Goutam, Feng, Shuyuan, Yanjingtian, Fengle

9 / 11

PC

## EE457 Quiz - Fall 2015

Instruction
memory

Instruction
[310]

10 / 11

8-9

Instruction [15 0]

BZIMAL

## Instruction [31 26]

Shift
left 2

0
M
u
x
1

Control

Control

RegWrite

JUMP

BZIM

data 1

16
Sign
extend

32

Function Field

Instruction [5 0]

Write
data

register 2
Write
data 2
register

register 1

ALUSrc ALUSrc
RegWrite RegUpdate

RegDst
Branch
Branch
MemtoReg MemtoReg
ALUOp ALUOp
MemWrite MemWrite

RegDst

ALU
control

0
M
u
x
1

Shift
left 2

ALU
control

ALU

ALU
result

Zero

Zero

ALU
result

Rt
ZVC

Write
data

PCSrc

0
1

0
1

JUMP

Data
memory

data

1
M
u
x
0

PCSrc
BZIM_SUCCESS

2-6

M
u
x

pts
18