Vous êtes sur la page 1sur 9

Sample Exam Questions

1) Explain the following terms:


• Carry select
• Reduction of summands (multiplication)
• Multiplicative division.
• non-restoring division
• size gap
• CISC

2)
a) Name major characteristics of the RISC philosophy:
b) Name major characteristics of the CISC philosophy:

3)
A program contains the following instruction mix:
• 60% load/store instructions with execution time of 1.2 microsecond each
• 10% ALU instructions with execution time of 0.8 microsecond each
• 30% branch instructions with execution time of 1.0 microsecond each
a) If the clock period is 0.2 microsecond, calculate the average clock cycles
per instruction (CPI) for the program.
b) What is the average million-instruction per second (MIPS) rate of the
program?

4)
Assume floating point square root (FPSQR) is responsible for 25% of the execution
time of a benchmark on a machine. One proposal is to add FPSQR hardware that
will speed up this operation by a factor of 10. The other alternative is just to make
all floating point (FP) instructions run two times faster. FP instructions are
responsible for a total of 40% of the execution time. Compare performance of these
two design alternatives.

5)
Assume we have a machine where the CPI is 2.0 when all memory accesses
(instruction fetches and data fetches) hit in the cache. The only data accesses are
loads and stores (note, these are one address type instructions), and these total 40%
of the instructions (the rest of instructions are dealing with registers). If the miss
penalty is 25 clock cycles and the miss rate is 2%, how much faster would the
machine be if all accesses were cache hits?

6)
A RISC type workstation uses a 15-MHz processor with a claimed 10-MIPS rating
to execute a given program mix. Assume a one-cycle delay for each memory
access:
a) What is the effective CPI of this computer?
b) Suppose the processor is being upgraded with a 30-MHz clock. However, the
speed of the memory subsystem remains unchanged, and consequently two
clock cycles are needed per memory access. If 30% of the instructions require
one memory access and another 5% require two memory accesses per
instruction, what is the performance of the upgraded processor with a
compatible instruction set and equal instruction counts in the given program
mix?

7)
Three enhancements with the following speedups are proposed:
Speedup1 = 30
Speedup2 = 20
Speedup3 = 10
Only one enhancement is usable at a time:
a) If enhancements 1 and 2 are each usable for 30% of the time, what
fraction of the time must enhancement 3 be used to achieve an overall speedup of 10?
b) Assume the distribution of enhancement usage is 30%, 30%, and 20% for
enhancements 1, 2, and 3, respectively. Assuming all three enhancements are in use, for
what fraction of the reduced execution time is no enhancement in use?
c) Assume for some benchmark, the fraction of use is 15% for each of the
enhancements 1 and 2 and 70% for enhancement 3. We want to maximize performance.
If only one enhancement can be implemented, which should be chosen?

8)
a) As a performance metric define MIPS
b) It has been discussed that MIPS is not an accurate measure for comparing
performance among computers justify this. Note, I need a discussion that is
straight-forward and clear.

9)
a) Cray Y-MP/8 (a vector processor) has a cycle time of 6ns. During a cycle, the
results of both an addition and a multiplication can be completed. Furthermore,
there are eight processors operating simultaneously without interference in the
best case. Calculate the peak performance of the Cray Y-MP/8 (in MIPS).

10)
Consider the time needed to transfer a block of data from the main memory to the cache
when a read miss occurs. Cache block size is 8 words, it takes one clock cycle to send an
address to the main memory, it takes 8 clock cycles to read the first word and subsequent
words are read in 4 clock cycles per word, finally one clock cycle is needed to send one
word to the cache.
a) If a single memory module is used, then what is the time needed to load a block from
main memory into cache.
b) Now assume that the memory is 4-way interleaved, then what is the time needed to
load a block from main memory into cache.
c) Now assume the computer has L1 and L2 caches, each with block size of 8 words.
Hit rate is the same for both caches and that it is 0.95 and 0.90 for instructions and data,
respectively. Finally, the time needed to access an 8-word block in these caches are C1=1 and
C2=10 cycles:
a. What is the average access time experienced by the processor during an
instruction cycle, if the main memory uses interleaving (30% of instructions are
load/store) (use parameters defined earlier)?
b. What is the average access time during an instruction cycle, if the main memory
is not interleaved?
c. What is the improvement obtained with interleaving?

11)
Calculate the execution time of a parallel adder augmented by the carry look-
ahead scheme1 where operands are 64-bit long. Note, the basic building blocks
are a collection of eight full adders, and within each basic building block it takes
1d delay to generate ps and gs, 2d extra delay to generate carries, and 1d extra
delay to generate sums (Show your work in detail).

12)
Calculate the execution time of a 30-bit parallel adder augmented by carry-select
scheme. Note: each basic building block is a cascade of six full adders (show your
work step-by-step).

13)
Calculate the execution time of a 64-bit parallel adder augmented by carry-select scheme.
Note: each basic building block is a cascade of eight full adders (show your work step-by-
step).

14)
Calculate the execution time of a parallel adder augmented by the carry look-
ahead scheme1 where operands are 32-bit long. Note, the basic building blocks are
a collection of six full adders (Show your work in detail).

15)
Figure 1 shows the ith stage logic of a parallel ALU: Where (Ai, Bi and Ci) are the
operands and the carry-in, respectively, and (S2, S1, S0 and M) are the control
signals. Determine under what values of S2, S1, S0, M, and C1 (carry-in to the
right most stage) the ALU performs the following operation:

a) F ← A - B (Why?)
b) F ← B (Why?).
Ai Xi
C i+1
S2

Bi
S1 Yi Fi

S0

Ci Zi

M
Figure 1

16)
Booth algorithm is a technique that allows multiplication of two 2s complement numbers:

a) True or false; On the average Booth algorithm is faster than add-and-shift


algorithm (justify your answer)?
b) Booth algorithm can be extended by checking three bits of multiplicand in one
loop iteration. Compare and contrast traditional Booth algorithm with extended
Booth algorithm.
c) Extended Booth algorithm can be further modified by checking group of more
than three bits in each iterations (say 4, 5, …). However, in practice rarely, Booth
algorithm based on grouping of more than three bits has been implemented. Why
(clear explanation)?
d) Apply extended Booth algorithm (group of 3 bits) to perform the following
multiplication:
a. Multiplier 1000111
b. Multiplicand 1111000

17)
a) The "add and shift" algorithm can be used to multiply two signed numbers
(say A and B) in 1s complement format. Calculate the correction term, where
A is negative and B is positive.
b) Apply your conclusion from part (a) to perform the following operation using
"add and shift" algorithm (show step-by-step operation).
101001
* 010011
Note: numbers are in 1s complement format.

18)
a) The "add and shift" algorithm can be used to multiply two negative numbers
(say A and B) in 2s complement format. Calculate the correction term.
b) Apply your conclusion from part (a) to perform the following operation using
"add and shift" algorithm (show step-by-step operation).
101001
* 110011
Note: numbers are in 2s complement format.

19)
Apply the Column Compression technique to perform the following operation:
111001
* 111101
Note: numbers are in 2s complement format.

20)
a) Apply the Reduction of Summands technique to perform the following
operation (show your work step-by-step):
101011
* 110101
b) Calculate the execution time of a Full Adder Tree when performing a
16*16 multiplication (show your work).

21)
Apply the Column Compression technique to perform the following operation:
101011
* 110101
Note: numbers are in 2s complement format (show your work in detail).

22)
Apply the Column Compression technique to perform the following operation:
111011
* 110111
Note: numbers are in 2s complement format (show your work in detail).

23)
Apply Hurson's scheme to perform the following operation in which numbers are in
2s complement format:
1101101 * 0101101
Show your work and explain each action step-by-step.

24)
Apply the Column Compression technique to perform the following operation (show your
work):
101001
* 111101
25)
a) Draw the block diagram of a "full adder tree" for multiplication of two n-bit
numbers.
b) Discuss the sequence of the operations in a "full adder tree".
c) Calculate the execution time of an 8*8 "full adder tree" (show your work and
explain why).

26)
Apply reduction of summands scheme (using half and full adders) to perform the
following operation:
1100011 * 0110010
Note: operands are unsigned numbers.

27)
Apply reduction of summands scheme (using half and full adders) to perform the
following operation:
1101111 * 0110011
Calculate the execution time of the operation (show the work). Note: operands are
in 2s complement format.

28)
Apply column compression scheme to perform the following operation:
1100011 * 0110010
Calculate the execution time of the operation (show the work).

29)
Apply the Column Compression technique to perform the following operation:
1110111
* 1101011
Note: numbers are in 2s complement format.

30)
Use SRT division method to perform the following operation:
AQ/B where AQ = .00100011
B = .0111
Show step-by-step operation.

31)
Use SRT division method to perform the following operation:
AQ/B where AQ = .00100000
B = .0110
Show step-by-step operation.

32)
Use SRT method to perform
AQ/B where AQ = .0010100100 and
B = .01111 (Show step by step operations.)

33)
Use SRT division method to perform the following operation:
AQ/B where AQ = .11001100
B = .0111
Show step-by-step operation.

34)
As a computer architect, in the design process of an ALU, what initial issues one
have to keep in mind? Name them and discuss about their importance.

35)
a) Explain access gap as clearly as possible.
b) Discuss, in detail, three distinct directions that reduce the access gap.

36)
a) Compare and contrast low-order interleaving against high-order interleaving (I
need clear discussion).
b) A 16-way interleaved memory is used for program storage. It is found that the
branching probability λ of the memory-request queue is 0.25. What is the
average number of instruction words accessed per memory cycle?

37)
Explain what factors degrade the performance of an interleaved memory. Why?

38)
Calculate the average duration of an instruction cycle for a Harvard-machine organization,
where:
instructions are in the form of
R ← R <op> <operand>, and
there are two types of instructions:
Type1: te ≤ ts
Type2: te > ts
te and ts are the instruction execution time and the main memory regeneration time,
respectively.

39)
Within the scope of interleaved memory:
a) What factors affect efficiency the most?
b) Prove part (a).

40)
a) Define term “interleave memory”,
b) Define high-order interleaving,
c) Define low-order interleaving,
d) Compare and contrast high-order interleaving with low-order interleaving,
e) Two issues affect performance of an interleave memory:
a. What are they,
b. Show (proof) how do they affect the effectiveness of the interleave
memory
f) With respect to the part (e), discuss about solutions (one for each case).

41)
a) A memory is n-way interleaved if:
1)
2)
3)
b) Define high-order interleaving,
c) Define low-order interleaving,
d) Address accessible memory can be classified as:
1)
2)
3)
e) Explain access gap as clearly as possible.

42)
Assume we are utilizing a parallel disk (RAID) composed of 6 and 8 disks (# of
data disks available). Calculate space utilization of each configuration for various
redundancy schemes (show your work):
Redundancy Space Utilization%
Configuration 6 disks 8 disks
Level 0
Level 1
Level 0+1
Level 3
Level 4
Level 5
Level 6

43)
Assume we are utilizing a parallel disk (RAID) composed of 6 and 8 disks (# of
data disks available). Calculate space utilization of each configuration for various
redundancy schemes (show your work):
Redundancy Space Utilization%
Configuration 6 disks 8 disks
Level 0
Level 1
Level 0+1
Level 3
Level 4
Level 5
Level 6

44)
“A programmer should avoid the application of branch instructions in a program.”
Clearly name and discuss three architectural concepts that support it.

45) All homework problems and quizzes