Vous êtes sur la page 1sur 9

Name:________________________________

Computer Architecture ELE 475 Midterm Exam Fall 2013 David Wentzlaff (Total Time = 80 Minutes)
This exam is closed-book, closed-notes. Calculators, Laptops, Computers, the Internet, and Cellphones are not allowed. Show your work clearly in the spaces provided in order to get full or partial credit. Excessively long and/or vague answers are subject to point deductions. If you are unclear on the wording/assumptions of a problem, please state your assumptions explicitly and work it through. This exam has a total of 100 points.

EXTRA Directions: This is a timed exam. To take this exam, it is recommended that the student prints out the exam and times taking the exam for 80 minutes on paper. If printing is not available, it is recommended that the student uses paper to record the answers while looking at the exam on the screen. The computer should not be used for any other purpose during the timed exam except to read the exam. This is a closed book exam. You are not allowed to use a calculator, computer, the Internet, Cellphones, or other calculating devices while taking the exam After the exam has been taken on paper, unlimited time can be used to type in or copy-and-paste the results to a machine readable format on Coursera. No answers should be changed, embellished, or added to while the typing in or copy-and-pasting stage. Be sure to type in or copy-and paste your work (show how you developed solution) including how you derived the answer, but clearly state what your final answer is. If you are unclear on the wording/assumptions of a problem, please state your assumptions explicitly and work it through. Each problem and sub-problem should be entered separately. For pipeline diagrams, it is recommended that when inputing the results, a fixed-width font is used in a basic text editor that understands fixed columns. Most word processors do not do this by default, but by using a font like courier or Consolas fixed width-columns can be ensured. This will guarantee that columns line up in the results. If in doubt, use a single underscore, _ , to indicate a blank space. For instance: 0: ADD R1, R2, R3 1: SUB R4, R5, R5 2: ADD R6, R7, R8 F _ _ D F _ X D F M X D W M X

W M

W 1

Name:________________________________ Problem 1) (5 points) What Out-Of-Order processor hardware structure can be used to enforce that instructions commit in order?

Problem 2) (5 points) Register renaming is able overcome which of the three data hazards?

Problem 3) (15 points) How many SRAM bits are needed to implement an 8KB two-way set associative cache with 64B block size? Assume that each line (entry) has a single valid bit and no dirty bits. There is one bit per set for true LRU. Assume that the address size of the machine is 32-bits and that the machine allows for byte addressing.

Name:________________________________ Problem 4) (12 points) Which of the following two processors will execute a program with the given instruction mix faster? Name Frequency CPI for ALU Instructions CPI for Branch Instructions CPI for Memory Instructions Processor A 1GHz. 1 2 1 Processor B 2GHz. 1.5 3 2

Instruction Mix: 50% ALU Instructions 10% Branch Instructions 40% Memory Instructions

Name:________________________________ Problem 5) (15 points) Given a 3-wide in-order processor, draw the optimal pipeline diagram, showing for each instruction, what stage of the pipeline it is in for each cycle for the execution of the code sequence below. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline Y can excute loads, stores, and ALU operations, and pipeline Z can execute loads, stores, and ALU operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. The machine can fetch three instructions per cycle, decode three instructions per cycle, execute three instructions per cycle, and writeback three instructions per cycle but maintains data dependencies. The operand steering logic can steer any operand to any ALU to enable any instruction to reach any pipeline, but the pipelines have restrictions on what instructions each can execute as described above. Assume that there are no alignment restrictions on instructions which can be simultaneously fetched from the instruction memory. Also, assume that instructions stall in the decode stage if there are structural or data hazards and stalling one pipeline does not inhibit the fetching of future instructions. The figure below shows the pipeline with pipeline stage names underlined.

[Figure of Three-Wide In-order Processor Pipeline]

Name:________________________________ Problem 5 Code Sequence:

0: ADDIU R6, R7, 1 1: SUBIU R9, R10, 2 2: LW R11, 0(R12) 3: LW R13, 0(R14) 4: ADD R14, R11, R15 5: SUB R16, R17, R18 6: AND R19, R20, R21 7: LW R22, 4(R19) 8: LW R24, 8(R19) 9: LW R25, 12(R19) 10: LW R26, 16(R19) 11: OR R11, R26, R18 12: AND R13, R17, R29 13: ADDIU R16, R17, 3

Name:________________________________ Problem 6) Part 6a) (15 points) Draw the optimal pipeline diagram for the following code executing on the IO3 processor from lecture as shown below. The IO3 processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions out-of-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, and writeback one result per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline M can excute loads and stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case i to denote if an instruction enters the issue queue, but does not immediately issue. Assume that the issue queue can hold 16 instructions and begins empty.

[Figure of IO3 Processor Pipeline] Part 6a Code Sequence:

0: ADD R15, R2, R3 1: SUB R1, R15, R16 2: ADDIU R11, R10, 1 3: MUL R5, R1, R4 4: MUL R7, R5, R6 5: ADDIU R18, R11, 1 6: ADDIU R14, R18, 1 7: ADDIU R13, R18, 2

Name:________________________________ Part 6b) (10 points) Draw the optimal pipeline diagram for the following code executing on the IO2I processor from lecture as shown below. The IO2I processor fetches instructions in-order, issues instructions out-of-order, writes-back results out-of-order, and commits instructions in-order. Assume the processor can fetch one instruction per cycle, decode one instruction per cycle, issue one instruction per cycle, writeback one result per cycle, and commit one instruction per cycle. Assume full bypassing of values from the respective instruction completion stage to the Decode stage. Assume that pipeline X can execute branches and ALU operations, pipeline L excutes loads, pipeline S executes stores, and pipeline Y can execute multiply operations. Loads have a latency of two cycles and ALU operations have a latency of one cycle. Branches are resolved in X0 and the machine has no branch delay slots and always predicts the fallthrough path. Multiply instructions have a latency of four cycles. Use the named pipeline stages in the figure for your pipeline diagram. The register file has only one write port. Use a lower-case i to denote if an instruction enters the issue queue, but does not immediately issue. Use a lower-case r to denote if an instruction enters the reorder buffer, but does not immediately commit. Assume that the issue queue can hold 16 instructions and begins empty.

[Figure of IO2I Processor Pipeline] Part 6b Code Sequence:

0: ADD R15, R2, R3 1: SUB R1, R15, R16 2: ADDIU R11, R10, 1 3: MUL R5, R1, R4 4: MUL R7, R5, R6 5: ADDIU R18, R11, 1 6: ADDIU R14, R18, 1 7: ADDIU R13, R18, 2

Name:________________________________ Part 6c) (5 points) Would adding register renaming logic enable faster completion of the code sequence used in Part 6a or Part 6b on architectures IO2I or IO3? Explain why or why not.

Name:________________________________ Problem 7) (10 points) The following code is to be executed on a processor with 32 architectural registers. The processor is able to issue instructions out-of-order. The processor is a single issue machine. The processor has different functional unit latencies with multiply instructions having a latency of 4 cycles, ALU operations having a latency of 1 cycles, and loads and stores having a latency of 2 cycles. The processor stalls on WAW and WAR dependencies. Pretend that you are the compiler and perform changes to the following code to increase the performance of the code when executing on this out-of-order processor. Assume that all registers not used are free to be used by the compiler. Modify the code in place or rewrite if needed. Problem 7 Code Sequence:

MUL R5, R6, R7 ADD R8, R5, R6 MUL R10, R13, R8 SW R12, 0(R10) SUB R10, R6, R4 MUL R17, R10, R15 ADDIU R15, R5, 1

Problem 8) (8 points) In a pipelined processor, a single instruction takes the following synchronous exceptions (interrupts): Divide-by-Zero fault and Invalid Opcode. What should the interrupt cause be loaded with and why?

END OF EXAM 9