Vous êtes sur la page 1sur 2

Computer Architecture

Ph.D. Qualifiers Examination - Sample Questions


1. Assuming a direct mapped cache with 16 4-word blocks, label the following references as a hit or a miss and show the contents of the cache after each cycle. Assume that the cache is initially empty. The referenced addresses are 20, 17, 1, 4, 8, 5, 56, 89, 100, 120, 100, 104, 108, 8, 4

2.

(a) Give the names of two RISC and two CISC processors. What are the main characteristics of RISC processors? (b) Define (i) superscalar and (ii) superpipeline concepts. Derive the equation for ideal speedup for a superpipelined processor compared to the sequential processor. Assume N instructions, k stage scalar base pipeline, and superpipeline degree n.

3.

Describe FIFO (first-in-first-out) and LRU (least-recently-used) algorithms for memory replacement. Consider real memory of 3 page frames and show intermediate memory states for the following page reference string 2 3 5 4 1 3 5 6 3 3 using both algorithms. Show all page faults and calculate the page hit ratio. Assume that memory is empty when starting.

4.

(a) Give the routing functions for (i) hypercube network and (ii) shuffle-exchange network. (b) Show the essential components of a vector processor architecture with a diagram. (c) Show the hardware required to generate a 8-phase clock. (d) List important issues in Instruction Set design. (e) Define and differentiate between a multiprocessor and a multicomputer.

5.

Describe the steps involved in floating point addition. Briefly show how you would design a pipelined floating point adder.

6.

A DO loop such as

100

DO 100 i=1, 1000 Z[i] := X[i] * Y[i] end DO

gets transformed into the following set of instructions on a processor: i=0 L1: load X[i] into R1 load Y[i] into R2 R3 <-- R1 * R2 store R3 into memory increment i loop back to L1 if i < 1000 Let X and Y be arrays in IEEE754 Double precision format. Each instruction on the processor is of length 32 bits. The processor has a 1K byte instruction cache and 1K byte data cache with 32-byte blocks and caches are direct mapped. The code is located at address 2048 and the arrays X and Y start at locations 4096 and 8192 respy. Z is put starting at location 16384. The main memory access time is 10 cycles and the cache access time is 1 cycle. The main memory is 4-way interleaved with each module being 32-bit wide. The system bus is also 32-bits wide. Explain how the program can benefit from caching with respect to instruction and data caches and give the cache hit ratios and effective memory access time. 7. Consider k couples of vectors. The ith couple consists of a row vector Ri and a column vector Ci, each of dimension N = 2n. To compute the pairwise inner-product for the ith couple, we perform the following: IP[i] = Ri[j] * Ci[j] Below is the algorithm to perform IP[i] for all i = 1,2,...k. For i = 1 to k do begin IP[i] := 0; For j = 1 to N do IP[i] := IP[i] + Ri[j] * Ci[j]; end (i) Neglecting the initialization, index updating and testing, find the total compute time on a uniprocessor as a function of k and N. Assume that multiplication and addition take same unit time to complete. (ii) Find the compute time if the algorithm is executed in a SIMD machine for each of the two cases: Case 1: Use P= N processing elements to compute IP[i] successively for each couple of vectors Ri, Ci. Case 2: A couple of vectors are allocated to each PE which computes one inner product. The number of PE's is P = k in this case.

Vous aimerez peut-être aussi