CH 4 ALU and Floating Point Arithmetic

CH 4
ARITHMETIC and LOGIC UNIT In this lecture, we will examine how to construct an ALU, and will look at the hardware and function of the various ALU components. We will also spend a little time on floating point representation and functions. ALU Structure is simple: a large MUX (multiplexer) is used to select a function: A Adder * 32 B MUX A AND gate * 32 B Other functions Control bits In terms of VLSI layout, the circuits are fabricated as bit-slice processorsthe circuitry for one bit is designed in a way that the circuit block is repeated in an array 32 times, with connecting lines (carry, shift bits) across bits connecting properly. This is called TILING. Carry in bit A ALU bit slice Bit B Carry out As you might imagine, the design of the ALU is therefore quite simple. We will look at how some of the circuitry can be made fast, how to do multiply, divide and float. bit out ALU out
Adder and subtractor. Key point: subtraction operation uses adder, since 6 - 5 = 6 + (-5). Remember: to negate a number, just complement and add 1. So: +5 => 0101 and 5 => 1010+1=1011 (in 4-bit notation). 0110 6 +1011 +(-5) -------- -----1 0001 1
Carry out of the MSB position is OK (see below)
Hardware implementation of adder/subtractor requires little more complexity than adder: A B Complement Full Adder Sum
Cin for bit 0 ONLY! (add 1) Subtract/!add Two issues: 1) Add, addi, sub and subi all cause exceptions when result is larger than 32 bits. So we need additional circuitry to test for overflow. Rules are simple. AN OVERFLOW EXCEPTION IS GENERATED WHEN: If adding: If both numbers are positive and result is negative If both numbers are negative and result is positive If subtracting: A >=0, B<0 and result negative A < 0, B>=0 and result is positive Note: Zero is considered a positive number (MS bit is 0). 2) You may remember that a 32-bit trickle-carry adder is VERY SLOW! The carry has to propagate from bit 0 through all full adders through to bit
2
31. Thus, more complexity is added via a carry-lookahead scheme. If the speed of a trickle-carry adder is O(n) (which means order of n for n bits, carry-lookahead circuitry can bring the speed down to O(log2 n). Thus, the 32-bit adder is only about 5 times slower than the single bit adder. This speed comes at substantial cost in additional circuitry and layout size. Plus, the adder can no longer be tiled. CARRY-LOOKAHEAD Consider the generate and propagate signals as defined in an adder bit slice: gi = ai * bi Thus, if gi is high, a carry out is GENERATED (A=B=1). pi = ai +bi Else, if pi is high, a carry in is PROPAGATED as carry out (Cout = Cin is ai or bi is high, but not both) In this example: 0101 1100 ------10001 The bit 2 inputs GENERATE a carry while the input pattern for bit 3 PROPAGATES the carry from bit 2. We use these signals to speed up carry generation: c1 = g0 + (p0 * c0) 2 gate levels c2 = g1 + (p1 * g0) + (p1 * p0 * C0) 3 gate levels c3 = g2 + (p2 * g1) + (p2 * p1 * g0) + (p2 * p1 * p0 * c0) 3 gate levels c4 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1* g0) + (p3 * p2 * p1 * p0 * c0) 3 gate levels More than 4 inputs used to make CMOS gates rather slow. Nowadays the number of gates is 8 or 16, but the idea still holds so we will stick with 4 inputs. So, we can now do groups of 4 bits at a time, reducing add time to about the ripple carry speed if we still chain the carries between blocks. To get the logn performance, we have to do carry lookahead at a higher level of abstraction! Consider the higher-level propagates, which propagate a carry only if ALL each bit in the group propagates a carry:
P0 = p3 * p2 * p1 * p0 P1 = p7 * p6 * p5 * p4 P2 = p11 * p10 * p9 * p8 P3 = p15 * p14 * p13 * p12 The higher-level generates are true only if any bit in the group causes a generate: G0 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0) G1 = g7 + (p7 * g6) + (p7 * p6 * g5) + (p7 * p6 * p5 * g4) G2 = g11 + . G3 = g15 + . So that at this level, we can combine the generates and propagates as group generates and propagates: C1 = G0 + (P0 * c0) C2 = G1 + (P1 * G0) + (P1 * P0 * c0) C3 = G2 + (P2 * G1) + (P2 * P1 * G0) + (P2 * P1 * P0 * c0) C4 = G3 + (P3 * G2) + (P3 * P2 * G1) + (P3 * P2 * P1 * G0) + (P3 * P2 * P1 * P0 * c0) This will give us a 16-bit adder; how do we get a 32-bit unit? 1) If fast enough, use two 16 bit units carry propagated together 2) Use a third unit, calculate high 16 bits with and without carry, then use c16 to drive a MUX! Expensive space-wise, time efficient 3) Use a third level of abstraction- simplified because its only 2 units
MULTIPLICATION 32x32 single-cycle integer multiply is another large VLSI circuit! How is it usually implemented? By ITERATIVELY calculating a product, somewhat as you do on paper:
1010 Multiplicand (10) 0110 Multiplier (6) -----0000 1010 1010 0000 ------------0111100 Product_register (60) So the algorithm is: Clear Product_register For (I=0; I<number_bits; I++) { If(LSB of multiplier is 1) Product_register += Multiplicand; Shift Multiplicand one bit to the left; Shift Multiplier one bit to the right; } Notes: The product is twice as wide as the multiplier and multiplicant. Bit shifts are cheap and easy. Hardware block diagram in text. Takes n clocks to do this, where n is number of bits in Multiplicand (assuming add etc. can be done in one clock). Can share the Multiplier and Product_register IF 1) Multiplier is put in LOWER half of Product_register 2) Adder adds Multiplicand to UPPER half of product_register 3) Product_register shifted to the right. As result is shifted right, Multiplier disappears off the edge. Also note that shifting Multiplicand left is similar to shifting Product_register right. See page 257 for the block diagram. The problem with this is that it only works for positive numbers. Booths Algorithm Booths algorithm is trickier, but works in the same number of cycles and works for negative numbers.
Suppose we have the number 00111. This number, 7, can be represented as 8-1, in the following way. Looking at the run of 1s in the number, from LSB to MSB, we can SUBTRACT the value of the LSB 1 (or 1) from a sum, and then ADD the value of the MSB in the run plus one bit position (i.e. 8). Consider 001100 (12): We would subtract 4 from 16 in this case to get 12. We can use this scheme to do multiplication: 0011 0110 -----0000 0011 + 0000 + 0011 ------------0011000 + 1111010 -------------0010010 (3) (6) zero in multiplier first one bit in multiplier another one bit in multiplier first zero bit in multiplier 2s complement subtraction by addition 2nd term (18)
So for 2 * -3: We will use 0010 as the multiplicand, 1101 as our multiplier. When starting out we use 0 as the last shifted out bit: Iteration 0 1 2 3 4 Step Intial Values 10 so Prod = Prod Multiplicand Shift Right 01 so Prod = Prod + Multiplicand Shift Right 10 so Prod = Prod Multiplicand Shift Right 11 = do nothing Shift Right Product 0000 1101 0 1110 1101 0 1111 0110 1 0001 0110 1 0000 1011 0 1110 1011 0 1111 0101 1 1111 0101 1 1111 1010 1
Ans: 1111 1010 = -6 For Booths 00 do nothing

6
01 add b 10 sub b 11 do nothing To finish, there are a number of architectures that do multiplication in one or four clock cycles. How? By doing all the adds in one very substantial circuit. Digital Signal Processor chips (DSP), for example, must be able to do one multiply per clock cycle. It is also possible to pipeline a multiply, but each stage must then have its own adder! Note that multiplying by a power of two is better accomplished by shifting. DIVISION Division is usually a fairly slow beast on most CPUs, in part since designers have realized that integer divides are rare. Float divides can often be more efficiently accomplished by multiplying a fractional value. 10011 Quotient (19) Divisor 0110 01110100 Dividend (116) 0110 1010 0110 116/6 = 19 2/6 1000 0110 10 Remainder (2) So, whats required here? We need to obtain quotient and remainder (get put into MIPS mflo=quotient and mfhi=remainder). Algorithm is similar in form to Multiply: Clear quotient Put divisor in left half of 64-bit divisor_register Initialize remainder register with dividend value For(I=0; I<n_bits; I++) { Remainder_register -= divisor; If(remainder >= 0)
7
Shift quotient register to left, setting new bit to 1; Else { Shift quotient register to left, setting new bit to 0; Remainder_register += divisor; -- reverse operation above } Shift divisor right one bit; } Note that the reversal add can be avoided by saving and restoring old value. Can play same games as in a multiplication to save register space, shifts: Shift remainder to left rather than divisor to right Shift quotient bits into remainder! After termination, remainder will Be in lower half, quotient in upper half. See page 271 for simple block diagram. FLOATING POINT Floating point is represented on almost all architectures using IEEE standard 754. The representation for a 32 bit float and a 64-bit double are somewhat different: Representations are in binary, of the form +/- 1.xxxxxxxx X 2yyyyyy, where binary values are saved for the significand (xxxxxxxxx) and exponent (yyyyyyyyy). Note that the leading 1 is implied; its not explicitly represented by IEEE754 Float representation: +/- exponent, yyyyyyy Bit: 31 23-30 (8 bits) significand xxxxxxxxxxx 0-22 (23 bits)
Double representation: +/Bit: 63 exponent yyyyyyy 52-62 (11 bits) significand xxxxxxxxxxx
0-61 (52 bits)
The significant is represented as fractional binary, so a value of 0110 would represent 0*2-1 + 1*2-2 + 1*2-3 + 0*2-4 = + 1/8
8
The exponent is coded as biased-notation, to simplify exponent calculations: The most negative exponent is coded as 00000000, the most positive as 11111111 (for floats). Thus, Float: yyyyyyyyy = exp+127 (bias is 127) 20 = 0111 1111 Double: yyyyyyyyyyy = exp + 1023 (bias is 1023) 24 = 4 + 1023 = 10000000011 Problem: What value does 1 10000111 011100000000000000000000 Represent in float format? In decimal? Neg number, 10000111 = 28, 0111 = +1/8+1/16= - 1.4375 * 28 = 224.0 FLOATING-POINT ADDITION In order to add (or subtract) numbers, it is necessary to make exponents match, so the number with the smaller exponent is adjusted (if different): 1.001000 * 25 + 1.1011000 * 23 = ? Change second number to: 0.01101100 * 25 by shifting binary point over 2 Then, add significands, and renormalize if necessary: 1.001000 0.011011 -----------1.100011 * 25 (No renormalization needed) Also consider sign of numbers. Subtractor may be necessary if signs different, since we are not dealing with 2s complement! In decimal, must also consider roundoff! In binary, truncation is common. Dedicated hardware used to do this stuff effectively, with separate logic for exponent and significand (p. 285).
MULTIPLICATION Add exponents, multiply significands, truncate and normalize! To add exponents, must subtract the bias, otherwise get two biases: (-5 + 127) + (2 + 127) = (-3 + 127) + 127 example: 1.5 * .75 1.100000 x 20 * 1.1000 x 2-1 1.1 1.1 ---11 11 ----------1001 -> 10.01 = 1.001 x 21 So the overall exponent is 0 1 +1 = 0 (Add two above and normalization) DIVIDE is difficult and time consuming! Not covered here! MIPS uses a separate set of 32 registers for dedicated floating-point instruction set, adjacent pairs used for double precision (64 bit). Add, subtract, multiply, divide; each SINGLE and DOUBLE precision add.s, add.d, sub.s, sub.d, mul.s, mul.d, div.s, div.d Also negate, convert between single and double, and a few more. Floating-point unit is coprocessor1; some of the instructions are structured to be used for other coprocessors as well (future expansion): Transfer data from/to memory: lwc1 $f1, 100($s2), swc1 Conditional branches use result of condition flag: bclt branch_target, bclf branch_target see A-62, A-70++ for ctul floating-point MIPS instruction set
10
Examples: 1) (4.5) What decimal number does 1111 1111 1111 1111 1111 1111 1111 1111 represent? ANS: -1 2) (4.9) Why doesnt MIPS have a subtract immediate instruction? ANS: Since MIPS includes add immediate and since immediates can be positive or negative, subtract would be redundant. 3) (4.15) Given the bit pattern: 0000 0000 0000 0000 0000 0000 0000 0000 What does it represent assuming: a) a twos complement integer? b) an unsigned integer? c) a single precision floating point number d) a MIPS instruction ANS a) 0 b) 0 c) 0.0 d) sll $0, $0, 0 4) (4.28) Show the IEEE 754 binary single and double precision for the floating point number -2/3. ANS: -2/3 = 1.01010101*2-1 Sign: 0 Single exponent= -1 + 127 = 126 Double exponent = -1 + 1023 = 1022 Single = 1 01111110 01010101010101010101010 Double = 1 01111111110 01010101010101010101010101010101 01010101010101010101
11
Homework: 1) (4.2) Convert -1023 into a 32 bit twos complement binary number. 2) (4.6) What decimal number does this binary number represent: 0111 1111 1111 1111 1111 1111 1111 1111? 3) (4.10) Find the shortest sequence of MIPS instructions to determine the absolute value of a twos complement integer. Convert this instruction: abs $t2, t3 $t2 has a copy of $t3 if $t3 is positive, and the twos complement of $t3 if $t3 is negative. 4) (4.14) Given the bit pattern: 1000 1111 1110 1111 1100 0000 0000 0000 What does it represent assuming that it is: a) a twos complement integer? b) an unsigned integer? c) a single precision floating-point number? d) a MIPS instruction? 5) (4.26) Show the IEEE 754 binary representation for the floating point number 10.5 in single and double precision.
12

CH 4 ALU and Floating Point Arithmetic

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

CH 4 ALU and Floating Point Arithmetic

Transféré par

Droits d'auteur :

Formats disponibles

CH 4

Carry out of the MSB position is OK (see below)

Ans: 1111 1010 = -6 For Booths 00 do nothing

0-61 (52 bits)

Vous aimerez peut-être aussi