Vous êtes sur la page 1sur 7

Outline

  Fractional numbers
  Floating point scientific notation
Floating Point Representation
  Floating point in binary
  IEEE Floating Point Standard
DCS111 Computer Architecture   Behaviour of Floating Point Numbers

Recap: fractions
  Decimal 5.6710 is
  5 x 100 plus
Fractional Numbers   6 x 10-1 plus
  7 x 10–2
… not whole numbers   Binary 11.0112 is
  1 x 21 plus
  1 x 20 plus
  0 x 2-1 plus Quiz: what is
  1 x 2–2 plus 11.0112 in decimal?
  1 x 2–3

Recap: fractions Recap: fractions


Quiz: what is a third as a Quiz: what is a third as a
decimal: N.NNNNN? decimal: N.NNNNN?

  Third is 0.33333…
  Not all numbers can be represented exactly
(with limited digits)

1
Problem Solution 1 – Fixed Point
  How to hold fractions in computers?   Divide bits between whole and fractional parts

0 0 1 1 1 1 0 1

integer bits fractional bits integer bits fractional bits

Point always Quiz: what is this in


in the same decimal?
place

Solution 1 – Fixed Point Evaluation of Fix Point


  Divide bits between whole and fractional parts   Range versus Accuracy
  High accuracy means low range
  High range means low accuracy
  Has uses

integer bits fractional bits


Quiz:
•  What is maximum number?
  Really just scaled integers
range
•  What is difference between   Software library for fixed point numbers
successive numbers? accuracy   No need for special hardware

Scientific (Exponent) Notation Scientific (Exponent) Notation


3.21 x 105 6.54 x 10-5 3.21 x 105 6.54 x 10-5

Mantissa   321,000 and 0.0000654


Exponent
5 -5
  Same accuracy
  Mantissa is a fraction
  Different magnitude   Exponent is an integer
  Both mantissa and exponent can be negative
Quiz: Write these number as decimal, without exponents

2
Normalisation
Advantage of Scientific Notation

}
  Large range   0.002 x 100
  Constant proportional accuracy (… with   0.2 x 10-2
exceptions)   2.0 x 10-3 all the same value
  20 x 10-4

  Normalised number has 1 digit before the point

Binary Floating Point


  1.01 x 22
  1.1 x 2-2
Floating Point in Binary
  Exponent: positive or negative
  Mantissa: positive or negative

Quiz:
•  Effect of negative mantissa?
•  Effect of negative exponent?

Normalised Binary FP Representation (32 bits)‫‏‬


  Sign bit S
  In normalised binary scientific notation
  Exponent E
  1.mmmm…mmm x 2E
  Mantissa M
  unless the number is 0
  1.mmm…mmm is the mantissa
  E is the exponent

exponent fraction (mantissa)‫‏‬


sign

First digit
always 1

3
Representation (32 bits)‫‏‬ Negative exponents - how?
  Sign bit S – 1 bit
  Aim: ALU (Arithmetic Logic Unit) can reuse
  Exponent E – 8 bits integer machinery
  Mantissa M – 23 bits BUT   Eg, comparison with zero: x > 0
  Easy because of sign bit
  Floating point numbers can be easily classified as
negative or positive
exponent fraction (mantissa)‫‏‬
sign
  Comparison of two floating point numbers x<y
not so straightforward...
  (-1)S x 1.M x 2E   choose exponent representation to help
First digit always 1, so
not included

Exponent in 2's Comp ?? Representation of Exponents


  Consider: 1/2 < 1   We want:
  half: 0.1 = 1.0 x 2-1 (normalised)‫‏‬   FP number order to follow (unsigned) bit order
  one: 1.0 = 1.0 x 20 (normalised)‫‏‬   11111111 to represent the highest positive exponent

0 11111111 000 …   Use biased representation

0 00000000 000 …

Bad Design

Bias by N (Excess N)‫‏‬ Bias by N (Excess N)‫‏‬


  Representation of negative numbers used in   Excess 7
floating point numbers
  Numbers in ‘correct’ order 0000 -7 1000 1
0001 -6 1001 2
0010 -5 1010 3
excess-N-rep(X) = unsigned-rep(X + N) 0011 -4 1011 4
0100 -3 1100 5
  Excess 7 0101 -2 1101 6
0110 -1 1110 7
excess-7-rep(-3) = unsigned-rep(-3 + 7)‫‏‬ 0111 0 1111 8
= 0100
excess-7-rep(-7) = 0000 E.g –2 is represented as unsigned(7-2)
excess-7-rep(4) = unsigned-rep(4 + 7)‫‏‬ = unsigned(5)‫‏‬
= 1011 = 0101

4
IEEE 754-1985
  What is IEEE?
  Standard important for
IEEE Standard   exchange of data
  portability of code

  Representation for FP numbers in


  32-bit (single precision)‫‏‬
  64-bit (double precision)‫‏‬

IEEE 32-bit FP IEEE 32-bit FP


  Sign bit S – 1 bit   Sign bit S – 1 bit
  Mantissa M – 23 bits   Mantissa M – 23 bits
  Exponent E – 8 bits
S E M
exponent fraction (mantissa)‫‏‬
sign
  Exponent E – 8 bits
  Bias is 127 (-1)S x (1.M) x 2E-127
  Exponents –126 (00000001) to +127 (11111110)‫‏‬
  Exponents 00000000 and 11111111 special

Example 1 – Convert to FP Example 2 – Convert from FP


  Represent 0.312510 = 5/16   What number is represented by:
  5/16 = 1/4 + 1/16 = 0.01012= 1.01*2-2
0 01111101 010000 ... 000
 S = 0
 S = 0
  E = -2 + bias = -2 + 127 = 12510=01111101
  E = 0111 1101 = 12510
  M = 010....000
  Real exponent = E-bias = 125-127 = -2
  M = 1/4
  (-1)S x (1+M) x 2E-bias
0 01111101 010000 ... 000 = (1 + 1/4) x (1/4)
= 5/16

5
Quiz IEEE FP Extra’s
  What are   Zero
  Both E and M = zero
0 10000001 111000 ... 000   Can be positive or negative

1 01111001 011000 ... 000   +/- Infinity (exponent all 1's)‫‏‬


  De-normalised numbers
  E=0
  Convert to 32 FP using IEEE
  close to zero, exponent is -126
  4.125
  -7.625

Overflow and Underflow


  Overflow
Behaviour of Floating Point   Results too large (positive or negative) to be
Numbers represented
  Underflow
  Result too close to zero (positive or negative) to be
represented

Range – 32 bit FP Range – 32 bit FP


negative zero positive negative zero positive

smallest smallest positive (>0) largest smallest smallest positive (>0) largest
largest negative largest negative

  Quiz: find the largest and smallest FP in IEEE   Largest/smallest +/- (2 – 223) x 2127 ≈ 1038
32-bit   Near zero (normalised numbers)‫‏‬
  +/- 1.0 x 2-126

6
How do they behave? Summary
  If x, y are positive is:   FP scientific notation
  x+y>x ?   Normalised representation in binary
  If x and y are different can:   Bias to represent -ve to +ve range in exponent
  x–y=0?   Notice how a 32-bit binary number can
  Do these rules hold: represent many different entities in memory
  (x + y) + z = x + (y + z) ?   Underflow as well as overflow
  (x * y) * z = x * (y * z) ?
  x * (y + z) = x*y + x*z ?

Different evaluation orders have different rounding errors

Vous aimerez peut-être aussi