Floating Point 6up

Outline
  Fractional numbers
  Floating point scientific notation
Floating Point Representation
  Floating point in binary
  IEEE Floating Point Standard
DCS111 Computer Architecture   Behaviour of Floating Point Numbers
Recap: fractions
  Decimal 5.6710 is
  5 x 100 plus
Fractional Numbers   6 x 10-1 plus
  7 x 10–2
… not whole numbers   Binary 11.0112 is
  1 x 21 plus
  1 x 20 plus
  0 x 2-1 plus Quiz: what is
  1 x 2–2 plus 11.0112 in decimal?
  1 x 2–3
Recap: fractions Recap: fractions

Quiz: what is a third as a Quiz: what is a third as a
decimal: N.NNNNN? decimal: N.NNNNN?
  Third is 0.33333…
  Not all numbers can be represented exactly
(with limited digits)
1
Problem Solution 1 – Fixed Point
  How to hold fractions in computers?   Divide bits between whole and fractional parts
0 0 1 1 1 1 0 1
integer bits fractional bits integer bits fractional bits
Point always Quiz: what is this in

in the same decimal?
place
Solution 1 – Fixed Point Evaluation of Fix Point

  Divide bits between whole and fractional parts   Range versus Accuracy
  High accuracy means low range
  High range means low accuracy
  Has uses
integer bits fractional bits

Quiz:
•  What is maximum number?
  Really just scaled integers
range
•  What is difference between   Software library for fixed point numbers
successive numbers? accuracy   No need for special hardware
Scientific (Exponent) Notation Scientific (Exponent) Notation

3.21 x 105 6.54 x 10-5 3.21 x 105 6.54 x 10-5
Mantissa   321,000 and 0.0000654

Exponent
5 -5
  Same accuracy
  Mantissa is a fraction
  Different magnitude   Exponent is an integer
  Both mantissa and exponent can be negative
Quiz: Write these number as decimal, without exponents
2
Normalisation
Advantage of Scientific Notation
}
  Large range   0.002 x 100
  Constant proportional accuracy (… with   0.2 x 10-2
exceptions)   2.0 x 10-3 all the same value
  20 x 10-4
  Normalised number has 1 digit before the point
Binary Floating Point

  1.01 x 22
  1.1 x 2-2
Floating Point in Binary
  Exponent: positive or negative
  Mantissa: positive or negative
Quiz:
•  Effect of negative mantissa?
•  Effect of negative exponent?
Normalised Binary FP Representation (32 bits)‫‏‬

  Sign bit S
  In normalised binary scientific notation
  Exponent E
  1.mmmm…mmm x 2E
  Mantissa M
  unless the number is 0
  1.mmm…mmm is the mantissa
  E is the exponent
exponent fraction (mantissa)‫‏‬

sign
First digit
always 1
3
Representation (32 bits)‫‏‬ Negative exponents - how?
  Sign bit S – 1 bit
  Aim: ALU (Arithmetic Logic Unit) can reuse
  Exponent E – 8 bits integer machinery
  Mantissa M – 23 bits BUT   Eg, comparison with zero: x > 0
  Easy because of sign bit
  Floating point numbers can be easily classified as
negative or positive
sign
  Comparison of two floating point numbers x<y
not so straightforward...
  (-1)S x 1.M x 2E   choose exponent representation to help
First digit always 1, so
not included
Exponent in 2's Comp ?? Representation of Exponents

  Consider: 1/2 < 1   We want:
  half: 0.1 = 1.0 x 2-1 (normalised)‫‏‬   FP number order to follow (unsigned) bit order
  one: 1.0 = 1.0 x 20 (normalised)‫‏‬   11111111 to represent the highest positive exponent
0 11111111 000 …   Use biased representation
0 00000000 000 …
Bad Design
Bias by N (Excess N)‫‏‬ Bias by N (Excess N)‫‏‬

  Representation of negative numbers used in   Excess 7
floating point numbers
  Numbers in ‘correct’ order 0000 -7 1000 1
0001 -6 1001 2
0010 -5 1010 3
excess-N-rep(X) = unsigned-rep(X + N) 0011 -4 1011 4
0100 -3 1100 5
  Excess 7 0101 -2 1101 6
0110 -1 1110 7
excess-7-rep(-3) = unsigned-rep(-3 + 7)‫‏‬ 0111 0 1111 8
= 0100
excess-7-rep(-7) = 0000 E.g –2 is represented as unsigned(7-2)
excess-7-rep(4) = unsigned-rep(4 + 7)‫‏‬ = unsigned(5)‫‏‬
= 1011 = 0101
4
IEEE 754-1985
  What is IEEE?
  Standard important for
IEEE Standard   exchange of data
  portability of code
  Representation for FP numbers in

  32-bit (single precision)‫‏‬
  64-bit (double precision)‫‏‬
IEEE 32-bit FP IEEE 32-bit FP

  Sign bit S – 1 bit   Sign bit S – 1 bit
  Mantissa M – 23 bits   Mantissa M – 23 bits
  Exponent E – 8 bits
S E M
sign
  Exponent E – 8 bits
  Bias is 127 (-1)S x (1.M) x 2E-127
  Exponents –126 (00000001) to +127 (11111110)‫‏‬
  Exponents 00000000 and 11111111 special
Example 1 – Convert to FP Example 2 – Convert from FP

  Represent 0.312510 = 5/16   What number is represented by:
  5/16 = 1/4 + 1/16 = 0.01012= 1.01*2-2
0 01111101 010000 ... 000
 S = 0
 S = 0
  E = -2 + bias = -2 + 127 = 12510=01111101
  E = 0111 1101 = 12510
  M = 010....000
  Real exponent = E-bias = 125-127 = -2
  M = 1/4
  (-1)S x (1+M) x 2E-bias
0 01111101 010000 ... 000 = (1 + 1/4) x (1/4)
= 5/16
5
Quiz IEEE FP Extra’s
  What are   Zero
  Both E and M = zero
0 10000001 111000 ... 000   Can be positive or negative
1 01111001 011000 ... 000   +/- Infinity (exponent all 1's)‫‏‬

  De-normalised numbers
  E=0
  Convert to 32 FP using IEEE
  close to zero, exponent is -126
  4.125
  -7.625
Overflow and Underflow

  Overflow
Behaviour of Floating Point   Results too large (positive or negative) to be
Numbers represented
  Underflow
  Result too close to zero (positive or negative) to be
represented
Range – 32 bit FP Range – 32 bit FP

negative zero positive negative zero positive
smallest smallest positive (>0) largest smallest smallest positive (>0) largest
largest negative largest negative
  Quiz: find the largest and smallest FP in IEEE   Largest/smallest +/- (2 – 223) x 2127 ≈ 1038
32-bit   Near zero (normalised numbers)‫‏‬
  +/- 1.0 x 2-126
6
How do they behave? Summary
  If x, y are positive is:   FP scientific notation
  x+y>x ?   Normalised representation in binary
  If x and y are different can:   Bias to represent -ve to +ve range in exponent
  x–y=0?   Notice how a 32-bit binary number can
  Do these rules hold: represent many different entities in memory
  (x + y) + z = x + (y + z) ?   Underflow as well as overflow
  (x * y) * z = x * (y * z) ?
  x * (y + z) = x*y + x*z ?
Different evaluation orders have different rounding errors

Floating Point 6up

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Floating Point 6up

Transféré par

Droits d'auteur :

Formats disponibles

Outline

Recap: fractions Recap: fractions

integer bits fractional bits integer bits fractional bits

Point always Quiz: what is this in

Solution 1 – Fixed Point Evaluation of Fix Point

integer bits fractional bits

Scientific (Exponent) Notation Scientific (Exponent) Notation

Mantissa 321,000 and 0.0000654

Normalised number has 1 digit before the point

Binary Floating Point

Normalised Binary FP Representation (32 bits)‫‏‬

exponent fraction (mantissa)‫‏‬

Exponent in 2's Comp ?? Representation of Exponents

0 11111111 000 … Use biased representation

Bias by N (Excess N)‫‏‬ Bias by N (Excess N)‫‏‬

Representation for FP numbers in

IEEE 32-bit FP IEEE 32-bit FP

Example 1 – Convert to FP Example 2 – Convert from FP

1 01111001 011000 ... 000 +/- Infinity (exponent all 1's)‫‏‬

Overflow and Underflow

Range – 32 bit FP Range – 32 bit FP

Different evaluation orders have different rounding errors

Vous aimerez peut-être aussi

Mantissa   321,000 and 0.0000654

  Normalised number has 1 digit before the point

0 11111111 000 …   Use biased representation

  Representation for FP numbers in

1 01111001 011000 ... 000   +/- Infinity (exponent all 1's)‫‏‬