Académique Documents
Professionnel Documents
Culture Documents
March 9, 2017
1 Introduction
2 Binary numbers
3 Error Analysis
(a) Model the probable evolution of (b) Model and simulate the growth
a pathology of a tumor
1 Introduction
2 Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3 Error Analysis
Let N denote a positive integer; then the digits a0 , a1 , ..., ak exist so that
N has the base 10 expansion
Base 10 expansion
So that:
Base 2 expansion
N = (bJ × 2J ) + (bJ−1 × 2J−1 ) + · · · + (b1 × 21 ) + (b0 × 20 ), (2)
1 Introduction
2 Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3 Error Analysis
1 1 0 0 0 0 1 1 0 1 1
b10 b9 b8 b7 b6 b5 b4 b3 b2 b1 b0
𝒌 𝟔𝟗𝟕 𝑸𝒌 𝑹𝒌
0 697/2= 348 1
1
2
3
4
5
6
7
8
9
b9 b8 b7 b6 b5 b4 b3 b2 b 1 b0
𝒌 𝟔𝟗𝟕 𝑸𝒌 𝑹𝒌
0 697/2= 348 1
1 348/2= 174 0
2 174/2= 87 0
3 87/2= 43 1
4 43/2= 21 1
5 21/2= 10 1
6 10/2= 5 0
7 5/2= 2 1
8 2/2= 1 0
9 1/2= 0 1
1 0 1 0 1 1 1 0 0 1
b9 b8 b7 b6 b5 b4 b3 b2 b 1 b0
1 Introduction
2 Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3 Error Analysis
1
For example, in = 0.3 , the symbol 3 means that the digit 3 is repeated
3
forever to form an infinite repeating decimal.
1
But, the number is the shorthand notation for the infinite series S
3
Definition 1.
The infinite series S
∞
X
S= crn = c + cr + cr2 + · · · + crn + · · · , (4)
n=0
Definition 1.
The infinite series S
∞
X
S= crn = c + cr + cr2 + · · · + crn + · · · , (4)
n=0
∞ n
X 1
which is equal to − 7 + 7 ,
7
n=0
∞ n
X 1
which is equal to − 7 + 7 ,
7
n=0
7 7
and acording with (5) S = −7 + = = 1.16,
1 6
1−
7
7
Then, is the shorthand notation for the infinite series S
6
1 Introduction
2 Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3 Error Analysis
Binary fractions
R = (d1 × 2−1 ) + (d2 × 2−2 ) + · · · + (dn × 2−n ) + · · · , (6)
Solution
d1 d2 d3 d4 d5 d6 d7 d8 d9
…
𝑗 0.6 𝐹𝑗 𝑑𝑗 𝑓𝑟𝑎𝑐
1 (0.6)(2) = 1.2 1 0.2
2 (0.2)(2) = 0.4 0 0.4
3 (0.4)(2) = 0.8 0 0.8
4 (0.8)(2) = 1.6 1 0.6
5 (0.6)(2) = 1.2 1 0.2
6 (0.2)(2) = 0.4 0 0.4
7 (0.4)(2) = 0.8 0 0.8
8 (0.8)(2) = 1.6 1 0.6
9 (0.6)(2) = 1.2 1 0.2
…
0.6 = 0. 1001
1 2 1
= −1 + = −1 + = .
1 3 3
1−
4
1 2 1
= −1 + = −1 + = .
1 3 3
1−
4
1
then, is the 10 rational number associated to 0.012
3
Professor PhD Henry Arguello Fuentes Numerical methods March 9, 2017 21 / 78
Contents
1 Introduction
2 Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3 Error Analysis
Let R be
R = 0.00000110002 . (7)
Let R be
R = 0.00000110002 . (7)
Let R be
R = 0.00000110002 . (7)
1 Introduction
2 Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3 Error Analysis
1 Introduction
2 Binary numbers
Base 2 numbers
Base 2 representation of the integer N
Sequences and Series
Binary Fractions
Binary shifting
Scientific Notation
Machine Numbers
3 Error Analysis
Machine Numbers
The sign is always one bit where, S = 0 if, x > 0 and S = 1, if x < 0.
The amount of bits for the exponent and the mantissa depends on
the precision of the machine.
Possible cases:
111011.00112
132.2812510
132 2 13210
0.2812510
LBS 0 66 2
0.28125×2 = 0.5625 0 MSB
0 33 2
1 16 2 0.375×2 = 1.125 1
0 8 2
0.125×2 = 0.25 0
0 4 2
0 2 2 0.25×2 = 0.5 0
0 1 2 0.5×2 = 1.0 1 LBS
MSB 1 0
10000100.010012
Where,
S = The sign
E = Exponent
127 = Bias
dj = Bits of the mantissa
S
E
M
0
01010010
01101000000100100000000
S
E
M
0
01010010
01101000000100100000000
23
!
X
−i
value = (−1) S
1+ d(23−i) 2 × 2(E−127)
i=1
S
E
M
0
01010010
01101000000100100000000
23
!
X
−i
value = (−1) S
1+ d(23−i) 2 × 2(E−127)
i=1
In this example:
S=0
P23
1 + i=1 d(23−i) 2−i = 1 + 2−2 + 2−3 + 2−5 + 2−12 + 2−15 = 1.4065246582
1 4
+26 )−127)
2(E−127) = 2((2 +2 = 282−127 = 2−45
S
E
M
0
01010010
01101000000100100000000
23
!
X
−i
value = (−1) S
1+ d(23−i) 2 × 2(E−127)
i=1
In this example:
S=0
P23
1 + i=1 d(23−i) 2−i = 1 + 2−2 + 2−3 + 2−5 + 2−12 + 2−15 = 1.4065246582
1 4
+26 )−127)
2(E−127) = 2((2 +2 = 282−127 = 2−45
Thus
value = 1.4065246582 × 2−45
Professor PhD Henry Arguello Fuentes Numerical methods March 9, 2017 36 / 78
Floating-point format
S
E
M
1
10000100
01000000000000000000000
31
30
23
22
0
In this example:
S=1
P23
1 + i=1 d(23−i) 2−i = 1 + 2−2 = 1.25
2(E−127) = 2(132−127) = 25
S
E
M
1
10000100
01000000000000000000000
31
30
23
22
0
In this example:
S=1
P23
1 + i=1 d(23−i) 2−i = 1 + 2−2 = 1.25
2(E−127) = 2(132−127) = 25
Thus
value = 1.25 × 25 = −40.
1 Introduction
2 Binary numbers
3 Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Definition 2.
Suppose that b p is an approximation to p. The absolute error is
Ep = |p − b
p|, and the relative error is Rp = |p − b
p|/|p|, provided that
p 6= 0.
Definition 2.
Suppose that b p is an approximation to p. The absolute error is
Ep = |p − b
p|, and the relative error is Rp = |p − b
p|/|p|, provided that
p 6= 0.
The absolute error is the difference between the true value and
the approximate value.
The relative error expresses the error as a percentage of the true
value.
Example: Find the absolute and relative error in the following three
cases:
Example: Find the absolute and relative error in the following three
cases:
Example: Find the absolute and relative error in the following three
cases:
Observe that as |p| moves away from 1 (greater than or less than) the
relative error Rp is a better indicator than Ep of the accuracy of the ap-
proximation.
Definition 3.
The number bp is said to approximate p to d significant digits if d is the
largest nonnegative integer for which
|p − p|
b 101−d
< .
|p| 2
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10− 3
|2.1645|
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10− 3
|2.1645|
101−0
if d = 0: 2.07900 × 10− 3 < 2 = 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10− 3
|2.1645|
101−0
if d = 0: 2.07900 × 10− 3 < 2 = 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
101−1
if d = 1: 2.07900 × 10− 3 < 2 = 0.5 Xsatisfies
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10− 3
|2.1645|
101−0
if d = 0: 2.07900 × 10− 3 < 2 = 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
101−1
if d = 1: 2.07900 × 10− 3 < 2 = 0.5 Xsatisfies
101−2
if d = 2: 2.07900 × 10− 3 < 2 = 0.05 Xsatisfies
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10− 3
|2.1645|
101−0
if d = 0: 2.07900 × 10− 3 < 2 = 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
101−1
if d = 1: 2.07900 × 10− 3 < 2 = 0.5 Xsatisfies
101−2
if d = 2: 2.07900 × 10− 3 < 2 = 0.05 Xsatisfies
101−3
if d = 3: 2.07900 × 10− 3 < 2 = 0.005 Xsatisfies
Example:
Let ŵ be the approximation for w = 2.1645, then
|2.1645 − 2.16|
= 2.07900 × 10− 3
|2.1645|
101−0
if d = 0: 2.07900 × 10− 3 < 2 = 5 Xsatisfies. However, as we need
to find the largest integer d, we need to continue..
101−1
if d = 1: 2.07900 × 10− 3 < 2 = 0.5 Xsatisfies
1−2
if d = 2: 2.07900 × 10− 3 < 102 = 0.05 Xsatisfies
1−3
if d = 3: 2.07900 × 10− 3 < 102 = 0.005 Xsatisfies
1−4
if d = 4: 2.07900 × 10− 3 < 102 = 0.0005 X does not satisfy
Then, ŵ approximate w to 3 significant digits.
Other examples:
1 Introduction
2 Binary numbers
3 Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
2 x4 x6 x8 x2n
ex = 1 + x 2 + + + + ··· + + ···
2! 3! 4! n!
x4 x6 x8
might be replaced with just the first five terms 1 + x2 + + + .
2! 3! 4!
Then a truncation error appears.
1 Introduction
2 Binary numbers
3 Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Example:
Consider p expressed in normalized decimal form:
Example:
Consider p expressed in normalized decimal form:
Example:
22
The real number p = = 3.142857142857142857... has the following
7
six-digit representations:
1 Introduction
2 Binary numbers
3 Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
For g(x)
500
g(500) = √ √
501 + 500
500 500
= = 11.1748.
22.3830 + 22.3607 44.7437
For g(x)
500
g(500) = √ √
501 + 500
500 500
= = 11.1748.
22.3830 + 22.3607 44.7437
The second function, g(x), is algebraically equivalent to f (x), but the answer,
g(500) = 11.1748, involves less error and it is the same as that obtained by
rounding the true 11.174755300747198... to six digits.
Professor PhD Henry Arguello Fuentes Numerical methods March 9, 2017 54 / 78
Loss of Significance
Example: Compare the results of calculating f (0.01) and P(0.01) using six
digits and rounding, where
ex − 1 − x 1 x x2
f (x) = and P(x) = + +
x2 2 6 24
The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded
about x = 0.
ex − 1 − x 1 x x2
f (x) = and P(x) = + +
x2 2 6 24
The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded
about x = 0.
For the first function
e0.01 − 1 − 0.01 1.010050 − 1 − 0.01
f (0.01) = = = 0.5.
(0.01)2 0.001
ex − 1 − x 1 x x2
f (x) = and P(x) = + +
x2 2 6 24
The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded
about x = 0.
For the first function
e0.01 − 1 − 0.01 1.010050 − 1 − 0.01
f (0.01) = = = 0.5.
(0.01)2 0.001
ex − 1 − x 1 x x2
f (x) = and P(x) = + +
x2 2 6 24
The function P(x) is the Taylor polynomial of degree n = 2 for f (x) expanded
about x = 0.
For the first function
e0.01 − 1 − 0.01 1.010050 − 1 − 0.01
f (0.01) = = = 0.5.
(0.01)2 0.001
1 Introduction
2 Binary numbers
3 Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
For sequences
Definition 5.
Let xn = 1∞ and yn = 1∞ be two sequences. The sequence xn is said
to be of order big Oh of yn , denoted xn = O(yn ), if there exist constants
C and N such that
For sequences
Definition 5.
Let xn = 1∞ and yn = 1∞ be two sequences. The sequence xn is said
to be of order big Oh of yn , denoted xn = O(yn ), if there exist constants
C and N such that
Example:
n2 − 1 n2 − 1 n2
1 1
=O , since ≤ = whenever n ≥ 1.
n3 n n3 n3 n
Definition 6.
Assume that f (h) is approximated by the function p(h) and there exist a
real constant M > 0 and a positive integer n so that
|f (h) − p(h)|
≤ M for sufficiently small h. (13)
hn
We say that p(h) approximates f (h) with order of approximation O(hn )
and write
f (h) = p(h) + O(hn ) (14)
When relation (13) is rewritten in the form |f (h) − p(h)| ≤ M|hn |, we see
that the notation O(hn ) stands in place of the error bound M|hn |.
Additional properties:
(i) O(hp ) + O(hp ) = O(hp ),
(ii) O(hp ) + O(hq ) = O(hr ), where r = min(m, n), and
(iii) O(hp )O(hq ) = O(hs ), where s = p + q.
Example:
Consider the Taylor polynomial expansions
h2 h3 h2 h4
eh = 1+h+ + +O(h4 ) and cos(h) = 1 − + + O(h6 ).
2! 3! 2! 4!
Example:
Consider the Taylor polynomial expansions
h2 h3 h2 h4
eh = 1+h+ + +O(h4 ) and cos(h) = 1 − + + O(h6 ).
2! 3! 2! 4!
h2 h3 h2 h4
eh + cos(h) =1 + h + + + O(h4 ) + 1 − + + O(h6 )
2! 3! 2! 4!
h3 h4
=2+h+ + O(h4 ) + + O(h6 )
3! 4!
h4
Since O(h4 ) + = O(h4 ) and O(h4 ) + O(h6 ) = O(h4 ), this reduces to
4!
h3
eh + cos(h) = 2 + h + + O(h4 ),
3!
and the order of approximation is O(h4 ).
−5h4 h5 h6 h7
− + + + O(h6 ) + O(h4 ) + O(h10 )
24 24 48 144
Since O(h0 ) + O(h4 ) + O(h10 ) = O(h4 ), the preceding equation is
simplified to yield
h3
eh cos(h) = 1 + h + + O(h4 ),
3
and the order of approximation is O(h4 ).
Convergence of a sequence
Definition 7.
Suppose that limn−→∞ xn = x and {rn }∞ n=1 is a sequence with
limn−→∞ rn = 0. We say that {xn }∞
n=1 converges to x with the order
of convergence O(rn ), if there exists a constant K ≥ 0 such that
|xn − x|
≤ K for n sufficiently large. (19)
|rn |
Definition 7.
Example:
Let xn = cos(n)/n2 and rn = 1/n2 then,
limn−→∞ xn = 0
1 Introduction
2 Binary numbers
3 Error Analysis
Absolute and relative error
Truncation Error
Round-off Error
Loss of Significance
Order of Approximation
Propagation of Error
Addition consider two numbers p and q (the true values) with the
approximate values b p and bq, which contains errors p and q ,
respectively. Starting with p = b
p + p and q = b
q + q , the sum is
p + q = (b
p + p ) + (b
q + q ) = (b
p+b
q) + (p + q ). (20)
Hence, for addition, the error in the sum is the sum of the errors in
the addends.
s = p + q .
pq = (b
p + p )(b
q + q ) = bq+b
pb pp + b
qp + p q . (21)
pq = (b
p + p )(b
q + q ) = bq+b
pb pp + b
qp + p q . (21)
Hence, if bp and bq are larger than 1 in absolute value, the terms bpq and
qp show that there is a possibility of magnification of the original errors
b
p and q . Insights are gained if we look at the relative error. Rearrange
the terms in (21) to get
pq − bq=b
pb pq + b
qp + p q . (22)
pq = (b
p + p )(b
q + q ) = bq+b
pb pp + b
qp + p q . (21)
Hence, if bp and bq are larger than 1 in absolute value, the terms bpq and
qp show that there is a possibility of magnification of the original errors
b
p and q . Insights are gained if we look at the relative error. Rearrange
the terms in (21) to get
pq − bq=b
pb pq + b
qp + p q . (22)
Suppose that b p 6= 0 and b
q 6= 0; then we can divide (22) by pq to obtain
the relative error in the product pq:
pq − b
pb
q pq + b
b qp + p q pq b
b qp p q
Rpq = = = + + . (23)
pq pq pq pq pq
pq − b
pb
q
Rpq = ≈ q /q + p /p + 0 = Rq + Rp . (24)
pq
pq − b
pb
q
Rpq = ≈ q /q + p /p + 0 = Rq + Rp . (24)
pq
This shows that the relative error in the product pq is approximately the
sum of the relative errors in the approximations p b and qb.
A quality that is desirable for any numerical process is that a small error
in the initial conditions will produce small changes in the final result.
An algorithm with this feature is called stable; otherwise, it is called
unstable.
Definition 8.
Suppose that represents an initial error and (n) represents the growth
of the error after n steps. If |(n)| ≈ n, the growth of error is said to be
linear. If |(n)| ≈ K n , the growth of error is called exponential. If
K > 1, the exponential error growns without bound as n −→ ∞, and if
0 < K < 1, the exponential error diminishes to zero as n −→ ∞.
Example: Show that the following three schemes can be used with finite-
precision arithmetic to recursively generate the terms in the sequence {1/3n }∞
n=0 .
1
r0 = 1 and rn = rn−1 for n = 1, 2, · · · , (25)
3
1 4 1
p0 = 1, p1 = , and pn = pn−1 − pn−2 for n = 1, 2, · · · , (26)
3 3 3
1 10
q0 = 1, q1 = , and qn = qn−1 − qn−2 for n = 1, 2, · · · , (27)
3 3
Formula (25) is obvious. In (26) the difference equation has the general solu-
tion pn = A(1/3n ) + B. This can be verified by direct substitution:
4 1 4 A 1 A
pn−1 − pn−2 = + B − + B
3 3 3 3n−1 3 3n−2
4 3 4 1 1
= − A − − B = A n + B = pn
3n 3n 3 3 3
Setting A = 1 and B = 0 will generate the sequence desired.
Formula (25) is obvious. In (26) the difference equation has the general solu-
tion pn = A(1/3n ) + B. This can be verified by direct substitution:
4 1 4 A 1 A
pn−1 − pn−2 = + B − + B
3 3 3 3n−1 3 3n−2
4 3 4 1 1
= − A − − B = A n + B = pn
3n 3n 3 3 3
Setting A = 1 and B = 0 will generate the sequence desired. In (27) the
difference equation has the general solution qn = A(1/3n ) + B3n . This too
verified by substitution:
10 10 A n−1 A n−2
qn−1 − qn−2 = + B3 − + B3
3 3 3n−1 3n−2
10 9 1
= n
− n A − (10 − 1)3n−1 B = A n + B3n = qn
3 3 3
1
r0 = 0.99996 and rn = rn−1 for n = 1, 2, · · · , (28)
3
4 1
p0 = 1, p1 = 0.33332, and pn = pn−1 − pn−2 for n = 1, 2, · · · ,
3 3
(29)
10
q0 = 1, q1 = 0.33332, and qn = pn−1 − pn−2 for n = 1, 2, · · · ,
3
(30)
In (28) the initial error in r0 is 0.00004, and in (29) and (30) the initial
errors in p1 and q1 are 0.000013. Investigate the propagation of error for
each scheme.
Professor PhD Henry Arguello Fuentes Numerical methods March 9, 2017 75 / 78
Propagation of error
n xn rn pn qn
0 1.0000000000 0.9999600000 1.0000000000 1.0000000000
1 0.3333333333 0.3333200000 0.3333200000 0.3333200000
2 0.1111111111 0.1111066667 0.1110933333 0.1110666667
3 0.0370370370 0.0370355556 0.0370177778 0.0369022222
4 0.0123456790 0.0123451852 0.0123259259 0.0119407407
5 0.0041152263 0.0041150617 0.0040953086 0.0029002469
6 0.0013717421 0.0013716872 0.0013517695 -0.0022732510
7 0.0004572474 0.0004572291 0.0004372565 -0.0104777503
8 0.0001524158 0.0001524097 0.0001324188 -0.0326525834
9 0.0000508053 0.0000508032 0.0000308063 -0.0983641945
10 0.0000169351 0.0000169344 -0.0000030646 -0.2952280648
n xn − rn xn − pn xn − qn
0 0.0000400000 0.0000000000 0.0000000000
1 0.0000133333 0.0000133333 0.0000133333
2 0.0000044444 0.0000177778 0.0000444444
3 0.0000014815 0.0000192593 0.0001348148
4 0.0000004938 0.0000197531 0.0004049383
5 0.0000001646 0.0000199177 0.0012149794
6 0.0000000549 0.0000199726 0.0036449931
7 0.0000000183 0.0000199909 0.0109349977
8 0.0000000061 0.0000199970 0.0328049992
9 0.0000000020 0.0000199990 0.0984149997
10 0.0000000007 0.0000199997 0.2952449999
1.5
4
xn−pn
xn−rn
1
2
0.5
0 0
0 2 4 6 8 10 0 2 4 6 8 10
n n
0.4
0.3
xn−qn
0.2
0.1
0
0 2 4 6 8 10
n
1.5
4
xn−pn
xn−rn
1
2
0.5
0 0
0 2 4 6 8 10 0 2 4 6 8 10
n n
0.4
0.3
xn−qn
0.2
0.1
0
0 2 4 6 8 10
n
The error for {rn } is stable and decreases in an exponential manner. The error
{pn } is stable. The errror for {qn } is unstable and grows at an exponential rate.
Although the error for {pn } is stable, the terms pn −→ 0 as n −→ ∞, so that the
error eventually dominates and teh terms past p8 have no significant digits.
Professor PhD Henry Arguello Fuentes Numerical methods March 9, 2017 78 / 78