Académique Documents
Professionnel Documents
Culture Documents
Havard Berland
Department of Mathematical Sciences, NTNU
September 14, 2006
1 / 21
Abstract
Automatic dierentiation is introduced to an audience with
basic mathematical prerequisites. Numerical examples show
the deency of divided dierence, and dual numbers serve to
introduce the algebra being one example of how to derive
automatic dierentiation. An example with forward mode is
given rst, and source transformation and operator overloading
is illustrated. Then reverse mode is briey sketched, followed
by some discussion.
(45 minute talk)
2 / 21
What is automatic dierentiation?
= f
(x)
f(x) {...}; df(x) {...};
human
programmer
symbolic dierentiation
(human/computer)
Automatic
dierentiation
3 / 21
Why automatic dierentiation?
Scientic code often uses both functions and their derivatives, for
example Newtons method for solving (nonlinear) equations;
nd x such that f (x) = 0
The Newton iteration is
x
n+1
= x
n
f (x
n
)
f
(x
n
)
But how to compute f
(x
n
) when we only know f (x)?
Symbolic dierentiation?
Divided dierence?
(x) = lim
h0
f (x + h) f (x)
h
so why not use
f
(x)
f (x + h) f (x)
h
for some appropriately small h?
x
x + h
f (x)
f (x + h)
approximate
exact
5 / 21
Accuracy for divided dierences on f (x) = x
3
desired accuracy
useless accuracy
10
16
10
12
10
8
10
4
10
0
10
16
10
12
10
8
10
4
10
0
10
4
x = 0.001
x = 0.1
x = 10
n
i
t
e
p
r
e
c
i
s
i
o
n
f
o
r
m
u
l
a
e
r
r
o
r
x = 0.1
c
e
n
t
e
r
e
d
d
i
e
r
e
n
c
e
h
error =
f (x+h)f (x)
h
3x
2
1.
But, let d
2
= 0, as opposed to i
2
= 1.
Arithmetic on dual numbers:
(x + xd) + (y + yd) = x + y + ( x + y)d
(x + xd) (y + yd) = xy + x yd + xyd +
=0
x yd
2
(x + xd) (y + yd) = xy + x yd + xyd +
=0
x yd
2
= xy + (x y + xy)d
(x + xd) = x xd,
1
x + xd
=
1
x
x
x
2
d (x = 0)
7 / 21
Polynomials over dual numbers
Let
P(x) = p
0
+ p
1
x + p
2
x
2
+ + p
n
x
n
and extend x to a dual number x + xd.
Then,
P(x + xd) = p
0
+ p
1
(x + xd) + + p
n
(x + xd)
n
= p
0
+ p
1
x + p
2
x
2
+ + p
n
x
n
+p
1
xd + 2p
2
x xd + + np
n
x
n1
xd
= P(x) + P
(x) xd
x + xd =
x +
x
2
x
d x = 0
9 / 21
Conclusion from dual numbers
Derived from dual numbers:
A function applied on a dual number will return its derivative in
the second/dual component.
We can extend to functions of many variables by introducing more
dual components:
f (x
1
, x
2
) = x
1
x
2
+ sin(x
1
)
extends to
f (x
1
+ x
1
d
1
, x
2
+ x
2
d
2
) =
(x
1
+ x
1
d
1
)(x
2
+ x
2
d
2
) + sin(x
1
+ x
1
d
1
) =
x
1
x
2
+ (x
2
+ cos(x
1
)) x
1
d
1
+ x
1
x
2
d
2
where d
i
d
j
= 0.
10 / 21
Decomposition of functions, the chain rule
Computer code for f (x
1
, x
2
) = x
1
x
2
+ sin(x
1
) might read
Original program
w
1
= x
1
w
2
= x
2
w
3
= w
1
w
2
w
4
= sin(w
1
)
w
5
= w
3
+ w
4
Dual program
w
1
= 0
w
2
= 1
w
3
= w
1
w
2
+w
1
w
2
= 0 x
2
+ x
1
1 = x
1
w
4
= cos(w
1
) w
1
= cos(x
1
) 0 = 0
w
5
= w
3
+ w
4
= x
1
+ 0 = x
1
and
f
x
2
= x
1
The chain rule
f
x
2
=
f
w
5
w
5
w
3
w
3
w
2
w
2
x
2
ensures that we can propagate the dual components throughout
the computation.
11 / 21
Realization of automatic dierentiation
Our current procedure:
1. Decompose original code into intrinsic functions
2. Dierentiate the intrinsic functions, eectively symbolically
3. Multiply together according to the chain rule
How to automatically transform the original program into the
dual program?
Two approaches,
f = w
5
= 1 (seed)
w
5
w
4
w
3
B
a
c
k
w
a
r
d
p
r
o
p
a
g
a
t
i
o
n
o
f
d
e
r
i
v
a
t
i
v
e
v
a
l
u
e
s
17 / 21
Jacobian computation
Given F : R
n
R
m
and the Jacobian J = DF(x) R
mn
.
J = DF(x) =
f
1
x
1
f
1
x
n
f
m
x
1
f
m
x
n
Neural networks
www.autodiff.org
Recommended literature: