Académique Documents
Professionnel Documents
Culture Documents
STEVEN HEILMAN
Contents
1.
2.
3.
4.
5.
6.
7.
Review
Introduction
Differentiation in multiple variables
Partial and Directional Derivatives
The Chain Rule in Several Variables
Iterated Derivatives and Clairauts Theorem
Appendix: Notation
1
2
2
3
7
8
10
1. Review
Definition 1.1 (Derivative on the real line). Let E be a subset of R, and let x0 be a
limit point of E, and let f : E R. If the limit
f (x) f (x0 )
.
xx0 ;xEr{x0 }
x x0
lim
exists and converges to a real number L R, then we write f 0 (x0 ) = L and we say that f is
differentiable at x0 . If this limit does not exist, then we say that f is not differentiable
at x0 .
Lemma 1.2. Let E be a subset of R, let f : E R, let x0 E, and let L R. Then the
following two statements are equivalent.
f is differentiable at x0 and f 0 (x0 ) = L.
0 )+L(xx0 ))|
We have limxx0 ;xEr{x0 } |f (x)(f (x
= 0.
|xx0 |
Definition 1.3. Let n be a positive integer. Let x = (x1 , . . . , xn ) Rn . We define the `2
norm kxk of x by
!1/2
n
X
kxk = k(x1 , . . . , xn )k :=
x2i
.
i=1
n
n
X
i=1
xi yi .
So, kxk =
p
hx, xi. We also denote the standard basis vectors e1 , . . . en so that
e1 = (1, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0),
en = (0, . . . , 0, 1).
x Rn .
lim
The following lemma shows that a function can have at most one derivative at an interior
point of E.
Lemma 3.3. Let E be a subset of Rn , let f : E Rm be a function, and let x0 be an interior
point of E. Let La : Rn Rm and let Lb : Rn Rm be linear transformations. Suppose f is
differentiable at x0 with derivative La , and f is differentiable at x0 with derivative Lb . Then
La = Lb .
Exercise 3.4. Prove Lemma 3.3. (Hint: argue by contradiction. Assume that La 6= Lb .
Then there exists a nonzero vector v Rn such that La v 6= Lb v. Then, apply the definition
of the derivative, and try to specialize to the case where x = x0 + tv for some scalar t, in
order to obtain a contradiction.)
Using Lemma 3.3, we can now talk about the derivative of f at interior points x0 , and we
will label this derivative as f 0 (x0 ). That is, if x0 is an interior point of E, then f 0 (x0 ) is the
unique linear transformation from Rn to Rm such that
kf (x) (f (x0 ) + f 0 (x0 )(x x0 ))k
= 0.
lim
xx0 ;xE
kx x0 k
Informally, we therefore have Newtons approximation:
f (x) f (x0 ) + f 0 (x0 )(x x0 ).
Remark 3.5. We sometimes refer to f 0 (x0 ) as the total derivative of f , to distinguish
f 0 (x0 ) from the related directional and partial derivatives.
4. Partial and Directional Derivatives
We now relate the total derivative to the partial and directional derivatives. Let n, m be
positive integers.
Definition 4.1. Let E be a subset of Rn , let f : E Rm be a function, let x0 be an interior
point of E, let v Rn , and let t be a real number. If the limit
f (x0 + tv) f (x0 )
lim
.
t0;t6=0,x0 +tvE
t
exists, we say that f is differentiable in the direction v at x0 , and we denote this limit
by Dv f (x0 ).
f (x0 + tv) f (x0 )
Dv f (x0 ) :=
lim
.
t0;t6=0,x0 +tvE
t
Equivalently, we have
d
Dv f (x0 ) := f (x0 + tv)|t=0 .
dt
3
Note that in this definition we are dividing by the scalar t, so this division is okay, and
Dv f (x0 ) Rm .
Example 4.2. Let f : R2 R2 be defined by f (x1 , x2 ) = (x21 , x22 ). Let x0 := (1, 2) and let
v := (3, 4). We then compute
((1 + 3t)2 , (2 + 4t)2 ) (1, 4)
(1 + 6t + 9t2 , 4 + 16t + 16t2 ) (1, 4)
=
t
t
Therefore,
Dv f (x0 ) = lim (6 + 9t, 16 + 16t) = (6, 16).
= (6 + 9t, 16 + 16t).
t0;t6=0
f
(x0 )
xj
f
(x0 )
xj
or
f (x0 )
xj
f
Note that if f : E Rm , then x
Rm . And if we write f in its components as f =
j
(f1 , . . . , fm ), then
f1
f1
f
(x0 ) =
(x0 ), . . . ,
(x0 ) .
xj
xj
xj
The total derivative and directional derivative are related in the following way.
n
X
j=1
vj ej =
n
X
vj f 0 (x0 )ej =
j=1
n
X
j=1
vj
f
(x0 ).
xj
()
From Exercise 4.5, partial differentiability does not imply differentiability. However, if
the partial derivatives of a function are continuous, then partial differentiability does imply
differentiability. We will use equation () to prove this assertion.
4
L(v1 , . . . , vn ) :=
j=1
vj
f
(x0 ).
xj
Let > 0. We will find > 0 such that, if x satisfies 0 < kx x0 k < , then
kf (x) (f (x0 ) + L(x x0 ))k
< .
kx x0 k
That is, we will show, if x satisfies 0 < kx x0 k < , then
kf (x) (f (x0 ) + L(x x0 ))k < kx x0 k .
Since x0 is an interior point of F , there exists r > 0 such that B(x0 , r) F . Since the
f
is continuous on F for each j {1, . . . , n}, there exists 0 < j < r
partial derivative x
j
f
f
such that k x
(x) x
(x0 )k < /(nm), for every x B(x0 , j ), for every j {1, . . . , n}.
j
j
f
f
Define := minj=1,...,n j . Then k x
(x) x
(x0 )k < /(nm), for every x B(x0 , ), for
j
j
every j {1, . . . , n}.
Let x B(x0 , ), and write x = x0 + v1 e1 + + vn en for some scalars v1 , . . . , vn . Note
that
q
kx x0 k =
v12 + + vn2 .
In particular, we have |vj | kx x0 k for all j {1, . . . , n}. Recall that we need to show
kf (x0 + v1 e1 + + vn en ) f (x0 )
n
X
j=1
vj
f
(x0 )k < kx x0 k .
xj
fi
fi
f
f
(x0 + ti e1 )
(x0 )| k
(x0 + ti e1 )
(x0 )k /(nm).
x1
x1
x1
x1
5
Therefore,
fi
(x0 )v1 | |v1 | /(nm).
x1
Summing this inequality over i {1, . . . , m} and using k(y1 , . . . , ym )k |y1 | + + |ym |, we
have
f
kf (x0 + v1 e1 ) f (x0 )
(x0 )v1 k |v1 | /n kx x0 k /n.
x1
In the last inequality, we used |v1 | kx x0 k.
Using a similar argument, we conclude that
f
(x0 )v2 k kx x0 k /n.
kf (x0 + v1 e1 + v2 e2 ) f (x0 + v1 e1 )
x2
And so on, until we get
f
kf (x0 + v1 e1 + + vn en ) f (x0 + v1 e1 + + vn1 en1 )
(x0 )vn k kx x0 k /n.
xn
Summing these n inequalities and using the triangle inequality kx + yk kxk + kyk, we get
a telescoping sum which finally gives
n
X
f
kf (x0 + v1 e1 + + vn en ) f (x0 )
vj
(x0 )k < kx x0 k .
x
j
j=1
|fi (x0 + v1 e1 ) fi (x0 )
From Theorem 4.7 and Lemma 4.3, if the partial derivatives of a function f : E Rm
exist and are continuous on a set F , then all directional derivatives of f exist at every interior
point x0 of F , and
n
X
f
D(v1 ,...,vn ) f (x0 ) =
vj
(x0 ).
xj
j=1
In particular, if f : E R is a real-valued function, and if we define the gradient f (x0 )
of f at x0 to be the n-dimensional row vector
f
f
(x0 ), . . . ,
(x0 )),
f (x0 ) := (
x1
xn
then we have the formula
Dv f (x0 ) = hf (x0 ), vi.
m
More generally, if f : E R is a function with f = (f1 , . . . , fm ), and x0 is in the interior
of the region where the partial derivatives of f exist and are continuous, then Theorem 4.7
says
!m
n
n
X
X
f
f
i
f 0 (x0 )(v1 , . . . , vn ) =
vj
(x0 ) =
vj
(x0 )
.
x
x
j
j
j=1
j=1
i=1
(x0 )
x1
f2 (x0 )
x1
fi
(x0 )
=
1im
xj
1jn
..
.
fm
(x0 )
x1
6
f1
(x0 )
x2
f2
(x0 )
x2
..
.
..
.
fm
(x0 )
x2
f1
(x0 )
xn
f2
(x0 )
xn
,
..
.
fm
(x0 )
xn
then we have
Dv f (x0 ) = f 0 (x0 )v = Df (x0 )v.
The matrix Df (x0 ) is sometimes called the derivative or the differential of f at x0 . We
still wish to distinguish the matrix Df (x0 ) from the linear transformation f 0 (x0 ), since the
latter is defined in a way which does not depend on the chosen basis of Euclidean space.
5. The Chain Rule in Several Variables
Let n, m, p be positive integers. Recall that if f : X Y and g : Y Z are functions,
then the composition g f : X Z is defined by g f (x) := g(f (x)), for all x X.
Theorem 5.1 (The Chain Rule in Multiple Variables). Let E be a subset of Rn , let
F be a subset of Rm , let f : E F be a function, and let g : F Rp . Let x0 be a point in
the interior of E. Assume that f is differentiable at x0 and that f (x0 ) is in the interior of
F . Assume also that g is differentiable at f (x0 ). Then g f : E Rp is also differentiable
at x0 , and
(g f )0 (x0 ) = g 0 (f (x0 ))f 0 (x0 ).
Remark 5.2. We can intuitively think of the chain rule as follows. From Newtons approximation, we have
f (x) f (x0 ) f 0 (x0 )(x x0 ).
Also, using Newtons approximation again,
g(f (x)) g(f (x0 )) g 0 (f (x0 ))(f (x) f (x0 )).
So, combining these two approximations, we have
g(f (x)) g(f (x0 )) g 0 (f (x0 ))f 0 (x0 )(x x0 ).
That is, (g f )0 (x0 ) = g 0 (f (x0 ))f 0 (x0 ). The rigorous version of this proof irons out the details
inherent in Newtons approximation.
Exercise 5.3.
Let L : Rn Rm be a linear transformation. Show that there exists a real number
M > 0 such that kLxk M kxk, for all x Rn . (Hint: first, using Remark 1.5,
write L in terms of a matrix A. Then, set M to be equal to the sum of the absolute
values of the entries of A. Use the triangle inequality a lot. There are many different
ways to do this exercise, some of which use a different value of M . For example,
you could try using the Cauchy-Schwarz inequality.) In particular, conclude that any
linear transformation L : Rn Rm is continuous.
Let E be a subset of Rn . Assume that f : E Rm is differentiable at an interior
point x0 of E. Then f is also continuous at x0 .
Prove Theorem 5.1. (Hint: it may be helpful to review the proof of the single variable
chain rule. It is probably easiest to use the sequence definition of a limit.)
Example 5.4. Suppose f : Rn Rm is a differentiable function, and xj : R R are
differentiable functions for all j {1, . . . , n}. Then
n
X
d
f
f (x1 (t), . . . , xn (t)) =
x0j (t)
(x1 (t), . . . , xn (t)).
dt
xj
j=1
This follows from the chain rule.
7
a
f
(x)
a
xi x j
xj xi
Define
M := f (ei + ej ) f (ei ) f (ej ) + f (0).
Applying the Fundamental Theorem of Calculus to the ei variable, we have
Z
f
(xi ei + ej )dxi .
f (ei + ej ) f (ej ) =
0 xi
And
Z
f
f (ei ) f (0) =
(xi ei )dxi .
0 xi
Therefore,
Z
f
f
(xi ei + ej )
(xi ei )dxi
M=
xi
0 xi
8
For each xi (0, ), there exists xj [0, ] such that, by the Mean Value Theorem, we
have
f
f
f
(xi ei + ej )
(xi ei ) =
(xi ei + xj ej ).
xi
xi
xj xi
By our choice of (noting that kxi ei + xj ej k < 2), we therefore have
f
f
0
xi (xi ei + ej ) xi (xi ei ) a < .
So, integrating this inequality over xi [0, ], we get
M 2 a0 < 2 .
We can run this same argument with the roles of i and j reversed (noting that M is symmetric
in i, j) to get
M 2 a < 2 .
So, from the triangle inequality, we conclude that
|a a0 | < 2.
Since this inequality holds for all > 0, we conclude that a = a0 , as desired.
7. Appendix: Notation
Let A, B be sets in a space X. Let m, n be a nonnegative integers.
Rn
AB
ArB
Ac
AB
AB
Let (X, d) be a metric space, let x0 X, let r > 0 be a real number, and let E be a subset
of X. Let (x1 , . . . , xn ) be an element of Rn , and let p 1 be a real number.
10
Let f, g : (X, dX ) (Y, dY ) be maps between metric spaces. Let V X, and let W Y .
f (V ) := {f (v) Y : v V }.
1
(W ) := {x X : f (x) W }.
d (f, g) := sup dY (f (x), g(x)).
f
xX
Z
hf, gi := (
f (x)g(x)dx)1/2 .
0
Z 1
p
kf k2 := hf, f i = (
|f (x)|2 dx)1/2
0
Z 1
dL2 (f, g) := kf gk2 = (
|f (x) g(x)|2 dx)1/2 .
0
Let n, m be positive integers, let (e1 , . . . , en ) denote the standard basis of Rn , let E be a
subset of Rn , let f : E Rm be a function, let x0 E be an interior point of E, let v Rn ,
and let j {1, . . . , n}.
f 0 (x0 ) denotes the total derivative of f .
Dv f (x0 ) denotes the derivative of f in the direction v.
f
(x0 ) =
f (x0 ) = Dej f (x0 ).
xj
xj
Let E be a subset of Rn , let f : E R be a function, and let x0 be an interior point of E.
f (x0 ) = (
f
f
(x0 ), . . . ,
(x0 )).
x1
xn
7.1. Set Theory. Let X, Y be sets, and let f : X Y be a function. The function f : X
Y is said to be injective (or one-to-one) if and only if: for every x, x0 V , if f (x) = f (x0 ),
then x = x0 .
The function f : X Y is said to be surjective (or onto) if and only if: for every y Y ,
there exists x X such that f (x) = y.
The function f : X Y is said to be bijective (or a one-to-one correspondence) if
and only if: for every y Y , there exists exactly one x X such that f (x) = y. A function
f : X Y is bijective if and only if it is both injective and surjective.
11
Two sets X, Y are said to have the same cardinality if and only if there exists a bijection
from X onto Y .
UCLA Department of Mathematics, Los Angeles, CA 90095-1555
E-mail address: heilman@math.ucla.edu
12