Académique Documents
Professionnel Documents
Culture Documents
PAUL S CHRIMPF
N OVEMBER 2, 2016
U NIVERSITY OF B RITISH C OLUMBIA
E CONOMICS 526
In this lecture, we will define derivatives for functions on vector spaces. We will show
that all the familiar properties of derivatives the mean value theorem, chain rule, etc
hold in any vector space. We will primarily focus on Rn , but we also discuss infinite
dimensional spaces (will we need to differentiate in them to study optimal control later).
All of this material is also covered in chapter 4 of Carter. Chapter 14 of Simon and Blume
and chapter 9 of Rudins Principles of Mathematical Analysis cover differentiation on
Rn . Simon and Blume is better for general understanding and applications, but Rudin is
better for proofs and rigor.
1. D ERIVATIVES
1.1. Partial derivatives. We have discussed limits of sequences, but perhaps not limits of
functions. To be complete, we define limits as follows.
Definition 1.1. Let X and Y be metric spaces and f : X Y.
lim f ( x ) = c
x x0
where x and x0 X and c Y, means that > 0 > 0 such that d( x, x0 ) < implies
d( f ( x ), c) < .
Equivalently, we could say limx x0 f ( x ) = c means that for any sequence { xn } with
xn x, f ( xn )c.
You are probably already familiar with the derivative of a function of one variable. Let
f : RR. f is differentiable at x0 if
f ( x0 + h ) f ( x0 ) df
lim = ( x0 )
h 0 h dx
exists. Similiarly, if f : V W we define its ith partial derivative as follows.
Definition 1.2. Let f : Rn R. The ith partial derivative of f is
f f ( x01 , ..., x0i + h, ...x0n ) f ( x0 )
( x0 ) = lim .
xi h 0 h
The ith partial derivative tells you how much the function changes as its ith argument
changes.
1
DIFFERENTIAL CALCULUS
f
Example 1.1. Let f : Rn R be a production function. Then we call x the marginal
i
product of xi . If f is Cobb-Douglas, f (k, l ) = Ak l , where k is capital and l is labor,
then the marginal products of capital and labor are
f
(k, l ) = Ak1 l
k
f
(k, l ) = Ak l 1 .
l
1.2. Examples.
u
Example 1.2. If u : Rn R is a utility function, then we call xi the marginal utility of
xi . If u is CRRA,
T 1
c
u(c1 , ..., c T ) = t t
t =1
1
then the marginal utility of consumption in period t is
u
= t c
t .
ct
Example 1.3. The price elasticity of demand is the percentage change in demand di-
vided by the percentage change in its price. If q1 : R3 R is a demand function with
three arguments: own price p1 , the price of another good, p2 , and consumer income,
y. The own price elasticity is
q1 p1
q1 ,p1 = .
p1 q1 ( p1 , p2 , y)
The cross price elasticity is the percentage change in demand divided by the percent-
age change in the other goods price, i.e.
q1 p2
q1 ,p2 = .
p2 q1 ( p1 , p2 , y)
Similarly, the income elasticity of demand is
q1 y
q1 ,y = .
y q1 ( p1 , p2 , y)
1.3. Total derivatives. Derivatives of univariate functions have a number of useful prop-
erties that partial derivatives do not always share. Examples of useful properties include
univariate derivatives giving the slope of a tangent line, the implicit function theorem,
and Taylor series approximations. We would like the derivatives of multivariate func-
tions to have these properties, but partial derivatives are not enough for this.
2
DIFFERENTIAL CALCULUS
(x 2 + y 2) (x y < 0) + (x + y) (x y >= 0)
100
50
-50
-100
-150
-20010
5 10
5
0
0
y
-5
-5 x
-10-10
f f
The partial derivatives of this function at 0 are x (0, 0) = 1 and y (0, 0) = 1. How-
f
ever, there are points arbitrarily close to zero with x ( x, y)
= 2x + 2y. If we were to
try to draw a tangent plane to the function at zero, we would find that we cannot.
Although the partial derivatives of this function exist everywhere, it is in some sense
not differentialable at zero (or anywhere with xy = 0).
Partially motivated by the preceding example, we define the total derivative (or just the
derivative; were saying total to emphasize the difference between partial derivatives
and the derivative).
approaches 0 along the axes. This allows partial derivatives to exist for strange functions
like the one in example 1.4. We can see that the function from the example is not differen-
tiable by letting h approach 0 along a path that switches from xy < 0 to xy 0 infinitely
many times close to 0. The limit in the definition of the derivative does not exist along
such a path, so the derivative does not exist.
Proof. Since f is differentiable at x0 , we can make h = ei t for ei the ith standard basis
vector, and t a scalar. The definition of derivative says that
| f ( x0 + ei t) f ( x0 ) D f x0 (ei t)|
lim = 0.
t 0 ei t
Let
ri ( x0 , t) = f ( x0 + ei t) f ( x0 ) tD f x0 ei
|ri ( x0 ,t)|
and note that limt0 |t|
= 0. Rearranging and dividing by t,
f ( x0 + ei t ) f ( x0 ) r ( x , t)
= D f x0 ei + i 0
t t
and taking the limit
f ( x0 + ei t ) f ( x0 )
lim = D f x0 ei
t 0 t
we get the exact same expression as in the definition of the partial derivative. Therefore,
f
xi = D f x0 ei . Finally, as when we first introduced matrices, we know that linear transfor-
mation D f x0 must be represented by
( )
f f
D f x0 h = x
1
( x0 ) x n
( x 0 ) h.
We know from example 1.4 that the converse of this theorem is false. The existence of
partial derivatives is not enough for a function to be differentiable. However, if the partial
derivatives exist and are continuous in a neighborhood, then the function is differentiable.
Theorem 1.2. Let f : Rn R and suppose its partial derivatives exist and are continuous in
N ( x0 ) for some > 0. Then f is differentiable at x0 with
( )
f f
D f x0 = x ( x0 ) x
1 n
( x 0 ) .
4
DIFFERENTIAL CALCULUS
for some h j between 0 and h j . The partial derivatives are continuous by assumption, so
by making r small enough, we can make
f j 1
f
x j 0 i
( x + h e
i i + h e
j j ) ( x 0 < /n,
)
=1
x j
1.4. Mean value theorem. The mean value theorem in R1 says that f ( x + h) f ( x ) =
f ( x )h for some x between x + h and x. The same theorem holds for multivariate func-
tions. To prove it, we will need a couple of intermediate results.
Definition 1.4. Let f : Rn R. we say that f has a local maximum at x if > 0 such that
f (y) f ( x ) for all y N ( x ).
Next, we need a result that relates derivatives to maxima.
Theorem 1.4. Let f : Rn R and suppose f has a local maximum at x and is differentiable at x.
Then D f x = 0.
Proof. Choose as in the definition of a local maximum. Since f is differentiable, we can
write
f ( x + h) f ( x ) D f x h + r ( x, h)
=
h h
|r ( x,h)|
where limh0 h
= 0. Let h = tv for some v Rn with v = 1 and t R. If D f x v >
f ( x +tv) f ( x ) r ( x,tv)
0, then for t > 0 small enough, we would have |t|
= D f x v + |t| > D f x v/2 >
0 and f ( x + tv) > f ( x ) in contradiction to x being a local maximum. Similary, if D f v v < 0
f ( x +tv) f ( x )
then for t < 0 and small, we would have |t|
= D f x v + r(x,tv
|t|
)
> D f x v/2 > 0
and f ( x + tv) > f ( x ). Thus, it must be that D f x v = 0 for all v, i.e. D f x = 0.
Now we can prove the mean value theorem.
Theorem 1.5 (mean value). Let f : Rn R1 be in continuously differentiable on some open set
U (i.e. f C1 (U ). Let x, y U be such that the line connecting x and y, ( x, y) = {z Rn :
z = x + (1 )y, [0, 1]}, is also in U. Then there is some x ( x, y) such that
f ( x ) f (y) = D f x ( x y).
Proof. Let z(t) = y + t( x y) for t [0, 1] (i.e. t = ). Define
g(t) = f (y) f (z(t)) + ( f ( x ) f (y)) t
Note that g(0) = g(1) = 0. The set [0, 1] is closed and bounded, so it is compact. It is easy
to verify that g(t) is continuously differentiable since f is continuously differentiable .
Hence, g must attain its maximum on [0, 1], say at t. If t = 0 or 1, then either g is constant,
in which case any t (0, 1) is also a maximum, or g must have an interior minimum,
and we can look at the maximum of g instead. When t is not 0 or 1, then the previous
theorem shows that g (t) = 0. Simple calculation shows that
g (t) = D f z(t) ( x y) + f ( x ) f (y) = 0
so
D f x ( x y) = f ( x ) f (y)
where x = z(t).
1.5. Functions from Rn Rm . So far we have only looked at functions from Rn to R.
Functions to Rm work essentially the same way.
Definition 1.5. Let f : Rn Rm . The derivative (or total derivative or differential) of f at
x0 is a linear mapping, D f x0 : Rn Rm such that
f ( x 0 + h ) f ( x 0 ) D f x0 h
lim = 0.
h 0 h
6
DIFFERENTIAL CALCULUS
Theorems 1.1 and 2.1 sill hold with no modification. The total derivative of f can be
represented by the m by n matrix of partial derivatives,
f f1
1
( x 0 ) ( x 0 )
x1 . xn
D f x0 =
.. ..
. .
fm fm
x ( x
1
0 ) xn ( x 0 )
This matrix of partial derivatives is often called the Jacobian of f .
The mean value theorem 1.5 holds for each of the component functions of f : Rn Rm .
( )T
Meaning, that f can be written as f ( x ) = f 1 ( x ) f m ( x ) where each f j : Rn R.
The mean value theorem is true for each f j , but the xs will typically differ with j.
Corollary 1.1 (mean value for Rn Rm ). Let f : Rn Rm be in C1 (U ) for some open U. Let
x, y U be such that the line connecting x and y, ( x, y) = {z Rn : z = x + (1 )y,
[0, 1]}, is also in U. Then there are x j ( x, y) such that
f j ( x ) f j (y) = D f j x ( x y)
j
and
D f 1 x1
f ( x ) f (y) = ... ( x y).
D f m xm
( )T
Slightly abusing notation, we might at times write D f x instead of D f 1 x1 D f m xm
with the understanding that we mean the later.
1.6. Chain rule. For univariate functions, the chain rule says that the derivative of f ( g( x ))
is f ( g( x )) g ( x ). The same is true for multivariate functions.
Theorem 1.6. Let f : Rn Rm and g : Rk Rn . Let g be continuously differentiable on some
open set U and f be continuously differentiable on g(U ). Then h : Rk Rm , h( x ) = f ( g( x )) is
continuously differentiable on U with
Dh x = D f g( x) Dgx
Proof. Let x U. Consider
f ( g( x + d)) f ( g( x ))
.
d
Since g is differentiable by the mean value theorem, g( x + d) = g( x ) + Dgx(d) d, so
f ( g( x + d)) f ( g( x )) =
f ( g( x ) + Dgx(d) d) f ( g( x ))
f ( g( x ) + Dgx d) f ( g( x )) +
where the inequality follows from the the continuity of Dgx and f , and holds for any
> 0. f is differentiable, so
f ( g( x ) + Dgx d) f ( g( x )) D f g( x) Dgx d
lim =0
Dgx d0 Dgx d
7
DIFFERENTIAL CALCULUS
1.7. Higher order derivatives. We can take higher order derivatives of multivariate func-
tions just like of univariate functions. If f : Rn Rm , then is has nm partial first deriva-
tives. Each of these has n partial derivatives, so f has n2 m partial second derivatives,
2 f k
written xi x j .
Theorem 1.7. Let f : Rn Rm be twice continuously differentiable on some open set U. Then
2 f k 2 f k
(x) = (x)
xi x j x j xi
2 f 1
from which it is apparent that we get the same expression for x j xi .
The same argument shows that in general the order of partial derivatives does not
matter.
Corollary 1.2. Let f : Rn Rm be k times continuously differentiable on some open set U. Then
k f k f
j j
= j p (1) j p(n)
x11 xnn x p(1) x p(n)
where in=1 ji = k and p : {1, .., n}{1, ..., n} is any permutation (i.e. reordering).
1This proof is not completely correct. We should carefully show that we can interchange the order of
taking limits. Interchanging limits is not always possible, but the assumed continuity makes it possible
here.
8
DIFFERENTIAL CALCULUS
1.8. Taylor series. You have probably seen Taylor series for univariate functions before.
A function can be approximated by a polynomial whose coefficients are the functions
derivatives.
Theorem 1.8. Let f : RR be k + 1 times continuously differentiable on some open set U, and
let a, a + h U. Then
f 2 ( a) 2 f k ( a) k f k+1 ( a) k+1
f ( a + h) = f ( a) + f ( a)h + h + ... + h + h
2 k! ( k + 1) !
where a is between a and h.
The same theorem is true for multivariate functions.
Theorem 1.9. Let f : Rn Rm be k times continuously differentiable on some open set U and
a, a + h U. Then there exists a k times continuously differentiable function rk ( a, h) such that
k
1 ji f
f ( a + h) = f ( a) +
j j j
j j
( a)h11 h22 hnn + rk ( a, h)
n k! x xn
1 n
i =1
j =1 i 1
k
and limh0 rk ( a, h) h = 0.
Proof. Follows from the mean value theorem. For k = 1, the mean value theorem says
that
f ( a + h) f ( a) = D f a h
f ( a + h) = f ( a) + D f a h
= f ( a) + D f a h + ( D f a D f a )h
| {z }
r1 ( a,h)
Example 1.5. The mean value theorem is used often in econometrics to show asymp-
totic normality. Many estimators can be written as
n arg min Qn ( )
where Qn ( ) is some objective function that depends on the sampled data. Exam-
ples include least squares, maximum likelihood and the generalized method of mo-
ments. Suppose there is also a population version of the objective function, Q0 ( ) and
p
Qn ( ) Q0 ( ) as n. There is a true value of the parameter, 0 , that satisfies
0 arg min Q0 ( ).
For example for OLS,
1 n
n i
Qn ( ) = ( y i x i )2
=1
and [ ]
Q0 ( ) = E (Y X )2 .
If Qn is continuously differentiablea on and n int(), then from theorem 1.4,
DQn = 0
n
aEssentially the same argument works if you expand Q0 instead of Qn . This is sometimes necessary
because there are some models, like quantile regression, where Qn is not differentiable, but Q0 is dif-
ferentiable.
10
DIFFERENTIAL CALCULUS
Lemma 2.2. If f : V W has Gateaux derivatives that are linear in v and continuous in x in
the sense that > 0 > 0 such that if x1 x < , then
d f ( x1 ; v) d f ( x; v)
sup <
v V v
then f is Frechet differentiable with D f x0 v = d f ( x; v).
Example (Example 2.1 continued). Motivated by lemmas 2.1 and 2.2, we can find the
Frechet derivative of f by computing its Gateaux derivatives. Let v V. Remember
12
DIFFERENTIAL CALCULUS
13
DIFFERENTIAL CALCULUS
To show that first order Taylor expansions have small approximation error, we need to
show that ( D f x D f x )h is uniformly small for all possible h. When W is R1 , there is a
single x. When W = Rm , then x is different for each dimension (as in theorem 1.1). How-
ever, this is okay because we just take the maximum over the finitely many different x to
get an upper bound on ( D f x D f x )h. When W is infinite dimensional, this argument
does not work. Instead, we must use a result called the Hahn-Banach extension theorem,
which is closely related to the separating hyperplane theorem.
Theorem 2.2 (Hahn-Banach theorem ). Let V be a vector space and g : V R be convex. Let
S V be a linear subspace. Suppose f 0 : SR is linear and
f 0 ( x ) g( x )
for all x S. Then f 0 can be extended to f : V R such that
f0 (x) = f (x)
14
DIFFERENTIAL CALCULUS
Proof. We will apply the separating hyperplane theorem to two sets in V R. Let
A = {( x, a) : a > g( x ), x V, a R}.
B = {( x, a) : a = f 0 ( x ), x S, a R}
Since
f 0 is linear, B is convex. Clearly A B = . Also, A = because A is open, so
A = A and A = since e.g. ( x, g( x ) + 1) A.
Therefore, by the separating hyperplane theorem, (V R) and c R such that
z A > c z B
( x, y) = ( x, 0) + y (0, 1)
( x, y)
= f (x) + y
( x, 0)
If x S, then ( x, f 0 ( x )) = 0, so f ( x ) = f 0 ( x ) for all x S. Similarly for any y > g( x ),
( x,y)
then ( x, y) A, so ( x,0) = f ( x ) + y > 0, and y > f ( x ). Therefore, f ( x ) g( x ).
With this result in hand, we now can now prove the mean value theorem for arbitrary
Banach spaces.
Proof. Step 1: use the Hahn Banach theorem to show that W with = 1 and
f ( x ) f (y) = ( f ( x ) f (y)). The relevant f 0 is f 0 (( f ( x ) f (y))) = f ( x ) f (y).
Define g : [0, 1]R by g(t) = ( f (tx ) f ((1 t)y)) and apply the mean value theorem
on R to get
f ( x ) f (y) = D f t x+(1t )y) ( x y)
.
15
DIFFERENTIAL CALCULUS
There are others as well. Again, the right choice of vector space depends on the problem
being considered, and we will not worry about it too much.
Given two Banach spaces (complete normed vector spaces), V and W, let BL(V, W )
denote the set of all linear transformations from V to W. The derivative of f : V W will
be in BL(V, W ). We can show that BL(V, W ) is a vector space, and we can define a norm
on BL(V, W ) by
D BL(V,W ) = sup DvW
v V =1
where V is the norm on V and W is the norm on W. Moreover, we could show that
BL(V, W ) is complete. Thus, BL(V, W ) is also a Banach space. Viewed as function of x,
D f x is a function from V to BL(V, W ). As we just said, there are both Banach spaces, so
can differentiate D f x with respect to x. In this way, we can define the second and higher
derivatives of f : V W.
In the previous section we saw that differentiation for functions on Banach spaces is
the same as for functions on finite dimensional vector spaces. All of our proofs of first
and second order conditions only relied on Taylor expansions and some properties of
linear transformations. Taylor expansions and linear transformations are the same on
Banach spaces as on finite dimensional vector spaces, so our results for optimization will
still hold. Lets just state for the first order condition for equality constraints. The other
results are similar, but stating them gets to be slightly cumbersome.
Theorem 3.1 (First order condition for maximization with equality constraints). Let f :
U R and h : U W be continuously differentiable on U V, where V and W are Banach
spaces. Suppose x interior(U ) is a local maximizer of f on U subject to h( x ) = 0. Suppose
that Dh x : V W is onto. . Then, there exists BL(W, R) such that for
L( x, ) = f ( x ) h( x ).
we have
Dx L( x , ) = D f x Dh x = 0BL(V,R)
D L( x , ) =h( x ) = 0W
16
DIFFERENTIAL CALCULUS
There are a few differences compared to the finite dimensional case that are worth com-
menting on. First, in the finite dimensional case, we had h : U Rm , and the condition
that rankDh x = m. This is the same as saying that the Dh x : Rn Rm is onto. Rank is
not well-defined in infinite dimension, so we now state this condition as Dh x being onto
instead of being rank m.
Secondly, previously Rm , and the Lagrangian was
L( x, ) = f ( x ) T h( x ).
Viewed as a 1 by m matrix, T is a linear transformation from Rm to R. Thus, in the
abstract case, we just say BL(W, R), which as when we defined transposes, is called
the dual space of W and is denoted W .
Finally, we have subscripted the zeroes in the first order condition with BL(V, R) and
W to emphasize that the first equation is for linear transformations from V to R, and the
second equation is in W. D f x is a linear transformation from V to R. Dh x goes from V
to W. goes from W to R, so composed with Dh x , which we just denoted by Dh x is
a linear transformation from V to R.
4. I NVERSE FUNCTIONS
Suppose f : Rn Rm . If we know f ( x ) = y, when can we solve for x in terms of y? In
other words, when is f invertible? Well, suppose we know that f ( a) = b for some a Rn
and b Rm . Then we can expand f around a,
f ( x ) = f ( a) + D f a ( x a) + r1 ( a, x a)
where r1 ( a, x a) is small. Since r1 is small, we can hopefully ignore it then y = f ( x ) can
be rewritten as a linear equation:
f ( a) + D f a ( x a) = y
D f a x = y f ( a) + D f a a
( )
we know that this equation has a solution if rankD f a = rank D f a y f ( a) + D f a a . It
has a solution for any y if rankD f a = m. Moreoever, this solution is unique if rankD f a =
n. This discussion is not entirely rigorous because we have not been very careful about
what r1 being small means. The following theorem makes it more precise.
Theorem 4.1 (Inverse function). Let f : Rn Rn be continuously differentiable on an open set
E. Let a E, f ( a) = b, and D f a be invertible . Then
(1) there exist open sets U and V such that a U, b V, f is one-to-one on U and f (U ) =
V, and
( ) 1
(2) the inverse of f exists and is continuously differentiable on V with derivative D f f 1 ( x) .
The open sets U and V are the areas where r1 is small enough. The continuity of f and
its derivative are also needed to ensure that r1 is small enough. The proof of this theorem
is a bit long, but the main idea is the same as the discussion preceding the theorem.
17
DIFFERENTIAL CALCULUS
Comment 4.1. The proof uses the fact that the space of all continuous linear transfor-
mations between two normed vector spaces is itself a vector space. I do not think we
have talked about this before. Anyway, it is a useful fact that already came up in the
proof that continuous Gateaux differentiable implies Frechet differentiable last lec-
ture. Let V and W be normed vector spaces with norms V and W . Let BL(V, W )
denote the set of all continuous (or equivalently bounded) linear transformations
from V to W. Then BL(V, W ) is a normed vector space with norm
AvW
A BL sup .
v V v V
This is sometimes called the operator norm on BL(V, W ). Last lecture, the proof that
Gateaux differentiable implies Frechet differentiable required that the mapping from
V to BL(V, W ) defined by D f x as a function of x V had to be continuous with
respect to the above norm.
We will often use the inequality,
AvW A BL vV ,
which follows from the definition of BL . We will also use the fact that if V is finite
dimensional and f ( x, v) : V V W, is continuous in x and v and linear in v for each
x, then f ( x, ) : V BL(V, W ) is continuous in x with respect to BL .
= D f a1 ( D f a D f x ).
Since D f x is continuous (as a function of x) if we make U small enough, then D f a D f x
will be near 0. Let = 1
. Choose U small enough that D f a D f x < for all
2 D f a1 BL
x U. From above, we know that
y ( x1 ) y ( x2 ) =
D f a1 ( D f a D f x )( x1 x2 )
y
D x
BL D f a D f x BL x1 x2
1
x1 x2 (7)
2
For any y f (U ) we can start with an arbitrary x1 U, then create a sequence by setting
x i +1 = y ( x i ).
From (7), this sequence satisfies
1
x i +1 x i x x i 1 .
2 i
18
DIFFERENTIAL CALCULUS
Using this it is easy to verify that xi form a Cauchy sequence, so it converges. The limit
satisfy y ( x ) = x, i.e. f ( x ) = y. Moreover, this x is unique because if y ( x1 ) = x1 and
y ( x2 ) = x2 , then we have x1 x2 12 x1 x2 , which is only possible if x1 = x2 . 3
Thus for each y f (U ), there is exactly one x such that f ( x ) = y. That is, f is one-to-one
on U. This proves the first part of the theorem and that f 1 exists.
We now show that f 1 is continuously differentiable with the stated derivative. Let
y, y + k V = f (U ). Then x, x + h U such that y = f ( x ) and y + k = f ( x + h). With
y as defined above, we have
y ( x + h) y ( x ) =h + D f a1 ( f ( x ) f ( x + h))
=h D f a1 k
By 7,
h D f a1 k
12 h. It follows that
D f a1 k
12 h and
h 2
D f a1
k = 1 k .
BL
Importantly as k 0, we also have h0. Now,
1
f (y + k ) f 1 (y) D f x1 k
D f x1 ( f ( x + h) f ( x ) D f x h)
=
k k
f ( x + h) f ( x ) D f x h
D f x 1
h
1
f (y + k) f (y) D f x k
1 f ( x + h) f ( x ) D f x h
1 1
lim lim D f x =0
k 0 k k 0
BL
h
Finally, since D f x is continuous, so is ( D f f 1 (y) )1 , which is the derivative of f 1 .
The proof of the inverse function theorem might be a bit confusing. The important idea
is that if the derivative of a function is nonsingular at a point, then you can invert the
function around that point because inverting the system of linear equations given by the
mean value expansion around that point nearly gives the inverse of the function.
5. I MPLICIT FUNCTIONS
The implicit function theorem is a generalization of the inverse function theorem. In
economics, we usually have some variables, say x, that we want to solve for in terms of
some parameters, say . For example, x could be a persons consumption of a bundle of
goods, and b could be the prices of each good and the parameters of the utility function.
Sometimes, we might be able to separate x and so that we can write the conditions of
our model as f ( x ) = b( ). Then we can use the inverse function theorem to compute
xi
j and other quantities of interest. However, it is not always easy and sometimes not
possible to separate x and onto opposite sides of the equation. In this case our model
3Functions like y that have d(( x ), (y)) cd( x, y) for c < 1 are called contraction mappings. The x
with x = ( x ) is called a fixed point of the contraction mapping. The argument in the proof shows that
contraction mappings have at most one fixed point. It is not hard to show that contraction mappings always
have exactly one fixed point.
19
DIFFERENTIAL CALCULUS
gives us equations of the form f ( x, ) = c. The implicit function theorem tells us when
xi
we can solve for x in terms of and what will be.
j
The basic idea of the implicit function theorem is the same as that for the inverse func-
tion theorem. We will take a first order expansion of f and look at a linear system whose
coefficients are the first derivatives of f . Let f : Rn Rm . Suppose f can be written as
f ( x, y) with x Rk and y Rnk . x are endogenous variables that we want to solve
for, and y are exogenous parameters. We have a model that requires f ( x, y) = c, and we
know that some particular x0 and y0 satisfy f ( x0 , y0 ) = c. To solve for x in terms of y, we
can expand f around x0 and y0 .
f ( x, y) = f ( x0 , y0 ) + Dx f ( x0 ,y0 ) ( x x0 ) + Dy f ( x0 ,y0 ) (y y0 ) + r ( x, y) = c
In this equation, Dx f ( x0 ,y0 ) is the m by k matrix of first partial derivatives of f with respect
to x evaluated at ( x0 , y0 ). Similary, Dy f ( x0 ,y0 ) is the m by n k matrix of first partial
derivatives of f with respect to y evaluated at ( x0 , y0 ). Then, if r ( x, y) is small enough, we
have
f ( x0 , y0 ) + Dx f ( x0 ,y0 ) ( x x0 ) + Dy f ( x0 ,y0 ) (y y0 ) c
( )
Dx f ( x0 ,y0 ) ( x x0 ) c f ( x0 , y0 ) Dy f ( x0 ,y0 ) (y y0 )
which gives x approximately as function of y. The implicit function says that you can
make this approximation exact and get x = g(y). The theorem also tells you what the
derivative of g(y) is in terms of the derivative of f .
Proof. We will show the first part by applying the inverse function theorem. Define F :
Rn+m Rn+m by F ( x, y) = ( f ( x, y), y). To apply the inverse function theorem we must
show that F is continuously differentiable and DF( x0 ,y0 ) is invertible. To show that F is
continuously differentiable, note that
F ( x + h, y + k ) F ( x, y) =( f ( x + h, y + k) f ( x, y), k )
=( D f (x,y) (h k), k)
20
DIFFERENTIAL CALCULUS
where the second line follows from the mean value theorem. It is then apparent that
( ) ( )
h
F ( x + h, y + k) F ( x, y) Dx f ( x,y) Dy f ( x,y)
0 Im k
lim = 0.
(h,k)0 (h, k)
( )
Dx f ( x,y) Dy f ( x,y)
So, DF( x,y) = , which is continuous sinve D f ( x,y) is continuous. Also,
0 Im
DF( x0 ,y0 ) can be shown to be invertible by using the partitioned inverse formula because
Dx f ( x0 ,y0 ) is invertiable by assumption. Therefore, by the inverse function theorem, there
exists open sets U and V such that ( x0 , y0 ) U and (c, y0 ) V, and F is one-to-one on U.
Let W be the set of y Rm such that (c, y) V. By definition, y0 W. Also, W is open
in Rm because V is open in Rn+m .
We can now complete the proof of 1. If y W then (c, y) = F ( x, y) for some ( x, y) U.
If there is another ( x , y) such that f ( x , y) = c, then F ( x , y) = (c, y) = F ( x, y). We just
showed that F is one-to-one on U, so x = x.
We now prove 2. Define g(y) for y W such that ( g(y), y) U and f ( g(y), y) = c, and
G (c, y) = ( g(y), y)
On the other hand, from the inverse function theorem, the derivative of G at ( x0 , y0 ) is
( ) 1
DG( x0 ,y0 ) = DF( x0 ,y0 )
( ) 1
Dx f ( x0 ,y0 ) Dy f ( x0 ,y0 )
=
0 Im
( )
Dx f (x1,y ) Dx f (x1,y ) Dy f ( x0 ,y0 )
= 0 0 0 0
0 Im
In particular,
( ) ( )
Dx f (x1,y ) Dy f (x0 ,y0 ) Dgy0
Dy G(c,y0 ) = 0 0 =
Im Im
6. C ONTRACTION MAPPINGS
One step of the proof the of the inverse function theorem was to show that
1
x1 x2 .
y ( x1 ) y ( x2 )
2
This property ensures that ( x ) = x has a unique solution. Functions like y appear quite
often, so they have name.
Definition 6.1. Let f : Rn Rn . f is a contraction mapping on U Rn if for all x, y U,
f ( x ) f (y) c x y
for some 0 c < 1.
If f is a contraction mapping, then an x such that f ( x ) = x is called a fixed point of the
contraction mapping. Any contraction mapping has at most one fixed point.
Lemma 6.1. Let f : Rn Rn be a contraction mapping on U Rn . If x1 = f ( x1 ) and
x2 = f ( x2 ) for some x1 , x2 U, then x1 = x2 .
Proof. Since f is a contraction mapping,
f ( x1 ) f ( x2 ) c x1 x2 .
f ( xi ) = xi , so
x1 x2 c x1 x2 .
Since 0 c < 1, the previous inequality can only be true if x1 x2 = 0. Thus, x1 =
x2 .
Starting from any x0 , we can construct a sequence, x1 = f ( x0 ), x2 = f ( x1 ), etc. When
f is a contraction, xn xn+1 cn x1 x0 , which approaches 0 as n. Thus, { xn }
is a Cauchy sequence and converges to a limit. Moreover, this limit will be such that
x = f ( x ), i.e. it will be a fixed point.
Lemma 6.2. Let f : Rn Rn be a contraction mapping on U Rn , and suppose that f (U ) U.
Then f has a unique fixed point.
Proof. Pick x0 U. As in the discussion before the lemma, construct the sequence defined
by xn = f ( xn1 ). Each xn U because xn = f ( xn1 ) f (U ) and f (U ) U by assump-
tion. Since f is a contraction on U, xn+1 xn cn x1 x0 , so limn xn+1 xn =
0, and { xn } is a Cauchy sequence. Let x = limn xn . Then
x f ( x ) x xn + f ( x ) f ( xn1 )
x x n + c x x n 1
xn x, so for any > 0 N, such that if n N, then x xn < 1+ c . Then,
x f ( x ) <
for any > 0. Therefore, x = f ( x ).
22