Vous êtes sur la page 1sur 80

Economics 381: Foundations of Economic Analysis

2013 Semester Two


Table of Contents

1. Course outline
2. Additional Course Information
3. Course Notes

Continued


Course Outline 2013
ECON 381: FOUNDATIONS OF ECONOMIC ANALYSIS (15 POINTS)

Semester 2 (1135)


Course Prescription
A grounding in the quantitative methods of economic analysis with application to
commonly used formal models in microeconomics, macroeconomics and econometrics.
The emphasis will be on the unifying structure of the theory with a systematic treatment
of the mathematical techniques involved. Preparation for continuing study in economic
theory and econometrics.


Programme and Course Advice
Prerequisite: ECON 201 Microeconomics

ECON 381 is a prerequisite for entry into the Honours and Master's programmes in
Economics.


Goals of the Course
The goal of the course is to familarise students with the most fundamental theoretical
models and methods employed in economic analysis. It is intended that students emerge
with a solid preparation for further study in economic theory and econometrics.


Learning Outcomes
By the end of this course it is expected that the student will:
1. be familiar with the most commonly used quantitative methods of economic analysis;
2. know the definition of a number of mathematical concepts central to economic
analysis;
3. understand the ideas behind these concepts;
4. be able to apply these methods and concepts in a number of standard economic
settings;
5. be well prepared to undertake graduate study in the core areas of economics.


Content Outline
The course will develop and apply basic techniques for economic analysis. Applications
will be in the context of a variety of economic models, including models from single agent
microeconomic theory, partial and general equilibrium theory, econometrics and
macroeconomics.

The following topics will be covered.
Week 1: Logic and Set Theory, Functions and Binary Relations
Week 2: Introduction to Mathematical Analysis: Sets, Spaces, and Topological Structure
Week 3: Linear Algebra
Week 4: Linear Algebra (cont)
Week 5: Linear Algebra (cont)
Week 6: Convexity
2. ECON 381 Course Outline 2013

Content Outline continued
Week 7: Constrained optimisation: Basic Theory and Lagrangians
Week 8: Macroeconomic Applications
Week 9: Constrained optimisation: Caveats and Extensions
Week 10: Constrained optimisation: The Implicit Function Theorem
Week 11: Constrained optimisation: The Envelope Theorem
Week 12: Microeconomic Applications


Learning and Teaching
This course will be taught in the second semester. There will be 3 hours of lectures per
week (Monday 3-5pm and Friday between 8-10am) plus a one-hour tutorial which
students are expected to attend. Some weeks there will be an optional full class tutorial
in the unused lecture hour. There will be regular homework which students are expected
to complete.


Teaching Staff
Associate Professor John Hillas, Course Coordinator, Room 6111, 6
th
floor, Owen G.
Glenn building, Telephone: 923 7349, email: j.hillas@auckland.ac.nz

Dr Ping Yu, Room 6103, 6
th
floor, Owen G. Glenn building, Telephone: 923 8312,
email: p.yu@auckland.ac.nz


Learning Resources
There is no prescribed text for this course but there is a Coursebook containing notes on
the material. This coursebook will be available for purchase and online. For students who
would like a text to accompany the course, we recommend the following:
C.P. Simon and L. Blume, Mathematics for Economists, 1994, W.W. Norton and Co.
This book covers everything that we do in ECON 381, and lots more besides. It would be a
good investment for students planning to continue to postgraduate study in Economics. It
also contains many useful exercises on the ECON 381 material. A copy is available from
the General Library's Short Loan collection.

The following supplementary references will also be useful, and all are available on Short
Loan.
A.K. Dixit, Optimization in Economic Theory, 2
nd
edition, 1990, Oxford University Press.
G.A. Jehle and P.J. Reny, Advanced Microeconomic Theory, 2
nd
edition, 2001, Addison-
Wesley. (1
st
edition, 1998 is also suitable.)
R. Garnier and J. Taylor, 100% Mathematical Proof, 1996, John Wiley and Sons.


Assessment
Assessment will be based on two components: Coursework worth 40% of the total mark,
(one Test worth 30% and two Assignments each worth 5%); and a Final Examination
worth 60%. Plussage does NOT apply.

Learning
Outcome
Assignment
1
Assignment
2
Test
Final
Examination
1 X X X X
2 X X X X
3 X X X X
4 X X X X
5 X X X X
1
Economics 381: Foundations of Economic Analysis
2013 Semester Two
Additional Course Information
INCLUSIVE LEARNING
Students are urged to discuss privately any impairment-related requirements face-to-face
and/or in written form with the course convenor/lecturer and/or tutor.

STUDENT FEEDBACK
Student feedback is encouraged in this course. During the semester, students may
directly submit their feedback to the lecturer through a face-to-face appointment, or they
may wish to submit feedback through the class representative.
Class representatives
At the beginning of each semester, you will elect a class representative for the
paper
[1]
. The role of the class representative is to gather feedback from students in the
course and bring this to the lecturer and/or the Department. Class representatives email
addresses are posted on Cecil and you are encouraged to contact them with feedback
relating to the course. You are also welcome to talk to the class representatives in
person.
Staff-Student Consultative Committee
Class representatives also submit feedback to the Department of Economics Staff Student
Consultative Committee (SSCC), which meets up to three times per semester to gain
feedback regarding the course. Only class representatives may attend the SSCC
meetings, and they will ask the class for feedback before the SSCC meeting.
Course and teaching evaluations
At the end of the semester, you will have the opportunity to submit an evaluation of the
course in a formative feedback questionnaire.

[1]
An election will not take place if the number of applicants for the class representative positions equals the
number of positions available.
ECON 381 SC Foundations Of Economic Analysis
2012
John Hillas
University of Auckland
Dmitriy Kvasov
University of Adelaide
Contents
Chapter 1. Logic, Sets, Functions, and Spaces 1
1. Logic 1
2. Proofs 3
3. Sets 4
4. Binary Relations 6
5. Functions 7
6. Spaces 9
7. Metric Spaces and Continuous Functions 10
8. Open sets, Compact Sets, and the Weierstrass Theorem 11
9. Sequences and Subsequences 12
10. Linear Spaces 16
Chapter 2. Linear Algebra 17
1. The Space R
n
17
2. Linear Functions from R
n
to R
m
19
3. Matrices and Matrix Algebra 20
4. Matrices as Representations of Linear Functions 21
5. Linear Functions from R
n
to R
n
and Square Matrices 24
6. Inverse Functions and Inverse Matrices 24
7. Changes of Basis 25
8. The Trace and the Determinant 27
9. Calculating and Using Determinants 29
10. Eigenvalues and Eigenvectors 33
Chapter 3. Convex Sets 37
1. Denition and Basic Properties 37
2. Support and Separation 40
Chapter 4. Constrained Optimisation 45
1. Constrained Maximisation 45
2. Applications to Macroeconomic Theory 49
3. Nonlinear Programming 53
4. The Implicit Function Theorem 56
5. The Theorem of the Maximum 58
6. The Envelope Theorem 60
7. Applications to Microeconomic Theory 64
i
CHAPTER 1
Logic, Sets, Functions, and Spaces
1. Logic
All the aspects of logic that we describe in this section are part of what is called
rst order or propositional logic.
We start by supposing that we have a number of atomic statements, which we
denote by lower case letters, p, q, r. Examples of such statements might be
Consumer 1 is a utility maximiser.
The apple is green.
The price of good 3 is 17.
We assume that each atomic statement is either true or false.
Given these atomic statements we can form other statements using logical con-
nectives.
If p is a statement then p, read not p, is the statement that is true precisely
when p is false. If both p and q are statements then (p q), read p and q, is the
statement that is true when both p and q are true and false otherwise. If both p
and q are statements then (p q), read p or q, is the statement that is true when
either p or q are true, that is, the statement that is false only if both p and q are
false.
We could make do with these three symbols, together with brackets to group
symbols and tell us what to do rst. For example we could have the complicated
statement (((p q) (p r)) s). This means that at least one of two statements
is true. The rst is that either both p and q are true or both p and r are true. The
second is that s is not true.
Exercise 1. Think about the meaning of the statement we have just consid-
ered. Can you see a more straightforward statement that would mean the same
thing?
While we dont strictly need any more symbols it is certainly convenient to
have at least a couple more. (In fact we dont actually need all the ones we have
dened. If we have and we can dene in terms of them. Similarly if we have
and we can dene in terms of them.) If both p and q are statements then
(p q), read if p then q or p implies q or p is sucient for q or q is necessary for
p, is the statement that is false when p is true and q is false and is true otherwise.
Many people nd this a bit nonintuitive. In particular, one might wonder about
the truth of this statement when p is false and q is true. A simple (and correct)
answer is that this is a denition. It is simply what we mean by the symbol and
there isnt any point in arguing about denitions. However there is a sense in which
the denition is what is implied by the informal statements. When we say if p
then q we are saying that in any situation or state in which p is true then q is also
true. We are not making any claim about what might or might not be the case
when p is not true. So, in states in which p is not true we make no claim about q
and so our statement is true whether q is true or false. Instead of (p q) we can
1
2 1. LOGIC, SETS, FUNCTIONS, AND SPACES
write (q p). In this case we are most likely to read the statement as q only if p
or q is necessary for p.
Exercise 2. We claimed above that if we have and we can dene in
terms of them and if we have and we can dene in terms of them. Show how
we would do this.
If (p q) and (p q)that is (q p)then we say that p if and only if q or
p is necessary and sucient for q and write (p q).
One powerful method of analysing logical relationships is by means of truth
tables. A truth table lists all possible combinations of the truth values of the
atomic statements and the associated truth values of the compound statements.
If we have two atomic statements then the following table gives the four possible
combinations of truth values.
p q
T T
F T
T F
F F
Now, we can add a column that would, for each combination of truth values of
p and q, give the truth value of p q, just as described above.
p q p q
T T T
F T T
T F F
F F T
Such truth tables allow us to see the logical relationship between various state-
ments. Suppose we have two compound statements A and B and we form a truth
table showing the truth values of A and B for each possible prole of truth values
of the atomic statements that constitute A and B. If in each row in which A is true
B is also true then statement A implies statement B. If statements A and B have
the same truth value is each row then statements A and B are logically equivalent.
For example I claim that the statement p q we have just considered is logically
equivalent to p q. We can see this by adding columns to the truth table we have
just considered. Let me add a column for p and then one for pq. (we only add
the column for p to make it easier).
p q p q p p q
T T T F T
F T T T T
T F F F F
F F T T T
Since the third column and the fth column contain exactly the same truth values
we see that the two statements, p q and p q are indeed logically equivalent.
Exercise 3. Construct the truth table for the statement (p q). Is it
possible to write this statement using fewer logical connectives? Hint: why not
start with just one?
Exercise 4. Prove that the following statements are equivalent:
(i) (p q) ((p) q) and (q p),
(ii) p q and q p.
2. PROOFS 3
In part (ii) the second statement is called the contrapositive of the rst statement.
Often if you are asked to prove that p implies q it will be easier to show the
contrapositive, that is, that not q implies not p.
Exercise 5. Prove that the following statements are equivalent:
(i) (p q) and p q,
(ii) (p q) and p q.
These two equivalences are known as De Morgans Laws.
A tautology is a statement that is necessarily true. For example if the state-
ments A and B are logically equivalent then the statement A B is a tautology.
If A logically implies B then A B is a tautology. We can check whether a com-
pound statement is a tautology by writing a truth table for this statement. If the
statement is a tautology then its truth value should be T in each row of its truth
table.
A contradiction is a statement that is necessarily false, that is, a statement A
such that A is a tautology. Again, we can see whether a statement is a contradic-
tion by writing a truth table for the statement.
2. Proofs
We shall not give a systematic development of this topic. Rather we shall just
collect a number of practical points about reading proofs and of writing your own
simple proofs.
When we are asked to prove something we have a set of assumptions or premises
and are asked to prove a conclusion. Often the division between premises and
conclusion is not obviously given. For example, if we have say two premises P
1
and P
2
and want to prove a conclusion C it would be essentially the same result if
we had premise P
1
and conclusion P
2
C, or indeed no premise (or, say, take as
premise only the basic axioms of mathematics, which are usually left implicit) and
the conclusion (P
1
P
2
) C.
Having said that, let us consider the problem which asks us form the premise
P prove the conclusion C. We could start with the premise P, draw some interme-
diate claims, and eventually conclude that C is true. This is called a direct proof.
Alternatively, we could start by assuming C, draw some intermediate claims, and
eventually conclude P. This is called proving the contrapositive. Finally we could
start by assuming P C and prove that this leads to a logical contradiction. This
is called a proof by contradiction.
2.1. Proof by Mathematical Induction. The previous comments, while
informal, properly belong in a treatment of logic. We now turn to another com-
mon method of proof that essentially depends on the denition of the natural or
counting numbers, that is, the numbers 1, 2, 3, . . . . Often, in mathematics, and
even reasonably often in economics, we want to prove, not a single proposition but
some general class of propositions. For example, rather than wanting to prove that
a particular specication of an economy has a competitive equilibrium we might
want to prove that all economies that satisfy some conditions have equilibria. Often
in such cases some details of size or dimension will not be specied. For example
we may not specify the number of consumers, or the number of commodities. In
these circumstances what we want to prove is a class of propositions P
1
, P
2
, P
3
, . . . .
And it might be quite dicult to directly prove the general case P
n
where n can be
any natural number 1, 2, 3, . . . . It is often easier, sometimes much easier, to prove,
say, P
1
. And knowing P
1
it might be quite easy to then prove P
2
. And knowing
P
2
to prove P
3
, and so on. Thus, if we wanted to prove P
17
, for example, we could
4 1. LOGIC, SETS, FUNCTIONS, AND SPACES
do so, though it might be a bit tedious. We still would not have proved the general
case that we wanted to, though it would seem to be obviously true.
The Principle of Mathematical Induction says that one has a list of propositions,
P
1
, P
2
, P
3
, . . . and P
1
is true and that P
n
implies P
n+1
for any n = 1, 2, 3, . . . then
P
n
is true for any n = 1, 2, 3, . . . .
Exercise 6. To use the Principle of Mathematical Induction we will need to
rst prove that P
1
is true. This is often quite easy. We then need to assume that
P
n
is true and prove that P
n+1
is true. This is often much harder. Sometimes
it is easier if we assume not only that P
n
is true but also that P
1
, . . . , P
n
are all
true. That is, we may sometimes want to use a more general principle that if P
1
is true and that P
1
, . . . , P
n
implies P
n+1
for any n = 1, 2, 3, . . . then P
n
is true for
any n = 1, 2, 3, . . . . Show that this more general principle follows from the stated
Principle of Mathematical Induction. Hint: Let Q
n
= P
1
P
2
P
n
and apply
the Principle of Mathematical Induction to the propositions Q
1
, Q
2
, Q
3
, . . . .
As we said in the previous exercise the diculty in using the Principle of Math-
ematical Induction is often in proving the P
n+1
is true. As noted in that exercise it
sometimes helps if in addition to P
n
we also assume P
1
, . . . , P
n1
and that this is
quite legitimate. If even this does not help it sometimes helps to strengthen what
we are trying to prove. In most proof this will only make the problem harder since
we are trying to prove more. However in using the Principle of Mathematical In-
duction when we strengthen what we are trying to prove we not only strengthen the
conclusion we also strengthen, in the place where we usually have to work hardest,
the premises. Suppose that we are trying to prove P
1
, P
2
, P
3
, . . . and that for each
n the proposition Q
n
is stronger than P
n
. It might well be that it is easier to prove,
using the Principle of Mathematical Induction, Q
1
, Q
2
, Q
3
, . . . then P
1
, P
2
, P
3
, . . . .
We would rst need to prove Q
1
. This might be a little harder, but this is, as we
have said, often not where the diculty lies. Of course, Q
1
should be true. We are
not claiming that any strengthening of the Ps will make things easier but just that
it is sometimes useful to consider strengthening the propositions if you are having
diculty with the original propositions.
The next step is proving that Q
n
implies Q
n+1
for any n = 1, 2, 3, . . . . Now its
certainly easier to prove P
n+1
than Q
n+1
from the same set of premises. But we
dont have the same set of premises. In our original attempt we had the premise
P
n
. Now we have the stronger premise Q
n
and this might possibly make our life
easier. In order to see what the right form for Q is it is often useful to prove P
1
and then use P
1
to prove P
2
and use P
2
to prove P
3
. If you examine exactly what
you are doing you may nd that there are things that are true in the case of n = 1
and n = 2 that are not explicitly part of P
1
and P
2
and that you are using.
We use the Principle of Mathematical Induction in at least one place in what
follows, and you will nd many uses of it in articles in economics.
3. Sets
Set theory was developed in the second half of the 19th century and is at the
very foundation of modern mathematics. But we shall not be concerned here with
the development of the theory. Rather we shall only give the basic language of set
theory and outline some of the very basic operations on sets.
We start by dening a set to be a collection of objects or elements. We will
usually denote sets by capital letters and their elements by lower case letters. If
the element a is in the set A we write a A. If every element of the set B is also
in the set A we call B a subset of the set A and write B A. We shall also say
the A contains B. If A and B have exactly the same elements then we say they
3. SETS 5
are equal or identical. Alternatively we could say A = B if and only if A B and
B A. If B A and B = A then we say that B is a proper subset of A or that A
strictly contains B.
Exercise 7. How many subsets a set with N elements has?
In order to avoid the paradoxes such as the one referred to in the rst paragraph
we shall always assume that in whatever situation we are discussing there is some
given set U called the universal set which contains all of the sets with which we
shall deal.
We customarily enclose our specication of a set by braces. In order to specify
a set one may simply list the elements. For example to specify the set D which
contains the numbers 1,2, and 3 we may write D = {1, 2, 3}. Alternatively we may
dene the set by specifying a property that identies the elements. For example
we may specify the same set D by D = {x | x is an integer and 0 < x < 4}. Notice
that this second method is more powerful. We could not, for example, list all
the integers. (Since there are an innite number of them we would die before we
nished.)
For any two sets A and B we dene the union of A and B to be the set which
contains exactly all of the elements of A and all the elements of B. We denote the
union of A and B by A B. Similarly we dene the intersection of A and B to
be that set which contains exactly those elements which are in both A and B. We
denote the intersection of A and B by A B. Thus we have
A B = {x | x A or x B}
A B = {x | x A and x B}.
Exercise 8. The oldest mathematician among chess players and the oldest
chess player among mathematicians is it the same person or (possibly) dierent
ones?
Exercise 9. The best mathematician among chess players and the best chess
player among mathematicians is it the same person or (possibly) dierent ones?
Exercise 10. Every tenth mathematician is a chess player and every fourth
chess player is a mathematician. Are there more mathematicians or chess players
and by how many times?
Exercise 11. Prove the distributive laws for operations of union and intersec-
tion.
(i) (A B) C = (A C) (B C)
(ii) (A B) C = (A C) (B C)
Just as the number zero is extremely useful so the concept of a set that has
no elements is extremely useful also. This set we call the empty set or the null set
and denote by . To see one use of the empty set notice that having such a concept
allows the intersection of two sets be well dened whether or not the sets have any
elements in common.
We also introduce the concept of a Cartesian product. If we have two sets, say
A and B, the Cartesian product, A B, is the set of all ordered pairs, (a, b) such
that a is an element of A and b is an element of B. Symbolically we write
A B = {(a, b) | a A and b B}.
6 1. LOGIC, SETS, FUNCTIONS, AND SPACES
4. Binary Relations
There are a number of ways of formulating the notion of a binary relation. We
shall pursue one, dening a binary relation on a set X simply as a subset of XX,
the Cartesian product of X with itself.
Definition 1. A binary relation R on the set X is a subset of X X. If the
point (x, y) R we shall often write xRy instead of (x, y) R.
Since we have already dened the notions of Cartesian product and subset,
there is really nothing new here. However the structure and properties of binary
relations that we shall now study is motivated by the informal notion of a relation
between the elements of X.
Example 1. Suppose that X is a set of boys and girls and the relation xSy is
x is a sister of y.
Example 2. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.
There are binary relations >, , and =.
Example 3. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.
The relations R, P, and I are dened by
xRy if and only if x + 1 y,
xPy if and only if x > y + 1, and
xIy if and only if 1 x y 1.
Definition 2. The following properties of binary relations have been dened
and found to be useful.
(BR1) Reexivity: For all x in X xRx.
(BR2) Irreexivity: For all x in X not xRx.
(BR3) Completeness: For all x and y in X either xRy or yRx (or both).
1
(BR4) Transitivity: For all x, y, and z in X if xRy and yRz then xRz.
(BR5) Negative Transitivity: For all x, y, and z in X if xRy then either
xRz or zRy (or both).
(BR6) Symmetry: For all x and y in X if xRy then yRx.
(BR7) Anti-Symmetry: For all x and y in X if xRy and yRx then x = y.
(BR8) Asymmetry: For all x and y in X if xRy then not yRx.
Exercise 12. Show that completeness implies reexivity, that asymmetry im-
plies anti-symmetry, and that asymmetry implies irreexivity.
Exercise 13. Which properties does the relation described in Example 1 sat-
isfy?
Exercise 14. Which properties do the relations described in Example 2 sat-
isfy?
Exercise 15. Which properties do the relations described in Example 3 sat-
isfy?
We now dene a few particularly important classes of binary relations.
Definition 3. A weak order is a binary relation that satises transitivity and
completeness.
Definition 4. A strict partial order is a binary relation that satises transi-
tivity and asymmetry.
1
We shall always implicitly include or both when we say either. . . or.
5. FUNCTIONS 7
Definition 5. An equivalence is a binary relation that satises transitivity
and symmetry.
You have almost certainly already met examples of such binary relations in
your study of Economics. We normally assume that weak preference, strict pref-
erence, and indierence of a consumer are weak orders, strict partial orders, and
equivalences, though we actually typically assume a little more about the strict
preference.
The following construction is also motivated by the idea of preference. Let
us consider some binary relation R which we shall informally think of as a weak
preference relation, though we shall not, for the moment, make any assumptions
about the properties of R. Consider the relations P dened by xPy if and only if
xRy and not yRx and I dened by xRy if and only if xRy and yRx.
Exercise 16. Show that if R is a weak order then P is a strict partial order
and I is an equivalence.
We could also think of starting with a strict preference P and dening the weak
preference R in terms of P. We could do so either by dening R as xRy if and only
if not yPx or by dening R as xRy if and only if either xPy or not yPx.
Exercise 17. Show that these two denitions of R coincide if P is asymmetric.
Exercise 18. Show by example that P may be a strict partial order (so, by
the previous result, the two denitions of R coincide) but R not a weak order.
[Hint: If you cannot think of another example consider the binary relations dened
in Example 3.]
Exercise 19. Show that if P is asymmetric and negatively transitive then
(i) P is transitive (and hence a strict partial order), and
(ii) R is a weak order.
5. Functions
Let X and Y be two sets. A function (or a mapping) f from the set X to the
set Y is a rule that assigns to each x in X a unique element in Y , denoted by f(x).
The notation
f : X Y.
is standard. The set X is called the domain of f and the set Y is called the
codomain of f. The set of all values taken by f, i.e. the set
{y Y | there exists x in X such that y = f(x)}
is called the range of f. The range of a function need not coincide with its codomain
Y .
There are several useful ways of visualising functions. A function can be thought
of as a machine that operates on elements of the set X and transforms an input
x into a unique output f(x). Note that the machine is not required to produce
dierent outputs from dierent inputs. This analogy helps to distinguish between
the function itself, f, and its particular value, f(x). The former is the machine,
the latter is the output
2
. One of the reasons for this confusion is that in practice,
to avoid being verbose, people often say things like consider a function U(x, y) =
x

instead of saying consider a function dened for every pair (x, y) in R


2
by
the equation U(x, y) = x

.
2
Mathematician Robert Bartle put it as follows. Only a fool would confuse sausage-grinder
with a sausage; however, enough people have confused functions with their values...
8 1. LOGIC, SETS, FUNCTIONS, AND SPACES
A function can also be thought of as a transformation, or a mapping, of the set
X into the set Y . In line with this interpretation is the common terminology, it is
said that f(x) is the image of x under the function f. Again, it is important to
remember that there may be points of Y which are the images of no point of X and
that there may be dierent points of X which have the same images in Y . What is
absolutely prohibited, however, is for a point from X to have several images in Y .
The part of denition of the function is the specication of its domain. However,
in applications, functions are quite often dened as an algebraic formula, without
explicit specication of its domain. For example, a function may be dened as
f(x) = sin x + 145x
2
.
The function f is then the rule that assigns the value sin x+145x
2
to each value of
x. The convention in such cases is that the domain of f is the set of all values of x
for which the formula gives a unique value. Thus, if you come, for instance, across
the function f(x) = 1/x you should assume that its domain is (, 0) (0, ),
unless specied otherwise.
For any subset A of X, the subset f(A) of Y such that y = f(x) for some x in
X is called the image of A by f, that is,
f(A) = {y Y | there exists x in A such that y = f(x)}.
Thus, the the range of f can be written as f(X). Similarly, one can dene the
inverse image. For any subset B of Y , the inverse image f
1
(B) of B is the set of
x in X such that f(x) is in B, that is,
f
1
(B) = {x X | f(x) B}.
A function f is called a function onto Y (or surjection) if the range of f is Y ,
i.e., if for every y Y there is (at least) one x X such that y = f(x). In other
words, each element of Y is the image of (at least) one element of X. A function f is
called one-to-one (or injection) if f(x
1
) = f(x
2
) implies x
1
= x
2
, that is, for every
element y of f(X) there is a unique element x of X such that y = f(x). In other
words, one-to-one function maps dierent elements of X into dierent elements of
Y . When a function f : X Y is both onto and one-to-one it is called a bijection.
Exercise 20. Suppose that a set X has m elements and a set Y has n m
elements. How many dierent functions are there from X to Y ? from Y to X?
How many of them surjective? How many of them injective? How many of them
bijective?
Exercise 21. Find a function f : N N which is
(i) surjective but not injective,
(ii) injective but not surjective,
(iii) neither surjective nor injective,
(iv) bijective
If function f is a bijection then it is possible to dene a function g : Y X
such that g(y) = x where x = f(y). Thus, to each element y of Y is assigned an
element x in X whose image under f is y. Since f is onto, g is dened for every y
of Y and since f is one-to-one g(y) is unique. The function g is called the inverse of
f and is usually written as f
1
. In that case, however, its not immediately clear
what f
1
(x) means. Is it the inverse image of x under f or the image of x under
f
1
? Happily enough they are the same if f
1
exists.
Exercise 22. Prove that when a function f
1
exists it is both onto and one-
to-one and that the inverse of f
1
is the function f itself.
6. SPACES 9
If f : X Y and g : Y Z, then the function h : X Z, dened as
h(x) = g(f(x)), is called the composition of g with f and denoted by g f. Note
that even if f g is well-dened it is, usually, dierent from g f.
Exercise 23. Let f : X Y . Prove that there exists a surjection g : X A
where A X and a injection h : A Y such that f = h g. In other words, prove
that any function can be written as a composition of a surjection and an injection.
The set G XY of ordered pairs (x, f(x)) is called the graph of the function
f
3
. Of course, the fact that something is called a graph does not necessarily mean
that it can be drawn.
6. Spaces
Sets are reasonably interesting mathematical objects to study. But to make
them even more interesting (and useful for applications) sets are usually endowed
with some additional properties, or structures. These new objects are called spaces.
The structures are often modeled after the familiar properties of space we live in and
reect (in axiomatic form) such notions as order, distance, addition, multiplication,
and so on.
Probably one of the most intuitive spaces is the space of the real numbers, R.
We will briey look at the axiomatic way of describing some of its properties.
Given the set of real numbers R, the operation of addition is the function
+ : RR R that maps any two elements x and y in R to an element denoted by
x+y and called the sum of x and y. This addition operation satises the following
axioms. For all real numbers x, y, and z
A1: x +y = y +x.
A2: (x +y) +z = x + (y +z).
A3: There exists an element, denoted by 0, such that x + 0 = x.
A4: For each x there exists an element, denoted by x, such that
x + (x) = 0.
All the remaining properties of the addition can be proven using these axioms.
Note also that we can dene another operation x y as x + (y) and call it
subtraction.
Exercise 24. Prove that the axioms for addition imply the following state-
ments.
(i) The element 0 is unique.
(ii) If x +y = x +z then y = z (a cancelation law).
(iii) (x) = x.
The operation of multiplication can be axiomatised in a similar way. Given the
set of real numbers, R, the operation of multiplication is the function : RR R
that maps any two elements x and y in R to an element denoted by x y and called
the product of x and y. The multiplication satises the following axioms for all real
numbers x, y, and z.
A5: x y = y x.
A6: (x y) z = x (y z).
A7: There exist an element, denoted by 1, such that x 1 = x.
A8: For each x = 0 there exist an element, denoted by x
1
, such that
x x
1
= 1.
3
Some people like the idea of the graph of a function so much that they dene a function to
be its graph.
10 1. LOGIC, SETS, FUNCTIONS, AND SPACES
One more axiom (a distributive law) brings these two operations, addition and
multiplication
4
, together.
A9: x(y +z) = xy +xz for all x, y, and z in R.
Another structure possessed by the real numbers has to do with the fact that
the real numbers are ordered. The notion of x less than y can be axiomatised as
follows. For any two distinct elements x and y either x < y or y < x and, in
addition, if x < y and y < z then x < z.
Another example of a space (very important and useful one) is ndimensional
real space
5
. Given the natural number n, dene R
n
to be the set of all possi-
ble ordered ntuples of n real numbers, with generic element denoted by x =
(x
1
, . . . , x
n
). Thus, the space R
n
is the nfold Cartesian product of the set R with
itself. Real numbers x
1
, . . . , x
n
are called coordinates of the vector x. Two vectors
x and y are equal if and only if x
1
= y
1
, . . . , x
n
= y
n
. The operation of addition of
two vectors is dened as
x +y = (x
1
+y
1
, . . . , x
n
+y
n
).
Exercise 25. Prove that the addition of vectors in R
n
satises the axioms of
addition.
The role of multiplication in this space is player by the operation of multipli-
cation by real number dened for all x in R
n
and all in R by
x = (x
1
, . . . , x
n
).
Exercise 26. Prove that the multiplication by real number satises a distribu-
tive law.
7. Metric Spaces and Continuous Functions
The notion of metric is the generalisation of the notion of distance between two
real numbers.
Let X be a set and d : XX R a function. The function d is called a metric
if it satises the following properties for all x, y, and z in X.
1. d(x, y) 0 and d(x, y) = 0 if and only if x = y,
2. d(x, y) = d(y, x),
3. d(x, y) d(x, z) +d(z, y).
The set X together with the function d is called a metric space, elements of X
are usually called points, and the number d(x, y) is called the distance between x
and y. The last property of a metric is called triangle inequality.
Exercise 27. Let X be a non-empty set and d : X X R be the function
that satises the following two properties for all x, y, and z in X.
(i) d(x, y) = 0 if and only if x = y,
(ii) d(x, y) d(x, z) +d(y, z).
Prove that d is a metric.
Exercise 28. Prove that d(x, y)+d(w, z) d(x, w)+d(x, z)+d(y, w)+d(y, z)
for all x, y, w, and z in X, where d is some metric on X.
An obvious example of a metric space is the the set of real numbers, R, together
with the usual distance, d(x, y) = |x y|. Another example is the ndimensional
Euclidean space R
n
with metric
d(x, y) =
_
(x
1
y
1
)
2
+ + (x
n
y
n
)
2
.
4
From now on, to go easy on notation we will follow the standard convention not to write
the symbol for multiplication, that is to write xy instead of x y, etc.
5
We havent dened what the word dimension means yet, so just treat it as a (fancy) name.
8. OPEN SETS, COMPACT SETS, AND THE WEIERSTRASS THEOREM 11
Note that the same set can be endowed with the dierent metrics thus resulting
in the dierent metric spaces. For example, the set of all ntuples of real numbers
can be made into metric space by use of the (non-Euclidean) metric
d
T
(x, y) = |x
1
y
1
| + +|x
n
y
n
|,
which is dierent from metric space R
n
. This metric is sometimes called the Man-
hattan (or taxicab) metric. Another curious metric is the so-called French railroad
metric, dened by
d
F
(x, y) =
_
0 if x = y
d(x, P) +d(y, P) if x = y
where P is the particular point of R
n
(called Paris) and function d is the Euclidean
distance.
Exercise 29. Prove that the French railroad metric d
F
is a metric.
Exercise 30. Let X be a non-empty set and d : X X R be the function
dened by
d(x, y) =
_
1 if x = y
0 if x = y
Prove that d is a metric. (This metric is called the discrete metric.)
Using the notion of metric it is possible to generalise the idea of continuous
function.
Suppose (X, d
X
) and (Y, d
Y
) are metric spaces, x
0
X, and f : X Y is a
function. Then f is continuous at x
0
if for every > 0 there exists a > 0 such
that
d
Y
(f(x
0
), f(x)) <
for all points x X for which d
X
(x
0
, x) < .
The function f is continuous on X if f is continuous at every point of X.
Lets prove that function f(x) = x is continuous on R using the above denition.
For all x
0
R, we have |f(x
0
) f(x)| = |x
0
x| < as long as |x
0
x| < = .
That is, given any > 0 we are always able to nd a , namely = , such that
all points which are closer to x
0
than will have images which are closer to f(x
0
)
than .
Exercise 31. Let f : R R be the function dened by
f(x) =
_
1/x if x = 0
0 if x = 0
Prove that f is continuous at every point of R, with the exception of 0.
8. Open sets, Compact Sets, and the Weierstrass Theorem
Let x be a point in a metric space and r > 0. The open ball B(x, r) of radius
r centred at x is the set of all y X such that d(x, y) < r. Thus, the open ball is
the set of all points whose distance from the centre is strictly less than r. The ball
is closed if the inequality is weak, d(x, y) r.
A set S in a metric space is open if for all x S there exists r R, r > 0 such
that B(x, r) S. A set S is closed if its complement
S
C
= {x X :| x / S}
is open.
Exercise 32. Prove that an open ball is an open set.
Exercise 33. Prove that the intersection of any nite number of open sets is
the open set.
12 1. LOGIC, SETS, FUNCTIONS, AND SPACES
A set S is bounded if there exists a closed ball of nite radius that contains it.
Formally, S is bounded if there exists a closed ball B(x, r) such that S B(x, r).
Exercise 34. Prove that the set S is bounded if and only if there a exists a
real number p > 0 such that d(x, x

) p for all x and x

in S.
Exercise 35. Prove that the union of two bounded sets is a bounded set.
A collection (possibly innite) of open sets U
1
, U
2
, . . . in a metric space is an
open cover of the set S if S is contained in its union.
A set S is compact if every open cover of S has a nite subcover. That is, from
any open cover can select a nite number of sets U
i
that still cover S.
Note that the denition does not say that a set is compact if there is a nite
open cover. That wouldnt be a good denition as you can cover any set with the
whole space, which is just one open set.
Lets see how to use this denition to show that something is not compact.
Consider the set (0, 1) R. To prove that it is not compact we need to nd an
open cover of (0, 1) from which we cannot select a nite cover. The collection of
open intervals (1/n, 1) for all integers n 2 is an open cover of (0, 1), because for
any point x (0, 1) it is always able to nd an integer n such that n > 1/x, thus
x (1/n, 1). But, no nite subcover will do. Let (1/N, 1) be the maximal interval
in a candidate subcover then it is always possible to nd a point x (0, 1) such
that N < 1/x.
While this denition of compactness is quite useful for nding out when the set
under question is not compact it is less useful for verifying that a set is indeed com-
pact. Much more convenient characterisation of compact sets in nite-dimensional
Euclidean space, R
n
, is given by the following theorem.
Theorem 1. Any closed and bounded subset of R
n
is compact.
But why are we interested in compactness at all? Because of the following ex-
tremely important theorem the rst version of which was proved by Carl Weierstrass
around 1860.
Theorem 2. Let S be a compact set in a metric space and f : S R be a
continuous function. Then function f attains its maximum and minimum in S.
And why is this theorem important for us? Because many economic problems
are concerned with nding a maximal (or a minimal) value of a function on some set.
The Weierstrass theorem provides conditions under which such search is meaningful.
This theorem and its implications will be much dwelt upon later in the notes, so
we just give here one example. The consumer utility maximisation problem is the
problem of nding the maximum of utility function subject to the budget constraint.
According to Weierstrass theorem, this problem has a solution if utility function is
continuous and the budget set is compact.
9. Sequences and Subsequences
Let us consider again some metric space (X, d). An innite sequence of points
in (X, d) is simply a list
x
1
, x
2
, x
3
, . . . ,
where . . . indicates that the list continues forever.
We can be a bit more formal about this. We rst consider the set of natural
numbers (or counting numbers) 1, 2, 3, . . . , which we denote N. We can now dene
an innite sequence in the following way.
Definition 6. An innite sequence of elements of X is a function from N to
X.
9. SEQUENCES AND SUBSEQUENCES 13
Notation. If we look at the previous denition we see that we might have
a sequence s : N X which would dene s(1), s(2), s(3), . . . or in other words
would dene s(n) for any natural number n. Typically when we are referring to
sequences we use subscripts (or sometimes superscripts) instead of parentheses and
write s
1
, s
2
, s
3
, . . . and s
n
instead of s(1), s(2), s(3), . . . and s(n). Also rather than
saying that s : N X is a sequence we say that {s
n
} is a sequence or even that
{s
n
}

n=1
is a sequence.
Lets now examine a few examples.
Example 4. Suppose that (X, d) is R the real numbers with the usual metric
d(, x, y) = |x y|. Then {n}, {

n}, and {1/n} are sequences.


Example 5. Again, suppose that (X, d) is R the real numbers with the usual
metric d(x, y) = |x y|. Consider the sequence {x
n
} where
x
n
=
_
1 if n is odd
0 if n is even
We see that {n} and {

n} get arbitrary large as n gets larger, while in the last


example x
n
bounces back and forth between 0 and 1 as n gets larger. However for
{1/n} the element of the sequence gets closer and closer to 0 (and indeed arbitrarily
close to 0). We say, in this case, that the sequence converges to zero or that the
sequence has limit 0. This is a particularly important concept and so we shall give
a formal denition.
Definition 7. Let {x
n
} be a sequence of points in (X, d). We say that the
sequence converges to x
0
X if for any > 0 there is N N such that if n > N
then d(x
n
, x
0
) < .
Informally we can describe this by saying that if n is large then the distance
from x
n
to x
0
is small.
If the sequence {x
n
} converges to x
0
, then we often write x
n
x
0
as n
or lim
n
x
n
= x
0
.
Exercise 36. Show that if the sequence {x
n
} converges to x
0
then it does not
converge to any other value unequal to x
0
. Another way of saying this is that if
the sequence converges then its limit is unique.
We have now seen a number of examples of sequences. In some the sequence
runs o to innity; in others it bounces around; while in others it converges to
a limit. Could a sequence do anything else? Could a sequence, for example, settle
down each element getting closer and closer to all future elements in the sequence
but not converging to any particular limit? In fact, depending on what the space
X is this is indeed possible.
First let us recall the notion of a rational number. A rational number is a
number that can be expressed as the ratio of two integers, that is r is rational if
r = a/b with a and b integers and b = 0. We usually denote the set of all rational
numbers Q (since we have already used R for the real numbers). We now consider
and example in which the underlying space X is Q. Consider the sequence of
rational numbers dened in the following way
x
1
= 1
x
n+1
=
x
n
+ 2
x
n
+ 1
.
This kind of denition is called a recursive denition. Rather than writing, as a
function of n, what x
n
is we write what x
1
is and then what x
n+1
is as a function
14 1. LOGIC, SETS, FUNCTIONS, AND SPACES
of what x
n
is. We can obviously nd any element of the sequence that we need, as
long as we sequentially calculate each previous element. In our case wed have
x
1
= 1
x
2
=
1 + 2
1 + 1
=
3
2
= 1.5
x
3
=
3
2
+ 2
3
2
+ 1
=
7
5
= 1.4
x
4
=
7
5
+ 2
7
5
+ 1
=
17
12
1.416667
x
5
=
17
12
+ 2
17
12
+ 1
=
41
29
1.413793
x
6
=
41
29
+ 2
41
29
+ 1
=
99
70
1.414286
.
.
.
We see that the sequence goes up and down but that it seems to be converg-
ing. What is it converging to? Lets suppose that its converging to some value x
0
.
Recall that
x
n+1
=
x
n
+ 2
x
n
+ 1
.
Well see later that if f is a continuous function then lim
n
f(x
n
) = f(lim
n
x
n
).
In this case that means that
x
0
= lim
n
x
n+1
= lim
n
x
n
+ 2
x
n
+ 1
=
x
0
+ 2
x
0
+ 1
.
Thus we have
x
0
=
x
0
+ 2
x
0
+ 1
and if we solve this we obtain x
0
=

2. Clearly if x
n
> 0 then x
n+1
> 0 so
our sequence cant be converging to

2 so we must have x
0
=

2. But

2 is
not in Q. Thus we have a sequence of elements in Q that are getting very close to
each other but are not converging to any element of Q. (Of course the sequence is
converging to a point in R. In fact one construction of the real number system is
in terms of such sequences in Q.
Definition 8. Let {x
n
} be a sequence of points in (X, d). We say that the
sequence is a Cauchy sequence if for any > 0 there is N N such that if n, m > N
then d(x
n
, x
m
) < .
Exercise 37. Show that if {x
n
} converges then {x
n
} is a Cauchy sequence.
A metric space (X, d) in which every Cauchy sequence converges to a limit in
X is called a complete metric space. The space of real numbers R is a complete
metric space, while the space of rationals Q is not.
Exercise 38. Is N the space of natural or counting numbers with metric d
given by d(x, y) = |x y| a complete metric space?
In Section 7 we dened the notion of a function being continuous at a point.
It is possible to give that denition in terms of sequences.
9. SEQUENCES AND SUBSEQUENCES 15
Definition 9. Suppose (X, d
X
) and (Y, d
Y
) are metric spaces, x
0
X, and
f : X Y is a function. Then f is continuous at x
0
if for every sequence {x
n
} that
converges to x
0
in (X, d
X
) the sequence {f(x
n
)} converges to f(x
0
) in (Y, d
Y
).
Exercise 39. Show that the function f(x) = (x + 2)/(x + 1) is continuous at
any point x = 1. Show that this means that if x
n
x
0
as n then
lim
n
x
n
+ 2
x
n
+ 1
=
x
0
+ 2
x
0
+ 1
.
We can also dene the concept of a closed set (and hence the concepts of open
sets and compact sets) in terms of sequences.
Definition 10. Let (X, d) be a metric space. A set S X is closed if for any
convergent sequence {x
n
} such that x
n
S for all n then lim
n
x
n
S. A set is
open if its complement is closed.
Given a sequence {x
n
} we can dene a new sequence by taking only some of
the elements of the original sequence. In the example we considered earlier in which
x
n
was 1 if n was odd and 0 if n was even we could take only the odd n and thus
obtain a sequence that did converge. The new sequence is called a subsequence of
the old sequence.
Definition 11. Let {x
n
} be some sequence in (X, d). Let {n
j
}

j=1
be a
sequence of natural numbers such that for each j we have n
j
< n
j+1
, that is
n
1
< n
2
< n
3
< . . . . The sequence {x
nj
}

j=1
is called a subsequence of the original
sequence.
The notion of a subsequence is often useful. We often use it in the way that
we briey referred to above. We initially have a sequence that may not converge,
but we are able to take a subsequence that does converge. Such a subsequence is
called a convergent subsequence.
Definition 12. A subset of a metric space with the property that every se-
quence in the subset has a convergent subsequence is called sequentially compact.
Theorem 3. In any metric space any compact set is sequentially compact.
If we restrict attention to nite dimensional Euclidian spaces the situation is
even better behaved.
Theorem 4. Any subset of R
n
is sequentially compact if and only if it is
compact.
Exercise 40. Verify the following limits.
(i) lim
n
n
n + 1
= 1
(ii) lim
n
n + 3
n
2
+ 1
= 0
(iii) lim
n

n + 1

n = 0
(iv) lim
n
n

a
n
+b
n
= max{a, b}
Exercise 41. Consider a sequence {x
n
} in R. What can you say about the
sequence if it converges and for each n x
n
is an integer.
Exercise 42. Consider the sequence
1
2
,
1
3
,
2
3
,
1
4
,
2
4
,
3
4
,
1
5
,
2
5
,
3
5
,
4
5
,
1
6
, . . . .
For which values z R is there a subsequence converging to z?
16 1. LOGIC, SETS, FUNCTIONS, AND SPACES
Exercise 43. Prove that if a subsequence of a Cauchy sequence converges to
a limit z then so does the original Cauchy sequence.
Exercise 44. Prove that any subsequence of a convergent sequence converges.
Finally one somewhat less trivial exercise.
Exercise 45. Prove that if lim
n
x
n
= z then
lim
n
x
1
+ +x
n
n
= z
10. Linear Spaces
The notion of linear space is the axiomatic way of looking at the familiar linear
operations: addition and multiplication. A trivial example of a linear space is the
set of real numbers, R.
What is the operation of addition? The one way of answering the question is
saying that the operation of addition is just the list of its properties. So, we will
dene the addition of elements from some set X as the operation that satises the
following four axioms.
A1: x +y = y +x for all x and y in X.
A2: x + (y +z) = (x +y) +z, for all x, y, and z in X.
A3: There exists an element, denoted by 0, such that x + 0 = x for all x in
X.
A4: For every x in X there exist an element y in X, called inverse of x, such
that x +y = 0.
And, to make things more interesting we will also introduce the operation of
multiplication by number by adding two more axioms.
A5: 1x = x for all x in X.
A6: (x) = ()x for all x in X and for all and in R.
Finally, two more axioms relating addition and multiplication.
A7: (x +y) = x +y for all x and y in X and for all in R.
A8: ( +)x = x +x for all x in X and for all and in R.
Elements x, y, . . . , w are linearly dependent if there exist real numbers , , . . . , ,
not all of them equal to zero, such that
x +y + +z = 0.
Otherwise, the elements x, y, . . . , w are linearly independent.
If in a space L it is possible to nd n linearly independent elements, but any
n +1 elements are linearly dependent then we say that the space L has dimension
n.
Nonempty subset L

of a linear space L is called a linear subspace if L

forms
a linear space in itself. In other words, L

is a linear subspace of L if for any x and


y in L and all and in R
x +y L

.
CHAPTER 2
Linear Algebra
1. The Space R
n
In the previous chapter we introduced the concept of a linear space or a vector
space. We shall now examine in some detail one example of such a space. This is
the space of all ordered n-tuples (x
1
, x
2
, . . . , x
n
) where each x
i
is a real number.
We call this space n-dimensional real space and denote it R
n
.
Remember from the previous chapter that to dene a vector space we not only
need to dene the points in that space but also to dene how we add such points
and how we multiple such points by scalars. In the case of R
n
we do this element
by element in the n-tuple or vector. That is,
(x
1
, x
2
, . . . , x
n
) + (y
1
, y
2
, . . . , y
n
) = (x
1
+y
1
, x
2
+y
2
, . . . , x
n
+y
n
)
and
(x
1
, x
2
, . . . , x
n
) = (x
1
, x
2
, . . . , x
n
).
Let us consider the case that n = 2, that is, the case of R
2
. In this case we can
visualise the space as in the following diagram. The vector (x
1
, x
2
) is represented
by the point that is x
1
units along from the point (0, 0) in the horizontal direction
and x
2
units up from (0, 0) in the vertical direction.
-
6
x
1
x
2
q (1, 2)
1
2
Figure 1
Let us for the moment continue our discussion in R
2
. Notice that we are
implicitly writing a vector (x
1
, x
2
) as a sum x
1
v
1
+ x
2
v
2
where v
1
is the
unit vector in the rst direction and v
2
is the unit vector in the second direction.
Suppose that instead we considered the vectors u
1
= (2, 1) = 2 v
1
+ 1 v
2
and
17
18 2. LINEAR ALGEBRA
u
2
= (1, 2) = 1 v
1
+ 2 v
2
. We could have written any vector (x
1
, x
2
) instead
as z
1
u
1
+ z
2
u
2
where z
1
= (2x
1
x
2
)/3 and z
2
= (2x
2
x
1
)/3. That is, for
any vector in R
2
we can uniquely write that vector in terms of u
1
and u
2
. Is there
anything that is special about u
1
and u
2
that allows us to make this claim? There
must be since we can easily nd other vectors for which this would not have been
true. (For example, (1, 2) and (2, 4).)
The property of the pair of vectors u
1
and u
2
is that they are independent. That
is, we cannot write either as a multiple of the other. More generally in n dimensions
we would say that we cannot write any of the vectors as a linear combination of
the others, or equivalently as the following denition.
Definition 13. The vectors x
1
, . . . , x
k
all in R
n
are linearly independent if it
is not possible to nd scalars
1
, . . . ,
k
not all zero such that

1
x
1
+ +
k
x
k
= 0.
Notice that we do not as a matter of denition require that k = n or even that
k n. We state as a result that if k > n then the collection x
1
, . . . , x
k
cannot
be linearly independent. (In a real maths course we would, of course, have proved
this.)
Comment 1. If you examine the denition above you will notice that there
is nowhere that we actually need to assume that our vectors are in R
n
. We can
in fact apply the same denition of linear independence to any vector space. This
allows us to dene the concept of the dimension of an arbitrary vector space as the
maximal number of linearly independent vectors in that space. In the case of R
n
we obtain that the dimension is in fact n.
Exercise 46. Suppose that x
1
, . . . , x
k
all in R
n
are linearly independent and
that the vector y in R
n
is equal to
1
x
1
+ +
k
x
k
. Show that this is the only
way that y can be expressed as a linear combination of the x
i
s. (That is show that
if y =
1
x
1
+ +
k
x
k
then
1
=
1
, . . . ,
k
=
k
.)
The set of all vectors that can be written as a linear combination of the vectors
x
1
, . . . , x
k
is called the span of those vectors. If x
1
, . . . , x
k
are linearly independent
and if the span of x
1
, . . . , x
k
is all of R
n
then the collection { x
1
, . . . , x
k
} is called
a basis for R
n
. (Of course, in this case we must have k = n.) Any vector in R
n
can be uniquely represented as a linear combination of the vectors x
1
, . . . , x
k
. We
shall later see that it can sometimes be useful to choose a particular basis in which
to represent the vectors with which we deal.
It may be that we have a collection of vectors { x
1
, . . . , x
k
} whose span is not
all of R
n
. In this case we call the span of { x
1
, . . . , x
k
} a linear subspace of R
n
.
Alternatively we say that X R
n
is a linear subspace of R
n
if X is closed under
vector addition and scalar multiplication. That is, if for all x, y X the vector
x +y is also in X and for all x X and R the vector x is in X. If the span
of x
1
, . . . , x
k
is X and if x
1
, . . . , x
k
are linearly independent then we say that these
vectors are a basis for the linear subspace X. In this case the dimension of the
linear subspace X is k. In general the dimension of the span of x
1
, . . . , x
k
is equal
to the maximum number of linearly independent vectors in x
1
, . . . , x
k
.
Finally, we comment that R
n
is a metric space with metric d : R
2n
R
+
dened by
d((x
1
, . . . , x
n
), (y
1
, . . . , y
n
)) =
_
(x
1
y
1
)
2
+ + (x
n
y
n
)
2
.
There are many other metrics we could dene on this space but this is the standard
one.
2. LINEAR FUNCTIONS FROM R
n
TO R
m
19
2. Linear Functions from R
n
to R
m
In the previous section we introduced the space R
n
. Here we shall discuss
functions from one such space to another (possibly of dierent dimension). The
concept of continuity that we introduced for metric spaces is immediately applicable
here. We shall be mainly concerned here with an even narrower class of functions,
namely, the linear functions.
Definition 14. A function f : R
n
R
m
is said to be a linear function if it
satises the following two properties.
(1) f(x +y) = f(x) +f(y) for all x, y R
n
, and
(2) f(x) = f(x) for all x R
n
and R.
Comment 2. When considering functions of a single real variable, that is,
functions from R to R functions of the form f(x) = ax + b where a and b are
xed constants are sometimes called linear functions. It is easy to see that if b = 0
then such functions do not satisfy the conditions given above. We shall call such
functions ane functions. More generally we shall call a function g : R
n
R
m
an
ane function if it is the sum of a linear function f : R
n
R
m
and a constant
b R
m
. That is, if for any x R
n
g(x) = f(x) +b.
Let us now suppose that we have two linear functions f : R
n
R
m
and
g : R
n
R
m
. It is straightforward to show that the function (f + g) : R
n
R
m
dened by (f +g)(x) = f(x) +g(x) is also a linear function. Similarly if we have a
linear function f : R
n
R
m
and a constant R the function (f) : R
n
R
m
dened by (f)(x) = f(x) is a linear function. If f : R
n
R
m
and g : R
m

R
k
are linear functions then the composite function g f : R
n
R
k
dened by
g f(x) = g(f(x)) is again a linear function. Finally, if f : R
n
R
n
is not only
linear, but also one-to-one and onto so that it has an inverse f
1
: R
n
R
n
then
the inverse function is also a linear function.
Exercise 47. Prove the facts stated in the previous paragraph.
Recall in the previous section we dened the notion of a linear subspace. A
linear function f : R
n
R
m
denes two important subspaces, the image of f,
denoted Im(f) R
m
, and the kernel of f, denoted Ker(f) R
n
. The image of f
is the set of all vectors in R
m
such that f maps some vector in R
n
to that vector,
that is,
Im(f) = { y R
m
| x R
n
such that y = f(x) }.
The kernel of f is the set of all vectors in R
n
that are mapped by the function f
to the zero vector in R
m
, that is,
Ker(f) = { x R
n
| f(x) = 0 }.
The kernel of f is sometimes called the null space of f.
It is intuitively clear that the dimension of Im(f) is no more than n. (It is of
course no more than m since it is contained in R
m
.) Of course, in general it may be
less than n, for example if m < n or if f mapped all points in R
n
to the zero vector
in R
m
. (You should satisfy yourself that this function is indeed a linear function.)
However if the dimension of Im(f) is indeed less than n it means that the function
has mapped the n-dimensional space R
n
into a linear space of lower dimension and
that in the process some dimensions have been lost. The linearity of f means that
a linear subspace of dimension equal to the number of dimensions that have been
lost must have been collapsed to the zero vector (and that translates of this linear
subspace have been collapsed to single points). Thus we can say that
dim(Im(f)) + dim(Ker(f)) = n.
20 2. LINEAR ALGEBRA
In the following section we shall introduce the notion of a matrix and dene
various operations on matrices. If you are like me when I rst came across matrices,
these denitions may seem somewhat arbitrary and mysterious. However, we shall
see that matrices may be viewed as representations of linear functions and that when
viewed in this way the operations we dene on matrices are completely natural.
3. Matrices and Matrix Algebra
A matrix is dened as a rectangular array of numbers. If the matrix contains
m rows and n columns it is called an m n matrix (read m by n matrix). The
element in the ith row and the jth column is called the ijth element. We typically
enclose a matrix in square brackets [ ] and write it as
_

_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
_

_.
In the case that m = n we call the matrix a square matrix. If m = 1 the matrix
contains a single row and we call it a row vector. If n = 1 the matrix contains
a single column and we call it a column vector. For most purposes we do not
distinguish between a 1 1 matrix [a] and the scalar a.
Just as we dened the operation of vector addition and the multiplication of
a vector by a scalar we dene similar operations for matrices. In order to be able
to add two matrices we require that the matrices be of the same dimension. That
is, if matrix A is of dimension m n we shall be able to add the matrix B to it
if and only if B is also of dimension m n. If this condition is met then we add
matrices simply by adding the corresponding elements of each matrix to obtain the
new mn matrix A+B. That is,
_

_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
_

_ +
_

_
b
11
. . . b
1n
.
.
.
.
.
.
.
.
.
b
m1
. . . b
mn
_

_ =
_

_
a
11
+b
11
. . . a
1n
+b
1n
.
.
.
.
.
.
.
.
.
a
m1
+b
m1
. . . a
mn
+b
mn
_

_.
We can see that this denition of matrix addition satises many of the same
properties of the addition of scalars. If A, B, and C are all mn matrices then
(1) A +B = B +A,
(2) (A +B) +C = A + (B +C),
(3) there is a zero matrix 0 such that for any mn matrix A we have A+0 =
0 +A = A, and
(4) there is a matrix A such that A + (A) = (A) +A = 0.
Of course, the zero matrix referred to in 3 is simply the mn matrix consisting
of all zeros (this is called a null matrix) and the matrix A referred to in 4 is the
matrix obtained from A by replacing each element of A by its negative, that is,

_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
_

_ =
_

_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
_

_.
Now, given a scalar in R and an mn matrix A we dene the product of
and A which we write A to be the matrix in which each element is replaced by
times that element, that is,

_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
_

_ =
_

_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
_

_.
4. MATRICES AS LINEAR FUNCTIONS 21
So far the denitions of matrix operations have all seemed the most natural
ones. We now come to dening matrix multiplication. Perhaps here the denition
seems somewhat less natural. However in the next section we shall see that the de-
nition we shall give is in fact very natural when we view matrices as representations
of linear functions.
We dene matrix multiplication of A times B written as AB where A is an
m n matrix and B is a p q matrix only when n = p. In this case the product
AB is dened to be an m q matrix in which the element in the ith row and jth
column is

n
k=1
a
ik
b
kj
. That is, to nd the term to go in the ith row and the jth
column of the product matrix AB we take the ith row of the matrix A which will
be a row vector with n elements and the jth column of the matrix B which will be
a column vector with n elements. We then multiply each element of the rst vector
by the corresponding element of the second and add all these products. Thus
_

_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
_

_
_

_
b
11
. . . b
1q
.
.
.
.
.
.
.
.
.
b
n1
. . . b
nq
_

_ =
_

n
k=1
a
1k
b
k1
. . .

n
k=1
a
1k
b
kq
.
.
.
.
.
.
.
.
.

n
k=1
a
mk
b
k1
. . .

n
k=1
a
mk
b
kq
_

_.
For example
_
a b c
d e f
_
_
_
p q
r s
t v
_
_
=
_
ap +br +ct aq +bs +cv
dp +er +ft dq +es +fv
_
.
We dene the identity matrix of order n to be the nn matrix that has 1s on
its main diagonal and zeros elsewhere that is, whose ijth element is 1 if i = j and
zero if i = j. We denote this matrix by I
n
or, if the order is clear from the context,
simply I. That is,
I =
_

_
1 0 . . . 0
0 1 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1
_

_
.
It is easy to see that if A is an mn matrix then AI
n
= A and I
m
A = A. In fact,
we could equally well dene the identity matrix to be that matrix that satises
these properties for all such matrices A in which case it would be easy to show that
there was a unique matrix satisfying this property, namely, the matrix we dened
above.
Consider an m n matrix A. The columns of A are m-dimensional vectors,
that is, elements of R
m
and the rows of A are elements of R
n
. Thus we can ask
if the n columns are linearly independent and similarly if the m rows are linearly
independent. In fact we ask: What is the maximum number of linearly independent
columns of A? It turns out that this is the same as the maximum number of linearly
independent rows of A. We call the number the rank of the matrix A.
4. Matrices as Representations of Linear Functions
Let us suppose that we have a particular linear function f : R
n
R
m
. We have
suggested in the previous section that such a function can necessarily be represented
as multiplication by some matrix. We shall now show that this is true. Moreover
we shall do so by explicitly constructing the appropriate matrix.
22 2. LINEAR ALGEBRA
Let us write the n-dimensional vector x as a column vector
x =
_

_
x
1
x
2
.
.
.
x
n
_

_
.
Now, notice that we can write the vector x as a sum

n
i=1
x
i
e
i
, where e
i
is the ith
unit vector, that is, the vector with 1 in the ith place and zeros elsewhere. That is,
_

_
x
1
x
2
.
.
.
x
n
_

_
= x
1
_

_
1
0
.
.
.
0
_

_
+x
2
_

_
0
1
.
.
.
0
_

_
+ +x
n
_

_
0
0
.
.
.
1
_

_
.
Now from the linearity of the function f we can write
f(x) = f(
n

i=1
x
i
e
i
)
=
n

i=1
f(x
i
e
i
)
=
n

i=1
x
i
f(e
i
).
But, what is f(e
i
)? Remember that e
i
is a unit vector in R
n
and that f maps
vectors in R
n
to vectors in R
m
. Thus f(e
i
) is the image in R
m
of the vector e
i
. Let
us write f(e
i
) as
_

_
a
1i
a
2i
.
.
.
a
mi
_

_
.
Thus
f(x) =
n

i=1
x
i
f(e
i
)
= x
1
_

_
a
11
a
21
.
.
.
a
m1
_

_
+x
2
_

_
a
12
a
22
.
.
.
a
m2
_

_
+ +x
n
_

_
a
1n
a
2n
.
.
.
a
mn
_

_
=
_

n
i=1
a
1i
x
i

n
i=1
a
2i
x
i
.
.
.

n
i=1
a
mi
x
i
_

_
and this is exactly what we would have obtained had we multiplied the matrices
_

_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
_

_
_

_
x
1
x
2
.
.
.
x
n
_

_
.
Thus we have not only shown that a linear function is necessarily represented by
multiplication by a matrix we have also shown how to nd the appropriate matrix.
4. MATRICES AS LINEAR FUNCTIONS 23
It is precisely the matrix whose n columns are the images under the function of the
n unit vectors in R
n
.
Exercise 48. Find the matrices that represent the following linear functions
from R
2
to R
2
.
(1) a clockwise rotation of /2 (90

),
(2) a reection in the x
1
axis,
(3) a reection in the line x
2
= x
1
(that is, the 45

line),
(4) a counter clockwise rotation of /4 (45

), and
(5) a reection in the line x
2
= x
1
followed by a counter clockwise rotation of
/4.
Recall that in Section 2 we dened, for any f, g : R
n
R
m
and R, the
functions (f +g) and (f). In Section 3 we dened the sum of two mn matrices
A and B, and the product of a scalar with the matrix A. Let us instead dene
the sum of A and B as follows.
Let f : R
n
R
m
be the linear function represented by the matrix A and
g : R
n
R
m
be the linear function represented by the matrix B. Now dene
the matrix (A + B) to be the matrix that represents the linear function (f + g).
Similarly let the matrix A be the matrix that represents the linear function (f).
Exercise 49. Prove that the matrices (A+B) and A dened in the previous
paragraph coincide with the matrices dened in Section 3.
We can also see that the denition we gave of matrix multiplication is precisely
the right denition if we mean multiplication of matrices to mean the composition of
the linear functions that the matrices represent. To be more precise let f : R
n
R
m
and g : R
m
R
k
be linear functions and let A and B be the m n and k m
matrices that represent them. Let (g f) : R
n
R
k
be the composite function
dened in Section 2. Now let us dene the product BA to be that matrix that
represents the linear function (g f).
Now since the matrix A represents the function f and B represents g we have
(g f)(x) = g(f(x))
= g
_
_
_
_
_
_

_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
_

_
_

_
x
1
x
2
.
.
.
x
n
_

_
_
_
_
_
_
= g
_
_
_
_
_
_

n
i=1
a
1i
x
i

n
i=1
a
2i
x
i
.
.
.

n
i=1
a
mi
x
i
_

_
_
_
_
_
_
=
_

_
b
11
b
12
. . . b
1m
b
21
b
22
. . . b
2m
.
.
.
.
.
.
.
.
.
.
.
.
b
k1
b
k2
. . . b
km
_

_
_

n
i=1
a
1i
x
i

n
i=1
a
2i
x
i
.
.
.

n
i=1
a
mi
x
i
_

_
=
_

m
j=1
b
1j

n
i=1
a
ji
x
i

m
j=1
b
2j

n
i=1
a
ji
x
i
.
.
.

m
j=1
b
kj

n
i=1
a
ji
x
i
_

_
24 2. LINEAR ALGEBRA
=
_

n
i=1

m
j=1
b
1j
a
ji
x
i

n
i=1

m
j=1
b
2j
a
ji
x
i
.
.
.

n
i=1

m
j=1
b
kj
a
ji
x
i
_

_
=
_

m
j=1
b
1j
a
j1

m
j=1
b
1j
a
j2
. . .

m
j=1
b
1j
a
jn

m
j=1
b
2j
a
j1

m
j=1
b
2j
a
j2
. . .

m
j=1
b
2j
a
jn
.
.
.
.
.
.
.
.
.
.
.
.

m
j=1
b
kj
a
j1

m
j=1
b
kj
a
j2
. . .

m
j=1
b
kj
a
jn
_

_
_

_
x
1
x
2
.
.
.
x
n
_

_
.
And this last is the product of the matrix we dened in Section 3 to be BA with
the column vector x. As we have claimed the denition of matrix multiplication
we gave in Section 3 was not arbitrary but rather was forced on us by our decision
to regard the multiplication of two matrices as corresponding to the composition
of the linear functions the matrices represented.
Recall that the columns of the matrix A that represented the linear function
f : R
n
R
m
were precisely the images of the unit vectors in R
n
under f. The
linearity of f means that the image of any point in R
n
is in the span of the images
of these unit vectors and similarly that any point in the span of the images is the
image of some point in R
n
. Thus Im(f) is equal to the span of the columns of
A. Now, the dimension of the span of the columns of A is equal to the maximum
number of linearly independent columns in A, that is, to the rank of A.
5. Linear Functions from R
n
to R
n
and Square Matrices
In the remainder of this chapter we look more closely at an important subclass
of linear functions and the matrices that represent them, viz the functions that
map R
n
to itself. From what we have already said we see immediately that the
matrix representing such a linear function will have the same number of rows as it
has columns. We call such a matrix a square matrix.
If the linear function f : R
n
R
n
is one-to-one and onto then the function f
has an inverse f
1
. In Exercise 47 you showed that this function too was linear.
A matrix that represents a linear function that is one-to-one and onto is called a
nonsingular matrix. Alternatively we can say that an n n matrix is nonsingular
if the rank of the matrix is n. To see these two statements are equivalent note
rst that if f is one-to-one then Ker(f) = {0}. (This is the trivial direction of
Exercise 50.) But this means that dim(Ker(f)) = 0 and so dim(Im(f)) = n. And,
as we argued at the end of the previous section this is the same as the rank of
matrix that represents f.
Exercise 50. Show that the linear function f : R
n
R
m
is one-to-one if and
only if Ker(f) = {0}.
Exercise 51. Show that the linear function f : R
n
R
n
is one-to-one if and
only if it is onto.
6. Inverse Functions and Inverse Matrices
In the previous section we discussed briey the idea of the inverse of a linear
function f : R
n
R
n
. This allows us a very easy denition of the inverse of a
square matrix A. The inverse of A is the matrix that represents the linear function
that is the inverse function of the linear function that A represents. We write the
inverse of the matrix A as A
1
. Thus a matrix will have an inverse if and only if
the linear function that the matrix represents has an inverse, that is, if and only
if the linear function is one-to-one and onto. We saw in the previous section that
7. CHANGES OF BASIS 25
this will occur if and only if the kernel of the function is {0} which in turn occurs
if and only if the image of f is of full dimension, that is, is all of R
n
. This is the
same as the matrix being of full rank, that is, of rank n.
As with the ideas we have discussed earlier we can express the idea of a matrix
inverse purely in terms of matrices without reference to the linear function that
they represent. Given an n n matrix A we dene the inverse of A to be a matrix
B such that BA = I
n
where I
n
is the n n identity matrix discussed in Section 3.
Such a matrix B will exist if and only if the matrix A is nonsingular. Moreover, if
such a matrix B exists then it is also true that AB = I
n
, that is, (A
1
)
1
= A.
In Section 9 we shall see one method for calculating inverses of general n n
matrices. Here we shall simply describe how to calculate the inverse of a 2 2
matrix. Suppose that we have the matrix
A =
_
a b
c d
_
.
The inverse of this matrix is
_
1
ad bc
__
d b
c a
_
.
Exercise 52. Show that the matrix A is of full rank if and only if adbc = 0.
Exercise 53. Check that the matrix given is, in fact, the inverse of A.
7. Changes of Basis
We have until now implicitly assumed that there is no ambiguity when we
speak of the vector (x
1
, x
2
, . . . , x
n
). Sometimes there may indeed be an obvious
meaning to such a vector. However when we dene a linear space all that are really
specied are what straight lines are and where zero is. In particular, we do
not necessarily have dened in an unambiguous way where the axes are or what
a unit length along each axis is. In other words we may not have a set of basis
vectors specied.
Even when we do have, or have decided on, a set of basis vectors we may wish
to redene our description of the linear space with which we are dealing so as to
use a dierent set of basis vectors. Let us suppose that we have an n-dimensional
space, even R
n
say, with a given set of basis vectors v
1
, v
2
, . . . , v
n
and that we
wish instead to describe the space in terms of the linearly independent vectors
b
1
, b
2
, . . . , b
n
where
b
i
= b
1i
v
1
+b
2i
v
2
+ +b
ni
v
n
.
Now, if we had the description of a point in terms of the new coordinate vectors,
e.g., as
z
1
b
1
+z
2
b
2
+ +z
n
b
n
then we can easily convert this to a description in terms of the original basis vectors.
We would simply substitute the formula for b
i
in terms of the e
j
s into the previous
formula giving
_
n

i=1
b
1i
z
i
_
v
1
+
_
n

i=1
b
2i
z
i
_
v
2
+ +
_
n

i=1
b
ni
z
i
_
v
n
or, in our previous notation
_

_
(

n
i=1
b
1i
z
i
)
(

n
i=1
b
2i
z
i
)
.
.
.
(

n
i=1
b
ni
z
i
)
_

_
.
26 2. LINEAR ALGEBRA
But this is simply the product
_

_
b
11
b
12
. . . b
1n
b
21
b
22
. . . b
2n
.
.
.
.
.
.
.
.
.
.
.
.
b
n1
b
n2
. . . b
nn
_

_
_

_
z
1
z
2
.
.
.
z
n
_

_
.
That is, if we are given an n-tuple of real numbers that describe a vector in terms
of the new basis vectors b
1
, b
2
, . . . , b
n
and we wish to nd the n-tuple that describes
the vector in terms of the original basis vectors we simply multiply the ntuple we
are given, written as a column vector by the matrix whose columns are the new
basis vectors b
1
, b
2
, . . . , b
n
. We shall call this matrix B. We see among other things
that changing the basis is a linear operation.
Now, if we were given the information in terms of the original basis vectors
and wanted to write it in terms of the new basis vectors what should we do? Since
we dont have the original basis vectors written in terms of the new basis vectors
this is not immediately obvious. However we do know that if we were to do it and
then were to carry out the operation described in the previous paragraph we would
be back with what we started. Further we know that the operation is a linear
operation that maps n-tuples to n-tuples and so is represented by multiplication
by an nn matrix. That is we multiply the n-tuple written as a column vector by
the matrix that when multiplied by B gives the identity matrix, that is, the matrix
B
1
. If we are given a vector of the form
x
1
v
1
+x
2
v
2
+ +x
n
v
n
and we wish to express it in terms of the vectors b
1
, b
2
, . . . , b
n
we calculate
_

_
b
11
b
12
. . . b
1n
b
21
b
22
. . . b
2n
.
.
.
.
.
.
.
.
.
.
.
.
b
n1
b
n2
. . . b
nn
_

_
1
_

_
x
1
x
2
.
.
.
x
n
_

_
.
Suppose now that we consider a linear function f : R
n
R
n
and that we have
originally described R
n
in terms of the basis vectors v
1
, v
2
, . . . , v
n
where v
i
is the
vector with 1 in the ith place and zeros elsewhere. Suppose that with these basis
vectors f is represented by the matrix
A =
_

_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn
_

_
.
If we now describe R
n
in terms of the vectors b
1
, b
2
, . . . , b
n
how will the linear
function f be represented? Let us think of what we want? We shall be given
a vector described in terms of the basis vectors b
1
, b
2
, . . . , b
n
and we shall want
to know what the image of this vector under the linear function f is, where we
shall again want our answer in terms of the basis vectors b
1
, b
2
, . . . , b
n
. We shall
know how to do this when we are given the description in terms of the vectors
e
1
, e
2
, . . . , e
n
. Thus the rst thing we shall do with our vector is to convert it from
a description in terms of b
1
, b
2
, . . . , b
n
to a description in terms of e
1
, e
2
, . . . , e
n
. We
do this by multiplying the n-tuple by the matrix B. Thus if we call our original
n-tuple z we shall now have a description of the vector in terms of e
1
, e
2
, . . . , e
n
,
viz Bz. Given this description we can nd the image of the vector in question
under f by multiplying by the matrix A. Thus we shall have A(Bz) = (AB)z.
Remember however this will have given us the image vector in terms of the basis
8. THE TRACE AND THE DETERMINANT 27
vectors e
1
, e
2
, . . . , e
n
. In order to convert this to a description in terms of the vectors
b
1
, b
2
, . . . , b
n
we must multiply by the matrix B
1
. Thus our nal n-tuple will be
(B
1
AB)z.
Recapitulating, suppose that we know that the linear function f : R
n
R
n
is
represented by the matrix A when we describe R
n
in terms of the standard basis
vectors e
1
, e
2
, . . . , e
n
and that we have a new set of basis vectors b
1
, b
2
, . . . , b
n
. Then
when R
n
is described in terms of these new basis vectors the linear function f will
be represented by the matrix B
1
AB.
Exercise 54. Let f : R
n
R
m
be a linear function. Suppose that with the
standard bases for R
n
and R
m
the function f is represented by the matrix A. Let
b
1
, b
2
, . . . , b
n
be a new set of basis vectors for R
n
and c
1
, c
2
, . . . , c
m
be a new set of
basis vectors for R
m
. What is the matrix that represents f when the linear spaces
are described in terms of the new basis vectors?
Exercise 55. Let f : R
2
R
2
be a linear function. Suppose that with the
standard bases for R
n
and R
m
the function f is represented by the matrix
_
3 1
1 2
_
.
Let
_
3
2
_
and
_
1
1
_
be a new set of basis vectors for R
2
. What is the matrix that represents f when
R
2
is described in terms of the new basis vectors?
Properties of a square matrix that depend only on the linear function that the
matrix represents and not on the particular choice of basis vectors for the linear
space are called invariant properties. We have already seen one example of an
invariant property, the rank of a matrix. The rank of a matrix is equal to the
dimension of the image space of the function that the matrix represents which
clearly depends only on the function and not on the choice of basis vectors for the
linear space.
The idea of a property being invariant can be expressed also in terms only of
matrices without reference to the idea of linear functions. A property is invariant
if whenever an n n matrix A has the property then for any nonsingular n n
matrix B the matrix B
1
AB also has the property. We might think of rank as a
function that associates to any square matrix a nonnegative integer. We shall say
that such a function is an invariant if the property of having the function take a
particular value is invariant for all particular values we may choose.
Two particularly important invariants are the trace of a square matrix and the
determinant of a square matrix. We examine these in more detail in the following
section.
8. The Trace and the Determinant
In this section we dene two important real valued functions on the space
of n n matrices, the trace and the determinant. Both of these concepts have
geometric interpretations. However, while the trace is easy to calculate (much easier
than the determinant) its geometric interpretation is rather hard to see. Thus we
shall not go into it. On the other hand the determinant while being somewhat
harder to calculate has a very clear geometric interpretation. In Section 9 we shall
examine in some detail how to calculate determinants. In this section we shall be
content to discuss one denition and the geometric intuition of the determinant.
28 2. LINEAR ALGEBRA
Given an nn matrix A the trace of A, written tr(A) is the sum of the elements
on the main diagonal, that is,
tr
_

_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn
_

_
=
n

i=1
a
ii
.
Exercise 56. For the matrices given in Exercise 55 conrm that tr(A) =
tr(B
1
AB).
It is easy to see that the trace is a linear function on the space of all n n
matrices, that is, that for all A and B n n matrices and for all R
(1) tr(A +B) = tr(A) + tr(B),
and
(2) tr(A) = tr(A).
We can also see that if A and B are both nn matrices then tr(AB) = tr(BA).
In fact, if A is an m n matrix and B is an n m matrix this is still true. This
will often be extremely useful in calculating the trace of a product.
Exercise 57. From the denition of matrix multiplication show that if A is an
mn matrix and B is an nm matrix that tr(AB) = tr(BA). [Hint: Look at the
denition of matrix multiplication in Section 2. Then write the determinant of the
product matrix using summation notation. Finally change the order of summation.]
The determinant, unlike the trace is not a linear function of the matrix. It does
however have some linear structure. If we x all columns of the matrix except one
and look at the determinant as a function of only this column then the determinant
is linear in this single column. Moreover this is true whatever the column we choose.
Let us write the determinant of the n n matrix A as det(A). Let us also write
the matrix A as [a
1
, a
2
, . . . , a
n
] where a
i
is the ith column of the matrix A. Thus
our claim is that for all n n matrices A, for all i = 1, 2, . . . n, for all n vectors b,
and for all R
det([a
1
, . . . , a
i1
, a
i
+b, a
i+1
, . . . , a
n
]) =det([a
1
, . . . , a
i1
, a
i
, a
i+1
, . . . , a
n
])
+ det([a
1
, . . . , a
i1
, b, a
i+1
, . . . , a
n
])
(3)
and
(4) det([a
1
, . . . , a
i1
, a
i
, a
i+1
, . . . , a
n
]) = det([a
1
, . . . , a
i1
, a
i
, a
i+1
, . . . , a
n
]).
We express this by saying that the determinant is a multilinear function.
Also the determinant is such that any n n matrix that is not of full rank,
that is, of rank n, has a zero determinant. In fact, given that the determinant
is a multilinear function if we simply say that any matrix in which one column is
the same as one of its neighbours has a zero determinant this implies the stronger
statement that we made. We already see one use of calculating determinants. A
matrix is nonsingular if and only if its determinant is nonzero.
The two properties of being multilinear and zero whenever two neighbouring
columns are the same already almost uniquely identify the determinant. Notice
however that if the determinant satises these two properties then so does any
constant times the determinant. To uniquely dene the determinant we tie down
this constant by assuming that det(I) = 1.
Though we havent proved that it is so, these three properties uniquely de-
ne the determinant. That is, there is one and only one function with these three
properties. We call this function the determinant. In Section 9 we shall discuss a
9. CALCULATING AND USING DETERMINANTS 29
number of other useful properties of the determinant. Remember that this addi-
tional properties are not really additional facts about the determinant. They can
all be derived from the three properties we have given here.
Let us now look to the geometric interpretation of the determinant. Let us
rst think about what linear transformations can do to the space R
n
. Since we
have already said that a linear transformation that is not onto is represented by a
matrix with a zero determinant let us think about linear transformations that are
onto, that is, that do not map R
n
into a linear space of lower dimension. Such
transformations can rotate the space around zero. They can stretch the space in
dierent directions. And they can ip the space over. In the latter case all objects
will become mirror images of themselves. We call linear transformations that
make such a mirror image orientation reversing and those that dont orientation
preserving. A matrix that represents an orientation preserving linear function has a
positive determinant while a matrix that represents an orientation reversing linear
function has a negative determinant. Thus we have a geometric interpretation of
the sign of the determinant.
The absolute size of the determinant represents how much bigger or smaller the
linear function makes objects. More precisely it gives the volume of the image
of the unit hypercube under the transformation. The word volume is in quotes
because it is the volume with which we are familiar only when n = 3. If n = 2 then
it is area, while if n > 3 then it is the full dimensional analog in R
n
of volume in
R
3
.
Exercise 58. Consider the matrix
_
3 1
1 2
_
.
In a diagram show the image under the linear function that this matrix represents
of the unit square, that is, the square whose corners are the points (0,0), (1,0),
(0,1), and (1,1). Calculate the area of that image. Do the same for the matrix
_
4 1
1 1
_
.
In the light of Exercise 55, comment on the answers you calculated.
9. Calculating and Using Determinants
We have already used the concepts of the inverse of a matrix and the determi-
nant of a matrix. The purpose of this section is to cover some of the cookbook
aspects of calculating inverses and determinants.
Suppose that we have an n n matrix
A =
_

_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
n1
. . . a
nn
_

_
then we shall use |A| or

a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
n1
. . . a
nn

as an alternative notation for det(A). Always remember that

a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
n1
. . . a
nn

30 2. LINEAR ALGEBRA
is not a matrix but rather a real number. For the case n = 2 we dene
det(A) =

a
11
a
12
a
21
a
22

as a
11
a
22
a
21
a
12
. It is possible to also give a convenient formula for the deter-
minant of a 3 3 matrix. However, rather than doing this, we shall immediately
consider the case of an n n matrix.
By the minor of an element of the matrix A we mean the determinant (re-
member a real number) of the matrix obtained from the matrix A by deleting the
row and column containing the element in question. We denote the minor of the
element a
ij
by the symbol |M
ij
|. Thus, for example,
|M
11
| =

a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
a
n2
. . . a
nn

.
Exercise 59. Write out the minors of a general 3 3 matrix.
We now dene the cofactor of an element to be either plus or minus the minor
of the element, being plus if the sum of indices of the element is even and minus
if it is odd. We denote the cofactor of the element a
ij
by the symbol |C
ij
|. Thus
|C
ij
| = |M
ij
| if i +j is even and |C
ij
| = |M
ij
| if i +j is odd. Or,
|C
ij
| = (1)
i+j
|M
ij
|.
We now dene the determinant of an n n matrix A,
det(A) = |A| =

a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
n1
. . . a
nn

to be

n
j=1
a
1j
|C
1j
|. This is the sum of n terms, each one of which is the product
of an element of the rst row of the matrix and the cofactor of that element.
Exercise 60. Dene the determinant of the 1 1 matrix [a] to be a. (What
else could we dene it to be?) Show that the denition given above corresponds
with the denition we gave earlier for 2 2 matrices.
Exercise 61. Calculate the determinants of the following 3 3 matrices.
(a)
_
_
1 2 3
3 6 9
4 5 7
_
_
(b)
_
_
1 5 2
1 4 3
0 1 2
_
_
(c)
_
_
1 1 0
5 4 1
2 3 2
_
_
(d)
_
_
1 0 0
0 1 0
0 0 1
_
_
(e)
_
_
2 5 2
1 5 3
0 1 3
_
_
Exercise 62. Show that the determinant of the identity matrix, det(I
n
) is 1
for all values of n. [Hint: Show that it is true for I
2
. Then show that if it is true
for I
n1
then it is true for I
n
.]
One might ask what was special about the rst row that we took elements of
that row multiplied them by their cofactors and added them up. Why not the
second row, or the rst column? It will follow from a number of properties of
determinants we list below that in fact we could have used any row or column and
we would have arrived at the same answer.
9. CALCULATING AND USING DETERMINANTS 31
Exercise 63. Expand the matrix given in Exercise 61(b) in terms of the 2nd
and 3rd rows and in terms of each column and check that the resulting answer
agrees with the answer you obtained originally.
We now have a way of calculating the determinant of any matrix. To nd
the determinant of an n n matrix we have to calculate n determinants of size
(n1)(n1). This is clearly a fairly computationally costly procedure. However
there are often ways to economise on the computation.
Exercise 64. Evaluate the determinants of the following matrices
(a)
_

_
1 8 0 7
2 3 4 6
1 6 0 1
0 5 0 8
_

_
(b)
_

_
4 7 0 4
5 6 1 8
0 0 9 0
1 3 1 4
_

_
[Hint: Think carefully about which column or row to use in the expansion.]
We shall now list a number of properties of determinants. These properties
imply that, as we stated above, it does not matter which row or column we use to
expand the determinant. Further these properties will give us a series of transfor-
mations we may perform on a matrix without altering its determinant. This will
allow us to calculate a determinant by rst transforming the matrix to one whose
determinant is easier to calculate and then calculating the determinant of the easier
matrix.
Property 1. The determinant of a matrix equals the determinant of its trans-
pose.
|A| = |A

|
Property 2. Interchanging two rows (or two columns) of a matrix changes
its sign but not its absolute value. For example,

c d
a b

= cb ad = (ad cb) =

a b
c d

.
Property 3. Multiplying one row (or column) of a matrix by a constant
will change the value of the determinant -fold. For example,

a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
n1
. . . a
nn

a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
n1
. . . a
nn

.
Exercise 65. Check Property 3 for the cases n = 2 and n = 3.
Corollary 1. |A| =
n
|A| (where A is an n n matrix).
Corollary 2. | A| = |A| if n is even. | A| = |A| if n is odd.
Property 4. Adding a multiple of any row(column) to any other row(column)
does not alter the value of the determinant.
Exercise 66. Check that
_
_
1 5 2
1 4 3
0 1 2
_
_
=
_
_
1 5 + 3 2 2
1 4 + 3 3 3
0 1 + 3 2 2
_
_
=
_
_
1 + (2) 1 5 + (2) 4 2 + (2) 3
1 4 3
0 1 2
_
_
.
32 2. LINEAR ALGEBRA
Property 5. If one row (or column) is a constant times another row (or
column) then the determinant the matrix is zero.
Exercise 67. Show that Property 5 follows from Properties 3 and 4.
We can strengthen Property 5 to obtain the following.
Property 5

. The determinant of a matrix is zero if and only if the matrix is


not of full rank.
Exercise 68. Explain why Property 5

is a strengthening of Property 5, that


is, why 5

implies 5.
These properties allow us to calculate determinants more easily. Given an nn
matrix A the basic strategy one follows is to use the above properties, particularly
Property 4 to nd a matrix with the same determinant as A in which one row (or
column) has only one non-zero element. Then, rather than calculating n determi-
nants of size (n 1) (n 1) one only needs to calculate one. One then does the
same thing for the (n 1) (n 1) determinant that needs to be calculated, and
so on.
There are a number of reasons we are interested in determinants. One is that
they give us one method of calculating the inverse of a nonsingular matrix. (Recall
that there is no inverse of a singular matrix.) They also give us a method, known
as Cramers Rule, for solving systems of linear equations. Before proceeding with
this it is useful to state one further property of determinants.
Property 6. If one expands a matrix in terms of one row (or column) and
the cofactors of a dierent row (or column) then the answer is always zero. That is
n

j=1
a
ij
|C
kj
| = 0
whenever i = k. Also
n

i=1
a
ij
|C
ik
| = 0
whenever j = k.
Exercise 69. Verify Property 6 for the matrix
_
_
4 1 2
5 2 1
1 0 3
_
_
.
Let us dene the matrix of cofactors C to be the matrix [|C
ij
|] whose ijth
element is the cofactor of the ijth element of A. Now we dene the adjoint matrix
of A to be the transpose of the matrix of cofactors of A. That is
adj(A) = C

.
It is straightforward to see (using Property 6) that Aadj(A) = |A|I
n
= adj(A)A.
That is, A
1
=
1
|A|
adj(A). Notice that this is well dened if and only if |A| = 0.
We now have a method of nding the inverse of any nonsingular square matrix.
Exercise 70. Use this method to nd the inverses of the following matrices
(a)
_
_
3 1 2
1 0 3
4 0 2
_
_
(b)
_
_
4 2 1
7 3 3
2 0 1
_
_
(c)
_
_
1 5 2
1 4 3
0 1 2
_
_
.
10. EIGENVALUES AND EIGENVECTORS 33
Knowing how to invert matrices we thus know how to solve a system of n linear
equations in n unknowns. For we can express the n equations in matrix notation as
Ax = b where A is an nn matrix of coecients, x is an n1 vector of unknowns,
and b is an n 1 vector of constants. Thus we can solve the system of equations
as x = A
1
Ax = A
1
b.
Sometimes, particularly if we are not interested in all of the xs it is convenient
to use another method of solving the equations. This method is known as Cramers
Rule. Let us suppose that we wish to solve the above system of equations, that is,
Ax = b. Let us dene the matrix A
i
to be the matrix obtained from A by replacing
the ith column of A by the vector b. Then the solution is given by
x
i
=
|A
i
|
|A|
.
Exercise 71. Derive Cramers Rule. [Hint: We know that the solution to the
system of equations is solved by x = (1/|A|)adj(A)b. This gives a formula for x
i
.
Show that this formula is the same as that given by x
i
= |A
i
|/|A|.]
Exercise 72. Solve the following system of equations (i) by matrix inversion
and (ii) by Cramers Rule
(a)
2x
1
x
2
= 2
3x
2
+ 2x
3
= 16
5x
1
+ 3x
3
= 21
(b)
x
1
+ x
2
+ x
3
= 1
x
1
x
2
+ x
3
= 1
x
1
+ x
2
+ x
3
= 1
.
Exercise 73. Recall that we claimed that the determinant was an invariant.
Conrm this by calculating (directly) det(A) and det(B
1
AB) where
B =
_
_
1 0 1
1 1 2
2 1 1
_
_
and A =
_
_
1 0 0
0 2 0
0 0 3
_
_
.
Exercise 74. An nth order determinant of the form

a
11
0 0 . . . 0
a
21
a
22
0 . . . 0
a
31
a
32
a
33
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
. . . a
nn

is called triangular. Evaluate this determinant. [Hint: Expand the determinant in


terms of its rst row. Expand the resulting (n 1) (n 1) determinant in terms
of its rst row, and so on.]
10. Eigenvalues and Eigenvectors
Suppose that we have a linear function f : R
n
R
n
. When we look at
how f deforms R
n
one natural question to look at is: Where does f send some
linear subspace? In particular we might ask if there are any linear subspaces that
f maps to themselves. We call such linear subspaces invariant linear subspaces.
Of course the space R
n
itself and the zero dimensional space {0} are invariant
linear subspaces. The real question is whether there are any others. Clearly, for
some linear transformations there are no other invariant subspaces. For example,
a clockwise rotation of /4 in R
2
has no invariant subspaces other than R
2
itself
and {0}.
A particularly important class of invariant linear subspaces are the one dimen-
sional ones. A one dimensional linear subspace is specied by one nonzero vector,
say x. Then the subspace is { x | R}. Let us call this subspace L( x). If L( x)
34 2. LINEAR ALGEBRA
is an invariant linear subspace of f and if x L( x) then there is some value such
that f(x) = x. Moreover the value of for which this is true will be the same
whatever value of x we choose in L( x).
Now if we x the set of basis vectors and thus the matrix A that represents f
we have that if x is in a one dimensional invariant linear subspace of f then there
is some R such that
Ax = x.
Again we can dene this notion without reference to linear functions. Given a
matrix A if we can nd a pair x, with x = 0 that satisfy the above equation we
call x an eigenvector of the matrix A and the associated eigenvalue. (Sometimes
these are called characteristic vectors and values.)
Exercise 75. Show that the eigenvalues of a matrix are an invariant, that
is, that they depend only on the linear function the matrix represents and not on
the choice of basis vectors. Show also that the eigenvectors of a matrix are not
an invariant. Explain why the dependence of the eigenvectors on the particular
basis is exactly what we would expect and argue that is some sense they are indeed
invariant.
Now we can rewrite the equation Ax = x as
(A I
n
)x = 0.
If x, solve this equation and x = 0 then we have a nonzero linear combination of
the columns of AI
n
equal to zero. This means that the columns of AI
n
are
not linearly independent and so det(A I
n
) = 0, that is,
det
_

_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn

_

_
= 0.
Now, the left hand side of this last equation is a polynomial of degree n in
, that is, a polynomial in in which n is the highest power of that appears
with nonzero coecient. It is called the characteristic polynomial and the equation
is called the characteristic equation. Now this equation may, or may not, have a
solution in real numbers. In general, by the fundamental theorem of algebra the
equation has n solutions, perhaps not all distinct, in the complex numbers. If the
matrix A happens to be symmetric (that is, if a
ij
= a
ji
for all i and j) then all of
its eigenvalues are real. If the eigenvalues are all distinct (that is, dierent from
each other) then we are in a particularly well behaved situation. As a prelude we
state the following result.
Theorem 5. Given an nn matrix A suppose that we have m eigenvectors of A
x
1
, x
2
, . . . , x
m
with corresponding eigenvalues
1
,
2
, . . . ,
m
. If
i
=
j
whenever
i = j then x
1
, x
2
, . . . , x
m
are linearly independent.
An implication of this theorem is that an n n matrix cannot have more than
n eigenvectors with distinct eigenvalues. Further this theorem allows us to see that
if an n n matrix has n distinct eigenvalues then it is possible to nd a basis
for R
n
in which the linear function that the matrix represents is represented by
a diagonal matrix. Equivalently we can nd a matrix B such that B
1
AB is a
diagonal matrix.
To see this let b
1
, b
2
, . . . , b
n
be n linearly independent eigenvectors with associ-
ated eigenvalues
1
,
2
, . . . ,
n
. Let B be the matrix whose columns are the vectors
10. EIGENVALUES AND EIGENVECTORS 35
b
1
, b
2
, . . . , b
n
. Since these vectors are linearly independent the matrix B has an
inverse. Now
B
1
AB = B
1
[Ab
1
Ab
2
. . . Ab
n
]
= B
1
[
1
b
1

2
b
2
. . .
n
b
n
]
= [
1
B
1
b
1

2
B
1
b
2
. . .
n
B
1
b
n
]
=
_

1
0 . . . 0
0
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . .
n
_

_
.
CHAPTER 3
Convex Sets
1. Denition and Basic Properties
Convexity is one of the most important mathematical properties in econom-
ics. For example, without convexity of preferences, demand and supply functions
are not continuous, and so competitive markets generally do not have equilibrium
points. The economic interpretation of convex preference sets in consumer theory is
diminishing marginal rates of substitution; the interpretation of convex production
sets is constant or decreasing returns to scale. Considerably less is known about
general equilibrium models that allow non-convex production sets (e.g., economies
of scale) or non-convex preferences (e.g., the consumer prefers a pint of beer or a
shot of vodka alone to any mixture of the two).
Another set of mathematical results closely connected to the notion of convexity
is so-called separation and support theorems. These theorems are frequently used in
economics to obtain a price system that leads consumers and producers to choose
Pareto-ecient allocation. That is, given the prices, producers are maximising
prots, and given those prots as income, consumers are maximising utility subject
to their budget constraints.
1.1. Convex Sets. Given two points x, y R
n
, a point z = ax + (1 a) y,
where 0 a 1, is called a convex combination of x and y.
The set of all possible convex combinations of x and y, denoted by [x, y], is
called the interval with endpoints x and y (or, the line segment connecting x and
y), that is
[x, y] = {ax + (1 a) y | 0 a 1} .
Definition 15. A set S R
n
is convex if and only if for any points x and y
in S the interval [x, y] S.
In words: a set is convex if it contains the line segment connecting any two of
its points; or, more loosely speaking, a set is convex if for any two points in the set
it also contains all points between them.
Convex sets in R
2
include interiors of triangles, squares, circles, ellipses, and
hosts of other sets. Note also that, for example in R
3
, while the interior of a cube
is a convex set, its boundary is not. (Of course, the same is true of the square
in R
2
.) The quintessential convex set in Euclidean space R
n
for any n > 1 is the
ndimensional sphere S
r
(a) of radius r > 0 about point a R
n
, given by
S
r
(a) = {x | x R
n
, |x a| < r}.
Exercise 76. Is the empty set convex? Is a singleton convex? Is R
n
convex?
In each case prove that the set is convex or prove that it is not.
There are also several standard ways of forming convex sets from convex sets.
We now examine a number of such ways.
Let A, B R
n
be sets. The Minkowski sum A +B R
n
is dened as
A +B = {x +y | x A, y B} .
37
38 3. CONVEX SETS
When B = {b} is a singleton, the set A + {b} is often written A + b is called a
translation of A.
Exercise 77. Prove that if A and B are convex then A+B is convex.
Let A R
n
be a set and R be a number. The scaling A R
n
is dened
as
A = {x | x A} .
When > 0, the set A is called a dilation of A.
Exercise 78. Prove that if A is convex then for any R the set A is
convex.
Exercise 79. Prove that for any nite number of convex sets S
1
, . . . S
K
the
intersection
K
i=1
S
i
is convex. In fact the result does not depend on the number of
sets being nite. Prove that the intersection
iI
S
i
of any number of convex sets
is convex.
Exercise 80. Show by example that the union of a number of convex sets
need not be convex.
It is also possible to dene the convex combination of an arbitrary (but nite)
number of points.
Definition 16. Let x
1
, ..., x
k
be a nite set of points from R
n
. A point
x =
k

i=1

i
x
i
,
where
i
0 for i = 1, ..., k and
k

i=1

i
= 1, is called a convex combination of
x
1
, ..., x
k
.
Note that the denition of a convex combination of two points is a special case
of this denition.
Can we generate superconvex sets using denition 16? That is, if we start with
a convex set can we make it even more convex by adding all the convex combinations
of points in the set. No, as the following Lemma shows.
Lemma 1. A set S R
n
is convex if and only if every convex combination of
points of S is in S.
Proof. If a set contains all convex combinations of its points it is obviously
convex, because it also contains convex combinations of all pairs its points. Thus,
we need to show that a convex set contains any convex combination of its points.
The proof is by induction on the number of points of S in a convex combination.
By denition, convex set contains all convex combinations of any two of it points.
Suppose that S contains any convex combination of n or fewer points and consider
one of n + 1 points, x =

n+1
i=1

i
x
i
. Since not all
i
= 1, we can relabel them so
that
n+1
< 1. Then
x = (1
n+1
)
n

i=1

i
1
n+1
x
i
+
n+1
x
n+1
= (1
n+1
) y +
n+1
x
n+1
.
Note that y S by induction hypothesis (as a convex combination of n points of
S) and, as a result, so is x, being a convex combination of two points in S.
But, using denition 16, we can generate convex sets from non-convex sets.
This operation is very useful, so the resulting set deserves a special name.
1. DEFINITION AND BASIC PROPERTIES 39
Definition 17. Given a set S R
n
the set of all convex combinations of
points from S, denoted convS, is called the convex hull of S.
Exercise 81. Prove that for any set S R
n
the convex hull of S is a convex
set. (This is not dicult. It just involves a careful attention to the details of the
denition. Or, put another way, the only diculty is seeing that there is something
to prove.)
In light of the previous exercise, Lemma 1 can be written more succinctly as
S = convS if and only if S is convex.
1.2. Convex Hulls. The next theorem deals the following interesting prop-
erty of convex hulls: the convex hull of a set S is the intersection of all convex sets
containing S. Thus, in a natural sense, the convex hull of a set S is the smallest
convex set containing S. In fact, many authors dene convex hulls in that way and
then prove our Denition 17 as theorem.
Theorem 6. Let S R
n
be a set. Then any convex set containing S also
contains convS.
Proof. Let A be a convex set such that S A. By lemma 1 A contains all
convex combinations of its points and, in particular, all convex combinations of
points of its subset S, which is convS.
The next exercise is again quite obvious. It again, frustrates attempts to gener-
ate superconvex sets, this time by showing that we do not make a set more convex
by taking the convex hull of a convex hull.
Exercise 82. Prove that convconvS = convS for any S.
Exercise 83. Prove that if A B then convA convB.
The next asks you to show that when taking convex hulls and taking direct
sums, it does not matter in which order you use these operations.
Exercise 84. Prove that conv (A+B) = (convA) + (convB).
On the other hand when taking unions or intersections and convex hulls the
order may matter.
Exercise 85. Prove that conv (A B) (convA)(convB). Give an example
to show that the inclusion may be strict.
Exercise 86. Prove that (convA) (convB) conv (A B). Again, give an
example to show that the inclusion may be strict.
1.3. Caratheodorys Theorem. The denition 17 implies that any point x
in the convex hull of S is representable as a convex combination of (nitely) many
points of S but it places no restrictions on the number of points of S required
to make the combination. Caratheodorys Theorem puts the upper bound on the
number of points required, in R
n
the number of points never has to be more than
n + 1.
Theorem 7 (Caratheodory, 1907). Let S R
n
be a non-empty set then every
x convS can be represented as a convex combination of (at most) n + 1 points
from S.
Note that the theorem does not identify the points used in the representation,
their choice would depend on x.
Exercise 87. Show by example that the constant n + 1 in Caratheodorys
theorem cannot be improved. That is, exhibit a set S R
n
and a point x convS
that cannot be represented as a convex combination of fewer than n + 1 points
from S.
40 3. CONVEX SETS
1.4. Polytopes. The simplest convex sets are those which are convex hulls of
a nite set of points, that is, sets of the form S = conv{x
1
, x
2
, ..., x
m
}. The convex
hull of a nite set of points in R
n
is called a polytope.
Exercise 88. Prove that the set
= {x R
n+1
|
n+1

i=1
x
i
= 1 and x
i
0 for any i}
is a polytope. This polytope is called the standard ndimensional simplex.
Exercise 89. Prove that the set
C = {x R
n+1
| 0 x
i
1 for any i}
is a polytope. This polytope is called an ndimensional cube.
Exercise 90. Prove that the set
O = {x R
n+1
|
n+1

i=1
|x
i
| 1}
is a polytope. This polytope is called a (hyper)octahedron.
1.5. The Topology of Convex Sets. We have now looked at another struc-
ture on R
n
along with the topological and algebraic structures we looked at earlier.
The following result tells us something about how these structures are related.
Proposition 1. The closure of a convex set is a convex set. The interior of a
convex set (possible empty) is convex.
Recall that you showed earlier that the empty set was convex.
2. Support and Separation
2.1. Hyperplanes. The concept of hyperplane in R
n
is a straightforward gen-
eralisation of the notion of a line in R
2
and of a plane in R
3
. A line in R
2
can be
described by an equation
p
1
x
1
+p
2
x
2
=
where p = (p
1
, p
2
) is some non-zero vector and is some scalar. A plane in R
3
can
be described by an equation
p
1
x
1
+p
2
x
2
+p
3
x
3
=
where p = (p
1
, p
2
, p
3
) is some non-zero vector and is some scalar. Similarly, a
hyperplane in R
n
can be described by an equation
n

i=1
p
i
x
i
=
where p = (p
1
, p
2
, . . . , p
n
) is some non-zero vector in R
n
and is some scalar. It
can be written in more concise way using scalar (aka inner, dot) product notation.
Definition 18. A hyperplane is the set
H(p, ) = {x R
n
| p x = }
where p R
n
is a non-zero vector and is a scalar. The vector p is called the
normal to the hyperplane H.
2. SUPPORT AND SEPARATION 41
Suppose that there are two points x

, y

H(p, ). Then by denition px

=
and p y

= . Hence p (x

) = 0. In other words, vector p is orthogonal to


the line segment (x

). Since we started by picking arbitrary points in H(p, )


we have that p is orthogonal to any line segment in H(p, ), or that p is orthogonal
to H(p, ).
Given a hyperplane H R
n
points in R
n
can be classied according to their
positions relative to hyperplane. The (closed) half-space determined by the hyper-
plane H(p, ) is either the set of points below H or the set of points above H,
i.e., either the set {x R
n
| p x } or the set {x R
n
| p x }. Open
half-spaces are dened by strict inequalities.
Exercise 91. Prove that a closed half-space is closed and open half-space is
open.
The straightforward economic example of a half-space is a budget set {x
R
n
| p x } of a consumer with income facing the vector of prices p. (It was
rather neat to call the normal vector p, wasnt it?). By the way, hyperplanes and
half-spaces are convex sets.
Exercise 92. Prove that any half-space, open or closed, and any hyperplane
in R
n
is convex.
2.2. Support Functions. In this section we give a description of what is
called a dual structure. Consider the set of all closed convex subsets of R
n
. We
will show that to each such set S we can associate an extended-real valued function

S
: R
n
R{}, that is a function that maps each vector in R
n
to either a real
number or to . Not all such functions can be arrived at in this way. In fact
we shall show that any such function must be concave and homogeneous of degree
1. But once we restrict attention to functions that can be arrived at as a support
function for some such closed convex set we have another set of objects that we
can analyse and perhaps make useful arguments about the original sets in which
we where interested.
In fact, we shall dene the function
S
for any subset of R
n
, not just the closed
and convex ones. However, if the original set S is not a closed convex one we shall
lose some information about S in going to
S
. In particular,
S
only depends on
the closed convex hull of S, that is, if two sets have the same closed convex hull
they will lead to the same function
S
.
We dene
S
: R
n
R {} as

S
(p) = inf{p x | x S},
where inf denotes the inmum or greatest lower bound. It is a property of the
real numbers that any set of real numbers has an inmum. Thus
S
(p) is well
dened for any set S. If the minimum exists, for example if the set S is compact,
then the inmum is the minimum. In other cases the minimum may not exist. To
take a simple one dimensional example suppose that the set S was the subset of R
consisting od the numbers 1/n for n = 1, . . . and that p = 2. Then clearly p x = px
does not have a minimum on the set S However 0 is less than px = 2x for any value
of x in S but for any number a greater than 0 there is a value of x in S such that
px < a. Thus 0 is in this case the inmum of the set {p x | x S}.
Recall that we have not assumed that S is convex. However, if we do assume
that S is both convex and closed then the function
S
contains all the information
needed to reconstruct S.
Given any extended-real valued function : R
n
R {} let us dene the
set S

as
S

= {x R
n
| p x (p) for every p R
n
}.
42 3. CONVEX SETS
That is, for each p such that (p) > we dene the closed half space
{x R
n
| p x (p)}.
Notice that is (p) = then p x (p) for any x and so the above set will be
R
n
rather than a half space and that for this p the requirement that p x (p)
puts no restrictions on the set S

. The set S

is the intersection of all these closed


half spaces. Since the intersection of convex sets is convex and the intersection of
closed sets is closed, the set S

is, for any function , a closed convex set.


Suppose that we start with a set S, dene
S
as above and then use
S
to
dene the set S
S
. If the set S was a closed convex set then S
S
will be exactly
equal to S. Since we have seen that S
S
is a closed convex set, it must be that if
S is not a closed convex set it will not be equal to S
S
. However S will always be
a subset of S
S
, and indeed S
S
will be the smallest closed convex set such that S
is a subset, that is S
S
is the closed convex hull of S.
So we see that the properties of
S
do not depend on the set S being closed or
convex. Whether or not S is closed or convex
S
will depend only on the closed
convex hull of S. We obtain a similar result about the function and the process
of going from to S

and then to
S
.
2.3. Separation. We now consider the notion of separating two sets by a
hyperplane.
Definition 19. A hyperplane H separates sets A and B if A is contained in
one closed half-space and B is contained in the other. A hyperplane H strictly
separates sets A and B if A is contained in one open half-space and B is contained
in the other.
It is clear that strict separation requires the two sets to be disjoint. For example,
consider two (externally) tangent circles in a plane. Their common tangent line
separates them but does not separate them strictly. On the other hand, although it
is necessary for two sets be disjoint in order to strictly separate them, this condition
is not sucient, even for closed convex sets. Let A = {x R
2
| x
1
> 0 and
x
1
x
2
1} and B = {x R
2
| x
1
0 and x
2
= 0} then A and B are disjoint
closed convex sets but they cannot be strictly separated by a hyperplane (line in
R
2
). Thus the problem of the existence of separating hyperplane is more involved
then it may appear to be at rst.
We start with separation of a set and a point.
Theorem 8. Let S R
n
be a convex set and x
0
/ S be a point. Then S and
x
0
can be separated. If S is closed then S and x can be strongly separated.
Idea of proof. Proof proceeds in two steps. The rst step establishes the
existence a point a in the closure of S which is the closest to x
0
. The second step
constructs the separating hyperplane using the point a.
STEP 1. There exists a point a

S (closure of S) such that d(x
0
, a) d(x, a)
for all x

S, and d(x
0
, a) > 0.
Let

B(x
0
) be closed ball with centre at x
0
that intersects the closure of S.
Let A =

B(x
0
)

S = . The set A is nonempty, closed and bounded (hence
compact). According to Weierstrasss theorem, the continuous distance function
d(x
0
, x) achieves its minimum in A. That is, there exists a A such that d(x
0
, a)
d(x, a) for all x

S. Note that d(x
0
, a) > 0
STEP 2. There exists a hyperplane H(p, ) = {x R
n
| p x = } such that
p x for all x

S and p x < .
Construct a hyperplane which goes through the point a

S and has normal
p = a x
0
. The proof that this hyperplane is the separating one is done by
2. SUPPORT AND SEPARATION 43
contradiction. Suppose there exists a point y

S which is strictly on the same side
of H as x
0
. Consider the point y

[a, y] such that the vector y

x
0
is orthogonal
to y a. Since d(x
0
, y) d(x
0
, a), the point y

is between a and y. Thus, y



S
and d(x
0
, y

) d(x
0
, a) which contradicts the choice of a. When S =

S, that is S
is closed, the separation can be made strict by choosing a point strictly in between
a and x
0
instead of a. This is always possible because d(x
0
, a) > 0.
Theorem 8 is very useful because separation of a pair of sets can be always
reduced to separation of a set and a point.
Lemma 2. Let A and B be a non-empty sets. A and B can be separated
(strongly separated) if and only if AB and 0 can be separated (strongly separated).
Proof. If A and B are convex then A B is convex. If A is compact and B
is closed then A B is closed. And 0 / A B if and only if A B = .
Theorem 9 (Minkowski, 1911). Let A and B be a non-empty convex sets with
A B = . Then A and B can be separated. If A is compact and B is closed then
A and B can be strongly separated.
2.4. Support. Closely (not in the topological sense) related to the notion of
a separating hyperplane is the notion of supporting hyperplane.
Definition 20. The hyperplane H supports the set S at the point x
0
S if
x
0
H and S is a subset of one of the half-spaces determined by H.
A convex set can be supported at any of its boundary points, this is the imme-
diate consequence of Theorem 9. To prove it, consider the sets A and B = {x
0
},
where x
0
is a boundary point of A.
Theorem 10. Let S R
n
be a convex set with nonempty interior and x
0
S
be its boundary point. Then there exist a supporting hyperplane for S at x
0
.
Note that if the boundary of a convex set is smooth (dierentiable) at the
given point x
0
then the supporting hyperplane is unique and is just the tangent
hyperplane. If, however, the boundary is not smooth then there can be many
supporting hyperplanes passing through the given point. It is important to note
that conceptually the supporting theorems are connected to calculus. But, the
supporting theorems are more powerful (dont require smoothness), more direct,
and more set-theoretic.
Certain points on the boundary of a convex set carry a lot of information about
the set.
Definition 21. A point x of a convex set S is an extreme point of S if x is
not an interior point of any line segment in S.
The extreme points of a closed ball and of a closed cube in R
3
are its boundary
points and its eight vertices, respectively. A half-space has no extreme points even
if it is closed.
An interesting property of extreme points is that an extreme point can be
deleted from the set without destroying convexity of the set. That is, a point x in
a convex set S is an extreme point if and only if the set S\{x} is convex.
The next Theorem is a nite-dimensional version of a quite general and powerful
result by M.G. Krein and D.P. Milman.
Theorem 11 (Krein & Milman, 1940). Let S R
n
be convex and compact.
Then S is the convex hull of its extreme points.
CHAPTER 4
Constrained Optimisation
1. Constrained Maximisation
1.1. Lagrange Multipliers. Consider the problem of a consumer who seeks
to distribute his income across the purchase of the two goods that he consumes,
subject to the constraint that he spends no more than his total income. Let us
denote the amount of the rst good that he buys x
1
and the amount of the second
good x
2
, the prices of the two goods p
1
and p
2
, and the consumers income y.
The utility that the consumer obtains from consuming x
1
units of good 1 and x
2
of good two is denoted u(x
1
, x
2
). Thus the consumers problem is to maximise
u(x
1
, x
2
) subject to the constraint that p
1
x
1
+ p
2
x
2
y. (We shall soon write
p
1
x
1
+ p
2
x
2
= y, i.e., we shall assume that the consumer must spend all of his
income.) Before discussing the solution of this problem lets write it in a more
mathematical way.
(5)
max
x1,x2
u(x
1
, x
2
)
subject to p
1
x
1
+p
2
x
2
= y
We read this Choose x
1
and x
2
to maximise u(x
1
, x
2
) subject to the constraint
that p
1
x
1
+p
2
x
2
= y.
Let us assume, as usual, that the indierence curves (i.e., the sets of points
(x
1
, x
2
) for which u(x
1
, x
2
) is a constant) are convex to the origin. Let us also
assume that the indierence curves are nice and smooth. Then the point (x

1
, x

2
)
that solves the maximisation problem (5) is the point at which the indierence
curve is tangent to the budget line as given in Figure 1.
One thing we can say about the solution is that at the point (x

1
, x

2
) it must be
true that the marginal utility with respect to good 1 divided by the price of good 1
must equal the marginal utility with respect to good 2 divided by the price of good
2. For if this were not true then the consumer could, by decreasing the consumption
of the good for which this ratio was lower and increasing the consumption of the
other good, increase his utility. Marginal utilities are, of course, just the partial
derivatives of the utility function. Thus we have
(6)
u
x1
(x

1
, x

2
)
p
1
=
u
x2
(x

1
, x

2
)
p
2
.
The argument we have just made seems very economic. It is easy to give an
alternate argument that does not explicitly refer to the economic intuition. Let x
u
2
be the function that denes the indierence curve through the point (x

1
, x

2
), i.e.,
u(x
1
, x
u
2
(x
1
)) u u(x

1
, x

2
).
Now, totally dierentiating this identity gives
u
x
1
(x
1
, x
u
2
(x
1
)) +
u
x
2
(x
1
, x
u
2
(x
1
))
dx
u
2
dx
1
(x
1
) = 0.
45
46 4. CONSTRAINED OPTIMISATION
-
6
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@
q q q q q q q q q q q q q q q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
x
2
x

2
x
1 x

1
u(x
1
, x
2
) = u
p
1
x
1
+p
2
x
2
= y
Figure 1
That is,
dx
u
2
dx
1
(x
1
) =
u
x1
(x
1
, x
u
2
(x
1
))
u
x2
(x
1
, x
u
2
(x
1
))
.
Now x
u
2
(x

1
) = x

2
. Thus the slope of the indierence curve at the point (x

1
, x

2
)
dx
u
2
dx
1
(x

1
) =
u
x1
(x

1
, x

2
)
u
x2
(x

1
, x

2
)
.
Also, the slope of the budget line is
p1
p2
. Combining these two results again gives
result (6).
Since we also have another equation that (x

1
, x

2
) must satisfy, viz
(7) p
1
x

1
+p
2
x

2
= y
we have two equations in two unknowns and we can (if we know what the utility
function is and what p
1
, p
2
, and y are) go happily away and solve the problem.
(This isnt quite true but we shall not go into that at this point.) What we shall
develop is a systemic and useful way to obtain the conditions (6) and (7). Let us
rst denote the common value of the ratios in (6) by . That is,
u
x1
(x

1
, x

2
)
p
1
= =
u
x2
(x

1
, x

2
)
p
2
and we can rewrite this and (7) as
(8)
u
x
1
(x

1
, x

2
) p
1
= 0
u
x
2
(x

1
, x

2
) p
2
= 0
y p
1
x

1
p
2
x

2
= 0.
1. CONSTRAINED MAXIMISATION 47
Now we have three equations in x

1
, x

2
, and the new articial or auxiliary variable
. Again we can, perhaps, solve these equations for x

1
, x

2
, and . Consider the
following function
(9) L(x
1
, x
2
, ) = u(x
1
, x
2
) +(y p
1
x
1
p
2
x
2
)
This function is known as the Lagrangian. Now, if we calculate
L
x1
,
L
x2
, and,
L

,
and set the results equal to zero we obtain exactly the equations given in (8). We
now describe this technique in a somewhat more general way.
Suppose that we have the following maximisation problem
(10)
max
x1,...,xn
f(x
1
, . . . , x
n
)
subject to g(x
1
, . . . , x
n
) = c
and we let
(11) L(x
1
, . . . , x
n
, ) = f(x
1
, . . . , x
n
) +(c g(x
1
, . . . , x
n
))
then if (x

1
, . . . , x

n
) solves (10) there is a value of , say

such that
L
x
i
(x

1
, . . . , x

n
,

) = 0 i = 1, . . . , n (12)
L

(x

1
, . . . , x

n
,

) = 0. (13)
Notice that the conditions (12) are precisely the rst order conditions for
choosing x
1
, . . . , x
n
to maximise L, once

has been chosen. This provides an


intuition into this method of solving the constrained maximisation problem. In
the constrained problem we have told the decision maker that he must satisfy
g(x
1
, . . . , x
n
) = c and that he should choose among all points that satisfy this con-
straint the point at which f(x
1
, . . . , x
n
) is greatest. We arrive at the same answer
if we tell the decision maker to choose any point he wishes but that for each unit by
which he violates the constraint g(x
1
, . . . , x
n
) = c we shall take away units from
his payo. Of course we must be careful to choose to be the correct value. If we
choose too small the decision maker may choose to violate his constrainte.g.,
if we made the penalty for spending more than the consumers income very small
the consumer would choose to consume more goods than he could aord and to
pay the penalty in utility terms. On the other hand if we choose too large the
decision maker may violate his constraint in the other direction, e.g., the consumer
would choose not to spend any of his income and just receive units of utility for
each unit of his income.
It is possible to give a more general statement of this technique, allowing for
multiple constraints. (Of course, we should always have fewer constraints than we
have variables.) Suppose we have more than one constraint. Consider the problem
max
x1,...,xn
f(x
1
, . . . , x
n
)
subject to g
1
(x
1
, . . . , x
n
) = c
1
.
.
.
.
.
.
g
m
(x
1
, . . . , x
n
) = c
m
.
Again we construct the Lagrangian
(14)
L(x
1
, . . . , x
n
,
1
, . . . ,
m
) = f(x
1
, . . . , x
n
)
+
1
(c
1
g
1
(x
1
, . . . , x
n
)) + +
m
(c
m
g
m
(x
1
, . . . , x
n
))
48 4. CONSTRAINED OPTIMISATION
and again if (x

1
, . . . , x

n
) solves (14) there are values of , say

1
, . . . ,

m
such that
(15)
L
x
i
(x

1
, . . . , x

n
,

1
, . . . ,

m
) = 0 i = 1, . . . , n
L

j
(x

1
, . . . , x

n
,

1
, . . . ,

m
) = 0 j = 1, . . . , m.
1.2. Caveats and Extensions. Notice that we have been referring to the set
of conditions which a solution to the maximisation problem must satisfy. (We call
such conditions necessary conditions.) So far we have not even claimed that there
necessarily is a solution to the maximisation problem. There are many examples of
maximisation problems which have no solution. One example of an unconstrained
problem with no solution is
(16) max
x
2x
maximise over the choice of x the function 2x. Clearly the greater we make x the
greater is 2x, and so, since there is no upper bound on x there is no maximum.
Thus we might want to restrict maximisation problems to those in which we choose
x from some bounded set. Again, this is not enough. Consider the problem
(17) max
0x1
1/x .
The smaller we make x the greater is 1/x and yet at zero 1/x is not even dened.
We could dene the function to take on some value at zero, say 7. But then the
function would not be continuous. Or we could leave zero out of the feasible set
for x, say 0 < x 1. Then the set of feasible x is not closed. Since there would
obviously still be no solution to the maximisation problem in these cases we shall
want to restrict maximisation problems to those in which we choose x to maximise
some continuous function from some closed (and because of the previous example)
bounded set. (We call a set of numbers, or more generally a set of vectors, that
is both closed and bounded a compact set.) Is there anything else that could go
wrong? No. The following result says that if the function to be maximised is
continuous and the set over which we are choosing is both closed and bounded, i.e.,
is compact, then there is a solution to the maximisation problem.
Theorem 12 (The Weierstrass Theorem). Let S be a compact set. Let f be a
continuous function that takes each point in S to a real number. (We usually write:
let f : S R be continuous.) Then there is some x

in S at which the function is


maximised. More precisely, there is some x

in S such that f(x

) f(x) for any


x in S.
Notice that in dening such compact sets we typically use inequalities, such
as x 0. However in Section 1 we did not consider such constraints, but rather
considered only equality constraints. However, even in the example of utility max-
imisation at the beginning of Section 1.1, there were implicitly constraints on x
1
and x
2
of the form
x
1
0, x
2
0.
A truly satisfactory treatment would make such constraints explicit. It is possible
to explicitly treat the maximisation problem with inequality constraints, at the
price of a little additional complexity. We shall return to this question later in the
book.
Also, notice that had we wished to solve a minimisation problem we could
have transformed the problem into a maximisation problem by simply multiplying
the objective function by 1. That is, if we wish to minimise f(x) we could do
so by maximising f(x). As an exercise write out the conditions analogous to
2. APPLICATIONS TO MACROECONOMIC THEORY 49
the conditions (8) for the case that we wanted to minimise u(x). Notice that if
x

1
, x

2
, and satisfy the original equations then x

1
, x

2
, and satisfy the new
equations. Thus we cannot tell whether there is a maximum at (x

1
, x

2
) or a
minimum. This corresponds to the fact that in the case of a function of a single
variable over an unconstrained domain at a maximum we require the rst derivative
to be zero, but that to know for sure that we have a maximum we must look at the
second derivative. We shall not develop the analogous conditions for the constrained
problem with many variables here. However, again, we shall return to it later in
the book.
2. Applications to Macroeconomic Theory
In this section we examine one part of one of the central models used in macroe-
conomics, the overlapping generations model. I wrote these notes some time back
for a dierent course so they are currently in a somewhat dierent style than the
rest of the notes. Over time they will become more consistent. This section cov-
ers the material in Alan Auerbach and Laurence Kotliko, Macroeconomics: An
Integrated Approach, South-Western College Publishing, 1995, Chapters 2 and 3.
2.1. Overview of The Basic Overlapping Generations Model of the
Economy. The model we consider in this section is one of the basic ones used in
modern macroeconomics. The idea of using such a framework was introduced by
Maurice Allais
1
(1947) and Paul Samuelson
2
(1958). It was developed in more or less
the form we shall give by Peter Diamond (1965). This model forms an important
part of Kydland and Prescotts 1988 paper called Time to Build and Aggregated
Fluctuations, one of the seminal works in the real business cycle theories.
3
The basic idea of the model is that in each period a new generation of economic
agents is born. Each individual lives for two periods, inelastically
4
supplying one
unit of labour in his rst period of life and consuming in both periods of his life.
Each agent from generation t supplies one unit of labour to the production
process at the beginning of his rst period of life in order to produce period ts
output of the single good of this economy. For this he receives whatever the wage
rate is in period t. Part of this he consumes and part he saves. The part he saves
becomes the capital used in the production process in period t + 1. He provides
the part of his wage that he didnt consume in period t at the beginning of period
t + 1 in return for a greater amount of the good for him to consume at the end of
period t + 1. The extra bit that they get we call the interest rate for period t + 1.
2.2. The Consumption-Saving Decision.
2.2.1. The intertemporal budget constraint. Recall from your microeconomics
the idea of a budget constraint. This is the set of consumption bundles that are
aordable for the consumer. This set typically depends on the consumers wealth
(sometimes we say income, but wealth is more accurate) and the prices of the
various consumption goods. A similar situation arises here.
Though it isnt really very complicated well develop the intertemporal budget
constraint a step at a time. Let us consider the generation born in period t. We
have said that when the individuals in this generation are young the supply one
unit of labour for which they receive the wage of that period, w
t
. This amount they
can either consume in period t or take over into period t + 1 as assets. Well call
1
Allais received the Nobel Prize in Economics in 1988
2
Samuelson received the Nobel Prize in Economics in 1970
3
Kydland and Prescott received the Nobel Prize in Economics in 2004
4
This means that his supply of labour is xed; it does not depend on the price system, and
in particular does not depend on the price of labour, that is, the wage rate.
50 4. CONSTRAINED OPTIMISATION
the consumption in period t of such an individual c
1t
(1 for the rst period of this
individuals life and t for the time period) and the amount of assets this individual
takes over into period t + 1 well call s
t
. Both c
1t
and s
t
are measured in units of
the one good produced in period t. Well assume that the wages are also measured
in units of this good so we have
(18) s
t
= w
t
c
1t
.
In period t + 1 this individual will be old and will not supply any labour. He
will supply his assets, s
t
to the production process and will receive, in return, at the
end of the period s
t
(1+r
t+1
) which he will consume. Thus there is no depreciation
of capital in this model. For every unit of the (capital) good individual supplies
the production process, he is paid r
t+1
, and also receives back the full unit of the
good after the production process is completed. Sadly, this individual will then
die. Well call the amount he consumes in period t + 1 as an old person c
2t+1
(2
for the second period of this individuals life and t + 1 for the period in which he
consumes). Again, we can write this as a budget constraint, namely
(19) c
2t+1
= s
t
(1 +r
t+1
).
Of course, what the consumer actually cares about is the amount he consumes is
each period. He doesnt directly care about the amount of assets he takes over from
period t to period 1 + 1. If we combine equations (18) and (19) by substituting
equation (18)s formula for s
t
into equation (19) we obtain a budget constraint
that relates the two quantities that the individual does directly care about, his
consumption in each period, namely
(20) c
1t
+
c
2t+1
1 +r
t+1
= w
t
.
The term 1/(1 +r
t+1
) is the price of period t +1 consumption in terms of period t
consumption, it is the number of units of period t consumption that the individual
would have to give up in order to obtain an additional unit of period t + 1 con-
sumption, just as a price of a good in dollars represents the number of dollars one
would give up in order to obtain an additional unit of the good.
2.2.2. Intertemporal Preferences. So we have one half of the information we
need to determine how the individuals of this generation will behave. We have said
what choices they have available to them. We also need to say what they like or
what they want, that is, we have to describe their preferences. As usual we do this
by specifying a utility function. In particular we assume that the individuals of
generation t have the utility function
(21) U
t
(c
1t
, c
2t+1
) = u(c
1t
) +
_
1
1 +
_
u(c
2t+1
),
with 0 < . The parameter is called the discount factor and indicates how the
individual trades o consumption for present consumption. If = 0 the individual
values future consumption as much as present consumption. As increases the
individual values future less. Notice that we have labelled the utility function with
the subscript t to indicate that it is generation t preferences we are describing.
However we have assumed that each generation has the same preferencesthere
are no time subscripts on the single period utility functions u on the right had
side of the equation. The individuals of generation t may have a dierent level of
utility than another generation, but only because their levels of consumption may
dier. We assume that the functional relationship describing how the level of the
individuals utility depends on c
1t
and c
2t+1
is the same for each generation, that
is we assume that each generation has the same intertemporal preferences.
2. APPLICATIONS TO MACROECONOMIC THEORY 51
In fact, we shall make a much stronger assumption about the individuals pref-
erences. In this course we shall always assume that the individuals utility functions
come from the family of CES (constant elasticity of substitution) utility functions
in which u(c) = c
1
/(1 ). That is, we assume that
(22) U
t
(c
1t
, c
2t+1
) =
c
1
1t
1
+
_
1
1 +
_
c
1
2t+1
1
.
A special case of these preferences occurs when = 1. In fact as approaches
1 this utility function diverges to innity. However, adding a constant to the utility
that does not depend of c
1t
or c
2t+1
does not change the preferences. So consider
the utility function
(23) U
t
(c
1t
, c
2t+1
) =
c
1
1t
1
1
+
_
1
1 +
_
c
1
2t+1
1
1
.
This represents the same preferences as the utility function given in equation (22).
If we consider the behaviour of this function as 1 we can show that
(24) lim
1
c
1
1t
1
1
+
_
1
1 +
_
c
1
2t+1
1
1
= log c
1t
+
_
1
1 +
_
log c
2t+1
.
It is also worth noting that this represents the same preferences as the utility
function
(25) U
t
(c
1t
, c
2t+1
) = c
1t
1+
2+
c
2t+1
1
1+
.
If we take the log of this utility function we obtain a constant times the utility
function given in equation (24). This function is of the same form as the production
function we considered earlier. We shall refer to an individual whose utility function
is of the form of either equation (25) or (24) as having Cobb Douglas preferences.
For most of the remainder of the rst part of the course we shall assume that
the individuals have Cobb Douglas preferences. This is a very strong assumption
and for many purposes we would not want to make it. However, as we shall see, it
does make our model much easier to solve, and so, we shall, at least as a rst cut,
be happy to make this assumption.
2.2.3. The solution to the intertemporal decision problem. We can state the
decision problem of an individual in the generation born in period t, as the following
constrained maximisation problem:
(26)
max
c1t,c2t+1
log c
1t
+
_
1
1 +
_
log c
2t+1
subject to c
1t
+
c
2t+1
1 +r
t+1
= w
t
.
In the next subsection we shall examine in detail how to solve this maximisation
problem. Here we simply note a general useful fact about Cobb-Douglas preferences,
namely that in a standard utility maximisation subject to a linear budget constraint
problem if the preferences are Cobb-Douglas, then, whatever the relative prices,
the consumer will spend a fraction of his income on a particular good equal to the
relative size of the exponent on that good in the Cobb-Douglas utility function.
More precisely, the fraction is the exponent divided by the sum of the exponents.
In our case this means that the individual will spend a fraction (1+)/(2+) of w
t
on his consumption in his rst period of life, that is, since w
t
and c
1t
are measured
in the same units (or equivalently the price of c
1t
is 1)
(27) c
1t
=
_
1 +
2 +
_
w
t
.
52 4. CONSTRAINED OPTIMISATION
Similarly he will spend the fraction 1/(2 +) of w
t
on c
2t+1
. Now, if we look at the
budget constraint we see that the price of c
2t+1
is not 1, but 1/(1 +r
t+1
). Thus
c
2t+1
1 +r
t+1
=
_
1
2 +
_
w
t
or
(28) c
2t+1
=
_
1 +r
t+1
2 +
_
w
t
.
We can also think what this means in terms of the two step constraints we originally
used to describe the individuals maximisation problem. In the rst period the
individual supplies 1 unit of labour and receives w
t
. Of this he consumes in that
period c
1t
= (1+)/(2+)w
t
, and saves the rest, that amount becoming the assets
he takes into period t + 1,
(29) s
t
=
_
1
2 +
_
w
t
.
He provides this amount to the production process in period t + 1 and receives in
return (1 +r
t+1
)s
t
= (1 +r
t+1
)w
t
/(2 +), which he consumes at the end of period
t + 1.
2.2.4. A detailed look at the maximisation problem. Here we shall look in detail
at how to solve the maximisation problem. You may think of this as an application
of the methods described in Section 1 of Chapter 1.
Recall that the problem is
max
c1t,c2t+1
log c
1t
+
_
1
1 +
_
log c
2t+1
subject to c
1t
+
c
2t+1
1 +r
t+1
= w
t
.
The Lagrangian function is
(30) L(c
1t
, c
2t+1
, ) = log c
1t
+
_
1
1 +
_
log c
2t+1
+
_
w
t
c
1t

c
2t+1
1 +r
t+1
_
.
and the rst order conditions are
L
c
1t
(c
1t
, c
2t+1
, ) =
1
c
1t
= 0 (31)
L
c
2t+1
(c
1t
, c
2t+1
, ) =
_
1
1 +
_
1
c
2t+1


1 +r
t+1
= 0 (32)
L

(c
1t
, c
2t+1
, ) = w
t
c
1t

c
2t+1
1 +r
t+1
= 0. (33)
If we solve equation (31) for c
1t
we obtain
(34) c
1t
=
1

.
And if we solve equation (32) for c
2t+1
we obtain
(35) c
2t+1
=
1 +r
t+1
(1 +)
.
And substituting (34) and (35) into equation (33) (and simplifying a bit) gives
(36)
1

=
_
1 +
2 +
_
w
t
3. NONLINEAR PROGRAMMING 53
and substituting this back into equations (34) and (35) gives
c
1t
=
_
1 +
2 +
_
w
t
(37)
c
2t+1
=
_
1 +r
t+1
2 +
_
w
t
. (38)
Exercise 93. Another way of solving this problem is to solve the budget
constraint it give c
2t+1
as a function of c
1t
(and w
t
and r
t+1
) and to substitute this
into the utility function giving a function that depends on c
1t
, but not on c
2t+1
. We
can then nd the solution by nding the unconstrained maximum of this function.
Solve the problem in this way and conrm that the solution is the same as the one
found above.
Exercise 94. Solve the maximisation problem using the utility function
u
t
(c
1t
, c
2t+1
) = c
1t
(
1+
2+
)
c
2t+1
(
1
2+
)
rst by using the Lagrangian method, and then by substituting the formula for
c
2t+1
into the utility function. Conrm that the solutions for c
1t
and c
2t+1
are the
same as those found above.
Exercise 95. Solve the maximisation problem using the general CES utility
function we discussed above
U
t
(c
1t
, c
2t+1
) =
c
1
1t
1
+
_
1
1 +
_
c
1
2t+1
1
.
rst by using the Lagrangian method, and then by substituting the formula for c
2t+1
into the utility function. Comment on the dierence between the solutions for c
1t
and c
2t+1
here and those we derived above (and you found in your answer to the
previous exercise), noting, in particular, on which of the parameters of the problem
the solutions depend. This does not change in a fundamental way the solution
to this problem, but does make the solution of the complete model substantially
harder when we do not assume Cobb-Douglas preferences.
3. Nonlinear Programming
Up until now we have been concerned with solving optimisation problems where
the variable could be any real number and where the constraints were of the form
that some function should take on a particular value. However for many optimisa-
tion problems, and particularly in economics, the more natural formulation of the
problem has some inequality restriction on the variables, such as the requirement
that the amount consumed of any good should be non-negative. And it is often
the case that the constraints on the problem are more naturally thought of as in-
equality constraints. For example, rather than thinking of the budget constraint as
requiring that a consumer consume a bundle exactly equal to what he can aord
the requirement perhaps should be that he consume no more than he can aord.
The theory of such optimisation problems is called nonlinear programming. We
give here an introduction to the most basic elements of this theory.
We start by considering one of the very simplest optimisation problems, namely
the maximisation of a continuously dierentiable function of a single real variable.
(39)
max
x
f(x)
We read this as Choose x to maximise f(x).
We perhaps should have examined this problem when we introduced the tradi-
tional theory of constrained optimisation in Section 1. (And perhaps in some future
54 4. CONSTRAINED OPTIMISATION
version of these notes we shall.) In any case, you should know that the rst order
necessary condition for x

to be a solution to this maximisation problem is that


df
dx
(x

) = 0.
Suppose now we add the constraint that x 0. How does this changes the
problem and its solution? We can write the problem as follows.
(40)
max
x
f(x)
subject to x 0.
We read this as Choose x to maximise f(x) subject to the constraint that x 0.
How does it change the conditions for x

to be a solution? Well, if there is


a solution to the maximisation problem, there are two possibilities. Either the
solution could occur when x

> 0 or it could occur when x

= 0. In the rst
case then x

will also be (at least a local) maximum of the unconstrained problem


and a necessary condition for this is that
df
dx
(x

) = 0. In the second case it is not


necessary that
df
dx
(x

) = 0 since we are in any case not permitted to decrease x

.
(Its already 0 which is as low as it can go.) However, it should not be the case that
we increase the value of f(x) when we increase x from x

. A necessary condition
for this is is that
df
dx
(x

) 0.
So, our rst order (that is, conditions involving the rst derivatives of the
functions involved) necessary (that is, conditions that must be satised for any
solution to the problem) conditions for a solution to this maximisation problem
are that
df
dx
(x

) 0, that x

0 and that either x

= 0 or that
df
dx
(x

) = 0. We
can express this last by the equivalent requirement that x
df
dx
(x

) = 0, that is that
the product of the two is equal to zero. We state this as a (very small and trivial)
theorem.
Theorem 13. Suppose that f : R R is continuously dierentiable. Then, if
x

maximises f(x) over all x 0, x

satises
(41)
df
dx
(x

) 0
x

df
dx
(x

) = 0
x

0.
Exercise 96. Give a version of Theorem 13 for minimisation problems.
Our discussion prior to Theorem 13 almost constitutes a proof of the theorem.
We shall not be as complete as we state progressively more general results.
More complete notes will be made available electronically during the semester.
For now I simply present the most general problem we shall consider and state some
results. Consider the problem
(42)
max
x1,...,xn
f(x
1
, . . . , x
n
)
subject to x
i
0 i = 1, . . . , n
g
j
(x
1
, . . . , x
n
) 0 j = 1, . . . , m
Here we have n variables and m inequality constraints. We may rewrite this
using vector notation with x = (x
1
, . . . , x
n
) and g(x) = (g
1
(x), . . . , g
m
(x)).
3. NONLINEAR PROGRAMMING 55
(43)
max
x
f(x)
subject to x 0
g(x) 0
We can write the Lagrangian function for the problem 42 as
(44) L(x
1
, . . . , x
n
,
1
, . . . ,
m
) = f(x
1
, . . . , x
n
)
m

j=1

j
g
j
(x
1
, . . . , x
n
)
Or, if we are happy to use vector and matrix notation we may write the La-
grangian more simply as
(45) L(x, ) = f(x) g(x).
The Kuhn-Tucker conditions for x

to solve the maximisation problem are,


using vector and matrix notation,
(46)
L
x
(x

, ) 0
x

L
x
(x

, ) = 0
x

0
L

(x

, ) 0

(x

, ) = 0
0.
Exercise 97. Write out the Kuhn-Thuker conditions given in 46 in the long
form without using vector and matrix notation.
We are now in a position to state a result about the rst order necessary
conditions for x

to solve the maximisation problem.


Theorem 14. Suppose that f : R
n
R and g : R
n
R
m
are continuously
dierentiable. Then, if x

maximises f(x) over all x satisfying the constraints x 0


and g(x) 0, and if x

satises the constraint qualication that the gradient vectors


gj
x
associated with all constraints satised with equality are linearly independent,
then there is a unique vector such that (x

, ) satises the Kuhn-Tucker conditions


given in 46.
Exercise 98. Give a version of the Kuhn-Tucker conditions such as in 46 and
state a theorem analogous to Theorem 14 for a constrained minimisation problem.
3.1. Exercises.
Exercise 99. Exercises will be made available during the semester. Here well
just put in some blanks so that I can add some exercises without messing up the
numbering in the rest of the document.
Exercise 100.
Exercise 101.
Exercise 102.
56 4. CONSTRAINED OPTIMISATION
4. The Implicit Function Theorem
In the previous sections we said things like: Now we have three equations
in x

1
, x

2
, and the new articial or auxiliary variable . Again we can, perhaps,
solve these equations for x

1
, x

2
, and . In this section we examine the question
of when we can solve a system of n equations to give n of the variable in terms
of the others. Let us suppose that we have n endogenous variables x
1
, . . . , x
n
,
m exogenous variables or parameters, b
1
, . . . , b
m
, and n equations or equilibrium
conditions
(47)
f
1
(x
1
, . . . , x
n
, b
1
, . . . , b
m
) = 0
f
2
(x
1
, . . . , x
n
, b
1
, . . . , b
m
) = 0
.
.
.
f
n
(x
1
, . . . , x
n
, b
1
, . . . , b
m
) = 0,
or, using vector notation,
f(x, b) = 0,
where f : R
n+m
R
n
, x R
n
, that is it is an n vector, b R
m
, and 0 R
n
.
When can we solve this system to obtain functions giving each x
i
as a function
of b
1
, . . . , b
m
? As well see below we only give an incomplete answer to this question,
but rst lets look at the case that the function f is a linear function.
Suppose that our equations are
a
11
x
1
+ . . . a
1n
x
n
+c
11
b
1
+c
1m
b
m
= 0
a
21
x
1
+ . . . a
2n
x
n
+c
21
b
1
+c
2m
b
m
= 0
.
.
.
a
n1
x
1
+. . . a
nn
x
n
+c
n1
b
1
+c
nm
b
m
= 0.
We can write this, in matrix notation, as
[A | C]
_
x
b
_
= 0,
where A is an n n matrix, C is an n m matrix, x is an n 1 (column) vector,
and b is an m1 vector.
This we can rewrite as
Ax +Cb = 0,
and solve this to give
x = A
1
Cb.
And we can do this as long as the matrix A can be inverted, that is, as long as the
matrix A is of full rank.
Our answer to the general question in which the function f may not be linear
is that if there are some values ( x,

b) for which f( x,

b) = 0 then if, when we take


a linear approximation to f we can solve the approximate linear system as we did
above, then we can solve the true nonlinear system, at least in a neighbourhood of
( x,

b). By this last phrase we mean that if b is not close to



b we may not be able to
solve the system, and that for a particular value of b there may be many values of
x that solve the system, but there is only one close to x.
To see why we cant, in general, do better than this consider the equation
f : R
2
R given by f(x, b) = g(x) b, where the function g is graphed in Figure 2.
Notice that the values ( x,

b) satisfy the equation f(x, b) = 0. For all values of b


close to

b we can nd a unique value of x close to x such that f(x, b) = 0. However,
(1) for each value of b there are other values of x far away from x that also satisfy
4. THE IMPLICIT FUNCTION THEOREM 57
f(x, b) = 0, and (2) there are values of b, such as

b for which there are no values of
x that satisfy f(x, b) = 0.
-
6
g(x)

b
q
x x
q q q q q q q q q q q q q
q
q
q
q
q
q
q
q
q
Figure 2
Let us consider again the system of equations 47. We say that the function f
is C
1
on some open set A R
n+m
if f has partial derivatives everywhere in A and
these partial derivatives are continuous on A.
Theorem 15. Suppose that f : R
n+m
R
n
is a C
1
function on an open set
A R
n+m
and that ( x,

b) in A is such that f( x,

b) = 0. Suppose also that


f(x, b)
x
=
_

_
f1(x,b)
x1

f1(x,b)
xn
.
.
.
.
.
.
fn(x,b)
x1

fn(x,b)
xn
_

_
is of full rank. Then there are open sets A
1
R
n
and A
2
R
m
with x in A
1
and

b in A
2
and A
1
A
2
A such that for each b in A
2
there is exactly one g(b) in A
1
such that f(g(b), b) = 0. Moreover, g : A
2
A
1
is a C
1
function and
g(b)
b
=
_
f(g(b), b)
x
_
1
_
f(g(b), b)
b
_
.
Exercise 103. Consider the general utility maximisation problem
(48)
max
x1,x2,...,xn
u(x
1
, x
2
, . . . , x
n
)
subject to p
1
x
1
+p
2
x
2
+ +p
n
x
n
= w.
Suppose that for some price vector p the maximisation problem has a utility max-
imising bundle x. Find conditions on the utility function such that in a neighbour-
hood of ( x, p) we can solve for the demand functions x(p). Find the derivatives of
the demand functions, x/p.
58 4. CONSTRAINED OPTIMISATION
Exercise 104. Now suppose that there are only two goods and the utility
function is given by
u(x
1
, x
2
) = (x
1
)
1
3
(x
2
)
2
3
.
Solve this utility maximisation problem, as you learned to do in Section 1 of this
Chapter, and then dierentiate the demand functions that you nd to nd the
partial derivative with respect to p
1
, p
2
, and w of each demand function.
Also nd the same derivatives using the method of the previous exercise.
5. The Theorem of the Maximum
Often in economics we are not so much interested in what the solution to a
particular maximisation problem is but rather wish to know how the solution to a
parameterised problem depends on the parameters. Thus in our rst example of
utility maximisation we might be interested not so much in what the solution to the
maximisation problem is when p
1
= 2, p
2
= 7, and y = 25, but rather in how the
solution depends on p
1
, p
2
, and y. (That is, we might be interested in the demand
function.) Sometimes we shall also be interested in how the maximised function
depends on the parametersin the example how the maximised utility depends on
p
1
, p
2
, and y.
This raises a number of questions. In order for us to speak meaningfully of a
demand function it should be the case that the maximisation problem has a unique
solution. Further, we would like to know if the demand function is continuous
or even if it is dierentiable. Consider again the problem (14), but this time let us
explicitly add some parameters.
(49)
max
x1,...,xn
f(x
1
, . . . , x
n
, a
1
, . . . , a
k
)
subject to g
1
(x
1
, . . . , x
n
, a
1
, . . . , a
k
) = c
1
.
.
.
.
.
.
g
m
(x
1
, . . . , x
n
, a
1
, . . . , a
k
) = c
m
In order to be able to say whether or not the problem has a unique solution
it is useful to know something about the shape or curvature of the functions f
and g. We say a function is concave if for any two points in the domain of the
function the value of function at a weighted average of the two points is greater
than the weighted average of the value of the function at the two points. We say
the function is convex if the value of the function at the average is less than the
average of the values. The following denition makes this a little more explicit. (In
both denitions x = (x
1
, . . . , x
n
) is a vector.)
Definition 22. A function f is concave if for any x and x

with x = x

and
for any t such that 0 < t < 1 we have f(tx +(1 t)x

) tf(x) +(1 t)f(x

). The
function is strictly concave if f(tx + (1 t)x

) > tf(x) + (1 t)f(x

).
A function f is convex if for any x and x

with x = x

and for any t such that


0 < t < 1 we have f(tx +(1 t)x

) tf(x) + (1 t)f(x

). The function is strictly


convex if f(tx + (1 t)x

) < tf(x) + (1 t)f(x

).
The result we are about to give is most conveniently stated when our statement
of the problem is in terms of inequality constraints rather than equality constraints.
As mentioned earlier we shall examine this kind of problem later in this course.
However for the moment in order to proceed with our discussion of the problem
involving equality constraints we shall assume that all of the functions with which
we are dealing are increasing in the x variables. (See Exercise 1 for a formal
denition of what it means for a function to be increasing.) In this case if f is
strictly concave and g
j
is convex for each j then the problem has a unique solution.
5. THE THEOREM OF THE MAXIMUM 59
In fact the concepts of concavity and convexity are somewhat stronger than is
required. We shall see later in the course that they can be replaced by the concepts
of quasi-concavity and quasi-convexity. In some sense these latter concepts are the
right concepts for this result.
Theorem 16. Suppose that f and g
j
are increasing in (x
1
, . . . , x
n
). If f is
strictly concave in (x
1
, . . . , x
n
) and g
j
is convex in (x
1
, . . . , x
n
) for j = 1, . . . , m
then for each value of the parameters (a
1
, . . . , a
k
) if problem (49) has a solution
(x

1
, . . . , x

n
) that solution is unique.
Now let v(a
1
, . . . , a
k
) be the maximised value of f when the parameters are
(a
1
, . . . , a
k
). Let us suppose that the problem is such that the solution is unique and
that (x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
)) are the values that maximise the function
f when the parameters are (a
1
, . . . , a
k
) then
(50) v(a
1
, . . . , a
k
) = f(x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
), a
1
, . . . , a
k
).
(Notice however that the function v is uniquely dened even if there is not a unique
maximiser.)
The Theorem of the Maximum gives conditions on the problem under which
the function v and the functions x

1
, . . . , x

n
are continuous. The constraints in the
problem (49) dene a set of feasible vectors x over which the function f is to be
maximised. Let us call this set G(a
1
, . . . , a
k
), i.e.,
(51) G(a
1
, . . . , a
k
) = {(x
1
, . . . , x
n
) | g
j
(x
1
, . . . , x
n
, a
1
, . . . , a
k
) = c
j
j}
Now we can restate the problem as
(52)
max
x1,...,xn
f(x
1
, . . . , x
n
, a
1
, . . . , a
k
)
subject to (x
1
, . . . , x
n
) G(a
1
, . . . , a
k
).
Notice that both the function f and the feasible set G depend on the parameters
a, i.e., both may change as a changes. The Theorem of the Maximum requires
both that the function f be continuous as a function of x and a and that the
feasible set G(a
1
, . . . , a
k
) change continuously as a changes. We already know
or should knowwhat it means for f to be continuous but the notion of what it
means for a set to change continuously is less elementary. We call G a set valued
function or a correspondence. G associates with any vector (a
1
, . . . , a
k
) a subset of
the vectors (x
1
, . . . , x
n
). The following two denitions dene what we mean by a
correspondence being continuous. First we dene what it means for two sets to be
close.
Definition 23. Two sets of vectors A and B are within of each other if for
any vector x in one set there is a vector x

in the other set such that x

is within
of x.
We can now dene the continuity of the correspondence G in essentially the
same way that we dene the continuity of a single valued function.
Definition 24. The correspondence G is continuous at (a
1
, . . . , a
k
) if for any
> 0 there is > 0 such that if (a

1
, . . . , a

k
) is within of (a
1
, . . . , a
k
) then
G(a

1
, . . . , a

k
) is within of G(a
1
, . . . , a
k
).
It is, unfortunately, not the case that the continuity of the functions g
j
neces-
sarily implies the continuity of the feasible set. (Exercise 106 asks you to construct
a counterexample.)
Remark 1. It is possible to dene two weaker notions of continuity, which we
call upper hemicontinuity and lower hemicontinuity. A correspondence is in fact
60 4. CONSTRAINED OPTIMISATION
continuous in the way we have dened it if it is both upper hemicontinuous and
lower hemicontinuous.
We are now in a position to state the Theorem of the Maximum. We assume
that f is a continuous function, that G is a continuous correspondence, and that
for any (a
1
, . . . , a
k
) the set G(a
1
, . . . , a
k
) is compact. The Weierstrass Theorem
thus guarantees that there is a solution to the maximisation problem (52) for any
(a
1
, . . . , a
k
).
Theorem 17 (Theorem of the Maximum). Suppose that f(x
1
, . . . , x
n
, a
1
, . . . , a
k
)
is continuous (in (x
1
, . . . , x
n
, a
1
, . . . , a
k
)), that G(a
1
, . . . , a
k
) is a continuous corre-
spondence, and that for any (a
1
, . . . , a
k
) the set G(a
1
, . . . , a
k
) is compact. Then
(1) v(a
1
, . . . , a
k
) is continuous, and
(2) if (x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
)) are (single valued) functions then
they are also continuous.
Later in the course we shall see how the Implicit Function Theorem allows us
to identify conditions under which the functions v and x

are dierentiable.
Exercises.
Exercise 105. We say that the function f(x
1
, . . . , x
n
) is nondecreasing if x

i

x
i
for each i implies that f(x

1
, . . . , x

n
) f(x
1
, . . . , x
n
), is increasing if x

i
> x
i
for each i implies that f(x

1
, . . . , x

n
) > f(x
1
, . . . , x
n
) and is strictly increasing if
x

i
x
i
for each i and x

j
> x
j
for at least one j implies that f(x

1
, . . . , x

n
) >
f(x
1
, . . . , x
n
). Show that if f is nondecreasing and strictly concave then it must be
strictly increasing. [Hint: This is very easy.]
Exercise 106. Show by example that even if the functions g
j
are continuous
the correspondence G may not be continuous. [Hint: Use the case n = m = k = 1.]
6. The Envelope Theorem
In this section we examine a theorem that is particularly useful in the study
of consumer and producer theory. There is in fact nothing mysterious about this
theorem. You will see that the proof of this theorem is simply calculation and a
number of substitutions. Moreover the theorem has a very clear intuition. It is this:
Suppose we are at a maximum (in an unconstrained problem) and we change the
data of the problem by a very small amount. Now both the solution of the problem
and the value at the maximum will change. However at a maximum the function
is at (the rst derivative is zero). Thus when we want to know by how much the
maximised value has changed it does not matter (very much) whether or not we
take account of how the maximiser changes or not. See Figure 2. The intuition for
a constrained problem is similar and only a little more complicated.
To motivate our discussion of the Envelope Theorem we will rst consider a
particular case, viz, the relation between short and long run average cost curves.
Recall that, in general we assume that the average cost of producing some good is
a function of the amount of the good to be produced. The short run average cost
function is dened to be the function which for any quantity, Q, gives the average
cost of producing that quantity, taking as given the scale of operation, i.e., the size
and number of plants and other xed capital which we assume cannot be changed
in the short run (whatever that is). The long run average cost function on the
other hand gives, as a function of Q, the average cost of producing Q units of the
good, with the scale of operation selected to be the optimal scale for that level of
production.
6. THE ENVELOPE THEOREM 61
-
6
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q q
f(x, a)
f(x

(a), a)
f(x

(a

), a

)
f(x

(a), a

)
f(, a

)
f(, a)
x x

(a) x

(a

)
Figure 2
That is, if we let the scale of operation be measured by a single variable k,
say, and we let the short run average cost of producing Q units when the scale is
k be given by SRAC(Q, k) and the long run average cost of producing Q units by
LRAC(Q) then we have
LRAC(Q) = min
k
SRAC(Q, k).
Let us denote, for a given value Q, the optimal level of k by k(Q). That is, k(Q) is
the value of k that minimises the right hand side of the above equation.
Graphically, for any xed level of k the short run average cost function can be
represented by a curve (normally assumed to be U-shaped) drawn in two dimensions
with quantity on the horizontal axis and cost on the vertical axis. Now think about
drawing one short run average cost curve for each of the (innite) possible values of
k. One way of thinking about the long run average cost curve is as the bottom or
envelope of these short run average cost curves. Suppose that we consider a point
on this long run or envelope curve. What can be said about the slope of the long
run average cost curve at this point. A little thought should convince you that it
should be the same as the slope of the short run curve through the same point.
(If it were not then that short run curve would come below the long run curve, a
contradiction.) That is,
d LRAC(Q)
dQ
=
SRAC(Q, k(Q))
Q
.
See Figure 3.
The envelope theorem is a general statement of the result of which this is a
special case. We will consider not only cases in which Q and k are vectors, but also
cases in which the maximisation or minimisation problem includes some constraints.
62 4. CONSTRAINED OPTIMISATION
-
6
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q q q q q q q q
Cost
LRAC(

Q) =
SRAC(

Q, k(

Q))
LRAC
SRAC
Q

Q
Figure 3
Let us consider again the maximisation problem (49). Recall:
max
x1,...,xn
f(x
1
, . . . , x
n
, a
1
, . . . , a
k
)
subject to g
1
(x
1
, . . . , x
n
, a
1
, . . . , a
k
) = c
1
.
.
.
.
.
.
g
m
(x
1
, . . . , x
n
, a
1
, . . . , a
k
) = c
m
Again let L(x
1
, . . . , x
n
,
1
, . . . ,
m
; a
1
, . . . , a
k
) be the Lagrangian function.
(53)
L(x
1
, . . . , x
n
,
1
, . . . ,
m
; a
1
, . . . , a
k
) = f(x
1
, . . . , x
n
, a
1
, . . . , a
k
)
+
m

j=1

j
(c
j
g
j
(x
1
, . . . , x
n
, a
1
, . . . , a
k
)).
Let (x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
)) and (
1
(a
1
, . . . , a
k
), . . . ,
m
(a
1
, . . . , a
k
)) be
the values of x and that solve this problem. Now let
(54) v(a
1
, . . . , a
k
) = f(x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
), a
1
, . . . , a
k
)
That is, v(a
1
, . . . , a
k
) is the maximised value of the function f when the parameters
are (a
1
, . . . , a
k
). The envelope theorem says that the derivative of v is equal to the
derivative of L at the maximising values of x and . Or, more precisely
6. THE ENVELOPE THEOREM 63
Theorem 18 (The Envelope Theorem). If all functions are dened as above
and the problem is such that the functions x

and are well dened then


v
a
h
(a
1
, . . . , a
k
) =
L
a
h
(x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
),

1
(a
1
, . . . , a
k
), . . . ,
m
(a
1
, . . . , a
k
), a
1
, . . . , a
k
)
=
f
a
h
(x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
), a
1
, . . . , a
k
)

j=1

j
(a
1
, . . . , a
k
)
g
h
a
h
(x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
), a
1
, . . . , a
k
)
for all h.
In order to show the advantages of using matrix and vector notation we shall
restate the theorem in that notation before returning to give a proof of the theorem.
(In proving the theorem we shall return to using mainly scalar notation.)
Theorem 18 (The Envelope Theorem). Under the same conditions as above
v
a
(a) =
L
a
(x

(a), (a), a)
=
f
a
(x

(a), a) (a)
g
a
(x

(a), a).
Proof. From the denition of the function v we have
(55) v(a
1
, . . . , a
k
) = f(x

1
(a
1
, . . . , a
k
), . . . , x

n
(a
1
, . . . , a
k
), a
1
, . . . , a
k
)
Thus
(56)
v
a
h
(a) =
f
a
h
(x

(a), a) +
n

i=1
f
x
i
(x

(a), a)
x

i
a
h
(a).
Now, from the rst order conditions (12) we have
f
x
i
(x

(a), a)
m

j=1

j
(a)
g
j
x
i
(x

(a), a) = 0.
Or
(57)
f
x
i
(x

(a), a) =
m

j=1

j
(a)
g
j
x
i
(x

(a), a).
Also, since x

(a) satises the constraints we have, for each j


g
j
(x

1
(a), . . . , x

n
(a), a
1
, . . . , a
k
) c
j
.
And, since this holds as an identity, we may dierentiate both sides with respect
to a
h
giving
n

i=1
g
j
x
i
(x

(a), a)
x

i
a
h
(a) +
g
j
a
h
(x

(a), a) = 0.
Or
(58)
n

i=1
g
j
x
i
(x

(a), a)
x

i
a
h
(a) =
g
j
a
h
(x

(a), a).
Substituting (57) into (56) gives
v
a
h
(a) =
f
a
h
(x

(a), a) +
n

i=1
[
m

j=1

j
(a)
g
j
x
i
(x

(a), a)]
x

i
a
h
(a).
64 4. CONSTRAINED OPTIMISATION
Changing the order of summation gives
(59)
v
a
h
(a) =
f
a
h
(x

(a), a) +
m

j=1

j
(a)[
n

i=1
g
j
x
i
(x

(a), a)
x

i
a
h
(a)].
And now substituting (58) into (59) gives
v
a
h
(a) =
f
a
h
(x

(a), a)
m

j=1

j
(a)
g
j
a
h
(x

(a), a),
which is the required result.
Exercises.
Exercise 107. Rewrite this proof using matrix notation. Go through your
proof and identify the dimension of each of the vectors or matrices you use. For
example f
x
is a 1 n vector, g
x
is an mn matrix.
7. Applications to Microeconomic Theory
7.1. Utility Maximisation. Let us again consider the problem given in (5)
max
x1,x2
u(x
1
, x
2
)
subject to p
1
x
1
+p
2
x
2
y = 0.
Let v(p
1
, p
2
, y) be the maximised value of u when prices and income are p
1
, p
2
, and
y. Let us consider the eect of a change in y with p
1
and p
2
remaining constant.
By the Envelope Theorem
v
y
=

y
{u(x
1
, x
2
) +(y p
1
x
1
+ p
2
x
2
)} = 0 +1 = .
This is the familiar result that is the marginal utility of income.
7.2. Expenditure Minimisation. Let us consider the problem of minimising
expenditure subject to attaining a given level of utility, i.e.,
min
x1,...,xn
n

i=1
p
i
x
i
subject to u(x
1
, . . . , x
n
) u
0
= 0.
Let the minimised value of the expenditure function be denoted by
e(p
1
, . . . , p
n
, u
0
). Then by the Envelope Theorem we obtain
e
p
i
=

p
i
{
n

i=1
p
i
x
i
+(u
0
u(x
1
, . . . , x
n
))} = x
i
0 = x
i
when evaluated at the point which solves the minimisation problem which we write
as h
i
(p
1
, . . . , p
n
, u
0
) to distinguish this (compensated) value of the demand for good
i as a function of prices and utility from the (uncompensated) value of the demand
for good i as a function of prices and income. This result is known as Hotellings
Theorem.
7. APPLICATIONS TO MICROECONOMIC THEORY 65
7.3. The Hicks-Slutsky Equations. It can be shown that the compensated
demand at utility u
0
, i.e., h
i
(p
1
, . . . , p
n
, u
0
) is equal to the uncompensated de-
mand at income e(p
1
, . . . , p
n
, u
0
), i.e., x
i
(p
1
, . . . , p
n
, e(p
1
, . . . , p
n
, u
0
)). (This result
is known as the duality theorem.) Thus totally dierentiating the identity
x
i
(p
1
, . . . , p
n
, e(p
1
, . . . , p
n
, u
0
)) h
i
(p
1
, . . . , p
n
, u
0
)
with respect to p
k
we obtain
x
i
p
k
+
x
i
y
e
p
k
=
h
i
p
k
which by Hotellings Theorem gives
x
i
p
k
+
x
i
y
h
k
=
h
i
p
k
.
So
x
i
p
k
=
h
i
p
k
h
k
x
i
y
for all i, k = 1, . . . , n. These are the Hicks-Slutsky equations.
7.4. The Indirect Utility Function. Again let v(p
1
, . . . , p
n
, y) be the indi-
rect utility function, that is, the maximised value of utility as described in Appli-
cation (1). Then by the Envelope Theorem
v
p
i
=
u
p
i
x
i
(p
1
, . . . , p
n
, y) = x
i
(p
1
, . . . , p
n
, y)
since
u
pi
= 0. Now, since we have already shown that =
v
y
(in Section 4.1) we
have
x
i
(p
1
, . . . , p
n
, y) =
v/p
i
v/y
.
This is known as Roys Theorem.
7.5. Prot functions. Now consider the problem of a rm that maximises
prots subject to technology constraints. Let x = (x
1
, . . . , x
n
) be a vector of
netputs, i.e., x
i
is positive if the rm is a net supplier of good i, negative if the rm
is a net user of that good. Let assume that we can write the technology constraints
as F(x) = 0. Thus the rms problem is
max
x1,...,xn
n

i=1
p
i
x
i
subject to F(x
1
, . . . , x
n
) = 0.
Let
i
(p) be the value of x
i
that solves this problem, i.e., the net supply of
commodity i when prices are p. (Here p is a vector.) We call the maximised value
the prot function which is given by
(p) =
n

i=1
p
i

i
(p).
And so by the Envelope Theorem

p
i
=
i
(p).
This result is known as Hotellings lemma.
66 4. CONSTRAINED OPTIMISATION
7.6. Cobb-Douglas Example. We consider a particular Cobb-Douglas ex-
ample of the utility maximisation problem
(60)
max
x1,x2

x
1

x
2
subject to p
1
x
1
+p
2
x
2
= w
The Lagrangean is
(61) L(x
1
, x
2
, ) =

x
1

x
2
+(y p
1
x
1
p
2
x
2
)
and the rst order conditions are
L
x
1
=
1
2
x

1
2
1
x
1
2
2
p
1
= 0 (62)
L
x
2
=
1
2
x
1
2
1
x

1
2
2
p
2
= 0 (63)
L

= w p
1
x
1
p
2
x
2
= 0. (64)
If we divide equation (62) by equation (63) we obtain
x
1
1
x
2
= p
1
/p
2
or
p
1
x
1
= p
2
x
2
and if we substitute this into equation (13) we obtain
w p
1
x
1
p
1
x
1
= 0
or
(65) x
1
=
w
2p
1
.
Similarly,
(66) x
2
=
w
2p
2
.
Substituting equations (65) and (66) into the utility function gives
(67) v(p
1
, p
2
, w) =

w
2
4p
1
p
2
=
w
2

p
1
p
2
.
As a check here we can check some known properties of the indirect utility
function. For example it is homogeneous of degree zero, that is, is we multiply p
1
,
p
2
, and w by the same positive constant, say we do not change the value of v.
You should conrm that this is the case.
We now calculate the optimal value of from the rst order conditions by
substituting equations (65) and (66) into (62), giving
1
2
_
w
2p
1
_

1
2
_
w
2p
2
_1
2
p
1
= 0
or
1
2
_
2p
1
w
w2p
2
= p
1

or
1
2

p
1

p
2

1
p
1
=
or
=
1
2

p
1
p
2
.
7. APPLICATIONS TO MICROECONOMIC THEORY 67
Our rst application of the Envelope Theorem told us that this value of could
be found as the derivative of the indirect utility function with respect to w. We
conrm this by dierentiating the function we found above with respect to w.
v
w
=

w
w
2

p
1
p
2
=
1
2

p
1
p
2
as we had found directly above.
Now let us, for the same utility function consider the expenditure minimisation
problem
min
x1,x2
p
1
x
2
+p
2
x
2
subject to

x
1

x
2
= u.
The Lagrangian is
(68) L(x
1
, x
2
, ) = p
1
x
1
+ p
2
x
2
+(u

x
1

x
2
)
and the rst order conditions are
L
x
1
= p
1

1
2
x

1
2
1
x
1
2
2
= 0 (69)
L
x
2
= p
2

1
2
x
1
2
1
x

1
2
2
= 0 (70)
L

= u

x
1

x
2
= 0. (71)
Dividing equation (69) by equation (70) gives
p
1
p
2
=
x
2
x
1
or
(72) x
2
=
p
1
x
1
p
2
.
And, if we substitute equation (72) into equation (69) we obtain
u x
1
_
p
1
p
2
or
x
1
= u
_
p
2
p
1
.
Similarly,
x
2
= u
_
p
1
p
2
,
and if we substitute these values back into the objective function we obtain the
expenditure function
e(p
1
, p
2
, u) = p
1
u
_
p
2
p
1
+p
2
u
_
p
1
p
2
= 2u

p
1
p
2
.
Hotellings Theorem tells us that is we dierentiate this expenditure function
with respect to p
i
we should obtain the Hicksian demand function h
i
.
e(p
1
, p
2
, u)
p
1
=

p
1
2u

p
1
p
2
= 2u
1
2
_
p
2
p
1
= u
_
p
2
p
1
as we had already found. And similarly for h
2
.
68 4. CONSTRAINED OPTIMISATION
Let us summarise what we have found so far. The Marshallian demand func-
tions are
x
1
(p
1
, p
2
, w) =
w
2p
1
x
2
(p
1
, p
2
, w) =
w
2p
2
The indirect utility function is
v(p
1
, p
2
, w) =
w
2

p
1
p
2
.
The Hicksian demand functions are
h
1
(p
1
, p
2
, w) = u
_
p
2
p
1
h
2
(p
1
, p
2
, w) = u
_
p
1
p
2
,
and the expenditure function is
e(p
1
, p
2
, u) = 2u

p
1
p
2
.
We now look at the third application concerning the Hicks -Slutsky decompo-
sition. First let us conrm that if we substitute the expenditure function for w in
the Marshallian demand function we do obtain the Hicksian demand function.
x
1
(p
1
, p
2
, e(p
1
, p
2
, u)) =
e(p
1
, p
2
, u)
2p
1
=
2u

p
1
p
2
2p
1
= u
_
p
2
p
1
,
as required.
Similarly, if we plug the indirect utility function v into the Hicksian demand
function h
i
we obtain the Marshallian demand function x
i
. Conrmation of this is
left as an exercise. [You should do this exercise. If you understand properly it is
very easy. If you understand a bit then doing the exercise will solidify your under-
standing. If you cant do it then it is a message to get some further explanation.]
Let us now check the Hicks-Slutsky decomposition for the eect of a change in
the price of good 2 on the demand for good 1. The Hicks-Slutsky decomposition
tells us that
x
1
p
2
=
h
1
p
2
h
2
x
1
w
.
Calculating these partial derivatives we have
x
1
p
2
= 0
x
1
w
=
1
2p
1
h
1
p
2
=
u

p
1

1
2

1

p
2
=
u
2

p
1
p
2
7. APPLICATIONS TO MICROECONOMIC THEORY 69
and
h
1
= u
_
p
2
p
1
.
Substituting into the right hand side of the Hicks-Slutsky equation above gives
RHS =
u
2

p
1
p
2
u
_
p
2
p
1

1
2p
1
= 0,
which is exactly what we had found for the left hand side of the Hicks-Slutsky
equation.
Finally we check Roys Theorem, which tells us that the Marshallian demand
for good 1 can be found as
x
1
(p
1
, p
2
, w) =

v
p1
v
w
.
In this case we obtain
x
1
(p
1
, p
2
, w) =

w
2

1

p2

1
2
p
1
3
2
1
2
_
1
p1p2
=
w
2p
1
,
as required.
Exercises.
Exercise 108. Consider the direct utility function
u(x) =
n

i=1

i
log(x
i

i
),
where
i
and
i
, i = 1, . . . , n are, respectively, positive and nonpositive parameters.
(1) Derive the indirect utility function and show that it is decreasing in its
arguments.
(2) Verify Roys Theorem.
(3) Derive the expenditure function and show that it is homogeneous of degree
one and nondecreasing in prices.
(4) Verify Hotellings Theorem.
Exercise 109. For the utility function dened in Exercise 108,
(1) Derive the Slutsky equation.
(2) Let d
i
(p, y) be the demand for good i derived from the above utility func-
tion. Goods i and j are said to be gross substitutes if d
i
(p, y)/p
j
> 0
and gross complements if d
i
(p, y)/p
j
< 0. For this utility function are
the various goods gross substitutes, gross complements, or can we not say?
(The two previous exercises are taken from R. Robert Russell and Maurice
Wilkinson, Microeconomics: A Synthesis of Modern and Neoclassical Theory, New
York, John Wiley & Sons, 1979.)
Exercise 110. An electric utility has two generating plants in which total
costs per hour are c
1
and c
2
respectively where
c
1
=80 + 2x
1
+ 0.001bx
2
1
b >0
c
2
=90 + 1.5x
2
+ 0.002x
2
2
70 4. CONSTRAINED OPTIMISATION
where x
i
is the quantity generated in the i-th plant. If the utility is required to pro-
duce 2000 megawatts in a particular hour, how should it allocate this load between
the plants so as to minimise costs? Use the Lagrangian method and interpret the
multiplier. How do total costs vary as b changes. (That is, what is the derivative
of the minimised cost with respect to b.)

Vous aimerez peut-être aussi