Econ 101A Notes

Lecture Notes for Econ 101A
David Card∗
Dept. of Economics
UC Berkeley
∗ The manuscript was typeset by Daniel Nolan in L AT X. The figures were created in Asymptote, Inkscape, R,
E
and Excel (the marjority in Inkscape). Please address comments/corrections to daniel nolan@msn.com, with “Card
Lecture Notes” in the subject line.
Contents
1 Optimization 7
1.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 SOC in Higher Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Consumer Choice 14
2.1 Budget Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Consumer’s Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Consumer’s Optimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Special Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Two Applications of Indifference Curve Analysis 23

3.1 Analysis of a Subsidy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 The Consumer Price Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Indirect Utility and the Expenditure Function 28

4.1 Indirect Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Expenditure Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Comparative Statics of Consumer Choice 31

5.1 Change in Demand with Respect to Income, Engel Curves . . . . . . . . . . . . . . . 31
5.2 Change in Demand with Respect to Price . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Graphical Decomposition of a Change in Demand . . . . . . . . . . . . . . . . . . . . 34
5.4 Substitution Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.5 Income Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 Slutsky’s Equation 38
6.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Slutsky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7 Using Market Level Demand Curves 42

7.1 An Increase in Income . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 Tax Incidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8 Labor Supply 48
9 Intertemporal Consumption 52
10 Production and Cost I 55

10.1 One-Factor Production and Cost Functions . . . . . . . . . . . . . . . . . . . . . . . 55
10.1.1 Production Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.1.2 Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
10.1.3 Connection between M C and M P . . . . . . . . . . . . . . . . . . . . . . . . 58
10.1.4 Geometry of c, AC, and M C . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1
11 Production and Cost II 62
11.1 Derivation of the Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
11.2 Marginal Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
12 Cost Functions and IRFs 68

12.1 Sheppard’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
13 Supply 70
13.1 Supply Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
13.2 The Law of Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
13.3 Changes in Input Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
14 Input Demand for a Competitive Firm 75
15 Industry Supply 80
16 Monopoly I 82
16.1 Monopolist’s Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
16.2 Comparative Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
16.3 Monopoly in Two or More Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
17 Monopoly II 87
18 Consumer’s Surplus 91
19 Duopoly 94
19.1 Monopolization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
19.2 Duopoly Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
19.3 Price Setting vs. Quantity Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
20 Symmetric Cournot Equilibria 99

20.1 n-Firm Symmetric Cournot Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . 99
20.2 Alternatives to the Cournot Assumption . . . . . . . . . . . . . . . . . . . . . . . . . 100
21 Game Theory I 102
22 Game Theory II 106

22.1 Tree Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
22.2 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
23 Uncertainty I: Income Lotteries 110

23.1 Review of Basic Statistical Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
23.2 Choices Over Uncertain Incomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
24 Uncertainty II: Expected Utility 114

24.1 Expected Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
24.2 The Demand for Insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
2
25 Uncertainty III: Moral Hazard 118
25.1 Solution with No Moral Hazard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
25.2 A Partial Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
26 Uncertainty IV: The State-preference Approach and Adverse Selection 122

26.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
26.2 Adverse Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
27 Auctions I: Types of Auctions 127

27.1 Basic Types of Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
27.2 Important Results Concerning the Private Values Case . . . . . . . . . . . . . . . . . 128
27.3 Bidding in a First-price Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
28 Auctions II: Winner’s Curse 131

28.1 Appendix: Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
28.1.1 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
28.1.2 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
29 Finance I: Capital Asset Pricing Model 135

29.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
29.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
29.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
30 Finance II: Efficient Market Hypothesis 139

30.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
30.2 Efficient Market Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
31 Public and Near-public Goods 143

31.1 Optimal Provision of Goods with No-rivalry Characteristics . . . . . . . . . . . . . . 143
31.1.1 Case 1: one consumer; x = t1 /p. . . . . . . . . . . . . . . . . . . . . . . . . . 143
31.1.2 Case 2: two consumers; x = (t1 + t2 )/p. . P
. . . . . . . . . . . . . . . . . . . . 143
n
31.1.3 Case 3: n consumers; x = τ /p, where τ = i=1 ti . . . . . . . . . . . . . . . . 145
31.2 Appendix: Social Optimum with Ordinary Goods . . . . . . . . . . . . . . . . . . . . 146
32 Externalities 148
32.1 Consumption Externalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
32.1.1 Market Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
32.1.2 Social Optimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
32.1.3 Market Equilibrium versus Social Optimum . . . . . . . . . . . . . . . . . . . 150
32.1.4 Other Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
32.2 Production Externalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
33 Empirical Methods in Microeconomics 154

33.1 Experiments and Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
33.1.1 The Self Sufficiency Project (SSP) . . . . . . . . . . . . . . . . . . . . . . . . 155
33.2 Research Designs Based on Natural Experiments . . . . . . . . . . . . . . . . . . . . 157
33.2.1 The Mariel Boatlift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
3
33.3 Natural Experiments with Several Control Groups . . . . . . . . . . . . . . . . . . . 157
33.3.1 The New Jersey Minimum Wage . . . . . . . . . . . . . . . . . . . . . . . . . 158
33.4 The Discontinuity Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4
Course Description
This is a course in intermediate microeconomics, emphasizing the applications of calculus and linear
algebra to the problems of consumer choice, firm behavior, and market interactions. Students are
presumed to be familiar with multivariate calculus (including e.g. limits, derivatives, integrals) and
with basic statistics (random variables, moments, etc.). The course material will be presented in a
fairly mathematical way and the problem sets and examinations will require you to apply models
and derive results. Students who are concerned about their mathematical ability should consider
Econ 100A.
The basic text is Microeconomic Theory: Basic Principles and Extensions, by Nicholson & Snyder,
which should be available at the campus book store. An alternative, slightly more theoretical
treatment of the same material is Varian’s Intermediate Microeconomics: A Modern Approach.
Another, slightly more application-oriented alternative is Perloffs Microeconomics: Theory and
Applications with Calculus. Any of the these is a good supplement to the lectures, but the lectures
will be at a somewhat higher level, and will not follow the texts closely.
Problem sets and practice exams will be made available on the course website.
The GSIs will present some additional material in section (for which all students will be responsible)
and also will review the solutions to problem sets, practice exams, and problems from the lectures,
etc.
Weekly problem sets will be assigned most weeks throughout the course. Completed problem sets
are due at the end of the last lecture each week. We will not accept late problem sets. Instead, we
drop your two worst scores. Thus, you can miss up to two problem sets without any penalty. You
are encouraged to work in groups but every student must hand in his or her own version of the
solutions.
Course grades will be determined by a combination of weekly problem sets (20 percent), two
midterm exams (15 percent each), and a final exam (50 percent). The midterm exams will be
held in class.
5
Lecture Topics
1 Methods of Optimization
2 Consumer Choice
3 Applications of Indifference Curve Analysis, Expenditure Function
4 Comparative Statics, Slutsky’s Equation
5 Market Level Demand and Supply
6 Labor Supply
7 Intertemporal Consumption & Savings
8–9 Production & Cost, Sheppard’s Lemma
10–11 Supply Determination
12 Monopoly and Price Discrimination
13 Consumer/Producer Surplus & Applications
14–15 Duopoly
16–17 Game Theory
18–21 Uncertainty and Insurance Markets
22–23 Auctions
24–25 Finance: CAPM and Efficient Markets
26–27 Public Goods, Externalities
28 Empirical Methods in Microeconomics
6
1 Optimization
1.1 Unconstrained Optimization
Consider a smooth function y = f (x). How do we go about finding a point x0 such that y0 =
f (x0 ) ≥ f (x) for any x in [a, b]?
Figure 1.1: In this picture f (x0 ) = maxa≤x≤b f (x). (Read: “f (x0 ) is the maximum value of f (x) when x
is selected from the interval [a, b].”)
What can we say generally? Obviously, if x0 is a potential candidate for a maximizer, then it must
be the case that we can’t move around x0 and reach a higher value of f . But this means f 0 (x0 ) = 0.
Why? Let 0 < h 1.
If f 0 (x) > 0, then f (x + h) ≈ f (x) + hf 0 (x) > f (x).
If f 0 (x) < 0, then f (x − h) ≈ f (x) − hf 0 (x) > f (x).
This leads us to Rule 1:
If f (x0 ) = maxa≤x≤b f (x), then f 0 (x0 ) = 0.
This is called the first order necessary condition (FONC) for an interior maximum.
Does f 0 (x0 ) = 0 always mean that x0 is a maximizer? Are there maximizers with f 0 (x0 ) 6= 0?
Consider the examples illustrated in Figure 1.3.
How can we be certain that we have located a maximum (not a minimum, nor an inflection point)?
We examine the properties of f 0 (x), which is itself a function of x. Take a look at Figure 1.4. As
the function f 0 crosses x0 from left to right, it goes from positive to negative, i.e. it’s decreasing.
On the other hand, as f 0 crosses x1 from left to right, it goes from negative to positive, i.e. it’s
increasing. In general, at a local maximum f 0 (x) has negative slope, or in other words f 00 (x) < 0,
while at a local minimum f 0 (x) has positive slope, that is f 00 (x) > 0.
These considerations lead us to Rule 2:
If f 0 (x0 ) = 0 and f 00 (x0 ) < 0, then f (x0 ) is a local maximum.
If f 0 (x0 ) = 0 and f 00 (x) > 0, then f (x0 ) is a local minimum.
7
Figure 1.2: Notice that Rule 1 also holds for a function of several variables.
(a) (b) (c)
Figure 1.3: Exceptions to the converse of Rule 1: (a) f (x) = x. Thus f (b) = maxa≤x≤b f (x) even though
f 0 (b) = 1 6= 0. The maximum occurs on the boundary. (b) f 0 (x) = 0 has two solutions, x0
and x00 but neither one is a maximizer. f (x0 ) is a local maximum while f (x00 ) is a minimum.
(c) f (x) = x3 . Solving f 0 (x) = 0 gives x = 0, which is an inflection point.
8
Figure 1.4: Properties of f 0 (x): at a local max f 0 is decreasing since the tangent lines go from positive to
negative. The reverse is true for a local min.
This generalizes to two or more dimensions.

How do we determine whether a local maximum is a global maximum? If f 00 (x) < 0 for all x and
f 0 (x0 ) = 0, then x0 is a global maximum. A function f such that f 00 (x) < 0 for all x is called
concave.1
Figure 1.5: A concave function always lies below any line tangent to its graph.
1.2 Constrained Optimization
Now we consider maximizing a function f (x1 , x2 ) subject to—“s.t.”—some constraint on x1 and x2

which we denote by g(x1 , x2 ) = g 0 . The two important examples of this in economics are:
1 See Appendix 1.3.
9
• In the study of consumer behavior, maximize utility u(x1 , x2 ) s.t. the budget constraint
p1 x1 + p2 x2 = I.
• In the study of firm behavior, maximize profit py − wx s.t. the production function y = f (x).
How do we go about a graphical analysis of the problem of maximizing f (x1 , x2 ) s.t g(x1 , x2 ) = g 0 ?
Figure 1.6: Illustration of two-step approach described on p. 10.
A two-step approach:
the function g. E.g. g(x1 , x2 ) = x21 + x22 ; g(x1 , x2 ) = k is the equation of
1. Plot the contours of √
a circle with radius k and center O = (0, 0).
2. Plot the contours of the function f . E.g. f (x1 , x2 ) = x1 x2 ; f (x1 , x2 ) = m is the equation of
a hyperbola.
The constrained maximum of the function f occurs where a contour of f is tangent to the contour
of g corresponding to g 0 . Why? Suppose we add a small amount dx1 to x1 in such a way as to
keep g(x1 , x2 ) constant. If so, then we must have a corresponding reduction in x2 such that the
total differential of g is zero, i.e.
dg = g1 (x1 , x2 )dx1 + g2 (x1 , x2 )dx2 = 0
(where gi denotes ∂g/∂xi ), which implies
dx2 g1 (x1 , x2 )
=−
dx1 g2 (x1 , x2 )
If we increase x1 by one unit, we must increase x2 by −g1 (x1 , x2 )/g2 (x1 , x2 )—or, equivalently,
decrease x2 by g1 (x1 , x2 )/g2 (x1 , x2 )—in order to keep the value of g constant. The net effect of
10
such a change in x1 on the value of f is
df = f1 (x1 , x2 )dx1 + f2 (x1 , x2 )dx2
dx2
= f1 (x1 , x2 )dx1 + f2 (x1 , x2 ) × dx1
dx1

g1 (x1 , x2 )
= f1 (x1 , x2 ) − f2 (x1 , x2 ) × dx1
g2 (x1 , x2 )
Now in order for (x01 , x02 ) to be a constrained maximum, it must be the case that we cannot increase
f by adding or subtracting a small amount to x1 while keeping the value of g constant. But this
means the above expression is 0 for all dx1 , or in other words
f1 (x1 , x2 ) g1 (x1 , x2 )
=
f2 (x1 , x2 ) g2 (x1 , x2 )
But this expression says that at (x01 , x02 ), the contours of f and g are tangent, i.e. have the same
slope. Note that this argument applies only if (x01 , x02 ) lies in the interior of the domain for if (x01 , x02 )
lies on the boundary then we cannot increase or decrease one of x1 or x2 .
How do we convert a constrained maximization problem into an unconstrained one? A French
mathematician named Lagrange noted that one gets the right answer by setting up an artificial,
unconstrained maximization problem with an additional variable, λ:
L(x1 , x2 , λ) = f (x1 , x2 ) − λ[g(x1 , x2 ) − g 0 ]
The FONC for L, with respect to x1 , x2 , and λ are:
L1 = f1 (x1 , x2 ) − λg1 (x1 , x2 ) = 0
L2 = f2 (x1 , x2 ) − λg2 (x1 , x2 ) = 0
Lλ = g(x1 , x2 ) − g 0 = 0
Dividing the first of these by the second gives
f1 (x1 , x2 ) g1 (x1 , x2 )
=
f2 (x1 , x2 ) g2 (x1 , x2 )
while the third simply restates the constraint! Thus by writing down the Lagrangian L and setting
its first derivatives equal to zero we get the necessary conditions for a constrained maximum.
We also get a new variable, λ, called the Lagrange multiplier. How do we interpret λ? It turns out
that the value of λ tells us how much the maximum value of f changes if we relax the constraint by a
small amount. Specifically, suppose we are to maximize f (x1 , x2 ) s.t. the constraint g(x1 , x2 ) = g 0 .
Call the solution (x01 , x02 ). Now suppose we relax the constraint and instead maximize f (x1 , x2 ) s.t.
g(x1 , x2 ) = g 0 + dg 0 . How do we change our optimal choices of x1 and x2 ? Suppose we decide to
use more x1 , enough to use up the added constraint. Since the total differential of g is
dg = g1 (x1 , x2 )dx1 + g2 (x1 , x2 )dx2
if we change only x1 , (that is, if dx2 = 0), the amount we can change x1 while satisfying the new
constraint is
1
dx1 = dg 0
g1 (x1 , x2 )
11
The increase in f that accompanies this increase in x1 is
f1 (x1 , x2 )
df = f1 (x1 , x2 )dx1 = =λ
g1 (x1 , x2 )
You are encouraged to check for yourself that if you were to use up the added constraint on x2 , df
would again be λ. This suggests another interpretation of the tangency condition: at a maximum,
if we had a bit more constraint, then we would be indifferent as to whether to use it on x1 or x2 .
As with unconstrained optimization, there are also second order conditions. These can be expressed
algebraically; however, they amount to the condition that the objective function has contours that
are “more convex” than the constraint.2
(a) (b)
Figure 1.7: (a) Contours of f are more convex than g(x1 , x2 ) = g0 : SOC satisfied. (b) Contours of f are
linear, less convex than g(x1 , x2 ) = g0 : SOC not satisfied.
1.3 Appendix
1.3.1 Convexity
A set S ⊆ R2 is convex if, for every pair of points u = (u1 , u2 ) and v = (v1 , v2 ) in S,
α ∈ [0, 1] =⇒ αu + (1 − α)v ∈ S
i.e. the line segment joining u and v lies entirely in S. A set that is not convex is called concave.
A function f : [a, b] → R is called convex if, for every x1 and x2 in [a, b],
α ∈ [0, 1] =⇒ f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 )

2 See Appendix 1.3.
12
Or, equivalently, f : [a, b] → R is convex if the set S = {(x, y) ∈ [a, b] × R : y ≥ f (x)} is convex. A
function g : [a, b] → R is called concave if −g is convex. Let f be twice differentiable. Then
f is convex ⇐⇒ f 00 (x) > 0 for all x

f is concave ⇐⇒ f 00 (x) < 0 for all x
Throughout these notes, if f 00 (x) >[<] g 00 (x) >[<] 0 on some interval, then we shall think of f as
being “more[less] convex[concave]” than g.
A function f : R2 → R is quasi-concave if Sk = {(x, y) ∈ R2 : f (x, y) ≥ k} is convex for all k. (The
sets Sk are called upper contour sets.)
1.3.2 SOC in Higher Dimensions
Let f : Rn → R, i.e. let z = f (x1 , . . . , xn ), and define the Hessian H(f ) to be the matrix
 ∂2f ∂2f 2 
∂x21 ∂x1 ∂x2 · · · ∂x∂1 ∂xf
n
 ∂2f ∂2f 2
· · · ∂x∂2 ∂xf

 
 ∂x2 ∂x1 ∂x22 n 
H(f ) =  . . .. . 
 .. .. . .. 
 
2 2 2
∂ f ∂ f ∂ f
∂xn ∂x1 ∂xn ∂x2 · · · ∂x 2
n
Next, define Hi (f ) to be the ith principal minor of H(f ), the submatrix comprised of the first i
rows and the first i columns of H(f ). For example
∂2f ∂2f
!
∂x21 ∂x1 ∂x2
H2 (f ) = ∂2f ∂2f
∂x2 ∂x1 ∂x22
If, at z 0 = f (x01 , . . . , x0n ), |Hi (f )| > 0 for all i, then z 0 satisfies the SOC for a local minimum. On
the other hand, if sgn(|Hi (f )|) = (−1)i for all i, then z 0 satisfies the SOC for a local maximum.
13
2 Consumer Choice
In this section we apply the methods of optimization of Section 1 to the analysis of consumer choice
subject to a budget constraint. The problem has three elements:
1. Describe the budget constraint.
2. Describe the consumer’s objective, i.e. his or her utility.
3. Set up and solve the constrained optimization.
2.1 Budget Constraint
We assume that a consumer must choose among bundles (x1 , . . . , xn ) of commodities 1 through n
that fall within his or her budget. In the case of just two goods x1 and x2 let their prices be p1
and p2 , respectively. Let the consumer have income I. Then the bundle (x1 , x2 ) is affordable iff
p1 x1 + p2 x2 ≤ I.
Figure 2.1: Graphically, the set of affordable bundles (the budget set) is the triangular region bounded
by the coordinate axes and the line x2 = (−p1 /p2 )x1 + I/p2 .
Note the following:

• if all income is spent on x1 , the total amount available is I/p1 (and likewise for x2 )
• we are implicitly assuming that you cannot buy negative amounts of x1 or x2
• the slope of the “budget line” (the outer boundary of the budget set) is −p1 /p2
2.2 Consumer’s Objective
We seek a simple way of summarizing how the consumer evaluates alternative bundles, say (x01 , x02 )
and (x∗1 , x∗2 ).
14
Figure 2.2: If we give up one unit of x1 , we save p1 , which can be used to purchase p1 /p2 units of x2 .
The market trades x1 for x2 at the rate p1 /p2 . This ratio represents the relative price of x1
and x2 .
Graphically, the device we use is the indifference curve: a curve connecting bundles that are equally
good. Consider the indifference curve through (x01 , x02 ), i.e. the set of bundles that are “as good”
as (x01 , x02 ).
Now take a look at Figure 2.4. If both x1 and x2 are desirable, then bundles with more x1 and
more x2 must be preferred to (x01 , x02 ). By the same token, (x1 , x2 ) must be preferred to bundles
with less x1 and less x2 . This means that indifference curves must have negative slope.
In more advanced treatments of economic theory, indifference curves are derived from a set of
assumptions about how consumers evaluate alternative bundles. Some types of preferences cannot
be represented by indifference curves. The classic example is “lexicographic preferences”: the
consumer evaluates a bundle (x1 , x2 ) first by the amount of x1 , then by the amount of x2 . If
x01 > x01 , then (x01 , x02 ) is strictly preferred to (x01 , x02 ) regardless of x02 and x02 . However, if x01 = x01 ,
then the consumer compares x02 and x02 . (This is the same way alphabetical order works.) As an
exercise, try to graph the “indifference curves” of a consumer with lexicographic preferences.
Analytically, we represent preferences by a utility function u(x1 , x2 ) with domain equal to the set
of possible consumption bundles. We construct u such that higher values are preferred.
Examples:
• u(x1 , x2 ) = x1 x2
• u(x1 , x2 ) = x1 + x2
• u(x1 , x2 ) = min {x1 , x2 }
Facts:
• The contours of u are the indifference curves.
• The bundles (x01 , x02 ) and (x01 , x02 ) lie on the same indifference curve iff u(x01 , x02 ) = u(x01 , x02 ).
15
Figure 2.3: How does a consumer decide between (x01 , x02 ) and (x∗1 , x∗2 )?
Figure 2.4: If both x1 and x2 are desirable, then it follows that indifference curves are downward-sloping.
16
• Let h > 0. If more of x1 is always preferred, then u(x1 + h, x2 ) > u(x1 , x2 ), which implies
u1 (x1 , x2 ) > 0 for every bundle (x1 , x2 ). (Likewise for x2 .) You are encouraged to verify this
for each of the above examples.
• The slope of the indifference curve through (x1 , x2 ), at (x1 , x2 ), is −u1 (x1 , x2 )/u2 (x1 , x2 ).
We call the absolute value of this ratio the marginal rate of substitution (MRS) because it
is the amount of x2 the consumer would need to compensate for the loss of one unit of x1 ,
or in other words the amount of x2 needed, per unit of x1 given up, in order to keep utility
constant.
Figure 2.5: The slope of the indifference curve through (x01 , x02 ) is M RS = u1 (x01 , x02 )/u2 (x01 , x02 ).
Examples:
β
• u(x1 , x2 ) = xα1 x2 (Cobb-Douglas)
α−1 β
u1 (x1 , x2 ) = αx1 x2
β−1
u2 (x1 , x2 ) = βxα1 x2
u1 (x1 , x2 ) α x2
M RS = = ×
u2 (x1 , x2 ) β x1
• u(x1 , x2 ) = x1 + x2
u1
M RS = = 1, a constant for every bundle (x1 , x2 )
u2
• u(x1 , x2 ) = 2 log x1 + x2
u1 2/x1 2
M RS = = = , independent of x2
u2 1 x1
As an exercise, graph the indifference curves for these three examples.

Note: If your utility function is u(x1 , x2 ) and mine is v(x1 , x2 ) = au(x1 , x2 ) + b, where a > 0, then
we have the same preferences. Why? It can be shown that we have the same indifference curves,
17
only with different labels. The result holds for v = f (u), where f is a monatonically increasing
function.
You may be familiar with the concept of diminishing marginal rate of substitution (DMRS). Unless
stated otherwise, we shall assume DMRS in most of the examples throughout these notes.
(a) (b) (c)
Figure 2.6: (a) DMRS (b) constant MRS (c) increasing MRS
Along an indifference curve, (holding utility constant), the MRS decreases with x1 . As one obtains
more x1 , the less one values an additional unit of x1 in terms of x2 . DMRS implies that consumers
always prefer averages. Suppose we have two bundles (x01 , x02 ) and (x01 , x02 ), on the same indifference
curve. Then a bundle that is a weighted average of (x01 , x02 ) and (x01 , x02 ), e.g. α(x01 , x02 ) + (1 −
α)(x01 , x02 ), where 0 < α < 1, is strictly preferred to either of the original bundles.
Figure 2.7: The dashed line represents the set of all weighted averages of x0 and x∗ , that is, the set
S = {αx0 + (1 − α)x∗ : 0 < α < 1}. Clearly these are strictly preferred to both x0 and x∗ .
Equivalently, the set S = {x ∈ R2 : u(x) > u(x0 )} is convex. (One can see this by noting the
shape of the region above the indifference curve.)
It is important to understand that DMRS is not the same as diminishing marginal utility, nor are
the two even related. Given a utility function u, the marginal utility of x1 is u1 . We say that u
exhibits diminishing marginal utility if u11 = (u1 )1 < 0. However, the sign of u11 says nothing
about the MRS, as the following examples show:
• u(x1 , x2 ) = (x21 + x22 )1/4
u1 (x1 , x2 ) = (1/2)(x21 + x22 )−3/4
18
u11 (x1 , x2 ) = −(3/4)(x21 + x22 )−7/4 < 0 =⇒ decreasing marginal utility but the indifference
curves are circles, which exhibit increasing MRS.
• u(x1 , x2 ) = x31 x32
u1 (x1 , x2 ) = 3x21 x32
u11 (x1 , x2 ) = 6x1 x32 > 0 =⇒ increasing marginal utility but the indifference curves are
hyperbolas, which exhibit DMRS.
2.3 Consumer’s Optimum
Analytically, the consumer’s problem is to solve
max u(x1 , x2 ) s.t. p1 x1 + p2 x2 = I

x1 ,x2
Have a look at Figure 2.8. Clearly, a bundle (x01 , x02 ) is optimal if two things are true:
Figure 2.8: The consumer chooses the bundle that lands her on the highest indifference curve while still
lying on the budget line.
1. p1 x01 + p2 x02 = I,
2. M RS(x01 , x02 ) = p1 /p2 .
Condition (2), the tangency condition, expresses the simple fact that if (x01 , x02 ) is optimal, then
there are no gains to be made by trading in the market any further. If M RS > p1 /p2 , then the
consumer values x1 more than the market does, in terms of x2 , so it would benefit the consumer
to sell x2 and buy more x1 as you can see in Figure 2.9.
19
Figure 2.9: M RS > p1 /p2 . On the margin, the consumer values x1 more than the market does, in terms
of x2 , and there is room for a profitable trade! What happens if M RS < p1 /p2 ?
To proceed analytically, let’s use the Lagrangian method:
L(x1 , x2 , λ) = u(x1 , x2 ) − λ(p1 x1 + p2 x2 − I)

L1 = u1 (x1 , x2 ) − λp1 = 0 (2.1)
L2 = u2 (x1 , x2 ) − λp2 = 0 (2.2)
Lλ = −p1 x1 − p2 x2 + I = 0 (2.3)
Dividing (2.1) by (2.2) gives the tangency condition
u1 (x1 , x2 ) p1
=
u2 (x1 , x2 ) p2
Also,
u1 (x1 , x2 ) u2 (x1 , x2 )
λ= =
p1 p2
With an extra dollar to spend one could either
(a) buy 1/p1 units of x1 and increase utility by u1 (x1 , x2 )/p1 = λ, or
(b) buy 1/p2 units of x1 and increase utility by u2 (x1 , x2 )/p2 = λ.
For this reason, λ is sometimes called the marginal utility of income.
For example, if u(x1 , x2 ) = x1 x2 , then L = x1 x2 − λ(p1 x1 + p2 x2 − I), and the FONC are:
L1 = x2 − λp1 = 0
L2 = x1 − λp2 = 0
Lλ = −p1 x1 − p2 x2 + I = 0
20
Therefore, x1 = λp2 and x2 = λp1 . Plugging these results back into (2.3):
p1 (λp2 ) + p2 (λp1 ) = I
=⇒ 2p1 p2 λ = I
I
=⇒ λ =
2p1 p2

x1 = x1 (p1 , p2 , I) = I/2p1 ,
=⇒
x2 = x2 (p1 , p2 , I) = I/2p2
The functions x1 (p1 , p2 , I) and x2 (p1 , p2 , I) are called the demand functions. Notice that p1 x1 =
p2 x2 = I/2, so the consumer spends half his or her income on each good! As an exercise, re-do the
β
analysis for U (x1 , x2 ) = xα
1 x2 with different values of α and β.
2.4 Special Problems
• Preferences do not satisfy DMRS (Figure 2.10). Often, we restrict preferences by requiring the
indifference curves to be convex to the origin. (Functions with this property are called quasi-
concave. A function u : R2 → R is quasi-concave if the upper contour sets Sk = {(x1 , x2 ) ∈
R2 : u(x1 , x2 ) ≥ k} are convex for all k.)
• Even with quasi-concave preferences, i.e. with convex indifference curves, we still can run into
problems (Figure 2.11). Most consumers consume zero units of most goods, so the endpoint
problem is potentially one that economists must deal with. The problem is much worse the
more narrowly goods are defined, (e.g. Coke versus Pepsi), and becomes less serious the
more broadly they are defined (e.g. beverages in general). A considerable amount of applied
research regarding consumer demand involves the so-called discrete choice approach, focusing
on whether consumers buy some or none of a given commodity. Daniel McFadden won the
Nobel Prize for his research showing how to link the “buy, don’t buy” decision to underlying
utility functions.
21
(a) (b)
Figure 2.10: (a) Indifference curves exhibit CMRS, and there is no bundle with M RS = p1 /p2 . (b)
M RS = p1 /p2 but this point is not a maximum—what’s wrong?
(a) (b)
Figure 2.11: Endpoint optima: (a) M RS < p1 /p2 , (x1 , x2 ) = (0, I/p2 ) (b) M RS > p1 /p2 , (x1 , x2 ) =
(I/p1 , 0).
22
3 Two Applications of Indifference Curve Analysis
We have seen that the consumer’s optimum is represented by a tangency between an indifference
curve and the budget constraint. This condition expresses the simple economic idea that the
consumer, on the margin, cannot adjust her consumption bundle to spend the same amount of
money and simultaneously achieve higher utility. Recall that the tangency condition is only true
when the indifference curves exhibit DMRS, and we don’t have an endpoint optimum.
3.1 Analysis of a Subsidy
In many economies, certain commodities are subsidized by the government. A subsidy is a negative
tax that is usually introduced to aid low income consumers. Economists generally argue that
subsidies are inefficient. Why?
Let there by two commodities: food f and “other stuff” x. The price of other stuff is px , and
the price of food is pf . A typical consumer has income I and normal preferences, (quasi-concave
indifference curves with DMRS). The budget constraint is px x + pf f = I. See Figure 3.1.
Figure 3.1: Budget constraints with and without food subsidy. (x∗ , f ∗ ) denotes the optimal choice under
the subsidy arrangement.
Suppose now that a subsidy of $s per unit is introduced on food. The budget constraint becomes
px x + (pf − s)f = I. If the consumer chooses the bundle (x∗ , f ∗ ), then the cost of the subsidy to
the government (for this consumer alone) is $sf ∗ . Most economists would argue that you should
instead give the consumer $sf ∗ directly and leave the price of food alone. To see this, suppose the
lump sum is given to the consumer directly, but she is forced to pay the market, unsubsidized price
for food. In this case her budget constraint is
px x + pf f = I + sf ∗ (3.1)
Notice that the bundle (x∗ , f ∗ ) satisfies the budget constraint, since originally
px x + (pf − s)f = I
23
In other words, if I give the consumer $sf she still can afford (x∗ , f ∗ ). But she can do even better,
as shown in Figure 3.2.
Figure 3.2: The unsubsidized budget constraint corresponding to I + sf ∗ cuts the original indifference
curve and therefore enables the consumer to achieve higher utility.
The reason is that the budget line (3.1), with the lump sum, is flatter than the budget line with
the subsidy. They both pass through (x∗ , f ∗ ), so the budget line (3.1) cuts through an indifference
curve and therefore enables the consumer to choose a bundle with higher utility.
Figure 3.3 illustrates the same point.
3.2 The Consumer Price Index
The CPI is a measure of how much it costs today (in today’s dollars) to buy a fixed bundle of
commodities. We currently use 1982-84 as our reference period, which means the CPI is calculated
by finding the cost of the bundle relative to its cost in 1982-84, $100.
Suppose the CPI is 177.5, (which it was in July 2001). That means it now costs 1.775 times as
much to purchase the “standard bundle” as it did on average in 1982-84. If someone earns 1.78
times as much as he did in the early 80s, then he is at least as well off as he was then.
Does your nominal income necessarily have to rise in proportion with the CPI? Suppose that in
1983 you purchased (x0 , y 0 ) at prices (p0x , p0y ). Your income was I 0 , and
x0 p0x + y 0 p0y = I 0
Now suppose that in 2001 prices are (p0x (1 + π), p0y (1 + π)). In this case both prices increased at the
rate of π. How much would your income have to increase in order to offset the increase in prices?
See Figure 3.4.
24
Figure 3.3: Note that ∆ = sf ∗ /px , or the subsidy at initial optimum, in terms of x.
On the other hand, suppose px rises by 3π/2 and py rises by π/2, i.e.

0 3
px = px 1 + π ,
2

0 1
py = py 1 + π .
2
The increase in the cost of living is represented by the increase in the cost of the reference bundle
(x0 , y 0 ):
3 1 3 1
p0x 1 + π + p0y 1 + π − p0x x0 − p0y y 0 = πp0x x0 + πp0y y 0 .
2 2 2 2
If you initially spent half your income on each of x and y, then p0x x0 = p0y y 0 = I 0 /2, and the
increase in the cost of living is
3π I 0 π I0
· + · = πI 0 ,
2 2 2 2
a proportional increase of π. But, if your income increases by π, you are better off !
The reasoning is as follows: If your income increases by enough to allow you to buy (x0 , y 0 ) your
budget is represented by the dashed line. But with that budget, you will not consume (x0 , y 0 ); you
will consume a bundle with more y, less x, and higher utility. You respond to the change in relative
prices by altering your consumption. See Figure 3.5.
The CPI is really a weighted average of prices for a fixed set of purchases. See Table 1 for an
example of some of the major categories and their weights. Note the slow growth of apparel prices
(usually attributed to the rapid rise in cheap imports) and the very rapid growth in medical prices.
25
Figure 3.4: If all prices rise by the same factor, the consumer is in fact worse off.
Figure 3.5: If some prices rise more than others, the new budget line, (assuming income rises in proportion
to CPI), cuts the original indifference curve.
26
Table 1: Major Purchase Categories in CPI and Corresponding Weights
Category Weight Price Index (Dec. 2000)

All 100.0 174.1
Food & Beverage 16.3 169.5
Housing 39.6 171.6
Apparel 4.7 131.8
Transportation 17.5 155.2
Medical 5.8 264.1
Recreation 6.0 103.7∗
Education 2.7 115.4∗
Communication 2.7 92.3∗
Other Items 4.7 276.2
* Reference period is Dec. 1997, not 1982-84.
The difference between the rate of increase in the average price of the reference bundle and the
minimum increase in income necessary in order to maintain the original level of utility is called the
substitution bias in the CPI. Note that it depends on two things: how disproportionately prices
for different goods are rising, and how convex one’s indifference curves are. The more convex the
indifference curves, and the more dispersion in relative price increases, the bigger the substitution
bias. The Boskin Commission estimates that on average substitution bias was about 0.5% per year
in the U.S. over the past couple decades.
There are lots of other, bigger sources of bias in the CPI. One that is hard to measure is quality bias:
consumer goods change over time, which makes it hard to hold the reference bundle constant. Some
new inventions since the early 80s: CD/DVD players, airbags and anti-lock breaks, the internet,
laser printers, portable PCs, cell phones, The X-Files. Roughly speaking, quality changes are
handled in the CPI by attempting to subtract the part of any price change that is due to quality,
measured at the time the higher quality product is introduced. So, for example, when airbags first
became available manufacturers charged about $500 extra for them. Thus, when we compare the
price of a new car in 2001 that is equipped with airbags, to a similar model in 1990 without airbags,
we subtract $500 from the 2001 price before computing the price ratio.
27
4 Indirect Utility and the Expenditure Function
4.1 Indirect Utility
We characterized the solution to the problem
max u(x1 , x2 ) s.t. p1 x1 + p2 x2 = I

x1 ,x2
as an optimal pair (x01 , x02 ) that satisfies the first order conditions (tangency, budjet constraint).
Note that (x01 , x02 ) varies with (p1 , p2 , I). We call the optimal choices at a given level of prices and
income the “demand functions” and write:
x1 = x01 (p1 , p2 , I)
x2 = x02 (p1 , p2 , I)
Note that p1 x01 (p1 , p2 , I)+p2 x02 (p1 , p2 , I) = I, so the demand functions satisfy the budget constraint
by definition, even as prices vary. This gives rise to restrictions on the demand functions.
The highest level of utility that can be achieved under (p1 , p2 , I) is u(x01 (p1 , p2 , I), x02 (p1 , p2 , I)),
which is the utility of the optimal choices under the budget parameters. We define the indirect
utility function to be
v(p1 , p2 , I) = max u(x1 , x2 ) s.t. p1 x1 + p2 x2 = I

x1 ,x2
= u(x01 (p1 , p2 , I), x02 (p1 , p2 , I))
It should be clear to the reader that v is decreasing in p1 and p2 , and increasing in I.

β
Example: u(x1 , x2 ) = xα 0
1 x2 , where α + β = 1. We saw in Section 2.3 that x1 (p1 , p2 , I) = αI/p1
0 0 0
and x2 (p1 , p2 , I) = βI/p2 . Note that x1 does not depend on p2 , and x2 does not depend on p1 . The
indirect utility function is given by
−β
v(p1 , p2 , I) = αα β β p−α
1 p2 I
4.2 Expenditure Function
Instead of maximizing utility subject to a budget constraint, one could minimize spending, subject
to a utility constraint:
min p1 x1 + p2 x2 s.t. u(x1 , x2 ) = u0
x1 ,x2
The Lagrangian is
L(x1 , x2 , µ) = p1 x1 + p2 x2 − µ[u(x1 , x2 ) − u0 ]
The FONC are:
p1 − µu1 (x1 , x2 ) = 0
p2 − µu2 (x1 , x2 ) = 0
u(x1 , x2 ) = u0
28
Note that the first two conditions are equivalent to the tangency condition p1 /p2 = u1 /u2 . Take a
look at Figure 4.1. The parallel lines represent “iso-cost lines”: combinations such that p1 x1 + p2 x2
is constant. These can be thought of as the contours of the objective function. Their slope is
−p1 /p2 . (Why?)
Figure 4.1: How does the consumer reach u0 with as little income as possible?
The utility maximization (u-max) and expenditure minimization (e-min) problems are called “dual”
problems, since they reverse the objective and the constraint.
What are the solutions to the e-min problem? The choices (x1 , x2 ) that minimize spending subject
to a utility constraint are like demand functions, with the exception that they take utility, rather
than income, as given. We call these compensated demand functions, and denote them as follows:
x1 = xc1 (p1 , p2 , u0 )
x2 = xc2 (p1 , p2 , u0 )
Sometimes these are called Hicksian demand functions, after John Hicks, the English economist
who discovered them (and won the second Nobel prize in economics).
Under (p1 , p2 , I), and having chosen xc1 , xc2 , one spends a total of
p1 xc1 (p1 , p2 , I) + p2 xc2 (p1 , p2 , I)
We define the expenditure function, (analagous to the indirect utility function for it gives the
amount spent assuming one has solved the e-min problem), to be
e(p1 , p2 , u0 ) = min p1 x1 + p2 x2 s.t. u(x1 , x2 ) = u0

x1 ,x2
= p1 xc1 (p1 , p2 , u0 ) + p2 xc2 (p1 , p2 , u0 )
Note that e(p1 , p2 , u0 ) tells you the minimum amount of money necessary to achieve utility u0 under
prices (p1 , p2 ).
29
β α β
Example: u(xα
1 , x2 ) = x1 x2 , where α + β = 1. The Lagrangian is
β
L = p1 x1 + p2 x2 − µ(xα 0
1 x2 − u )
FONC:
L1 = p1 − µαxα−1 xβ2 = 0

1 p1 α x2 β p1
α β−1
=⇒ = × =⇒ x2 = × x1
L2 = p1 − µβx1 x2 = 0 p2 β x1 α p2
Substituting this into the budget constraint,
β
β p1
xα
1 × x1 = u0
α p2
which implies
β
p2 α
x1 = u0 ×
p1 β
α
p 1 β
x2 = u0 ×
p2 α
30
5 Comparative Statics of Consumer Choice
In this section we characterize the changes in consumer demands that occur as income and prices
vary. Our goal is to describe the consumer’s demand functions. Analytically, the demand functions
for the goods x and y are a pair of functions
x = x(px , py , I)
y = y(px , py , I)
that describe the consumer’s optimal choices of x and y, given prices and income. As you can
imagine, the nature of these functions is important in a wide variety of applications.
5.1 Change in Demand with Respect to Income, Engel Curves
As income changes, the budget constrint shifts in a parallel fashion: inward if I decreases, outward
if I increases.
In commodity space, (xy-space, or in our case the plane), the tangencies of the budget constraints
with higher and higher indifference curves trace out the income expansion path shown in Figure 5.1.
For a good x, if the quantity of x demanded increases with income, then x is said to be a normal
good. For some goods, the quantity demanded falls with income—such goods are called inferior.
Analytically, ∂x/∂I > 0 =⇒ x normal, while ∂x/∂I < 0 =⇒ x inferior.
Figure 5.1: Fix prices. Then x(px , py , I) = x(I), and y(px , py , I) = y(I). The income expansion path is
{(x(I), y(I)) : I ≥ 0}.
A couple interesting implications of the budget constraint for changes in x and y with respect to
income:
31
(a) (b) (c)
Figure 5.2: (a) x, y normal (b) x normal, y borderline inferior (c) x inferior, y normal
• Using the fact that income is always exhausted,
I = px x + py y
=⇒ dI = px dx + py dy
dx dy
=⇒ 1 = px + py
dI dI
so clearly both goods cannot be inferior for in that case the RHS would be negative.
• Starting from the previous equation,
xpx I dx ypy I dy
× + × =1
I x dI I y dI
which is equivalent to
sx ex + sy ey = 1
where sx and sy are the expenditure shares, (the fraction of income spent on each good),
and ex and ey are the income elasticies, (the percent change in demand ∆x/x divided by the
percent change in income ∆I/I, or, in the limit as ∆I → 0, (dx/x)/(dI/I)). This equation
can be summarized as follows: the expenditure-weighted sum of income elasticies is unity.
The relation between x and I, holding prices constant, is called the Engel curve, and is shown in
Figure 5.3.
The data in Table 2 confirm Engel’s Law, that as income increases, the expenditure share of food
decreases. The implication is that income elasticity of food is less than unity. Why? Let x be food.
Then sx = xpx /I is the expenditure share of food, and
xpx I dx
dsx px dx
dI 1 I x dI 1 xpx sx
= − 2 xpx = − = (ex − 1)
dI I I I I I I
or
I dsx
= ex − 1
sx dI
32
Figure 5.3: The Engel curve starts from the origin if x = 0 when I = 0, (which is a reasonable assumption).
The Engel curve has positive slope if x is a normal good.
(a) (b) (c)
Figure 5.4: (a) Linear Engel curves: dx/dI = x/I =⇒ ex = 1. (b) Convex Engel curves: dx/dI >
x/I =⇒ ex > 1. (c) Concave Engel curves: dx/dI < x/I =⇒ ex < 1.
So, if ex < 1, then food share is declining with income. An alternative proof employs a favorite
trick of economists, taking natural logs:
log sx = log x + log px − log I

d log sx d log x
= −1
d log I d log I
or
I dsx
= ex − 1
sx dI
In some contexts, the food share is used as an indicator of welfare. It has been proposed that
families in different countries with the same food share are equally well off.
5.2 Change in Demand with Respect to Price
A change in one of the prices causes the budget line to rotate; as it does so, the tangencies with
higher and higher indifference curves trace out the price consumption path.
You should be familiar with the demand curve, which is the graph of the demand function x(px ) =
x(px , p0y , I 0 ), where p0y and I 0 are fixed. See Figure 5.6.
33
Table 2: Food Share of Std. Budget in Various Years
Year Food Share in Std. Budget∗

1935-39 35.4
1952 32.2
1963 25.2
1992 19.6
2000 16.3
* Budget used in calculation of CPI.
Figure 5.5: A rise in px is accompanied by a reduction in x.
Note that we traditionally plot demand, (the dependent variable), on the horizontal axis and the
price, (the independent variable), on the vertical axis.3 The negative slope of the demand curve
reflects the idea that consumption of a commodity falls as its price increases. However, demand
curves are not necessarily downward sloping! We turn now to a decomposition of the change in
demand due to a change in price. We show that there are two factors:
1. the curvature of the indifference curves
2. the nature of the income effect on demand
5.3 Graphical Decomposition of a Change in Demand
Suppose px increases from p0x to p1x ; demand changes from (x0 , y 0 ) to (x1 , y 1 ). We can deocmpose
the change from x0 to x1 as follows:
1. First, think of the change in x that arises purely due to the fact that x now costs more.
Draw a budget line with slope p1x /py that still allows the consumer to reach the indifference
3 We
owe this convention to Alfred Marshall. As a result of this, steep demand curves are “inelastic,” whereas flat
demand curves are “elastic.”
34
Figure 5.6: The reader is presumed to be famililar with the demand curve.
Figure 5.7: The movement from x0 , y 0 ) to (x∗ , y ∗ ) takes place along the indifference curve.
curve through (x0 , y 0 ) (call this indifference curve u0 ). Note that, since it’s steeper than the
old budget line, it has a tangency with u0 to the left of (x0 , y 0 ).4 This “artificial” budget
constraint is represented by the dashed line in Figure 5.7.
2. Second, move from this intermediate point to the final optimum. Observe that this movement
is a movement along an income expansion path, since the intermediate optimum occurs where
u0 has a tangency with a budget line with slope p1x /py .
Analytically,
∆x = x1 − x0 = (x1 − x∗ ) + (x∗ − x0 )
where x∗ denotes the aforementioned intermediate optimum. We refer to the first change (x1 − x∗ ),
holding utility constant, as the substitution effect. We refer to the second change (x∗ − x0 ), as the
4 Assuming DMRS.
35
(a) (b)
Figure 5.8: (a) Step 1: move to new tangency on old indifference curve. (b) Step 2: Move along IEP to
new optimum.
income effect. Thus we write

∆x = ∆xS + ∆xI
5.4 Substitution Effect
The substitution effect represents movement along an indifference curve. It tells you how far to
move in order for the indifference curve to be parallel to the new budget line, i.e. in order for the
MRS to equal the new price ratio. Obviously, then, if the indifference curves are relatively flat,
you have to go a long way before the MRS equals the new price ratio, and the substitution effect is
substantial. If the indifference curves are highly convex, the MRS changes rapidly and you do not
need to go far: the substitution effect is small. See Figure 5.9.
(a) (b)
Figure 5.9: (a) u0 flat =⇒ more substantial substitution effect (b) u0 highly curved =⇒ lesser substi-
tution effect
Note that if ∆px > 0, the substitution effect is negative. (Why?) What about the substitution
36
effect of ∆px on y?
5.5 Income Effect
Intuitively, one might think the income effect is larger the greater x0 , i.e. the greater x was in
the first place. If, initially, you consumed very little x, the income effect would be relatively small.
Take a look at Figure 5.10:
• Notice that the intermediate budget constraint almost passes through (x0 , y 0 ). (It always
cuts below, if not by much.)
• So, the income effect is approximately proportional to the change in income from the budget
line through (x0 , y 0 ) to the final budget line.
Figure 5.10: The income effect is approximately proportional to the perpendicular distance between the
budget lines.
What is the change in income? The final budget constraint limits the consumer to I, just as the
initial constraint does. Therefore I = p0x x0 + py y 0 . In order to be able to afford (x0 , y 0 ) under the
new prices, you would need p1x x0 + py y 0 , or ∆I = ∆px x0 more than before. For a small change
in px , the intermediate optimum is close to the initial one, so the difference in income from the
intermediate constraint to the final one is approximately ∆px x0 . (The approximation is exact in
the limit ∆px → 0.)
This confirms our intuition: the movement along the income expansion path from the intermediate
optimum to the final optimum—the income effect—will be larger, the larger was x0 , our initial level
of consumption of x.
37
6 Slutsky’s Equation
6.1 Review
Expenditure function:
e(p1 , p2 , u0 ) = min p1 x1 + p2 x2 s.t. u(x1 , x2 ) = u0

x1 ,x2
= p1 xc1 (p1 , p2 , u0 ) + p2 xc2 (p1 , p2 , u0 )
where xc1 and xc2 are the compensated demands, the cheapest choices that enable one to achieve
utility level u0 at prices (p1 , p2 ).
The Lagrangian for the e-min problem is
L(x1 , x2 , µ) = p1 x1 + p2 x2 − µ[u(x1 , x2 ) − u0 ]
The FONC are:
p1 − µu1 (x1 , x2 ) = 0
p2 − µu2 (x1 , x2 ) = 0
u(x1 , x2 ) = u0
As for the derivatives of the expenditure function with respect to prices,

∂e(p1 , p2 , u0 ) ∂xc (p1 , p2 , u0 ) ∂xc (p1 , p2 , u0 )
= xc1 (p1 , p2 , u0 ) + p1 1 + p2 2 . (6.1)
∂p1 ∂p1 ∂p1
The reader is presumed to be familiar with the Envelope Theorem, which says the second and third
terms on the RHS cancel.
Proof: Recall that u(xc1 (p1 , p2 , u0 ), xc2 (p1 , p2 , u0 )) = u0 . Differentiate both sides with respect to p1 :
∂xc1 ∂xc
u1 + u2 2 = 0
∂p1 ∂p1
But u1 = p1 /µ and u2 = p2 /µ by the FONC. It follows by substitution that
p1 ∂xc1 p2 ∂xc2
· + · =0
µ ∂p1 µ ∂p1
which means
∂xc1 ∂xc
p1 + p2 2 = 0
∂p1 ∂p1
Thus we have
∂e(p1 , p2 , u0 )
= xc1 (p1 , p2 , u0 )
∂p1
There is a story we tell to go along with this. If you initially are minimizing expenditure, and the
price of good 1 rises, what do you do? Your first order response is simply to continue buying the
38
old bundle—this increases your spending by xc1 × ∆p1 . That is the first term on the RHS of (6.1).
But then you would like to adjust your choices of goods 1 and 2 to reflect the new prices. The
adjustments are the second and third terms on the RHS of (6.1). But because your initial choices
were optimal—they satisfied the FONC—when you attempt to adjust x1 and x2 you don’t save any
more.
6.2 Slutsky Decomposition
Now we are ready to analyze what happens to the uncompensated, or regular demand functions
when prices rise/fall. Suppose we start with prices (p01 , p02 ) and income I 0 . Initially the optimal
choices are x01 = x1 (p01 , p02 , I 0 ) and x02 = x2 (p01 , p02 , I 0 ), where x1 (·) and x2 (·) are the regular demand
functions.
We decompose the effect of a change in price ∆p1 = p11 − p01 as follows:
(a) Starting from (x01 , x02 ), imagine the adjustment you would make if you could remain on the
old indifference curve. This would lead you to a new bundle (x∗1 , x∗2 ). Since prices have risen
this bundle costs more than you were spending before. This move is called the substitution
effect of the price increase.
(b) Then, from (x∗1 , x∗2 ), imagine the adjustment you would make to get back to the original
income level. This would be a move inward along an income expansion path (IEP), and
would lead you to (x11 , x12 ). This move is called the income effect of a price increase.
Figure 6.1: A decomposition of the change in demand into its constituent parts: movement along the
indifference curve followed by movement inward along an IEP.
Note that the total change in x1 is
∆x1 = x11 − x01 = (x11 − x∗1 ) + (x∗1 − x01 ) = ∆xI1 + ∆xS1
39
What are the relative magnitudes of the constituent parts? To begin, observe that (x01 , x02 ) and
(x∗1 , x∗2 ) are on u0 . Now,
x01 = x1 (p01 , p02 , I 0 ) = xc1 (p1 , p2 , u0 ) (6.2)
Also,
x∗1 = xc1 (p11 , p02 , u0 )
so
∂xc1 (p01 , p02 , u0 )
∆xS1 = x∗1 − x01 = xc1 (p11 , p02 , u0 ) − xc1 (p01 , p02 , u0 ) ≈ × ∆p1
∂p1
The substitution effect depends on the rate at which compensated demands change: this is purely a
function of the curvature of the indifference curves.
How about the income effect?
∆xI1 = x11 − x∗1
First note that x11 = x1 (p11 , p02 , I 0 ): it is the regular demand given (p1 ,1 , p02 , I 0 ). But what is x∗1 ? It
is the choice one would make with enough income remain on u0 even at the new prices. How much
money would it take? The answer is e(p11 , p02 , u0 )! So,
x∗1 = x1 (p11 , p02 , e(p11 , p02 , u0 ))
Thus
∆xI1 = x1 (p11 , p02 , I 0 ) − x1 (p11 , p02 , e(p11 , p02 , u0 ))

∂x1 (p01 , p02 , I 0 ) 0
≈ (I − e(p11 , p02 , u0 ))
∂I
So the income effect depends on the income derivative of demand times the change in income
∆I = I 0 − e(p11 , p02 , u0 ). Note that ∆I < 0 since one would need more than I 0 to achieve U = u0
at prices (p11 , p02 ).
But how big is ∆I? We need one last trick. We know that I 0 = e(p01 , p02 , u0 ), so we can write
∆I = I 0 − e(p11 , p02 , u0 )
= e(p01 , p02 , u0 ) − e(p11 , p02 , u0 )
∂e(p01 , p02 , u0 ) 0
≈ (p1 − p11 )
∂p1
∂e(p01 , p02 , u0 )
= × (−∆p1 )
∂p1
∂e(p01 , p02 , u0 )
=− × ∆p1
∂p1
(which is negative for an increase in p1 ). Finally we have
∂e(p01 , p02 , u0 )
= xc1 (p01 , p02 , u0 ) by (6.1)
∂p1
= x01 by (6.2)
40
and combining the last few results,
∆I ≈ −x01 ∆p1
Note that the size of the income effect depends on the original level of consumption of x1 .
Putting it all together,
∂x1 (p01 , p02 , I 0 ) ∂x1 (p01 , p02 , I 0 )

∆xI1 = × ∆I = − × x01 ∆p1
∂I ∂I
Thus
∆x1 = ∆xI1 + ∆xS1

∂x1 (p01 , p02 , I 0 ) ∂xc1 (p01 , p02 , u0 )
=− × x01 ∆p1 + × ∆p1
∂I ∂p1
or
∆x1 ∂x1 (p01 , p02 , I 0 ) ∂xc1 (p01 , p02 , u0 )
= −x01 +
∆p1 ∂I ∂p1
Now in the limit ∆p1 → 0 the ratio ∆x1 /∆p1 equals the derivative of the regular demand function
with respect to p1 . We have established:
∂x1 (p01 , p02 , u0 ) ∂x1 (p01 , p02 , I 0 ) ∂xc1 (p01 , p02 , u0 )

= −x01 +
∂p1 ∂I ∂p1
This is called Slutsky’s equation, after the Russian economist who proved it over 100 years ago.
Slutsky’s equation says the derivative of the regular demand function with respect to p1 is a com-
bination of the income and substitution effects. The income effect depends on the derivative of
demand with respect to income, times the original level of consumption of x1 . The substitution
effect depends on the derivative of the compensated demand function.
A useful feature of Slutsky’s equation is that it provides a way to recover information about indif-
ference curves from the derivatives of the demand functions with respect to prices and incomes. In
principle, we can observe ∂x1 /∂p1 and ∂x1 /∂I, which would enable us to infer
∂xc1 (p01 , p02 , u0 ) ∂x1 (p01 , p02 , I 0 ) ∂x1 (p01 , p02 , I 0 )

= + x01
∂p1 ∂p1 ∂I
Suppose we get an estimate of ∂xc1 /∂p1 that is nearly zero. The indifference curves must therefore
be almost Leontief (“right angles”).
41
7 Using Market Level Demand Curves
Since the demand curve graphs x = f (px , py , I), if py or I changes, the demand curve shifts. For
example, if income were to increase by dI > 0, then at a given price, demand would increase by
dx = (∂x/∂I)dI. For a normal good ∂x/∂I > 0, so the demand curve would shift to the right as in
Figure 7.1.
Figure 7.1: A shift in the demand curve to to an increase in I, assuming x is a normal good.
If the elasticities of demand are approximately constant, then

dx ∂x I dI dI
d(log x) = = · = ex = ex d(log I)
x ∂I x I I
where ex is the income elasticity of demand for x.5 Similarly, if py changes, the demand curve shifts
unless ∂x/∂py = 0 (as in the case of Cobb-Douglas preferences). If ∂x/∂py < 0, and increase in the
price of y causes the demand curve to shift to the right.
For the purposes of evaluating the effect of relatively small changes in prices and income, we often
assume the demand function has constant elasticities:
∂x px ∂ log x
× = = ηxx (constant)
∂px x ∂ log px
∂x py ∂ log x
× = = ηxy (constant)
∂py x ∂ log py
∂x px ∂ log x
× = = ex (constant)
∂I x ∂ log I
This is equivalent to assuming that the demand function is log-linear:
log x = ηxx log px + ηxy log py + ex log I + c

5 You should be familiar with the concept of elasticity from Econ 1. In particular, you should be able to verify
that elasticity is a unitless quantity.
42
where c is a constant. Note that homogeneity implies ηxx + ηxy + ex = 0. Put differently, if prices
and income all rise by one percent, then x remains constant.6
As you recall from introductory economics, the market is constructed by introducing a supply curve
of the form x = S(px ). (See Figure 7.2.) It is usually assumed that supply is upward sloping. (We
defer the derivation of market supply curves until later.) For now, we shall assume that elasticity
of supply is constant:
dS(px ) px
· = σx
dpx S(px )
where σx denotes elasticity of supply. We now can combine supply and demand curves to analyze
the effects of exogenous shocks to income or other prices. We have
x = S(px ) = f (px , py , I)
a system of two equations in two unknowns, px and x (unit price of x and quantity of x, respectively),
given income and other prices. This is pictured in Figure 7.3.
Figure 7.2: The reader is presumed to be familiar with the upward sloping supply curve.
7.1 An Increase in Income
Obviously, both x and px increase with I. But by how much? Take a look at Figure 7.4. Starting
at equilibrium, with x = x0 and px = p0x , the changes in demand and supply are:
∆x ∆px ∆I
= ηxx + ex (demand)
x px I
∆x ∆px
= σx (supply)
x px
6 A proof would involve recognizing that if x remains constant, then so does log x, and therefore setting the total
differential of log x equal to zero. The details are left to the reader.
43
Figure 7.3: The market is in equilibruim when the price is such that supply and demand are balanced.
Figure 7.4: How much does px increase due to an outward shift in the demand curve?
The proportional changes in supply and demand have to be the same in order to restore equilibrium.
Therefore
∆px ∆I ∆px
ηxx + ex = σx
px I px
which implies
∆px ex ∆I
=
px σx − ηxx I
Note that σx > 0 and ηxx < 0, so σx − ηxx is strictly positive. Furthermore,

∆x ∆px σx ex ∆I
= σx =
x px σx − ηxx I
44
For example, suppose the following:
σx = 0.60 (short run)

ηxx = −1.40
ex = 0.40
If ∆I/I = 0.10 (10% increase), then

∆px
= (0.40)(0.10) ≈ 0.02
px
∆x
≈ 0.012
x
As an exercise, calculate the effect of a 10% drop in the price of a substitute good (good y) on the
market for x. Use an estimate for the cros-price elasticity between x and y of 0.67 (ηxy = 0.67).
7.2 Tax Incidence
If a tax of t dollars per unit is imposed on x, it creates a gap between the price that consumers pay
and the price that producers receive, of t dollars per unit. You are presumed to be familiar with
the diagram shown in Figure 7.5.
Starting from an equilibrium at (p0x , x0 ), price received by producers falls to p1x , the price paid
by consumers rises to p1x + t, and the quantity falls to x1 . Consider the two marekts shown in
Figure 7.6, each with the same tax. Obviously, the effect of the tax on the prices paid/received by
the two sides depends on the relative elasticities of supply and demand. To see this more formally,
we proceed based on the assumption that elasticities are roughly constant. Letting px denote the
price received by producers, the change in supply is
∆x ∆px
= σx
x px
The change in prices for consumers is ∆px + t. Therefore, the change in quantity demanded is

∆x ∆px + t
= ηxx
x px
Market equilibrium requires that change in demand equals change in supply:

∆px + t ∆px
ηxx = σx
px px
Solving for the equilibrium change in prices, we have
t ∆px
ηxx = (σx − ηxx )
px px
and
∆px ηxx t
=
px σx − ηxx px
45
where t/px is the proportional tax rate. Since σx > 0 and ηxx < 0, so σx − ηxx is strictly positive,
and therefore ∆px < 0. With regard to quantity,

∆x ∆px σx ηxx t
= σx = <0
x px σx − ηxx px
For producers, the change in price is

∆px ηxx t
=
px σx − ηxx px
and for consumers it is

∆px + t ηxx t t σx t
= + = >0
px σx − ηxx px px σx − ηxx px
Notice that the ratio of the changes in prices for producers versus consumers is ηxx /σx . So, if
demand is highly inelastic, i.e. |ηxx | is small (e.g. ηxx = −0.1), and supply is moderately elastic
(e.g. σx = 1.0), then producer prices don’t fall by much relative to consumer prices. On the other
hand, if demand is highly elastic, i.e. if ηxx is big (e.g. ηxx = −3.0), then producer prices are more
affected.
Last we consider the effect of a per unit subsidy of s on the price of x. (For example, prior to
the recent rise in electricity rates, electricity prices were subsidized throughout most of California.)
The change in price received by producers is ∆px , whereas the change in price paid by consumers
is ∆px − s. The proportional changes in quantity are:

∆x ∆px − s
= ηxx (demand)
x px
∆x ∆px
= σx (supply)
x x
Setting the two equal, we have
∆px −ηxx s
= >0
px σx − ηxx px
which implies that part of the effect of the subsidy is mitigated by a rise in prices. In fact, the
change in price paid by consumers is

∆px − s −ηxx s s −σx s
= − = <0
px σx − ηxx px px σx − ηxx px
Note that −σx /(σx − ηxx ) is less than one in absolute value.
46
Figure 7.5: The new price p1x is such that when consumers pay p1x + t and suppliers receive p1x , equilibrium
is restored.
(a) (b)
Figure 7.6: (a) Demand inelastic, supply elastic. (b) Demand elastic, supply inelastic.
47
8 Labor Supply
In this section we consider the choice of how many hours to work by an individual who faces an
hourly wage w > 0, and also has non-labor income y. The individual is assumed to value leisure `
and consumption of goods x, using a utility function u(x, `). We assume there is an upper bound
T on leisure, and that the sum of leisure ` and hours of work h is T :
` + h = T, or h = T − `
The graph looks a little unusual since preferences are only defined up to the point where ` = T as
the reader can see in Figure 8.1.
Figure 8.1: The budget constraint for an agent who works for w/h and consumes a numeraire good x.
The budget constraint is px = wh + y but we shall assume p = 1. The consumer’s objective is

max u(x, `) s.t. x = w(T − `) + y, or x + w` = y + wT
x,`
Note that if you think of the consumption bundle as (x, `), then the budget constraint says the
total cost of the bundle has to be y + wT for this is all the income you would have if you “bought”
no leisure. This “full income” depends on w, and therein lies the key difference between labor
supply and other consumer choice problems: as the price of one good (leisure) rises, the consumer
is actually richer. Intuitively this is because a worker is a net seller of leisure: he or she starts at
an “endowment point” (x, `) = (y, T ). From there he or she can trade with the market by giving
up leisure in return for cash, which is then used to purchase goods.
We proceed by the method of Lagrange:
L(x, `, λ) = u(x, `) − λ(x + w` − y + wT )
Lx = ux (x, `) − λ = 0
L` = u` (x, `) − λw = 0
Lλ = −x − w` + y − wT = 0
48
The first two FONC imply the usual tangency condition: u` (x, `)/ux (x, `) = w. The solutions are:
x = x(w, y)
` = `(w, y)
h(w, y) = T − `(w, y)
Now consider the rise in w (from w to w1 ) shown in Figure 8.2. As you can see, the substitution
0
Figure 8.2: For this individual the income and substitution effects have opposite signs.
effect causes a drop in `, or equivalently a rise in h. But the income effect works in the opposite
direction: as a net seller of leisure the agent is better off and uses some of her extra income to buy
more leisure.
To formally analyze the income and substitution effects we rely on the expenditure function for the
labor supply case: this is the amount of non-labor income needed to achieve utility u0 , given w:
e(w, u0 ) = min x − w(T − `) s.t. u(x, `) = u0
x,`
L(x, `, µ) = x − w(T − `) − µ[u(x, `) − u0 ]

Lx = 1 − µux (x, `) = 0
L` = w − µu` (x, `) = 0
Lµ = −u(x, `) + u0 = 0
The first two FONC imply the tangency condition: u` (x, `)/ux (x, `) = w. The solutions are:
x = xc (w, u0 )
` = `c (w, u0 )
hc (w, u0 ) = T − `c (w, u0 )
The expenditure function is thus
e(w, u0 ) = xc (w, u0 ) − w[T − `c (w, u0 )] = xc (w, u0 ) − whc (w, u0 )
49
and
∂e ∂xc ∂hc
= −w −hc = −hc
∂w |∂w {z ∂w}
0
c c
To see that ∂x /∂w − w∂h /∂w = 0, we use the same trick as we did in Section 6 when dealing
with the usual expenditure function. So, recalling that (xc (w, u0 ), `c (w, u0 )) yields utility u0 ,
u(xc (w, u0 ), `c (w, u0 )) = u0
and therefore differentiating both sides,

∂xc ∂`c
ux (xc (w, u0 ), `c (w, u0 )) + u` (xc (w, u0 ), `c (w, u0 )) =0
∂w ∂w
But wux = u` by the tangency condition, and ∂hc /∂w = −∂`c /∂w, hence the desired result.
(Again, this is an example of the Envelope Theorem.)
To summarize, we have shown that ∂e/∂w = −hc (w, u0 ). To understand this, think of your mom
when she finds out you got a raise at your summer job: she reduces your allowance by an amount
proportional to how much you were working.
Now let’s see how leisure choice depends on wages. Assume we start with (w0 , y 0 ), and that w rises
from w0 to w1 . The rise in w causes a substitution effect and an income effect:
∆` = ∆`S + ∆Ì
As usual, we can write

∂`c
∆w∆`S =
∂w
representing the compensated adjustment to the higher cost of leisure on the indifference curve
corresponding to level u0 . Also,
∆Ì = `(w1 , y 0 ) − `(w1 , y 1 )
where y 0 = original non-labor income, and y 1 = e(w1 , u0 ). We use our standard trick of taking
first order approximations, based on the expenditure function. First, we can approximate
∂`(w1 , y 1 )
`(w1 , y 0 ) − `(w1 , y 1 ) ≈ × (y 0 − y 1 )
∂y
and recognizing that y 0 = e(w0 , u0 ),
y 0 − y 1 = e(w0 , u0 ) − e(w1 , u0 )
∂e(w0 , u0 )
≈ (−∆w)
∂w
c 0 0
= −h (w , u )(−∆w)
= h0 ∆w
So,
∂`(w1 , y 1 )
∆Ì ≈ × h0 ∆w
∂y
50
The income effect is proportional to h0 ∆w: if you had been working more, there would be a bigger
positive income effect. Finally, then, we have
∂`c (w0 , u0 ) ∂`(w1 , y 1 )

∆` = ∆`S + ∆Ì = ∆w + × h0 ∆w
∂w ∂y
Dividing both sides ∆w, and taking the limit ∆w → 0,
∂` ∆` `c (w0 , u0 ) ∂`(w0 , y 0 )
= lim = + h0
∂w ∆w→0 ∆w w ∂y
This is Slutsky’s equation for leisure demand. In terms of hours, recall that h = T − `, so
∂h ∂` ∂h ∂`
=− and =−
∂w ∂w ∂y ∂y
and therefore
∂h ∂hc (w0 , u0 ) ∂h(w0 , y 0 )
= + h0
∂w ∂w ∂y
When the wage rises there is a positive substitution effect and a negative income effect on labor
supply. Note in particular that when a person gets a raise, he won’t necessarily work more.
51
9 Intertemporal Consumption
The two-period consumption model concerns a consumer whose lifetime spans two periods. In
period one the consumer has income y1 and spends c1 ; in period two the consumer has income y2
and spends c2 . The consumer can borrow or lend at a rate of interest equal to r.
We express the consumer’s budget constraint in terms of period-two dollars. The choice is arbitrary,
but this way it ends up simplifying the algebra for then we basically have two goods with prices 1+r
and 1, respectively (rather than 1 and 1/(1 + r), which would be the case in period-one dollars).
Having 1 + r in the numerator, not the denominator, is a big help. Total consumption is limited
by total income, so the budget constraint is given by
(1 + r)c1 + c2 = (1 + r)y1 + y2
The consumer’s objective is to solve
max u(c1 , c2 ) s.t. (1 + r)c1 + c2 = (1 + r)y1 + y2
The Lagrangean is
L(c1 , c2 , λ) = u(c1 , c2 ) − λ[(1 + r)c1 + c2 − (1 + r)y − 1 − y2 ]
and the FONC are
L1 = u1 (c1 , c2 ) − λ(1 + r) = 0
L2 = u2 (c1 , c2 ) − λ = 0
Lλ = −(1 + r)c1 − c2 + (1 + r)y1 + y2 = 0
These give a rise to the tangency condition u1 /u2 = 1 + r and the budget constraint, as usual. The
solutions are functions of r, y1 , and y2 :
c1 = c1 (r, y1 , y2 )
c2 = c2 (r, y1 , y2 )
These demand functions are a little unusual because they specify not just total available resources,
or “wealth” w = (1 + r)y1 + y2 , but also the composition of w. To clarify the effects of a change
in r on c1 it is helpful to define two other consumption functions, that depend on the interest rate
and total wealth (measured in period-two dollars):
c1 = cw
1 (r, w)
c2 = cw
2 (r, w)
These optimal choice functions are related by:
c1 (r, y1 , y2 ) = cw
1 (r, (1 + r)y1 + y2 )
c2 (r, y1 , y2 ) = cw
2 (r, (1 + r)y1 + y2 )
You can see that as we change r, the effect on c1 (r, y1 , y2 ) depends on both ∂c1 /∂r and ∂c1 /∂w.
52
Now let’s define the expenditure function as the minimum cost to reach a given level of utility
(again, measured in period-two dollars). Specifically, define e as follows:
e(r, u0 ) = min(1 + r)c1 + c2 s.t. u(c1 , c2 ) = u0
The Lagrangian is
L(c1 , c2 , µ) = (1 + r)c1 + c2 − µ[u(c1 , c2 ) − u0 ]
and the FONC are
L1 = 1 + r − µu1 (c1 , c2 ) = 0
L2 = 1 − µu2 (c1 , c2 ) = 0
Lµ = −u(c1 , c2 ) + u0 = 0
The solutions are the compensated demand functions cc1 (r, u0 ) and cc2 (r, u0 ). As usual
e(r, u0 ) = (1 + r)cc1 (r, u0 ) + cc2 (r, u0 )
Differentiating,
∂e(r, u0 ) ∂cc ∂cc
= cc1 (r, u0 ) + (1 + r) 1 + 2
∂r ∂r ∂r
c c
and (as usual) it is easy to show that (1 + r)∂c1 /∂r + ∂c2 /∂r = 0, so
∂e(r, u0 )
= cc1 (r, u0 )
∂r
Thus we have three optimal consumption functions for first period consumption:
• c1 (r, y1 , y2 ), which depends on y1 and y2
• cw
1 (r, w), which depends only on w
• cc1 (r, u0 ), which depends on utility

We also have two relations connecting the three:
c1 (r, y1 , y2 ) = cw
1 (r, (1 + r)y1 + y2 ) (9.1)
cc1 (r, u0 ) = cw 0
1 (r, e(r, u )) (9.2)
Now it may seem clear why we defined cw 1 : it’s the function that links the compensated demand and
the demand we ultimately are interested in, c1 (r, y1 , y2 ). We can differentiate these two equations
with respect to r. Starting with (9.1),
∂c1 (r, y1 , y2 ) ∂cw (r, (1 + r)y1 + y2 ) ∂cw (r, (1 + r)y1 + y2 )

= 1 + y1 1 (9.3)
∂r ∂r ∂w
This means that when you change r, the response of the demand for c1 as a function of (r, y1 , y2 )
has an income effect, reflecting the fact that as r rises, so does the value of wealth.
53
From (9.2) we get an expression like we’ve seen before:
∂cc1 (r, u0 ) ∂cw (r, e(r, u0 )) ∂cw (r, e(r, u0 )) ∂e(r, u0 )

= 1 + 1 ×
∂r ∂r ∂w ∂r
∂cw 0
1 (r, e(r, u )) ∂cw 0
1 (r, e(r, u )) c
= + c1 (r, u0 )
∂r ∂w
Rearranging, we get a Slutsky equation for cw
1:
∂cw 0
1 (r, e(r, u )) ∂cc (r, u0 ) ∂cw (r, u0 ) c
= 1 − 1 c1 (r, u0 )
∂r ∂r ∂w
∂cc (r, u0 )
= 1 − c1 (r, y1 , y2 ) (9.4)
∂r
assuming u0 is the level of utility one can achieve with income (y1 , y2 ) and interest rate r.
Finally, plugging (9.4) into (9.3),
∂c1 (r, y1 , y2 ) ∂cw (r, (1 + r)y1 + y2 ) ∂cw (r, (1 + r)y1 + y2 )

= 1 + y1 1
∂r ∂r ∂w
c 0 w 0
∂c (r, u ) ∂c1 (r, e(r, u ))
= 1 + [y1 − c1 (r, y1 , y2 )]
∂r ∂w
∂cc (r, u0 ) ∂cw (r, e(r, u0 ))
= 1 + 1 s1 (r, y1 , y2 )
∂r ∂w
where s1 (r, y1 , y2 ) = y1 − c1 (r, y1 , y2 ) is the optimal level of period-one savings.
The income effect of a rise in r on optimal consumption c1 (r, y1 , y2 ) is positive or negative, depending
whether s1 is positive or negative. For a saver, s1 > 0 and a rise in r has a positive income effect
(because the consumer is a net supplier of funds to the market, as in the case of labor supply). But
for a borrower, s1 < 0 and a rise in r has a negative income effect (because the consumer is a net
demander of funds, as in the case of basic commodity demand).
54
10 Production and Cost I
The technology available to a given firm is is summarized by its production function. This function
gives the quantities of output produced by various combinations of inputs. For example, an airline
uses labor inputs, fuel, and machinery (airplanes, loading equipment, etc.) to produce the output
“passenger seats.” We write y = f (a, b) to signify that with inputs a and b, it is possible to produce
y units of output.
Examples:
One Input
• y = aγ

0 a < ā
• y=
1 a > ā
Two Inputs
• y = aα bβ (Cobb-Douglas)
• y = min{a, b} (Leontief, CRS)
• y = a + b (Additive, CRS)
For two or more inputs, production functions are a lot like utility functions. The important dif-
ference is that output is measurable and has natural units (e.g. passenger seats). It’s as if the
“indifference curves” have numbers attached to them that matter.
A second, less obvious, way to summarize technology is to compute the cost associated with pro-
ducing a given output level y, at fixed prices for the inputs. In principle, if you know the production
function, it is easy to find the cost function in two steps:
1. enumerate all possible ways of producing y
2. determine the cheapest one, and evaluate its cost
Most of the economic behavior of firms is studied via the cost function. In the next few sections,
we demonstrate how to derive the cost function and illustrate the connection between its properties
and those of the production function.
10.1 One-Factor Production and Cost Functions
10.1.1 Production Functions
Suppose there is only one input (apart from, perhaps a “set-up cost”). Then we have a picture
along the lines of Figure 10.1. Note that f (0) = 0 by convention.
Definitions and Facts:
55
Figure 10.1: A representative production function. Note the “S” shape.
• The marginal product of factor a is the increase in y that accompanies a unit increase in a:
∂f (a)
M Pa = = f 0 (a)
∂a
Factor a is said to be useful if f 0 (a) > 0.
• The average product of factor a is the ratio of total output to total input of a:
f (a)
APa =
a
• If the M P of factor a is increasing, then f 00 (a) > 0 and we say that there are increasing
marginal returns: as the scale of output is expanded, each additional unit of input contributes
more. If the M P is decreasing, then f 00 (a) < 0 and we say there are diminishing marginal
returns. See Figure 10.2.
(a) (b)
Figure 10.2: (a) Increasing marginal returns. (b) Decreasing marginal returns.
• If M Pa > APa , then APa is increasing; if M Pa < APa , then APa is decreasing.
Think baseball, with AP = career batting average and M P = season batting average. A
hitter who has a better-than-average season raises his career average. See Figure 10.3. In
56
general,
af 0 (a) − f (a)

dAPa 1 0 f (a) 1
= 2
= f (a) − = (M Pa − APa )
da a a a a
Figure 10.3: At a = a1 , AP = f (a1 )/a < f 0 (a) = M P , AP is increasing. At a = a2 , the opposite is true.
Examples:
• f (a) = ka, where k > 0 (linear). APa = M Pa = k.
• f (a) = aβ , where 0 < β < 1 (concave). See Figure 10.4.
Figure 10.4: The greater β, the less concave the production function, up to β = 1.
• f (a) = 9a2 − a3 , a < 6. See Figure 10.5. For this function we have the following:
f 0 (a) = 18a − 3a2 =⇒ [f 0 (a) ≥ 0 ⇐⇒ a ≤ 6]

00
00 f (a) > 0 ⇐⇒ a < 3
f (a) = 18 − 6a =⇒
f 00 (a) < 0 ⇐⇒ a > 3
57
Figure 10.5: The production function of the example on page 57.
10.1.2 Cost Functions
What is the cost function for a one-factor production function? Let w dentoe the price per unit of
factor a. Then
c(y, w) = min wa s.t. y = f (a)
But y = f (a) implies a = f −1 (y).7 Therefore c(y, w) = wf −1 (y). See Figure 10.6 for an illustration
of this process. If w is fixed, then we often write the cost function as a function of y only: c(y).
Define marginal cost M C(y) = c0 (y), and average cost AC(y) = c(y)/y.
Examples:
• y = f (a) = ka (linear) =⇒ a = y/k (linear input requirement function)
y 1
c(y, w) = w = wy (linear in both y and w)
2 2
√
• y = f (a) = a =⇒ a = y 2 (convex input requirement function)
c(y, w) = wy 2 (linear in w but convex in y—see Figure 10.7)
10.1.3 Connection between M C and M P
Marginal cost is the amount it would cost, at the current level of output, to produce an additonal
unit. By definition of M Pa , one unit of input adds M Pa = f 0 (a) units of output. It follows that
• 1/M Pa = 1/f 0 (a) units of a are needed to produce one unit of y
• the marginal cost of an additional unit is M C(y) = w/f 0 (a), when the production function
is given by y = f (a)
7 Assume, for the moment, that f is one-to-one.
58
(a)
(b)
Figure 10.6: The graph in (b) is obtained by rotating quadrant II in (a) 90 degrees clockwise.
Alternatively, c(y) = wf −1 (y), using as input requirement function a = f −1 (y). Thus8
df −1 (y) w
C 0 (y) = w = 0
dy f (a)
10.1.4 Geometry of c, AC, and M C
Take a look at Figure 10.8a. Note the following:

• when M C < AC, AC is falling
• when M C > AC, AC is rising
• when AC is at a minimum, AC = M C
8 Recall that if f 0 (x0 ) 6= 0, then
df −1 (y)

1
= .
dy y=f (x0 ) f 0 (x0 )
59
(a) (b)
√
Figure 10.7: The production function y = a and the corresponding cost function c = wy 2 , where w is
the per-unit cost of a.
We sometimes add a “set up” cost F , (also called a fixed cost). The total cost is then
c(y) = fixed cost + variable cost = F + V C(y)
The implications of this model are illustrated in Figure 10.8b.
60
(a) (b)
Figure 10.8: Compare (b) to (a) and note the following: 1. min AC occurs to the right of min AV C.
Why? 2. M C intersects both AC and AV C at their respective minimumns. Why?
61
11 Production and Cost II
The analysis of production and cost is more interesting when it involves combinations of two or
more inputs to produce y. The production function is y = f (a, b). As in consumer theory, we begin
by thinking about combinations of inputs that produce the same level of output. In the firm case
these are called isoquants.
We define the marginal rate of technical substitution (MRTS) as the slope of an isoquant. It indicates
how many units of b one would need to add, per unit of a given up, to keep output constant. See
Figure 11.1.
Figure 11.1: The marginal rate of technical substitution is analogous to the consumer’s MRS. This bears
comparison to Figure 2.5.
Formally, suppose y = f (a0 , b0 ), and consider varying a and b in such a way that output remains
fixed at y 0 :
dy = fa da + fb db = 0
which implies
fa (a0 , b0 )

db M Pa
=− =−
da y0 fb (a0 , b0 M Pb
The MRTS is analogous to the marginal rate of substitution (MRS) in consumer theory. When
there are two or more inputs, the production function is characterized by both the degree of sub-
stitutability between inputs (curvature of isoquants) and the extent to which output expands as
inputs are expanded proportionately. The latter gives rise to the idea of returns to scale. Recall
that for a production function y = f (a, b), we say f has constant returns to scale (CRS) if
f (γa, γb) = γf (a, b), γ > 0
We say that f has decreasing returns to scale (DRS) if
f (γa, γb) < γf (a, b), γ > 1
62
With DRS, if you double both inputs, you get less than twice the output. On the other hand, the
same inequality implies that if you reduce inputs by some proportion, your output falls by a smaller
proportion. So DRS suggests that smaller firms are necessarily more efficient. Conversely we say
that f has increasing returns to scale (IRS) if
f (γa, γb) > γf (a, b), γ > 1
(a) (b)
Figure 11.2: (a) CRS and (b) DRS. This can be seen by noting the shape of the intersection of the surface
with the plane a = b for example.
Examples:
• One Input: f (a) = aα
– CRS if α = 1
– DRS if α = 1
– IRS if α > 1
• Cobb-Douglas: f (a, b) = aα bβ
– CRS if α + β = 1
– DRS if α + β = 1
– IRS if α + β > 1
As a check, suppose α + β = 1. Then
f (γa, γb) = (γa)α (γb)β

= γ α+β aα bβ
= γf (a, b)
63
Geometrically, returns to scale indicates whether f is concave or convex over the top of a ray
emanating from the origin. (See Figure 11.2.)
11.1 Derivation of the Cost Function
Given a production function f (a, b) and prices wa , wb , we can write

c(wa , wb , y) = min wa a + wb b s.t. f (a, b) ≥ y
Define L = wa a + wb b − µ[f (a, b) − y], and proceed by the method of Lagrange:
La = wa − µfa (a, b) = 0
Lb = wb − µfb (a, b) = 0
Lµ = −f (a, b) + y = 0
The ratio of the first two FONC gives
wa fa (a, b)
= = M RT S
wb fb (a, b)
Geometrically, we find the point of tangency of the constraint f (a, b) = y with the “iso-cost” lines
wa a + wb b = const.
See Figure 11.3.Notice the problem is reversed relative to that of a consumer. In the cost problem,
you are constrained to an isoquant and have to find the lowest budget, or iso-cost line. In the
consumer problem, you are constrained to a budget line and have to find the highest isoquant, or
indifference curve.
Figure 11.3: The Firm’s objective is to minimize cost subject to a given level of output. This is done by
moving along an isoquant until the tangency condition is satisfied.
If we consider finding the most inexpensive way to achieve different levels of output given wa and
wb , we trace out the scale expansion path (SEP) shown in Figure 11.4. Note the similarity between
64
a firm’s SEP and a consumer’s IEP. Geometrically, the shape of the cost function (as a function of
y) depends on the shape of the production function “over the top” of the SEP. See Figure 11.5 for
an illustration. If the curve over the SEP is S-shaped as in Figure 11.5b we get cost functions of
the usual shape.
Figure 11.4: The scale expansion path traces out the optimal input demands as production varies.
(a) (b)
Figure 11.5: The shape of the cost function depends on the shape of the production function over the
top of the SEP. In other words, if the SEP is given by g(a, b) = g 0 , then the cost function is
shaped like the intersection of y = f (a, b) with g(a, b) = g 0 , where the latter is promoted to
three dimensions.
11.2 Marginal Cost
If we were to produce an additional unit of y, we could use input a, or input b, or both. If we used
a only, it would take 1/M Pa units of a for a single unit of y. The marginal cost is wa /M Pa (just as
65
in the one-factor case). By symmetry, we could also use b only, at marginal cost of wb /M Pb . But
from the FONC
wa M Pa wa wb
= =⇒ =
wb M Pb M Pa M Pb
So, on the margin, one should be indifferent to expanding output via increases in a or increases in
b. This reflects the fact that a and b were optimally chosen to begin with. Note also that
wa wa wb
µ= = =
fa (a, b) M Pa M Pb
Thus the Lagrange multiplier in the cost-minimization problem gives marginal cost.
Examples:
• f (a, b) = min{a, b/k}. At a cost minimum we must have a = b/k = y, which implies
c(wa , wb , y) = y(wa + kwb )
Note that this production function exhibits CRS.

• f (a, b) = a + kb. These are linear isoquants, with fa /fb = 1/k. If wa /wb > 1/k, use only b,
in which case y = kb =⇒ b = y/k, and c(wa , wb , y) = wb y/k. But if wa /wb < 1/k, use only
a, in which case y = a, and c(wa , wb , y) = wa y. Combining these results, for any wa , wb , we
have c(wa , wb , y) = y × min{wa , wb /k}.
The previous two examples illustrate what is called the dual relationship between cost and pro-
duction functions. Leontief production functions imply linear cost functions; linear cost functions
imply Leontief-like cost functions.
• f (a, b) = aα bβ . (You may have seen this in a problem set!) The Lagrangian is L(a, b, µ) =
wa a + wb b − µ(aα bβ − y).
La = wa − µαaα−1 bβ = 0
Lb = wb − µβaα bβ−1 = 0
Lµ = −αα β β + y = 0
Using the first FONC, we have

wa αaα−1 bβ αb
= α β−1
=
wb βa b βa
or
βawa
b=
αwb
By substitution,
β
βawa
aα bβ = aα = aα+β β β waβ α−β wb−β = y
αwb
from which we can easily retrieve the input requirement function (IRF) for a:
β
α+β
1 α − β β
a=y α+β wa α+β wbα+β

β
66
The IRF for b can be found by substitution, or by symmetry:
α
α+β
1 β α
− α
b = y α+β waα+β wb α+β
α
Finally c(wa , wb , y) = wa a + wb b when a and b are set to their respective cost-minimizing

values, so
β
α+β α+βα
1 α α
α+β
β
α+β
1 β α β
c(wa , wb , y) = y
α+β wa wb + y α+β waα+β wbα+β
β α
" β α #
1 α
α+β
β
α+β α α+β
β α+β
= y α+β wa wb +
β α
If α + β = 1 (CRS), this simplifies considerably:

" α #
β
α β α β
c(wa , wb , y) = ywa wb + = ywaα wbβ (α−α β −β )
β α
So with CRS, cost is linear in output. In general the exponent of y in the cost function is
(α + β)−1 , so if α + β > 1, cost is concave in output (IRS), whereas if α + β < 1, cost is
convex in output (DRS).
67
12 Cost Functions and IRFs
Suppose we are given a production function f (x1 , x2 ), and the associated cost function c(y, w1 , w2 ).
We determine c by solving the cost minimization problem:
min w1 x1 + w2 x2 s.t. f (x1 , x2 ) = y
We define the Lagrangian L = w1 x1 + w2 x2 − µ[f (x1 , x2 ) − y]. The FONC are:
L1 = w1 − µf1 (x1 , x2 ) = 0
L2 = w2 − µf2 (x1 , x2 ) = 0
Lµ = −f (x1 , x2 ) + y = 0
The first two of these imply the tangency condition w1 /w2 = f1 /f2 , while the third is equivalent
to the constraint. Solving these two equations in two unknowns we get the IRFs:
x1 = x∗1 (y, w1 , w2 )
x2 = x∗2 (y, w1 , w2 )
The IRF’s are analogous to the consumer’s demand functions: they represent the optimal (cost-
minimizing) input choices to produce y when input prices are (w1 , w2 ). With these we obtain the
cost function
c(y, w1 , w2 ) = w1 x∗1 (y, w1 , w2 ) + w2 x∗2 (y, w1 , w2 ) (12.1)
which is simply the cost of the cost-minimizing combination of inputs.
12.1 Sheppard’s Lemma
It turns out that given c, one can recover the IRFs by simple differentiation:
∂c(y, w1 , w2 )
x∗1 (y, w1 , w2 ) =
∂w1
At a glance, this appears to be inconsistent with (12.1). Indeed, differentiating (12.1) with respect
to w1 gives three terms:
∂c(y, w1 , w2 ) ∂x∗ (y, w1 , w2 ) ∂x∗ (y, w1 , w2 )

= x∗1 (y, w1 , w2 ) + w1 1 + w2 2 (12.2)
∂w1 ∂w1 ∂w1
However, when an input price changes, x∗1 (y, w1 , w2 ) and x∗2 (y, w1 , w2 ) are constrained to move
along an isoquant as in Figure 12.1. In other words, we have
f (x∗1 (y, w1 , w2 ), x∗2 (y, w1 , w2 )) = y
and this holds even as w1 varies, so, differentiating w.r.t. w1 :

∂x∗1 ∂x∗
f1 + f2 2 = 0
∂w1 ∂w1
68
This means
∂x∗2 f1 ∂x∗1
=− ×
∂w1 f2 ∂w1
So, since x∗1 falls in response to a rise in w1 , x∗2 has to rise, and the rates of change are in the ratio
fx1 /fx2 . (Note that x∗1 responds to a change in w1 just as a demand function does in consumer
theory; the response is like a subsitution effect. Since the isoquant exhibits DMRTS, w1 inc.
=⇒ x∗1 dec.) And substituting (12.1) into (12.2),
∂x∗1

∂c f1
= x∗1 + w1 − w2
∂w1 ∂w1 f2
But w1 − w2 (f1 /f2 ) = 0 by the tangency condition, so the second and third terms on the RHS of
(12.2) always cancel, leaving us with (12.1).
Equation (12.1) says that if w1 rises, the first order effect on cost is proportional to the amount of
x1 the firm originally was using. Although the optimal choices of x1 and x2 also change, they do so
in such a way that y remains constant, and because of the initial tangency condition the movements
in the inputs leave cost unchanged.
Figure 12.1: The price of x1 changes, and the firm adjusts x∗1 and x∗2 without affecting production.
69
13 Supply
13.1 Supply Determination
So far we have studied cost, taking output as given. In this lecture, we consider the output or
supply decision of individual competitive firms. By competitive, we mean the firm takes the prices
of inputs and outputs as exogenous (i.e. beyond the firm’s control). For any firm, profit is defined
as revenue minus cost. For a competitive firm that uses two inputs, 1 and 2, to produce a single
output y with unit price p, profit is given by
π(y) = py − c(y, w1 , w2 )
Note that revenue py is linear in output, whereas the cost function is potentially non-linear. Assume
the firm selects y so as to maximize profit:
max py − c(y, w1 , w2 )
FONC:
dπ
= p − cy (y ∗ , w1 , w2 ) = 0
dy
or, equivalently, price = marginal cost at y = y ∗ . The SOC for a maximum is
d2 π
< 0 =⇒ −cyy (y ∗ , w1 , w2 ) < 0 =⇒ cyy (y ∗ , w1 , w2 ) > 0 =⇒ M C is increasing at y = y ∗
dy 2
The diagram is shown in Figure 13.1a. Note that y ∗ is a function of p and w = (w1 , w2 ). We define
the supply function to be y = y ∗ (p, w1 , w2 ). What if π < 0 at y ∗ (p, w)? See Figure 13.1b.
(a) (b)
Figure 13.1: (a) The firm selects y ∗ such that M C = p. (b) p < AV C =⇒ y ∗ = 0 and AV C < p <
AC =⇒ the firm is not turning a profit but it’s covering its operating costs, so it may be
advised to stay in business and hope for better times.
• If p < AV C then y ∗ = 0. The firm is losing on both fixed and variable inputs: the best choice
is to shut down.
70
• If p > AC, the firm is turning a profit, so y ∗ is such that p = M C(y ∗ ).
• If AV C < p < AC , the firm is incurring a loss, but it’s covering its operating costs, failing
only to cover its fixed costs. The firm may well stay in business and hope for better times.
Figure 13.2 is a useful representation of the firm’s optimal choice.
Figure 13.2: The rectangle represents revenue py ∗ while the area underneath M C represents costs (not
including fixed costs). Thus the shaded area represents
Ry profits (not including fixed cost
payments). Here we are using the fact that c(y) = 0 M C(s)ds + F .
Observations
• If M C is constant (e.g. Cobb-Douglas with α + β = 1), then, assuming no fixed costs,
p < M C =⇒ loss =⇒ y ∗ = 0, and p ≥ M C =⇒ π ∼ y =⇒ y ∗ = ∞ (infinite profit).
• If M C is always decreasing, then supply is undefined, if not zero.
Figure 13.3: At y ∗ defined by p = M C(y ∗ ), profit is not maximized. Why? Consider a reduction in
output. Cost falls by M C and revenue falls by p, so π actually increases. The SOC are not
satisfied since cyy < 0.
Examples:
71
• y = xa , 0 < a < 1 (one input, DRS)
The input requirement function is x∗ (y) = y 1/a , which does not depend on prices. Thus
c(w, y) = wx∗ (y) + F = wy 1/a + F
where F = fixed costs, and

w 1−a
M C(y) = y a
a
F 1−a
AC(y) = + wy a
y
The optimal output supply choice y ∗ solves p = M C(y), which implies

w ∗ 1−a
p= (y ) a
a
or a
ap 1−a
y ∗ (p, w) =
w
Note the following:
y ∗ is homogenous of degree zero in (p, w)
y ∗ increases with p, decreases with w
β
• y = xα
1 x2 , α + β < (Cobb-Douglas with DRS)
Recall that β
α 1
c(y, w1 , w2 ) = k1 w1α+β w2α+β y α+β
for some k1 > 0. Therefore
1−α−β α β
M C(y) = k2 y α+β w1α+β w2α+β
for some constant k2 . Setting p = M C and solving for y gives

α β
α+β − α+β − α+β
y ∗ = k3 p 1−α−β w1 w2
for some constant k3 . Or, equivalently,

α+β α β
log y ∗ = constant + log p − log w1 − log w2
1−α−β 1−α−β 1−α−β
Again y ∗ is homogeneous of degree zero in (p, w), increasing in p, and decreasing in w1 and
w2 .
As an exercise, prove that for a general cost function, the competitive supply response is homoge-
neous of degree zero in all prices, (input and output). Hint: The cost function is homogeneous of
degree one in all input prices.
72
13.2 The Law of Supply
The Law of Supply states that competitive supply functions are always upward sloping:
∂y ∗
>0
∂p
Why? At the optimal level of supply, p = M C. But M C is increasing by the SOC, so if p increases,
the new optimal level of supply increases, too: we simply move along the M C schedule as in
Figure 13.4.
Figure 13.4: Assuming the SOC is satisfied, an increase in p is accompanied by an increase in y ∗ since
the intersection moves upward and to the right.
Formally, y ∗ is defined as the solution to

p − cy (y ∗ (p, w1 , w2 ), w1 , w2 ) = 0. (13.1)
This FONC holds even if we move p (or either of w1 or w2 for that matter). Therefore, differentiating
both sides of (13.1) w.r.t. p,
∂y ∗
1 − cyy (y ∗ (p, w1 , w2 ), w1 , w2 ) =0
∂p
hence
∂y ∗ 1
= .
∂p cyy (y ∗ , w1 , w2 )
But cyy (y ∗ (p, w1 , w2 ), w1 , w2 ) > 0 by the SOC, so ∂y ∗ /∂p > 0!
13.3 Changes in Input Prices
What is the effect of an increase in input prices on the firm’s output decisions? An increase in
input prices, (say w1 ), is associated with a shift in M C. See Figure 13.5.
In the case where M C rises with w1 , we have ∂y ∗ /∂w1 < 0. Is this always the case? We shall see
in the next section!
73
Figure 13.5: An increase in w1 causes the M C curve to shift, usually upward, which causes the intersection
of p and M C to move inward.
74
14 Input Demand for a Competitive Firm
In this lecture we describe the determination of input demands for a competitive firm that sells
output y at price p. Its production function is y = f (x1 , x2 ). Inputs 1 and 2 have prices w1 and
w2 .
The firm’s optimal choice of (x1 , x2 ) is determined in two steps. First, the firm constructs its cost
function c(y, w1 , w2 ). This implicitly defines the optimal input demands x1 and x2 for each level of
y, given input prices.
c(y, w1 , w2 ) = min w1 x1 + w2 x2 s.t. y = f (x1 , x2 )
x1 ,x2
= w1 xc1 (y, w1 , w2 ) + w2 xc2 (y, w1 , w2 )

where xc1 (y, w1 , w2 ) and xc2 (y, w1 , w2 ) are the conditional factor demands. The word conditional
signifies that these input demands depend on the output choice. Note that xc1 and xc2 are very
much like the compensated demand functions for the consumer. In particular, setting L = w1 x1 +
w2 x2 − µ[y − f (x1 , x2 )], we have the following FONC:
L1 = w1 − f1 (x1 , x2 ) = 0
L2 = w2 − f2 (x1 , x2 ) = 0
Lµ = −y + f (x1 , x2 ) = 0
The ratio of the first two FONC implies that w1 /w2 = f1 /f2 . Recall that f1 is the marginal product
of input 1. The ratio f1 /f2 is called the marginal rate of technical substitution (MRTS). This is
the firm’s equivalent of the consumer’s MRS; it gives the slope of an isoquant at (w1 , w2 ). So, the
first order conditions for the cost-min problem are illustrated in Figure 14.1.
Figure 14.1: Illustration of FOC for cost-min problem.
Recall from Section 12.1 that

∂c(y, w1 , w2 )
xci (y, w1 , w2 ) = , i = 1, 2
∂wi
75
Having determined the cost of producing a given level of output, the next step for the firm is to
choose what level of output to produce. It does so by maximizing profit π = py − c(y, w1 , w2 ):
p − cy = 0 =⇒ p = M C (14.1)
∂M C
−cyy < 0 =⇒ >0 (14.2)
∂y
Equation (14.2) means that marginal cost must be rising. See Figure 13.1a. The optimal choice of
y, given (p, w1 , w2 ), is the value y ∗ such that
p = M C(y ∗ , w1 , w2 )
i.e. output is chosen so that price equals marginal cost. Now we are ready to define the firm’s
unconditional input choices. The firm’s unconditional input demands are simply:
xi (p, w1 , w2 ) = xci (y ∗ (p, w1 , w2 ), w1 , w2 ) (i = 1, 2)
In other words, the unconditional input demands are the conditional demands, for the optimal
choice of y. We can think of the problem of finding optimal input demand choices as one of solving
two problems simultaneously: cost-min and p = M C.
Figure 14.2: The level of production plays the role of utility in the consumer choice analogy: w1 rises,
conditional input demand falls.
What happens when w1 rises? Since
x1 (p, w1 , w2 ) = xc1 (y ∗ (p, w1 , w2 ), w1 , w2 )
we have
∂x1 ∂xc1 ∂xc ∂y ∗
= + ∗1 ×
∂w1 ∂w1 ∂y ∂w1
76
The first term is the response of optimal input demand, holding constant y. This is called the
substitution effect. It is just like the consumer’s substitution effect, which is defined as the change
in demand, holding constant u. Instead of being constrained to move along an indifference curve,
the firm is constrained to move along an isoquant as one can see in Figure 14.2.
The second term is called the scale effect. It is somewhat similar to the consumer’s income effect,
except the analogy can be misleading. It reflects the fact when w1 rises, the firm’s M C curve shifts,
so the optimal choice of y shifts. See Figure 14.3.
Figure 14.3: The optimal choice of y shifts due to a change in w1 Assuming input 1 is non-inferior, the
shift is upward.
Recall that if input 1 is non-inferior, then M C shifts upward when w1 rises. Why?

∂M C ∂ ∂c
=
∂w1 ∂w1 ∂y
∂2c
=
∂y∂w1
∂2c
=
∂w1 ∂y

∂ ∂c
=
∂y ∂w1
∂xc1
=
∂y
Thus the derivative of M C w.r.t. w1 is the same quantity as the derivative of the conditional
input demand function w.r.t. y. If input 1 is non-inferior, then ∂xc1 /∂y > 0, so M C shifts upward
whenever w1 rises.
77
In this case we have the pictures shown in Figure 14.4. When w1 rises, the substitution effect and
scale effect both cause a reduction in demand for input 1.
With an inferior input, when w1 rises, M C shifts downward. (E.g. when shovels rise in price the
marginal cost of holes goes down.) But the scale effect is also negative because although the rise
in w1 causes the firm to want to increase output, input 1 is inferior, so the expansion in output
reduces demand! See Figure 14.5.
There is another way to look at the problem of input demands—a so-called “direct approach.”
Suppose the firm simply chose x1 and x2 to maximize
π = pf (x1 , x2 ) − w1 x1 − w2 x2
This is an unconstrained optimization problem, so the FONC are:
pf1 (x1 , x2 ) − w1 = 0 (14.3)

pf2 (x1 , x2 ) − w2 = 0 (14.4)
Note that dividing (14.3) by (14.4) returns the tangency condition f1 /f2 = w1 /w2 . Also, the firm
sets w1 /f1 = w2 /f2 = p. What do these equations mean? If the firm had to increase output, it
could do by increasing input 1 or input 2. If it used input 1, it would require 1/f1 (x1 , x2 ) = p units
to produce an additional unit of output. The marginal cost would be w1 /f1 (x1 , x2 ). If instead the
firm used input 2, the marginal cost would again be w2 /f2 (x1 , x2 ) = p.
Looking back at the Lagrangian for the cost-min problem, notice that the FONC are
w1 = µf1 =⇒ µ = w1 /f1
and
w2 = µf2 =⇒ µ = w2 /f2
Remember that µ is marginal cost. So, when the firm solves the cost-min problem and sets p =
M C = µ, it achieves the same result as if it had carried out the direct approach. Sometimes one
method is more convenient than the other, that’s all.
78
(a) (b)
Figure 14.4: SE causes x1 to decrease, scale effect does too.
(a) (b)
Figure 14.5: Once again despite x1 being an inferior input, SE causes x1 to decrease, and so does scale
effect.
79
15 Industry Supply
The supply curve for an industry consists of the “horizontal sum” of the supply curves of each
individual firm as shown in Figure 15.1.
Notice that if firms vary with respect to their costs, at any market price some firms are profiting,
some are on the margin, and others are out of business. A good example of this is the case of oil
wells. Some wells have low variable costs and always are profitable to operate. Others are high-
cost, and are activated only when crude prices are high. We usually call the profits earned by the
infra-marginal suppliers rents. Presumably the lower costs of these firms arise from their control
over a scarce resource.
A competitive market is in equilibrium if the following conditions hold:
1. Each existing firm has p = M C and π ≥ 0.
2. No remaining firm can afford to enter the market.
These ideas are applicable to the case of a single firm with multiple facilities, or plants. For
example if a firm owns two plants, with MC schedules M C1 (y1 ) and M C2 (y2 ), then the firm
operates efficiently by viewing the plants as separate suppliers. See Figure 15.2 for an example of
this principle, called the principle of decentralization.
80
Figure 15.1: When prices reach p1 , Firm 1 enters the market, when prices then reach p2 , Firm 2 enters
the market (causing the discontinuity in the supply curve), and so on.
Figure 15.2: At prices below p1 , the firm is completely inactive; at prices between p1 and p2 , only plant
1 is active, while at prices above p2 , both plants are active.
81
16 Monopoly I
16.1 Monopolist’s Objective
A monopolist is the sole supplier in a given market. The critical feature of monopolistic behavior
is the fact that a monopolist sets the price, or quantity. Monopolies arise
(a) through exclusive control over resources, e.g. DeBeers monopoly of diamond marketing
(b) through exclusive legal rights, e.g. public utilities, drug companies with patents, etc.
Suppose the demand for output is represented by the function y = D(p). Then we can invert this
to p = p(y), where p = D−1 is usually referred to as the inverse demand function. A monopolist’s
profit is
π(w1 , w2 , y) = yp(y) − c(w1 , w2 , y)
The FONC for profit maximization is
p(y) + yp0 (y) − cy (w1 , w2 , y) = 0 =⇒ p(y) + yp0 (y) = cy (w1 , w2 , y)
The LHS represents marginal revenue M R(y) = p(y) + yp0 (y). If demand is downward sloping, as
usual, then p0 (y) < 0, so M R(y) < p. This is the key point about a monopoly. Since a monopolist
controls the market, it cannot treat price as exogenous. Rather, it has to take into account the fact
that a rise in sales will necessarily come at the expense of a reduction in price. Note that there may
be close substitutes for a product. But as long as a firm is the sole supplier of a given product, it
has monopoly power.
Define the elasticity of demand
∂y p 1 p 1 p(y)
η= · = 0 · =⇒ p0 (y) = ·
∂p y p (y) y η y
We then have

1 p(y) 1
M R(y) = p(y) + yp0 (y) = p(y) + y = p(y) 1 +
η y η
So, for a monopolist,

1
p(y) 1 + = MC
η
As the market demand becomes closer and closer to a horizontal line, η → −∞, demand becomes
perfectly elastic, and p = M C. In other words, in the limiting case, monopoly becomes perfect
competition.
The picture associated with monopoly is shown in Figure 16.1.
Observations:
• A monopolist always sets M R = M C. Since M R = p(1 + 1/η) and η < 0, M R < p. If
|η| < 1, then 1/η < −1 and MR is negative. It follows that a monopolist never operates in a
market in which demand is inelastic. Intuitively, if demand were inelastic, one could increase
82
Figure 16.1: The monopolist selects y ∗ such that M C = M R.
revenue by raising the price! This is a very powerful result. It says that some markets cannot
be considered monopolies, namely those with measured elasticities of demand less than 1 in
absolute value.
• If the monopolist’s MC schedule were the MC schedule of a price taker—or a set of price takers,
i.e. a competitive industry—then equilibrium would occur at p = M C. This would entail
higher output and lower price, but lower profit to the industry as a whole. See Figure 16.2.
Figure 16.2: The area of the region bounded by p = pM , y = D(pM ), and M C, is greater than the area
of the region bounded by p = pC , y = D(pC ), and M C.
• A monopolist does not have a supply schedule per se. First, the monopolist examines the
demand function. Then she establishes the price. There is no schedule of price/quantity
combinations.
∂
• The SOC for profit maximization is ∂y (M R − M C) < 0, or slope of MR < slope of MC. Even
if MC is downward sloping, there may still exist an equilibrium for the monopolist.
16.2 Comparative Statics
See Figure 16.3. Note the following:
83
Figure 16.3: If MC increases, output falls, assuming MR is negatively sloped, which is usually the case.
• If MC increases, (say because an input becomes more expensive), output will fall, assuming
MR is negatively sloped.
• Factors that shift MR will cause output to increase or decrease along the MC schedule. A
constant elasticity of demand function gives
y = Apη pγz I e
where z is another good, I is income, p = price of y, and pz = price of z. Inverse demand is

given by
p = A−1/η y 1/η p−γ/η
z I −e/η
and
M R = p (1 + 1/η) = (1 + 1/η) A−1/η y 1/η p−γ/η
z I −e/η
Thus increases in I or pz shift MR.
Examples:
• Linear y = a − by, and c(y) = α + βy. To find inverse demand, note that p = a/b − y/b. Let
a0 = a/b and b0 = 1/b. Inverse demand may be written p = a0 − b0 y. Revenue is given by
yp(y) = a0 y − b0 y 2 =⇒ M R(y) = a0 − 2b0 y. See Figure 16.4. Equating MC and MR, we
obtain a0 − 2b0 y = β, or
a0 − β
y∗ =
2
and
a0 + β
p∗ = a0 − by ∗ =
2
• Exponential y = apη , η < −1. Inverse demand is given by p = a0 y 1/η , where a0 = a−1/η , and
revenue equals yp(y) = a0 y 1+1/η , hence we have
M R = (1 + 1/η)a0 y 1/η
84
Suppose cost also is exponential, i.e. c(y) = αy β , β > 0. This implies M C = αβy β−1 . Profit
is thus a0 y 1+1/η − αy β , and the FONC is
(1 + 1/η)a0 y 1/η = αβy β−1
The SOC is
1 + 1/η 0 1/η−1
ay − αβ(β − 1)y β−2 > 0
η
which is automatically satisfied whenever β > 1. Solving the FONC,
η
a0
η(β−1)−1
1
y= 1+
η αβ
Note that the optimal choice depends on the parameters of the demand and cost functions.
A change in elasticity of demand causes a shift in the optimal choice of output.
Figure 16.4: Marginal revenue in the case of linear demand.
16.3 Monopoly in Two or More Markets
Suppose a monopolist has access to two markets.

Market 1: p1 = p1 (y1 )
Market 2: p2 = p2 (y1 )
If trade is restricted between the two markets, then p1 and p2 can differ. The firm’s profits are
π = p1 y1 + p2 y2 − c(y1 + y2 )
The FONC are

∂p1
p1 + y1 − c0 (y1 + y2 ) = 0, or M R1 = M C
∂y1
and
∂p2
p2 + y2 − c0 (y1 + y2 ) = 0, or M R2 = M C
∂y2
85
Since M R1 = p1 (1 + 1/η1 ) and M R2 = p2 (1 + 1/η2 ), and in light of the FONC, our model predicts
that
1 + 1/η2 η1 1 + η2
p2 (1 + 1/η1 ) = p2 (1 + 1/η2 ) =⇒ p1 /p2 = =
1 + 1/η1 η2 1 + η1
For example, if η1 = −1.5 and η2 = −2.5, then p1 /p2 = 1.82. The monopolist charges more in the
more inelastic market. This is known as price discrimination.
86
17 Monopoly II
We have shown that a monopolist prefers to distinguish between markets and charge more to
customers with more inelastic demand. This phenomenon is called price discrimination. Sellers
have a strong incentives to attempt to separate customers according to their demand elasticities,
and charge discriminatory prices. Consumers, on the other hand, have a strong incentive to imitate
high-elasticity consumers. There are many devices to separate consumers according to elasticity:
• Advanced purchase versus regular coach fares on airlines. Here, the airlines discriminate against
customers who book at the last minute (typically business travelers) and charge lower prices
to consumers who are willing to shop around.
• Single tokens versus monthly passes on public transit. Presumably, commuters have more
elastic demand for public transit than out-of-town or occasional passengers.
• Discount coupons. Here, retailers are willing to charge lower prices to consumers who are
better informed while continuing to charge high prices to consumers who “can’t be bothered”
with coupons (and therefore reveal themselves as having inelastic demands).
• After-season sales, special Monday-and-Tuesday only sales. Again, retailers are attempting
to separate high-elasticity consumers from those who want only up-to-date items at the peak
of their popularity.
In each case, the key to price discrimination is to impose a cost on the low-price consumers (those
with more elastic demands), in order to prevent high-price consumers from masquerading as low-
price consumers. The cost must be too high for low-elasticity consumers, yet not high enough to
discourage others from buying altogether.
As an example, suppose that, across a population, individual demand elasticities are negatively
correlated with wages. Those with the highest wages have the most inelastic demand; those with
the lowest wages have the most elastic demand. A firm can use a queue, or “line-up,” as follows:
charge a high price with no waiting time, and a low price to those who are willing to line up in a
queue for a while, (e.g. price difference between buying a ticket at a box office versus buying over
the phone.) For a consumer with wage w, the full price is given by

p if she decides not to wait
p0 =
p + wt − d if she waits, where d = price discount and t = waiting time
For this individual, if wt > d, she bypasses the line and pays p, whereas if wt < d, she waits in
line and pays p − d. The firm has successfully charged two prices! Another way to implement price
discrimination is by charging less to those who buy more. Suppose, for example, that there are two
kinds of buyers, (1) low-volume buyers with inelastic demand, and (2) high-volume buyers with
elastic demand. See Figure 17.1. The monopolist can choose y 0 between y1∗ and y2∗ and offer a
two-tiered price system: p1 /unit for those who buy less than y 0 , and p2 /unit for those who buy
at least y 0 . Note that we must have p1 y1∗ < p2 y2∗ , or else the low-volume customers would buy y 0
units and discard what they don’t need. The ultimate price discrimination strategy would involve
charging a separate price for each unit sold, as in Figure 17.2 (for the first unit sold, charge p1 , for
the 20th unit, p20 , etc.). Notice that in this case the MR of the next unit sold is equal to its price,
since the seller doesn’t have to lower prices on the infra-marginal—previously sold—units to sell
87
Figure 17.1: On the left, low-volume buyers with inelastic demand, and on the right, high-volume buyers
with elastic demand. The monopolist can price discriminate in this case.
Figure 17.2: Ideally, the monopolist would charge a separate price for each unit sold.
an additional unit, which means the MR curve is identical to the demand function.9 Thus under
perfect price discrimination:
• quantity is equal to its level under perfect competition
• monopolist revenue = area underneath demand curve
Relative to a perfect price discrimination scheme, consumers benefit when all consumers pay the
same price. The savings to all consumers is the shaded area underneath the demand curve, above
the price line in Figure 17.3. This area is sometimes called consumer’s surplus (CS). By analogy,
the area over the MC curve and up to price line is called producer’s surplus (PS). We have noted
previously that this area equals revenue less total variable cost, so P S = π + F . Also, we saw that
in a competitive industry, the supply schedule is simply the combined MC schedule of the firms that
comprise the industry. The area between the supply and demand curves, (or MC and demand),
represents the total surplus CS + P S pictured in Figure 17.4. This is consistent because CS and
PS both are measured in dollars. Applied economists often evaluate the effect of a government
intervention in a given market by computing ∆CS + ∆P S + GC, where GC denotes the cost to the
9 Actually, as you shall see if you continue reading, this is not quite true. The demand function represents the
number of units demanded when all units sell for a given price.
88
Figure 17.3: The shaded region is sometimes called consumer’s surplus.
Figure 17.4: The dark region is referred to as producer’s surplus.
government. The appeal of this exercise is obvious: it assigns a dollar value to the inefficiency that
arises due to monopolization or the imposition of a tax/subsidy. Nonetheless, there is a problem.
Recall that if y = D(p) is demand function, D indicates how much is purchased when y costs
p/unit. On the other hand, D says nothing regarding demand when the price of the next unit is
p but prices for all previous units are higher. In general, having paid more for the inframarginal
units, the consumers who purchased them have less income with which to purchase additional units.
Higher prices on the inframarginal units have an income effect that is not captured by the ordinary
demand function. In fact, the only case in which CS and PS analysis is completely legitimate is
the one in which demand does not depend on income (the slope of the indifference curve through
x1 = (x1 , x12 ), at x1 , equals the slope of the indifference curve through x2 = (x1 , x22 ), at x2 ), or
each consumer buys at most one unit of the commodity (so that higher prices for the first—and
only—unit purchased don’t lower subsequent demands).
In spite of this problem, CS and PS analysis is a good starting point for evaluating the merits of a
market intervention. For example, suppose that a market is in equilibrium at p = p0 , x = x0 , when
a per-unit tax of t is imposed as in Figure 17.5. Demand falls to x1 , and the amount received by
supplies falls to p1 . Tax revenue is tx1 . The combined loss in CS and PS, however, exceeds the tax
revenue by an amount equal to the area of the shaded triangle. This excess loss is referred to as
the deadweight loss due to the tax. It provides a rough estimate of the inefficiency brought on by
89
the tax.
Figure 17.5: The tax scenario pictured is inefficient because the shaded triangle can be thought of as lost
since it is neither revenue nor savings to any of the parties involved in this market.
Exercises:
1. Calculate deadweight loss in terms of elasticities of supply and demand.
2. Prove that CS + P S is a maximum when D intersects M C.
3. Calculate ∆CS + ∆P S when a competitive industry is monopolized.
90
18 Consumer’s Surplus
In Econ 1 you probably were introduced to the concept of consumer’s surplus (CS). Consider a
consumer who is choosing between two goods, x and y. Denote by x(px , py , I) the consumer’s
demand for good x, given prices px and py , and income I. Now suppose the price of good x rises
from p0x to p1x . The change in consumer’s surplus is the shaded region in Figure 18.1, which can be
written as Z 1 px
∆CS = x(px , py , I)dpx
p0x
As we noted previously, there is a problem with CS: although the vertical height of the inverse
demand function appears to be the most you would be willing to pay for each additional unit,
if someone actually charged you different prices for each unit, your demand would not be given
by the conventional demand curve, (since that is derived under the assumption that you pay the
same price for every unit you purchase). There is, however, a measure of welfare that does make
sense—in fact, there are two. Let u0 represent utility at (p0x , py , I), and let u1 represent utility at
Figure 18.1: The shaded region represents the change in consumer’s surplus due to an increase in px .
(p1x , py , I). Note that u0 > u1 since a rise in prices makes a consumer worse off.
Also note that I = e(p0x , py , I). In other words I is the minimum amount of money needed to
achieve u0 at prices p0x and py . This follows by the fact that the consumer wasn’t wasting money
initially.
Likewise, I = e(p1x , py , u1 ). (Make sure you understand why this must be true.)
Consider the quantity
EV = e(p1x , py , u1 ) − e(p0x , py , u1 ) = I − e(p0x , py , u1 )
This is the amount of money one would have to take away from our consumer initially, leaving
prices alone, so that he would be indifferent regarding a rise in prices. This is called equivalent
variation. It can be thought of as the income equivalent of a rise in prices or, more specifically, a
natural means of measuring the effect on welfare of a rise in prices.
Alternatively, consider the quantity
CV = e(p1x , py , u0 ) − e(p0x , py , u0 ) = e(p1x , py , u0 ) − I
91
This is the amount of money one would have to provide our consumer in order for him to be as
well off under the new prices as he was initially. This is called the compensating variation. It also
appears to be a plausible measure of the effect on welfare of a rise in prices.
Now we shall use Sheppard’s Lemma to connect these two quantities to the area underneath com-
pensated demand curves. Specifically, start from the fact that
∂
e(px , py , u0 ) = xc (px , py , u0 )
∂px
By the Fundamental Theorem of Calculus,
Z p1x
∂
e(p1x , py , u0 ) = e(p0x , py , u0 ) + e(px , py , u0 )dpx
p0x ∂px
hence we have Z p1x
CV = xc (px , py , u0 )dpx
p0x
which is the area “underneath” the compensated demand curve from p0x to p1x . See Figure 18.2.
Note that xc (p0x , py , u0 ) = x(p0x , py , I), so the regular demand curve, with I, and the compensated
Figure 18.2: The area of the shaded region is the Compensating Variation.
demand curve, with u0 , intersect at (x0 , p0x ). But the regular demand curve is flatter. Why? Recall
that by Slutsky:
∂x ∂xc ∂x
= −x
∂px ∂px ∂I
If x is a normal good, then ∂x/∂I > 0, and a rise in prices causes the regular demand to decrease
faster than the compensated demand because of the income effect, hence it appears flatter. All of
this implies CV > ∆CS for a normal good.
For the EV ,
Z p1x
∂
e(p1x , py , u1 ) = e(p0x , py , u1 ) + e(px , py , u1 )dpx
p0x ∂px
Z p1x
= e(p0x , py , u1 ) + xc (px , py , u1 )dpx
p0x
92
so Z p1x
EV = xc (px , py , u1 )dpx
p0x
which is the area “underneath” the compensated demand curve between p0x and p1x , with u = u1 ,
which intersects the regular demand curve at (x1 , p1x ). So we have Figure 18.3 for a normal good x.
We have shown that CV > ∆CS > EV , so you can think of ∆CS as approximating either one of
these.
Figure 18.3: The area of the shaded region is the Equivalent Variation.
93
19 Duopoly
The simplest market to analyze, in between the two extremes of perfect competition and monopoly,
is one with two suppliers. In particular, suppose there are two suppliers of a homogeneous good,
one that cannot be differentiated by consumers. Let y1 denote the amount supplied by Firm 1,
and y2 the amount supplied by Firm 2 so that the inverse demand function is given by p(y1 + y2 ).
Note that inverse demand is a function of the sum y1 + y2 , reflecting the assumption that the two
outputs are perfect substitutes. We shall assume the following, for simplicity
• p(y1 + y2 ) = a − b(y1 + y2 ), i.e. linear demand
• M C = c/unit, a constant
The problem facing these firms is simple:
Firm 1: choose y1 so as to maximize π1 (y1 , y2 ) = y1 p(y1 + y2 ) − cy1
Firm 2: choose y2 so as to maximize π2 (y1 , y2 ) = y2 p(y1 + y2 ) − cy2
Note that Firm 1’s obejctive function depends on Firm 2’s choice, and vice versa.
19.1 Monopolization
What would a monopolist do? Suppose a monopolist owned both firms. Then she would choose y1
and y2 as follows:
max y1 p(y1 + y2 ) + y2 p(y1 + y2 ) − cy1 − cy2 = max (y1 + y2 )p(y1 + y2 ) − c(y1 + y2 )

y1 ,y2 y1 +y2
= max yp(y) − cy
y
= max (a − c)y − by 2
y
The FONC is (a − c) − 2by = 0, or

a−c
y = yM =
2b
where M signifies monopoly. This implies that
a+c
p = pM =
2
Now suppose p = pM but the firms have separate ownership groups, each producing y M /2. (This
would constitute a perfect cartel.) Is this an equilibrium? Probably not. For Firm 1,
∂
M R1 = [y1 p(y1 + y2 )]
∂y1
94
If Firm 1 could increase output with the assurance that Firm 2 would not follow suit, then
∂p
M R1 = p(y1 + y2 ) + y1
∂y1
b
= pM + y M
2
a+c a−c
= −
2 4
3c a
= + > c = MC
4 4
Thus under a joint monopoly (both firms producing y M /2), each firm has an incentive to cheat.
For the industry as a whole,
(y1 + y2 )p(y1 + y2 ) = a(y1 + y2 ) − b(y1 + y2 )2 =⇒ M R = a − 2b(y1 + y2 )
For an individual firm, however,
y1 p(y1 + y2 ) = y1 [a − b(y1 + y2 )] =⇒ M R1 = a − 2by1 > M R
This is a fundamental problem with a cartel; each firm has an incentive to cheat and produce more
if it has any reason to believe the other firm will hold constant its production. The reason is that
when Firm 1 increases output, it considers only how this affects the price of the units it sells; Firm
1 ignores the fact prices fall for Firm 2 as well. A monopolist, by contrast, takes account of the full
effect of a price change on all units sold.
19.2 Duopoly Equilibrium
How does the duopoly market equilibrate? The answer depends how much each firm believes the
other will react to a change in the level of output. The simplest assumption is the one we made
above, that Firm 1 does not believe Firm 2 will adjust its output and vice versa. This assumption
was suggested by Counot, a 19th century French economist. Let’s consider Firm 1’s optimal choice
in this case. Fix y2 . Then Firm 1’s objective is
max y1 p(y1 + y2 ) − cy1 = max y1 [a − b(y1 + y2 )]
y1 y1
The FONC is
a − by2 − 2by1 = 0
The SOC is not a concern. (Why?) This leads to
a − c − by2
y1 = y1∗ (y2 ) =
2b
The function y1∗ is called Firm 1’s reaction function. It represents the optimal choice by Firm 1, as
a function of Firm 2’s level of output, under the Cournot assumption that Firm 2 will not respond
further.
Observations:
95
• If y2 = 0, then Firm 1 acts as a monopolist: y1∗ (0) = (a − c)/2b = y M .
• If y2 ≥ (a − c)/b = 2y M , then y1∗ (y2 ) = 0, that is, Firm 1 is driven out of the market.
• The slope of the reacion function is −1/2. Every two additional units produced by Firm 2
cause Firm 1 to reduce output by one unit.
Figure 19.1: Firm 1’s reaction function, assuming linear demand.
By the same token, there is a reaction function for Firm 2, taking Firm 1’s output as given.
Following the same procedure as above,
a − c y1
y2∗ (y1 ) = −
2b 2
If Firm 1 decides upon its ouput, given Firm 2’s output, and Firm 2 does the same, then where
does this process end? Presumably, it ends when Firm 1’s choice, taking Firm 2’s output as given,
is such that given this level of output, Firm 2 produces the same level of output as Firm 1 thought
it would. Formally,
y1 = y1∗ (y2∗ (y1 ))
If y1 is an equilibrium choice, then it has the property that when Firm 1 chooses y1 , Firm 2
chooses y2∗ (y1 ), and the optimal response by Firm 1 is y1∗ (y2∗ (y1 )), which leads us back to y1 . In
mathematical terms, y1 is called a fixed point of the composition of functions y1∗ ◦ y2∗ .
Fortunately for us, there is a convenient way of visualizing a Cournot equilibrium. We simply plot
the reaction functions (remembering which is which!) as in Figure 19.2. Equilibrium occurs when

a − c 1 a − c y1
y1 = y1∗ (y2 ) = y1∗ (y2∗ (y1 )) = − −
2b 2 2b 2
Solving,
2 M
y1 = y2 = y
3
and therefore
a + 2c
p=
3
The details are left to the reader.
96
Figure 19.2: Cournot equilibrium supply y 0 = 23 y M .
19.3 Price Setting vs. Quantity Setting
The previous section was an analysis of the outcome when two duopolists take each other’s output
as given. A similar analysis can be carried out when duopolists set prices. For example, consider
end-to-end railroads that wish to set the rates for freight. Railroad 1 hauls from point A to point B,
at p1 /ton, and Railroad 2 hauls from point B to point C, at p2 /ton. Demand for transport services
from A to C depends on p1 + p2 . Assume for the sake of simplicity, that demand is linear:
x = a − b(p1 + p2 )
Note that this means the two segments are perfect complements. (They are consumed together, so
demand is a function of p1 + p2 only.) Let’s assume, too, that cost per ton for Railroad 1 is c1 , and
cost per ton for Railroad 2 is c2 . Suppose a single firm owned both railroads. Then it would choose
a total price p so as to maximize
π(p) = (a − bp)(p − c1 − c2 )
The FONC implies
a + b(c1 + c2 )
p= = pM
2b
where pM denotes the monopolist’s price.
Now suppose the two railroads act as duopolists, each taking the other’s price as given. For the
first railroad,
π1 (p1 , p2 ) = [a − b(p1 + p2 )](p1 − c1 )
The FONC implies
a − bp2 + bc1
p∗1 (p2 ) =
2b
which looks a lot like the reaction function in the quantity-setting scenario. In particular, the slope
is again −1/2. By symmetry, Railroad 2’s reaction function is
a − bp1 + bc2
p∗2 (p1 ) =
2b
97
In equilibrium, p1 = p∗1 (p∗2 (p1 )). Solving,
a 2c1 − c2
p∗1 = +
3b 3
and
a 2c2 − c1
p∗2 = +
3b 3
For price-setting duopolists who sell perfectly complementary products in a market with linear
demand,
2a c1 + c2
p∗1 + p∗2 = +
3b 3
Note that
a − b(c1 + c2 )
p∗1 + p∗2 − pM =
6b
If the railroads charged a combined c1 + c2 , demand would be x = a − b(c1 + c2 ) > 0. Thus the
duopolists actually charge an even higher price than a monopolist. This special result is due to the
perfect complementarity.
98
20 Symmetric Cournot Equilibria
20.1 n-Firm Symmetric Cournot Equilibria
Duopoly is simple when the two firms are identical and equilibrim is symmetric, with each firm
producing an equal share of the industry supply. Let us continue to assume linear demand. Recall
that for Firm 1,
π1 = y1 p(y1 + y2 ) − cy1 = ay1 − by1 (y1 + y2 ) − cy1 = (a − by2 − c)y1 − by12
The FONC is
a − by2 − c = 2by1
Let y1 = y2 = y 0 , since we are in a symmetric equilibrium. Now solving the FONC,
a−c 2
y0 = = yM
3b 3
The same appeal to symmetry enables us to solve for equilibrium output in a market with n
suppliers. In this case
n
! n
X X
π1 = y1 p yi − cy1 = ay1 − b yi − cy1
i=1 i=1
The FONC is
n
X
a−b yi − 2by1 − c = 0
i=2
As before, yi = y 0 for all i, so

a−c
y0 =
b(n + 1)
and
1 n
p = p0 = a − b(ny 0 ) = a+ c
n+1 n+1
As the number of firms increases, (relative to the “size” of the market), the symmetric Cournot
equilibrium has each firm supplying less and less, and price converging to the competitive price c.
As a practical matter, the presence of fixed costs often prevents us from having a large number of
firms in a given industry. With fixed costs, there is a social cost to more firms—namely, that total
fixed costs associated with the industry rise—as well as a benefit due to less monopolistic behavior.
In our example, since costs are constant, there is no inefficiency as output per firm falls.
The problem is illustrated by Figure 20.1. In each case, the firm has c(y) = ky 2 /2+F , and therefore
M C = ky and AC = k/2y p + F/y, which is U-shaped. Optimal AC is achieved by choosing y such
that M C = AC, or y E = 2F/k. (E signifies efficient scale.) In Figure 20.1b, in order to have
three or more firms, p must exceed p0 ; otherwise, firms would fail to recover their fixed costs. In
some cases p exceeds even the price that a monopolist would charge.
99
(a) (b)
Figure 20.1: (a) Competitive paradigm. min AC is achieved at small scale relative to the size of the
market. (b) Non-competitive paradigm. min AC is achieved at a level of output that is large
relative to the size of the market.
20.2 Alternatives to the Cournot Assumption
1. Return for a moment to the duopoly model of Section 19, with linear demand and constant
marginal cost. Recall that Firm 1’s objective is to maximize
π1 = y1 p(y1 + y2 ) − cy1
Under the Cournot assumption Firm 1 selects y1∗ (y2 ) under the assumption that y2 is fixed.
Suppose, however, that Firm 1 has reason to believe Firm 2 will respond to Firm 1’s choice
by setting y2 = ψ(y1 ). What does Firm 1 do in this case? The FONC is
p(y1 + y2 ) + y1 p0 (y1 + y2 )[1 + ψ 0 (y1 )] − c = 0
For example, Firm 2 might announce: “We plan to increase our output in (constant) propor-
tion to yours.” Then
dy2 dy1
=
y2 y1
which implies
dy2 y2
ψ 0 (y1 ) = =
dy1 y1
and the FONC becomes
p(y1 + y2 ) + (y1 + y2 )p0 (y1 + y2 ) − c = 0
But this should remind you of the FONC for joint profit maximization that we saw in Sec-
tion 19.1. Therefore, if each firm announced to the other the rule that
dyi yi
= i, j = 1, 2
dyj yj
the two firms maintain the same level of output as a joint-monopoly, (provided each believes
the other).
100
2. A second class of alternatives to the Cournot assumption involves a duopoly in which one
firm is “savvy” and the other one is “naive.” Suppose for example that Firm 2 always takes
y1 as given, i.e. Firm 2 adopts the Cournot assumption. Firm 1 on the other hand is savvy,
and recognizes Firm 2 is employing the Cournot reaction function y2∗ (y1 ). Firm 1 is said to
be a Stackelberg leader while Firm 2 is a Stackelberg follower. (Stackelberg was an early 20th
century German economist.) It can be shown that (1) the leader does better than the follower,
(2) the leader does better than either firm would in a symmetric Cournot model, and (3) the
follower does worse than either firm would in a symmetric Cournot model.10
10 Condition (3) is redundant since the symmetric Cournot equilibrium is Pareto optimal.
101
21 Game Theory I
In Sections 19 and 20 we considered a duopoly with linear demand
p(y1 + y2 ) = a − b(y1 + y2 )
and constant marginal cost

M C1 = M C2 = c
We identified three possible strategies:
1. Cooperation. Each firm produces y M /2.
yM a−c
yi = = i = 1, 2
2 4b
a+c
p = pM =
2
πM (a − b)2
πi = = i = 1, 2
2 8b
2. Joint non-cooperation. Each firm produces y 0 = 2y M /3.
2 M a−c
yi = y = i = 1, 2
3 3b
0 a + 2c
p=p =
3
(a − c)2
πi = π 0 = i = 1, 2
9b
The situation is jointly non-cooperative in the sense that each firm is acting in its own,
narrowly defined best interest, given what the other firm is producing. Given that Firm 1
produces y 0 , Firm 2 is advised to produce y 0 as well.
3. Cheating given that your competitor is cooperating. For example if Firm 1 sets y1 = y M /2,
Firm 2’s best response is
M
1 yM

∗ y M 3
y2 =y − = yM
2 2 2 4
which means
3a + 5c
p = pC =
8
We have also
3(a − c)2
π1 = π L =
64b
9(a − c)2
π2 = π W =
64b
102
where W stands for “winner” (in this case cheater!), and L of course stands for loser. Notice
that
1
πW > πM > π0 > πL
2
Cooperation is better than joint non-cooperation but, given that your competitor is cooper-
ating, your best response is to cheat.
We can illustrate the dilemma in a box such as Figure 21.1, showing each firm’s actions and the
resulting payoffs as ordered pairs.
Firm 2
Cooperate Don't Cooperate

M M L W
Cooperate π /2, π /2 π ,π
Firm 1
Don't
πW, πL
0 0
π ,π
Cooperate
Figure 21.1: The strategies are listed along the edges of the box. The payoffs are listed in order with
Player 1 first.
E.g. if Firm 1 cooperates and Firm 2 does not, the payoffs are (π L , π W ), where the first coordinate
corresponds to Firm 1 and the second to Firm 2. Set (a − c)2 /b = 1. Then our “game box” looks
like Figure 21.2.
Player 2
C ¬C
Player 1
C 1/8, 1/8 3/64, 9/64
¬C 9/64, 3/64 1/9, 1/9
Figure 21.2: “C” stands for Cooperate, and ¬C for not C, or Don’t Cooperate.
Given a box like this, we can figure out which stategy each player will adopt.
• Suppose Firm 2 believes Firm 1 will play C (cooperate). Firm 2 then checks the second
coordinate of each entry in row 1: 9/64 > 1/8, so Firm 2 plays ¬C (don’t cooperate). However,
if Firm 2 believes Firm 1 will play ¬C, then Firm 2 checks the second coordinate of each entry
in row 2: 1/9 > 3/64, so again Firm 2 plays ¬C.
• We can evaluate Firm 1’s choices the same way, only this time we check columns rather than
rows, and we compare first coordinates. The result is the same: Firm 1 is better off playing
¬C, regardless of Firm 2’s choice.
Notice that in this game there is always an incentive for each player to choose ¬C, regardless of what
the other player does. An action that is always the best response is called a dominant strategy. The
game pictured in Figure 21.3 doesn’t have a unique dominant strategy. In this game, we say that
(C,C) and (¬C,¬C) are Nash equilibria. A Nash equilibrium in a 2-player game is a combination
of strategies (S, T ) such that
103
1. Given that Player 1 has chosen S, Player 2’s best response is T .
2. Given that Player 2 has chosen T , Player 1’s best response is S.
Player 2
C ¬C
Player 1 C 2, 2 1/2 3/2 ←2 chooses C if 1 plays C
¬C 3/2, 1/2 1, 1 ←2 chooses ¬C if 1 plays ¬C

↑ ↑
1 chooses C if 2 1 chooses ¬C if 2
plays C plays ¬C
Figure 21.3: (C,C) and (¬C,¬C) are Nash equilibria since given that Player 1 or Player 2 plays C, his
opponent’s best response is to play C, and likewise for ¬C.
The duopoly game has a unique Nash equilibrium in (¬C,¬C). The game in Figure 21.3 has two
Nash equilibria, although one is superior to the other.
You may have seen the duopoly game in disguise. One common version is known as the Prisoner’s
Dilemma. Suppose you and a former friend are involved in a legal dispute. You and he will appear
before a judge who will determine who takes custody of the cat you bought together. You can hire
a lawyer, or not. Suppose further that you estimate the probability of winning is 1/2 if neither of
you hires a lawyer, or if you both hire a lawyer. But, if one of you hires a lawyer and the other
one does not, the one who is represented by a lawyer wins with probability 3/4. As we can see by
looking at the box in Figure 21.4, hiring a lawyer is a dominant strategy. The problem is lawyers
ex-Friend
No Lawyer Lawyer
No Lawyer 1/2, 1/2 1/4, 3/4
You
Lawyer 3/4, 1/2 1/2, 1/2
Figure 21.4: Hiring a lawyer is a dominant strategy in this game.
cost money, so your true “payoff” with a lawyer is lower than the box suggests. In fact both parties
are better off agreeing not to hire lawyers. But this is not a Nash equilibrium. Figure 21.5 displays
real data pertaining to child custody cases in California in the early 1980s.
It may be possible to induce cooperation in a game that is played repeatedly. For example, consider
the following long-term strategy by a participant in a duopoly game:
• If Player 1 sees that the price last time was pM , then she produces y M /2.
• If Player 1 sees that the price last time was pC , she infers that Player 2 cheated, and “punishes”
Player 2 by producing y 0 the next k times, after which she reverts to y M /2.
Questions to consider:
104
Mother
No Lawyer Lawyer
No Lawyer 75% 86%
Father
Lawyer 49% 65%
Figure 21.5: Percentage of Mothers Awarded Child Physical Custody in San Mateo and Santa Clara
Counties, California, 1984. Source: Mnookin, Maccoby, Depha, and Albiston (1989)
1. Does the threat of punishment stop Player 2 from cheating?

2. Is the threat credible?
To answer Question 1, consider the costs and benefits of cheating in the current period, ignoring
the time value of money:
Benefit = π C − π M /2 = 9/64 − 1/8 = 1/64

Cost = k(π M /2 − π 0 ) = k/72
Clearly, if k ≥ 2, the threat will deter cheating! As for Question 2, the punishment is to produce
y 0 . This is not too crazy, but is it credible? Given that Player 2 has cheated, he could simply
claim that it was an honest mistake and promise not to do it again. Player 1 could then bypass the
punishment—does this sound familiar?—and save herself k(π M /2 − π 0 ) too! So she has a strong
incentive not to follow through on her threat. This is an example of a dynamic inconsistency.
Player 1 would like to commit herself to carrying out the punishment in return for a deviation from
cooperative play but, given that Player 2 has cheated, she hurts herself by doing so and therefore
has an incentive to bail out early.
105
22 Game Theory II
22.1 Tree Diagrams
In Section 21 we described a punishment strategy for the repeated Cournot game in which a player
chooses his current level of output based on last period’s price: if pt−1 < pM , he decides to produce
y 0 for the next k periods. Afterwards he reverts to cooperative play, producing y M /2. We showed
that this strategy is effective, provided his compeitor believes he actually will execute it. But should
his competitor believe him? The same issue arises in numerous contexts:
• The Cold War. The U.S. threatened to start nuclear war with the USSR if the USSR invaded
Western Europe. Many Europeans themselves believed that even if the USSR invaded, the
U.S. simply would cut its losses.
• Flood relief. The government would like to discourage homeowners from living in flood prone
areas such as the New Jersey shore. But when a flood strikes, the government inevitably
offers disaster relief.
• Entry deterrence. A grocery store currently is a monopolist in a certain town. Another chain
is considering building a new store to compete with the existing one. The incumbent threatens
to reduce prices if the chain enters the market.
Figure 22.1: The first coordinate is the payoff to Player 1, the incumbent, and the second coordinate is
the payoff to Player 2, the potential entrant. Note that we could replace (0, 0) and (π 0 , π 0 )
with (0, −F ) and (π 0 , π 0 − F ), where F denotes fixed cost of entry.
We can analyze simple dynamic games with the aid of a tree diagram such as the one in Figure 22.1,
which shows each party’s possible moves. Consider the entry deterrence game. First, the potential
entrant decides whether to enter. Then, the incumbent decides whether to engage in a price war.
106
As before,
π M = profit per year for incumbent without any competitors

π 0 = profit per year for each firm if entry followed by Cournot duopoly
In a price war, the incumbent charges p = c and earns no profit. Notice that once the potential
entrant (Player 2) has acted, it’s up to the encumbent to decide where to go from there. Suppose
Player 2 has entered. The incumbent (Player 1) has to choose between the top two nodes: π 0 > 0,
so clearly it doesn’t make sense to fight once the competitor has entered. Thus we can conclude
that Player 1 will choose “don’t fight.”
Player 2 on the other hand looks at the ultimate payoffs to entry. If she enters, she gets π 0 since
she knows that Player 1 will choose “don’t fight,” so she always enters.
The method we used to analyze this game is called backward induction. At the last stage, depending
whose turn it is, we deduce this player’s action by comparing his payoffs. Then we back up to the
previous move.
Notice that (enter, don’t fight) is the dynamically consistent equilibrium here. (enter, fight) is not
dynamically consistent even though Player 1 threatens to fight, because given that Player 2 has
entered, Player 1 seeks to maximize his payoff and thus doesn’t fight.
Implications:
• In the Cold War, it was not a credible threat to promise all-out nuclear war if the USSR
invaded Western Europe.
• In hostage situations, it is not a credible threat to claim that you “don’t negotiate” with
terrorists.
• In the entry game above, it is not a credible threat to claim that you will wage a price war if
another supplier enters the market.
• The punishment strategy outlined in Section 21 is not a credible threat.
22.2 Interpretation
The prededing analysis is predicated on players behaving rationally; despite threatening to do

something, once the time comes to make good on the threat, they always do what is in their best
interest, regarless of the events leading up to that time. This is encountered quite often in economics
and finance, e.g. “bygones are bygones” and “sunk costs don’t count.”
Notice that in our entry game, the incumbent (Player 1) would like to be able to commit herself to
behaving irrationally. If the entrant (Player 2) knows that the incumbent will in fact fight, then he
won’t enter, especially if there is a fixed cost of entry.
Suppose there is an earlier decision that Player 1 can make to alter the payoffs should Player 2
enter. Here the decision might involve investing in overhead that increases the operating costs for
the incumbent so that the payoffs are
π M − C, if Player 2 doesn’t enter, and
107
π 0 − A, if Player 2 enters and Player 1 doesn’t fight
Figure 22.2: Here the investment decision reduces Player 1’s payoffs by A if he doesn’t fight in the event
of entry and C if there is no entry.
This is illustrated in Figure 22.2. Will Player 1 invest in this strategy? Again, the answer is found
by backward induction:
• Path 1 (Player 1 doesn’t invest). We know that if Player 2 enters, Player 1 won’t fight. Player
2 gets π 0 if he enters and 0 otherwise, so he will enter, which means Player 1 gets π 0 .
• Path 2 (Player 1 does invest). We know that if Player 2 enters, Player 1 will fight if A > π 0 .
Assuming this condition holds, Player 2 knows Player 1 will fight, so Player 2 won’t enter,
which means Player 1 gets π M − C. This is worthwhile if C < π M − π 0 .
• Thus Player 1’s payoffs boil down to the following:
– don’t invest in entry deterrence, earn π 0
– invest in entry deterrence, earn π M − C
Conclusion: Player 1 may make an investment in overhead provided:
• it reduces the profit from not fighting when Player 2 enters (A > π 0 )
• it isn’t too costly when Player 2 doesn’t enter (C < π M − π 0 )
The key to entry deterrence is that once the incumbent decides to invest, the decision must affect
his payoffs. He is committing to fighting by changing his payoffs in the latter stage of the game.
108
There is an extension of this model to the case in which potential entrants don’t know whom they’re
dealing with. Suppose there are two types of incumbents:
• rational incumbents with payoffs 0 in a price war and π 0 in a duopoly
• “mad dog” incumbents with payoffs π ∗ in a price war and π 0 − S in a duopoly
The possibility that π ∗ > 0 reflects the idea that the mad dog likes to fight. S > 0 can thus be
thought of as the shame that a mad dog feels for backing down. If π ∗ > π 0 − S, the mad dog
will fight, and this is the case if the mad dog really enjoys fighting or feels a substantial amount of
shame for backing down.
Suppose there is a fixed cost of entry F . The game looks like Figure 22.3. If Player 2 enters and
the incumbent is a mad dog, then a fight ensues and the entrant gets −F . If Player 2 enters and
the incumbent is rational, the incumbent doesn’t put up a fight and the entrant gets π 0 − F . Player
2’s expected profit11 is
E[π2 ] = P (mad dog) × (−F ) + P (rational) × (π 0 − F )
As the incumbent, you want to raise the entrant’s belief that you are crazy!
Figure 22.3: The potential entrant has no idea which type of incumbent he’s dealing with. It behooves
the incumbent to signal that he’s crazy!
11 See Section 23 for a definition of expected value.
109
23 Uncertainty I: Income Lotteries
In the next four sections we extend the theory of consumer choice to the context of choice under
uncertianty. For simplicity, we deal mainly with uncertainty regarding income. Assuming that
prices are fixed, alternative realizations of random income translate directly into alternative utility
levels. We begin with a brief review of statistics.
23.1 Review of Basic Statistical Concepts
We define the mean, or expected value of a random variable X, denoted by E[X] (or sometimes by
X), to be
Xn
E[X] = p i xi
i=1
where X takes the value xi with probability pi . The mean is just a weighted average of the
alternative realizations of X, with the weights being the probabilities associated with the respective
realizations.
Consider the two random variables X1 and X2 with probability distributions as shown in Figure 23.1.
Note that
E[X1 ] = 10 × .1 + 20 × .2 + 30 × .4 + 40 × .2 + 50 × .1 = 30,
E[X2 ] = 10 × .5 + 50 × .5 = 30,
so while these distributions have the same mean, X2 is more dispersed (X1 on the other hand is
more concentrated near its mean).
Figure 23.1: Two different distributions with identical means.
One way to describe the level of dispersion of a random variable is by its variance, denoted V[X]:
n
X
V[X] = pi (xi − X)2 .
i=1
The variance of X is the mean squared difference between X and X. As an exercise, calculate V[X1 ]
and V[X2 ] above. We say that a random variable X is degenerate if X = E[X] with probability
one, in which case V[X] = 0.
110
We can also consider functions of random variables. If g is a function defined on R, then Y = g(X)
is a random variable. We define the mean E[Y ] as follows:
n
X
E[Y ] = E[g(X)] = pi g(xi ).
i=1
If g is linear, i.e. if g(x) = ax + b for some choice of a and b, then

n
X
E[Y ] = pi (axi + b)
i=1
n
X n
X
=a pi xi + b pi
i=1 i=1
| {z }
1
= aE[X] + b.
As an exercise, show that V[aX + b] = a2 V[X] for any choice of a, b.
23.2 Choices Over Uncertain Incomes
We now suppose that individuals are asked to make choices between alternative income lotteries.
Each lottery is essentially a probability distribution of income. In ranking two alternative lotteries,
we hold constant income in the absence of either lottery, (which in reality could be random).
Let y denote income. In a world without uncertainty individuals always prefer more income to
less, so the following utility functions are all equivalent in the sense that they give rise to the same
indifference curves:
u(y) = ay + b, a > 0
u(y) = ey
u(y) = y 3
Since each function is increasing, it indicates a preference for more income. This is all we need, if
all we want to know is how to rank incomes.
On the other hand, suppose we wish to rank income lotteries. For example, consider:
Payoff Probability
Lottery 1: $100 0.5
0 0.5
Payoff Probability
Lottery 2: $70 0.5
$30 0.5
In the 1940s John von Neumann and Oskar Morgenstern asked: is there some way of assigning
a utility number to each possible outcome in such a way that we can compare these lotteries by
111
comparing the expected utilities:
0.5 × u(100) + 0.5 × u(0) in case of Lottery 1

0.5 × u(70) + 0.5 × u(30) in case of Lottery 2
The answer is yes, (under some assumptions), although we won’t prove it. Thus, if preferences
satisfy certain conditions, then there is a utility function—call it an expected utility function—
defined on the set of all possible incomes, that we can use to compare both certain incomes, (which
is trivially easy anyway), and income lotteries. The idea is that if we get the utility differnces
between different incomes just right, then we can use the expected utility criterion to compare
lotteries.
NOTE: Normally we don’t care about the gauge of a given utility function. That is, if u is a utility
function, then we regard v = g(u) as equivalent, provided g is a non-decreasing function.
How do you feel about Lottery 1 versus Lottery 2? Chances are, you would take Lottery 2. This
reveals something about the shape of your expected utility function.
Figure 23.2: Concave expected utility function. u(50) > 0.5×u(30)+0.5×u(70) > 0.5×u(0)+0.5×u(100).
An expected utility function u is always increasing since more money is always better than less,
(for an economist anyway). If u is linear, e.g. u(y) = ay + b, then
u(0) = b,
u(30) = 30a + b,
u(70) = 70a + b, and
u(100) = 100a + b,
so clearly 0.5 × u(70) + 0.5 × u(30) = 0.5 × u(0) + 0.5 × u(100). This leads to our first result:
If the expected utility function is linear, then lotteries with equal expected utilities are
considered equally good.
On the other hand, if you prefer Lottery 2, then your expected utility function must be concave as
in Figure 23.2. If you prefer Lottery 1, this reveals that your expected utility function is convex.
112
In general, it is useful to assume that people are risk-averse (gambling is an exception). We say
that a person is risk-averse if he prefers x for sure rather than x + , where is a random variable
with E[] = 0:
E[u(x)] ≥ E[u(x + )]
If u is concave, this equation holds. Why? For any realization of , say = i ,
u concave =⇒ u(x + i ) ≤ u(x) + i u0 (x)
So, taking expectations over all realizations of ,
E[u(x + )] ≤ E[u(x)] + E[u0 (x)] = u(x) + u0 (x)E[] = u(x)
since E[] = 0 by assumption.
113
24 Uncertainty II: Expected Utility
24.1 Expected Utility
In Section 23 we introduced the idea of a special utility function, defined over nonrandom incomes,
with curvature such that a consumer can use it to rank income lotteries. In particular, if an income
lottery is available that pays yi with probability pj , then it can be compared with any other lottery
based on the expected utility criterion:
X
E[u(y)] = pi u(yi )
i
This function is called a von Neumann-Morgenstern utility function (vN-M), or sometimes simply
an expected utility function.
Examples:
• Linear. u(y) = ay + b, gives rise to an expected value ranking.
• Power function. u(y) = y α , where 0 < α < 1. This function is concave, so people with
preferences such as these are risk-averse.
• Exponential. u(y) = − exp(−ry), where r > 0. This function is increasing and concave, and
ranges from −∞ to 0. This particular function is often used in finance because if all income
lotteries are normally distributed, we get a nice ranking: for y ∼ N (µ, σ 2 ), it can be shown
that12
E[− exp(−ry)] = − exp(−rµ + r2 σ 2 /2) = − exp[−r(µ − rσ 2 /2)]
Therefore, a lottery with mean µ and variance σ 2 is assigned a value based on µ − rσ 2 /2.
This is nice because, given µ, individuals with higher values of r assign a greater discount to
a lottery with higher risk (variance).
We know that vN-M utility functions are not invariant under arbitrary transformations. If your
vN-M utility function is u(y) = αy, then you are risk-neutral, and care only about expected values.
If my vN-M utility function is p
v(y) = u(y)
√
then mine is concave (v ∝ y), and therefore I am risk-averse. Thus you and I evaluate lotteries
differently. Expected utility functions are, however, invariant under increasing linear transforma-
tions. In other words, if your vN-M utility function is u(y) and mine is v(y) = au(y) + b, where
a > 0, then we evaluate lotteries the same way. To see this, consider a pair of lotteries y1 and y2 .
Suppose you prefer y1 , i.e.
E[u(y1 )] > E[u(y2 )]
Then it also is true that
aE[u(y1 )] + b > aE[u(y2 )] + b ⇐⇒ E[au(y1 ) + b] > E[au(y2 ) + b]

12 This can be shown by manipulating the moment generating function (MGF) for the normal distribution. The
reader is advised to consult a book on mathematical statistics.
114
so I too prefer y1 . This fact is very useful because it means we can rescale a vN-M utility function
so that the worst income realization (among a given set of lotteries) is assigned the value 0 and
the best one is assigned the value 1. To see this, imagine that we are comparing several lotteries:
the worst outcome is −10, 000 and the best outcome is 250, 000. Suppose u(−10, 000) = u0 and
u(250, 000) = u1 . Then v(y) = au(y) + b, where
1
a=
u1 − u0
u0
b=− 1
u − u0
has v(−10, 000) = 0 and v(250, 000) = 1. We have seen already that v evaluates lotteries in the
same way as u, so we are better off using v instead.
Figure 24.1: Risk-neutral individual has p(1, 000)+(1−p)(−100) = 250, or p = 0.318, whereas risk-averse
individual has p250 > 0.318.
We now are in a position to describe how to derive one’s own vN-M utility function. Assume the
best possible outcome among lotteries under consideration is 1,000, and the worst is −100. We wish
to assign utilities to all possible incomes ranging from −100 to 1,000. Begin by setting u(−100) = 0
and u(1, 000) = 1. For any intermediate income level, e.g. 250, as yourself:
If I had to choose between 250, and a lottery in which I receive 1,000 with probability
p, and −100 with probability 1 − p, what value of p would make me indifferent?
Call this quantity p250 . Clearly 0 < p250 < 1. Also, p251 > p250 (although not by much, probably).
Now simply set u(250) = p250 . Why does this work? By definition
p250 u(1, 000) + (1 − p250 )u(−100) = u(250)
and we’ve normalized u so that u(1, 000) = 1 and u(−100) = 0, hence u(250) = p250 . Experimental
economists use this idea in the lab to figure out whether a subject is more or less risk-averse. As
Figure 24.1 shows, the more convex one’s preferences, the bigger is p250 , and the better the chances
of winning 1,000 have to be in order to forfeit 250 certain.
115
24.2 The Demand for Insurance
We now use the expected utility function to show that if you are risk-averse, and you have access to
actuarially fair insurance, then you will insure yourself fully against any risk. For example, suppose
your income is 30,000, and the probability that you will have an accident is p = 0.05. In the event of
an accident your medical bills will be 10,000. Your vN-M utillity function is u. Without insurance
your expected utility of income is
(1 − p)u(30, 000) + pu(20, 000)
How does insurance work in a simple world? An insurance contract for 1 worth of coverage is a
promise by the insurance company to pay you 1 if you have an accident, and nothing otherwise. If
the premium, i.e. the cost to you, is π, then the expected value of the contract to the insurance
company is
(1 − p)π + p(π − 1)
With probability 1 − p, you pay the premium and nothing happens. With probability p, you pay
the premium but there is a claim and therefore a benefit payment of 1. If insurance companies were
risk-neutral, they would compete for business by reducing π to the point that
(1 − p)π + p(π − 1) = 0
This is so-called actuarially fair insurance: coverage of 1 is available for a premium equal to the
probability of a claim.
Suppose you buy c units of coverage at a premium of π. Your expected utility is
ϕ(c) = (1 − p)u(30, 000 − πc) + pu(20, 000 − πc + c)
where the function ϕ captures the value of different levels of coverage. If you choose c so as to
maximize ϕ, the FONC is
ϕ0 (c) = −π(1 − p)u0 (3−, 000 − πc) + p(1 − π)u0 (20, 000 − πc + c) = 0
The SOC is not a concern since
ϕ00 (c) = π 2 (1 − p)u00 (30, 000 − πc) + p(1 − π)2 u00 (20, 000 − πc + c)
is always negative under the assumption that you are risk-averse. (Why? u concave =⇒ u00 < 0.)
Consider the FONC carefully for π = p. In this case
u0 (30, 000 − pc) = u0 (20, 000 + c(1 − p))
If u00 < 0 as usual, then u0 is strictly decreasing and therefore one-to-one, so
u0 (x) = u0 (y) ⇐⇒ x = y
hence
30, 000 − pc = 20, 000 + c(1 − p)
or c = 10, 000!
Exercises:
116
1. Redo the analysis of Section 24.2 assuming that if you buy insurance at all, you have to pay
an underwriting fee of f . The price per unit of coverage remains p (total cost of c units of
coverage is pc + f ). Show that there is a number F such that
f ≤ F =⇒ you insure yourself fully
f > F =⇒ you don’t buy insurance at all
2. Redo the analysis of Section 24.2 assuming π > p, i.e. the insurance is not actuarially fair.
117
25 Uncertainty III: Moral Hazard
One of the most interesting problems in markets with uncertainty is that of moral hazard, the
tendency of economic agents to change their behavior inefficiently upon having entered a contract
or some sort. We owe this term to the insurance industry: a policy holder who fails to exercise due
caution because he is insured is known as a moral hazard. A good example of this is a driver who
rents a car and purchases the “full insurance” option. Moral hazard can arise in other contexts as
well. For example, it often is argued that welfare systems discourage those who are in the system
from seeking employment. In this section we analyze the demand for insurance when policyholders,
through their own efforts, are capable of influencing the likelihood of an accident. We show that
1. With full insurance, policyholders have no incentive to avoid accidents.
2. A solution to the moral hazard problem involves a deductibility clause.
In particular, a high deductible generally will induce a greater level of preventive care at the cost of
inducing variability in the policy holder’s income. Thus there is a tradeoff between insurance and
efficiency.
The model is simple. In each state of the world (accident/no accident) the insured has y initially.
In the even of an accident he loses `. The insurance company offers to pay c in this event, in return
for a per unit charge of π regardless. Expected utility depends on both ultimate wealth and effort x
expended on accident prevention. Assume that consumers evaluate income-effort bundles according
to
u(ultimate income) − d(effort)
where u is an expected utility function and d represents the cost of a concerted effort to avoid an
accident. Assume d is convex, with d(0) = d0 (0) = 0 as in Figure 25.1.
Figure 25.1: A representative cost-of-effort function.
The probability of an accident is p(x), where p is a decreasing function with p(0) = 0.5. We must
have p(x) > 0 and p0 (x) < 0 for all x > 0. A consumer who buys c units of coverage and who
expends x units of effort has expected utility
ϕ(c, x) = p(x)[u(y − πc − ` + c) − d(x)] + (1 − p(x))[u(y − πc) − d(x)]

= p(x)u(y − ` + c(1 − π)) + (1 − p(x))u(y − πc) − d(x)
118
Notice that since equal effort is expended whether or not there is an accident, we end up subtracting
d(x) from expected utility of income. Suppose the insurance company, through vast experience,
knows p(x), i.e. knows how much effort the insured will expend. If they break even, then
(1 − p(x))π − p(x)(1 − π) = 0 (25.1)
by the same line of reasoning as in Section 24.2, so they charge π = p(x).

The consumer views π as exogenous and chooses x so as to maximize expected utility. The FONC
are13
ϕc = p(x)(1 − π)u0 (y − ` + c(1 − π)) + π(1 − p(x))u0 (y − πc) ≥ 0 (25.2)

ϕx = p0 (x)[u(y − ` + c(1 − π)) − u(y − πc)] − d0 (x) = 0 (25.3)
Since the insurance company sets π such that (25.1), (25.2) may be rewritten as follows
u0 (y − ` + c(1 − π)) − u0 (y − πc) ≥ 0 (25.4)
Suppose that equality holds in (25.4), i.e. the insured gets all the coverage he wants. Then, as in
Section 24.2
y − ` + c∗ (1 − π) = y − πc∗ ⇐⇒ c∗ = `
But, with full coverage is there any incentive to be cautious? If the insured goes out of his way to
be careful, p falls, and he saves
u(y − ` + c(1 − π)) − u(y − πc)
in utility. With full coverage the savings are nil: he doesn’t reap the benefit of his actions because
the insurance company bears all of the risk. Therefore, if d0 (0) = 0, the FONC are satisfied with
x∗ = 0 and c∗ = `, i.e. the insured takes minimal care. Insurance companies expect this and set
premiums accordingly.
This level of care is socially inefficient because the marginal cost of care is 0 when x = 0. If the
insured were just a little bit more careful, it would cost next to nothing, yet it would result in fewer
accidents and lower premiums. There is a breakdown in the usual argument about markets leading
to socially efficient outcomes because each consumer views the premium as exogenous even though
ultimately π = p(x) since, in the long run, the insurance company understands what is going on.
25.1 Solution with No Moral Hazard
Suppose the consumer recognizes that π = p(x) (this would be true if the insurance company could
monitor her behavior). In this case her objective is to maximize
ϕ(x, c) = p(x)u(y − ` + c(1 − p(x))) + (1 − p(x))u(y − cp(x)) − d(x)

13 The “>” in (*) reflects the idea that the consumer is “rationed.”
119
The FONC are
ϕc = p(x)[1 − p(x)]u0 (y − ` + c(1 − p(x))) − p(x)[1 − p(x)]u0 (y − cp(x)) = 0 ⇐⇒ c = `
ϕx = p0 (x)[u(y − ` + c(1 − p(x))) − u(y − cp(x))] − d(x)
− cp(x)p0 (x)u0 (y − ` + c(1 − p(x)))
− cp0 (x)[1 − p(x)]u0 (y − cp(x))
=0 (25.5)
Compare this to (25.3) and note that allowing premiums to vary according to effort gives rise to
extra terms. Now use y − ` + c[1 − p(x)] = y − cp(x) in (25.5):
− p0 (x)u0 (y − cp(x))` = d0 (x) (25.6)
which has the following interpretation: if the insured expends more effort, the cost is d0 (x), the
RHS. On the other hand this reduces the likelihood of an accident by p0 (x), saving ` times marginal
utility of income u0 (y − cp(x)), the LHS. The optimal level of caution is such that the marginal costs
and marginal benefits are perfectly balanced. Note that (25.6) usually implies a level of caution
greater than zero, unless p0 (x) = 0, in which case an increase in effort doesn’t reduce the likelihood
of an accident. Notice too that the optimal solution has marginal benefit of accident prevention
equal to u0 (y − cp(x)) × `.
25.2 A Partial Solution
How can we incentivize the insured to expend effort avoiding accidents when her efforts aren’t
rewarded with a lower premium? Look at equations (*) and (**). Assuming π = p,
u0 (u − ` + c(1 − π)) − u0 (y − πc) ≥ 0 (25.7)
0 0
p (x)[u(y − πc) − u(y − ` + c(1 − π))] = d (x) (25.8)
Suppose the insurance company refuses to sell full coverage (c < `). Utility in the accident state is
less than it is otherwise,
u(y − πc) > u(y − ` + c(1 − π))
and there is in fact an incentive to avoid accidents. The insured prefers more coverage—because
the LHS of (25.7) is positive—but the insurance company refuses to sell any more. The insurance
company is instituting a deductible that the insured must pay in the event of an accident. The
amount of the deductible, ` − c, influences the amount of care taken by the insured.
Let a = ` − c. Then (25.8) becomes
−p0 (x)[u(y − π` + πa) − u(y − π` − a(1 − π))] = d0 (x)
or
d0 (x)
∆u = u(y − π` + πa) − u(u − π` − a(1 − π)) = −
p0 (x)
See what the deductible does? It provides the insured with less income in the accident state. Now
we can show that a higher deductible causes the insured to try even harder to avoid an accident.
For example, if
p(x) = p − αx
120
and
d(x) = βx2
then optimal effort x∗ is such that
2βx∗
∆u =
α
or
α
x∗ = ∆u
2β
i.e. optimal effort increases with ∆u, which in turn increases with a.
121
26 Uncertainty IV: The State-preference Approach and Ad-
verse Selection
In this section we continue to consider the insurance problems with the following characteristics:
• An individual with income y is at risk of a losing `.
• The loss occurs with probability p.
• The individual has a vN-M utility function u(z), where z denotes (net, or ultimate) income.
• The individual has access to actuarially fair insurance with a per-unit premium of π = p.
• The individual’s expected utility is
ϕ(c) = pu(y − ` + c(1 − π)) + (1 − p)u(y − πc)
With c units of coverage the problem is summarized in Table 3. We now introduce a graphical way
Table 3: A summary of our assumptions regarding a policy holder.
State Probability Income Utility

accident p y − ` + c(1 − π) u(y − ` + c(1 − π))
no accident 1−p y − πc u(y − πc)
of analyzing a problem such as this. The approach we take is called the state-preference approach,
and it applies only if p is fixed. Therefore it is less useful in moral hazard style problems where p
varies according to the endogenous variable x.
26.1 Setup
Figure 26.1: The slope of the indifference curves, on their way through the line yA = yN , is −p/(1 − p).
122
Think of the consumer as choosing a bundle consisting of two goods: income in the accident state
y A and income in the no-accident state y N . His expected utility is then
v(y A , y N ) = pu(y A ) + (1 − p)u(y N )
We can draw indifference curves in y A y N -space as in Figure 26.1. Note that in this case
v1 p u0 (y A )
M RS = = × 0 N
v2 1 − p u (y )
Along the line y A = y N ,

p
M RS =
1−p
The consumer of Table 3 is represented graphically by Figure 26.2. Note that every point on the
Figure 26.2: E denotes the consumer’s endowment, her income’s in each state without insurance.
line through (y − `, y) with slope −p/(1 − p) has the same expected utility. Why? Consider varying
y A and y N in such a way as to hold expected income constant:
pdy A + (1 − p)dy N = 0
or
dy N p
=−
dy A 1−p
The consumer has access to insurance with a per-unit premium of π = p, so with c units of coverage
his incomes are
y A = y − ` + c(1 − π)
y N = y − πc
As coverage increases, the consumer moves along a line with slope −π/(1 − π) since each unit of
coverage raises income in the accident state by 1 − π and reduces income in the no-accident state
by π. But π = p, so we have Figure 26.3. Recall that on y A = y N , M RS = −p(1 − p), so, if the
123
Figure 26.3: The tangency condition is satisfied by full coverage.
Figure 26.4: Here π > p, so the “budget line” is rotated about E.
consumer can buy as much insurance as he wants, then he will satisfy the tangency condition for a
constrained optimum by choosing c = `.
What happens if π > p? Have a look at Figure 26.4. Starting from the endowment (y − `, y), the
budget line has slope14 π/(1 − π) > p/(1 − p), hence the optimum lies above the line y A = y N .15
This means that when insurance companies sell policies with a “load” factor, consumers buy less
than full coverage. Note the following:
• If π is too high as in Figure 26.5, the consumer won’t buy any coverage.
• If π < p, the consumer will over-insure as in Figure 26.6.
14 Steepness, actually.
15 All indifference curves have slope −p/(1 − p) on the line y A = y N , and u concave =⇒ v exhibits DMRS—verify
this!
124
Figure 26.5: Here π is prohibitively high, so the consumer does without insurance altogether.
Figure 26.6: Here π is so low that the consumer can afford to provide himself with more income in the
accident state than he has initially, so he tends to over-insure.
26.2 Adverse Selection
We now are ready to consider the case of two types of consumers: high-risk consumers with p = pH
and low-risk consumers with p = pL . We assume that consumers know their own type but that the
insurance company cannot tell who’s who. Suppose the population is half high-risk, half low-risk,
and the average level of risk is therefore given by
pH + pL
p̄ =
2
If the insurance company were to charge everyone π = p̄, the low-risk consumers would buy less than
full coverage and the high-risk consumers would over-insure (or buy as much as they are allowed,
up to c = `).
How might equilibrium work in this case? One possibility is a signaling equilibrium in which the
125
nsurance companies offer two types of policies: one with a low premium π L = pL that requires a
deductible a, another with a high premium π H = pH and no deductible. If a is high enough, the
high-risk consumers will self-select and opt for the high-premium policy. The deductible has to be
such that they are a little worse off than they would be with a high premium and no deductible,
otherwise they would masquerade as low-risk consumers.16
Figure 26.7: The low-risk consumer is in blue, and the high-risk consumer is in red.
In this model, which was described first by M. Rothschild and J. Stiglitz (QJE, 1976), the deductible
associated with the low-risk contract serves the same purpose as extra education acquired by job
seekers in M. Spense’s job-market signalling model. The point is that low-risk consumers have a
lesser cost of bearing risk (deductible) because they know an accident is relatively unlikely. The
presence of the high-risk consumers is a problem for the low-risk consumers: if they can identify
themselves credibly, they can achieve higher utility, but the only way to identify themselves credibly
is by purchasing a contract that a high-risk consumer would turn down.
Note that R-S equilibrium involves firms and consumers. Firms charge actuarially fair premiums
and therefore don’t profit in the long run. Consumers of all types are happy to make the appropriate
choices.
16 This should remind the reader of price discrimination.
126
27 Auctions I: Types of Auctions
Many items are sold by auction, including treasury bills, broadcasting rights, real estate, livestock,
fine art, and natural resources (e.g. timber lands and oil fields). Large companies and governments
also use procedures that are equivalent to auctions to determine who will supply goods or services
in some cases.
In this section and the next we examine how economists model auctions. Although auctions have
existed for centuries, the basic theory thereof is quite modern. One good, somewhat advanced ref-
erence is Paul Klemperer, “Auction Theory: A Guide to the Literature,” Journal of Economic Sur-
veys Vol 13 (3), July 1999, which is available at http://www.nuff.ox.ac.uk/users/klemperer/
survey.pdf.
27.1 Basic Types of Auction
There are four basic types of auction for a single good:

1. English Auction. Also known as an “ascending bid” auction, this probably is the one with
which you are most familiar. An auctioneer acts as moderator, and asks for bids from a group
of n bidders. If a bidder bids b(n) , and no one outbids him, then he wins the auction and pays
b(n) in return for the good. Note that the auctioneer may be a computer. (eBay essentially
is an English auction arena, although each of the auctions has a time limit, which is unusual
for an English auction.)
2. Dutch Auction. Also known as a “descending bid” auction, the auctioneer calls out a de-
scending sequence of prices, starting from a price that is clearly too high. The first bidder to
announce that she is willing to accept the current price, b(n) , wins the auction and pays b(n)
in return for the good.
3. First-Price Sealed-Bid Auction. Bidders submit written bids. At a certain point the bidding
is closed. The auctioneer then selects the highest bid b(n) , which is declared the winner. The
winner pays b(n) .
4. Second-Price Sealed-Bid Auction. Also known as a “Vickery” auction, bidders again submit
written bids and, at a certain point, the bidding is closed. The auctioneer then selects the
highest bid b(n) , which is declared the winner; however, the winner pays the second highest
bid b(n−1) .
Auction models differ in their assumptions regarding how the value of an item at auction varies
from one person to the next, and how much the bidders know about their own potential valuations
as well as those of other bidders. The value of the item to bidder i will be donoted vi .
We shall focus on three important cases:
1. Private Values. Each valuation is independent and known only to the bidder.
2. Common Value. vi = v for all i but v is unknown. (Examples might include an auction to
sell the rights to drill for oil in a certain tract of land.)
127
3. Affiliated Values. vi varies across bidders but bidders themselves do not know their own
valuations with certainty, and the valuations are positively correlated. (Examples might
include an auction for a house.)
27.2 Important Results Concerning the Private Values Case
1. A Dutch auction is equivalent to a first-price sealed-bid auction.

In a Dutch auction there is no dynamic choice: one must choose an opt-in price ex ante and,
if the price falls to that level, opt in and receive the good for that price. This is the same
problem as deciding what bid to submit in a first-price sealed-bid auction. (We defer for the
moment the optimal choice of bidding strategy in these auctions.)
2. In an English auction the optimal strategy is to keep bidding until the current highest bid b
exceeds your valuation vi . Why?
(a) If b > vi , you are advised to walk away for otherwise, if you bid b0 > b, then the eventual
winner will pay at least b > vi .
(b) If vi > b, and you walk away, you leave a surplus vi − b > 0 “on the table.”
(c) If b = vi , then bid b + and, if no one outbids you, break even.
3. In light of (2c), in an English auction the bidder with the highest valuation wins, and pays
the second highest valuation (plus a marginal amount needed to surpass the second highest
bidder).
4. In a second-price sealed-bid auction, the optimal strategy is to bid your valuation.
Suppose your true value is v and you bid v − x, where x ≥ 0. Suppose the highest bid among
all other bidders is w.
(a) If v > w, you win and pay w.
(b) If v < w, you lose and pay nothing.
Your expected surplus,
s = (v − w)P (v − x > w)
is maximized by setting x = 0!
Now suppose you bid v + x, where x and w are as before.
(a) If v + x < w, then you lose and pay nothing.
(b) If v + x > w, you win and pay w. Your surplus is v − w. There are two cases:
v ≥ w =⇒ you want to win
v < w =⇒ you don’t want to win
If you set x = 0, you win iff v > w, so x = 0 is the best choice.
5. Based on (4), in a second-price sealed-bid auction the winner is the bidder with the highest
valuation, who then pays a price equal to the second highest valuation.
128
6. Results (3) and (5) imply that in the private values case an English auction is equivalent to
a second-price sealed-bid auction.
7. In the common value case English auctions may be different because a bidder can make
inferences based on the identity/size of the remaining pool of bidders.
27.3 Bidding in a First-price Auction
How does one bid in a first-price auction? Let us start by making a few assumptions:
• The valuations v1 , . . . , vn are independent and identically distributed (IID), with P (vi ≤ x) =
IID
F (x) for all i. (This is sometimes written v1 , . . . , vn ∼ F , where F is the cumulative
distribution function, or CDF.)
• Each bidder adopts the same strategy and bids bi = B(vi ), where B is the bid function.
What does the bid function look like?
• B is increasing for otherwise bidder with highest valuation wouldn’t necessarily win.
• B increasing =⇒ B invertible =⇒ if b = B(v), then v = g(b), where g denotes the inverse
bid function B −1 .
Assuming each bidder bids according to B,
P (you win with bid b) = P (vj < g(b) for all other bidders j) = [F (g(b))]n−1
Let s = s(b, v) = expected surplus given bid b and valuation v. Then
s = (v − b) × [F (g(b))]n−1
| {z } | {z }
surplus prob of winning
if win
What is the FONC for b?

∂s ∂
= −[F (g(b))]n−1 + (v − b)(n − 1)[F (g(b))]n−2 × F (g(b))
∂b ∂b
= −[F (g(b))]n−1 + (n − 1)(v − b)[F (g(b))]n−2 f (g(b))g 0 (b)
where f = F 0 is the probability density function. Setting ∂s/∂b = 0,
[F (g(b))]n−1 = (n − 1)(v − b)[F (g(b))]n−2 f (g(b))g 0 (b)
which implies
F (g(b)) 1 1
v−b= × 0 × (27.1)
f (g(b)) g (b) n − 1
Note that since B 0 > 0, so is g 0 = 1/B 0 , and therefore v − b > 0. This means one should always
“shade” his or her bid. Why? If you bid more, then although you win more often you also pay
more. Like a monopolist, you must take this into account.
129
IID
As an example, suppose v1 , . . . , vn ∼ Uniform (0, 1), that is, suppose the valuations are indepen-
dent and identically distributed with

 0, v < 0
F (v) = v, 0 ≤ v ≤ 1
1 v>1

and therefore
1, 0 ≤ v ≤ 1
f (v) =
0, otherwise
In this case (27.1) says
1 g(b)
g(b) − b = ×
n − 1 g 0 (b)
which is a differential equation with solution
n
g(b) = b
n−1
This is easily verifiable:
n
n n − (n − 1) b 1 n−1 b 1 g(b)
g(b) − b = b−b= b= = × n = × 0
n−1 n−1 n−1 n−1 n−1 n − 1 g (b)
The bid function is recovered by setting v = g(b) and solving

n
v= b
n−1
for b = B(v),
n−1
B(v) = v
n
130
28 Auctions II: Winner’s Curse
In this section we analyze the winner’s curse. The winner’s curse arises in common value and
affiliated values auctions, in which each bidder estimates the value of the item at auction. Bidders
with higher guesses, on average have made a positive error. So bidders must shade their bids to
compensate for the fact that when they win, on average it is the result of being overly optimistic.
A very simple example of the winner’s curse is the following: a police car is to be sold at auction
using a second-price sealed-bid system. Each bidder inspects the car, then the bidding begins.
The true value of the car is v, a random variable with mean µ and variance σv2 . This reflects the
idea that over many auctions, the average value of an old police car is µ. But there is variability
from one car to the next, captured by σv2 .
Each bidder hires a mechanic to estimate the value of the car. The mechanic reports his or her
estimated value ti = v + i , where i is the error in the mechanic’s assessment. A bidder doesn’t
know the values reported to his competitors by their respective mechanics.
IID
We assume 1 , . . . , n ∼ F , with mean zero and variance σ2 . Note that the larger is σ2 relative
to σV2 , the “noiser” the mechanics’ reports.
Based on ti , the ith bidder estimates the true value of the car. In particular, the bidder forms an
estimate
yi = λti + (1 − λ)µ
The idea behind this is that if ti is significantly noisy, the bidder should downweight the mechanic’s
report and assume instead that the average value at auction is more credible. Note that
E[yi ] = λE[ti ] + (1 − λ)µ = λE[v + i ] + (1 − λ)µ = λµ + (1 − λ)µ = µ
What is the optimal value of λ? The forecast error for a given value of λ is
δi = yi − v
= λti + (1 − λ)µ − v
= λ(v + i ) + (1 − λ)µ − v
= (1 − λ)(µ − v) + λi
The variance of the forecast error is
V[δi ] = V[(1 − λ)(µ − v) + λi ]
= (1 − λ)2 V[v] + λ2 V[i ]
= (1 − λ)2 σv2 + λ2 σ2
We choose λ to minimize the variance of the forecast error. The FONC is
∂V[δi ]
= −2(1 − λ)σv2 + 2λσ2 = 0
∂λ
which implies (1 − λ)σv2 = σ2 , or
σv2
λ = λ∗ =
σv2 + σ2
131
You may have seen this before: λ∗ is the signal-to-total-variance ratio. If σ2 is small, then λ∗ is
nearly one, and the result is a weighted average with more weight on the mechanic’s report.
Based on the mechanic’s report, plus the optimal choice of λ = λ∗ , each bidder now has a good
idea as to the value of the car in the current auction. Since it is a second-price auction, one might
think each bidder simply should bid
yi∗ = λ∗ ti + (1 − λ∗ )µ
But this will give rise to a winner’s curse! The highest bidder wins the auction—this is the bidder
whose mechanic made the biggest positive error. He pays the amount of the second highest bid,
which includes the second biggest positive error. So, even in a second-price auction, one must take
into account the fact that on average the second highest bidder also was overly optimistic, and
shade his or her bid.
IID
In an auction with 10 bidders, if 1 , . . . , n ∼ N (0, 1), i.e. if the errors are normally distributed
with mean zero and variance one,17 then the expected value of the second highest error (n−1) is
approximately 1.003. Table 4 lists a few other cases. As you can see, E[(n−1) ] grows with n. This
Table 4: Expected Second Highest Error for Various Numbers of Bidders
n E[(n−1) ]
10 1.001
25 1.524
35 1.692
100 2.148
might cause you to worry a little about eBay: with thousands of bidders on a given item, if you do
not know what the item is worth to you, then you ought to shade your bid quite a bit. But what
should you bid?
Suppose each bidder shades his or her bid by k:
yi = λ∗ ti + (1 − λ∗ )µ − k
= λ∗ v + (1 − λ∗ )µ + λ∗ i − k
Let ¯ = E[(n−1) ]. (This quantity usually is estimated by computer simulation.18 ) The expected
bid of the second-highest bidder is
λ∗ v + (1 − λ∗ )µ + λ∗ ¯ − k.
So, if everyone sets k = λ∗ ¯, then each bidder should expect to pay
E[λ∗ v + (1 − λ∗ )µ] = µ
in the event he wins, which means that on average he pays what the item is worth.
Bottom line: in the second-price auction we have set up, each bidder bids what she believes the
item is worth based on her information, minus a discount that is equal to the expected second-highest
error in the observed signal. Note that in the real world bidders hire consultants to simulate the
auction (by making educated guesses as to σ2 and σv2 ).
17 See Figure 28.1.
18 See Appendix 28.1
132
Standard Normal Distribution
0.0 0.2 0.4 0.6 0.8 1.0
−4 −2 0 2 4
Figure 28.1: Standard normal distribution with PDF in red and CDF in black.
28.1 Appendix: Order Statistics

IID
Let X1 , . . . , Xn ∼ FX , and let Yk = X(n−k+1) deonte the kth biggest observation, 1 ≤ k ≤ n.
Then
FY2 (x) = P (exactly one observation is greater than x) + P (all observations are at most x)
= n[1 − FX (x)] × [FX (x)]n−1 + [FX (x)]n
= n[FX (x)]n−1 − n[FX (x)]n + [FX (x)]n
= n[FX (x)]n−1 − (n − 1)[FX (x)]n
and therefore
fY2 (x) = FY0 2 (x)

= n(n − 1)[FX (x)]n−2 fX (x) − n(n − 1)[FX (x)]n−1 fX (x)
= n(n − 1)[FX (x)]n−2 fX (x)SX (x)
where SX (x) is defined to be 1 − FX (x). Then

Z
E[Y2 ] = n(n − 1) x[FX (x)]n−2 fX (x)SX (x)dx
R
133
28.1.1 Uniform Distribution
IID
If X1 , . . . , Xn ∼ Uniform (0, 1), then
Z 1
E[Y2 ] = n(n − 1) xn−1 (1 − x)dx
0

1 1
= n(n − 1) −
n n+1
n−1
=
n+1
28.1.2 Normal Distribution

IID
If X1 , . . . , Xn ∼ N (0, 1), then
Z
E[Y2 ] = n(n − 1) x[Φ(x)]n−2 [1 − Φ(x)]ϕ(x)dx
R
where Φ is the standard normal CDF and

1 2
ϕ(x) = Φ0 (x) = √ e−x /2
2π
This can be computed by Gaussian quadrature with the following R19 function:
ESB.gq = function(n, CDF, PDF){
# - computes approx. EV of 2nd biggest observation among n IID draws from dist. "CDF"
# - corresponding density is "PDF"
f = function(x){x*n*(n-1)*(CDF(x))^(n-2)*PDF(x)*(1 - CDF(x))}
# - "f" is density of 2nd biggest observation
integrate(f, -Inf, Inf)$value
}
As a check, we may approximate E[Y2 ] for n = 100 by simulation. The following R script returns
E[Y2 ] ≈ 2.148444, which agrees fairly well with the previous result.
ESB.sim = function(n,B){
# - computes EV of 2nd biggest observation among n IID draws from standard normal
x = 0
for(i in 1:B) x = x + sort(rnorm(n))[n-1]
x/B
}
print(ESB.sim(100,1e06))
19 R is a programming language for statistical computing that is available free of charge at http://www.r-project.
org.
134
29 Finance I: Capital Asset Pricing Model
In this section we consider the implications of the simple assumption that investors hold only port-
folios of assets that are mean-variance efficient. This turns out to have the surprising consequence
that the price of a stock, i.e. the price of a share in a publicly traded company, depends on the
covariance between the return on the stock and the return on the market as a whole. This result
was discovered in the early 1960s by William Sharpe, who shared the Nobel Price in Economics on
the basis of his work. The theoretical model is called the Capital Asset Pricing Model (CAPM).
29.1 Assumptions
We shall assume that investors can choose among a set of assets, i = 1, . . . , n. An investor with x
to invest selects a portfolio, or in other words a list of the amount invested in each
Pn of the possible
assets. Denote by αi the share of x in asset i. Note that αi ≥ 0 for all i, and i=1 αi = 1. You
can think of any vector α ∈ Rn with non-negative elements that add up to one as a portfolio.
We shall assume for simplicity that our investor has a one-period horizon, or holding period. An
investment of $1 in asset i will be worth 1 + Ri at the end of the holding period. Thus Ri is the
proportional return on asset i over the holding period. Note that Ri ≥ −1 if assets have limited
liability for in that case the worst event would be to lose one’s entire investment. R1 , . . . , Rn are
random variables. The mean of Ri is ri = E[Ri ], and the variance is σi2 . Asset i is said to be
riskier than asset j if σi2 > σj2 . We do not assume that the returns on the respective assets are
independent. Instead, we assume there to be potential covariances
2
σij = Cov[Ri , Rj ] = E[(Ri − ri )(Rj − rj )]
Note that the covariance of the return on asset i, with itself, is
E[(Ri − ri )2 ] = V[Ri ] = σi2

2
so we shall occasionally write σii instead of σi2 .
Recall20 the following: Given two random variables X and Y , and observations (X1 , Y1 ), . . . , (Xn , Yn ),
if you are interested in a line of best fit revealing the dependence of Y upon X, then you carry out
a linear regression of Y on X. This procedure returns the least squares estimates α and β in the
linear model
Yi = α + βXi + i
where 1 , . . . , n are the residual errors. The optimal coefficients are found by minimizing the
residual sum of squares
X n n
X 2
2i = (α + βXi − Yi )
i=1 i=1
20 If the reader has not taken econometrics or statistics, then this may not look familiar.
135
This can be done by the same method we used in Section 28 to analyze the winner’s curse, and
gives
Cov[X, Y ] σ2
β= = XY
2
V[X] σX
Now if the investor with x selects a portfolio α = (α1 , . . . , αn ), then how much will he have at the
end of the holding period?
• The total amount invested in asset i is αi x.
• At the end of the holding period this investment is worth αi x(1 + Ri ).
• The investor now has
n n n
! n
!
X X X X
αi x(1 + Ri ) = x αi + αi Ri = x 1 + αi Ri
i=1 i=1 i=1 i=1
The return on the portfolio is

Pn n
x (1 + i=1 αi Ri ) − x X
R= = αi Ri
x i=1
which is a weighted average of the returns on the respective assets, with each weight equal to the
corresponding share. The expected return is
" n # n
X X
E[R] = E αi Ri = αi ri
i=1 i=1
What is the variance of R?

 !2 
n
X n
X
V[R] = E  αi Ri − αi ri 
i=1 i=1
 !2 
n
X
= E αi (Ri − ri ) 
i=1
 
Xn X
n
= E αi αj (Ri − ri )(Rj − rj )
i=1 j=1
n X
X n
= αi αj E[(Ri − ri )(Rj − rj )]
i=1 j=1
Xn X n
2
= αi αj σij
i=1 j=1
n
X X X
= αi2 σii
2
+ 2
αi αj σij
i=1 1≤i≤n 1≤j≤n
j6=i
136
The CAPM hinges on two assumptions:
1. There exists a risk-free asset—call it asset 1.
2. The market portfolio—the portfolio consisting of equal amounts of all shares—is mean-variance
efficient in the sense that no other portfolio realizes the same return but with lesser variance.
29.2 Conclusion
Pn
Consider an efficient portfolio αP = (α1P , . . . , αnP ), with return RP = i=1 αiP Ri , and which has
the least possible variance subject to yielding an expected rate of return of rP = ERP . Such a
portfolio solves
n X
X n n
X n
X
2
min αi αj σij s.t. αi ri = rP and αi = 1
α
i=1 j=1 i=1 i=1
The Lagrangian is
n X
n n
! n
!
X X X
2
L(α, λ, µ) = αi αj σij −λ αi ri − rP −µ αi − 1
i=1 j=1 i=1 i=1
FONC w.r.t. α:
n
X
LαPj = 2 αiP σij
2
− λri − µ = 0, 1≤j≤n (29.1)
i=1
In particular, (29.1) holds for asset 1; however, asset 1 is risk-free by assumption, and therefore
2
σ1i = 0 for all i. Thus, for asset 1, (29.1) implies
µ = −λr1 = −λr,
where r deontes the risk-free rate of return. Hence we can rewrite (29.1) as follows:
n
X
2 αiP σij
2
= λ(rj − r), 1≤j≤n (29.2)
i=1
Multiplying by αjP ,
n
X
2αjP αiP σij
2
= αjP λ(rj − r), 1≤j≤n
i=1
and summing across all assets,

n
X n
X n
X
2 αjP αiP σij
2
= αjP λ(rj − r)
j=1 i=1 j=1
which is equivalent to  
n X
X n n
X
2 αiP αjP σij
2
= λ αjP rj − r
j=1 i=1 j=1
137
or
n X
X n
αiP αjP σij
2
= λ(rP − r) (29.3)
j=1 i=1
Pn P 2
Pn Pn P P 2
since j=1 αj rj = rP . Let σP = j=1 i=1 αi αj σij denote the minimum variance of the
portfolio with expected return rP . Then (29.3) says 2σP2 = λ(rP − r), or
2σP2
λ=
rP − r
Plugging the result back into (29.2),
n 2
X σij
rj − r = (rP − r) αiP (29.4)
i=1
σP2
Finally, notice that

n
" n
#
X X
αiP σij
2
= E (Rj − rj ) αiP (Ri − ri )
i=1 i=1
" n
#
X
= Cov Rj , αiP Ri
i=1
= Cov [Rj , RP ]
Pn
In other words, i=1 αiP σij
2
is the covariance of the return on asset j, with the return on the efficient
portfolio that has expected return rP . We can rewrite (29.4) as follows:
Cov[Rj , RP ]
rj − r = (rP − r) (29.5)
σP2
2
Suppose now that (29.5) holds for the return on the market portfolio, RM . Let σM denote VRM .
Then it follows by (29.5) that
rj = r + β(rM − r)
where
Cov[Rj , RM ]
β= 2
σM
is the regression coefficient one would get by carrying out linear regression of the return on asset j,
on the return on the market portfolio. This regression coefficient is called the asset’s beta.
29.3 Summary
An asset’s beta measures the amount of systematic risk the asset carries. E.g. if β > 1, then the
asset is expected to outperform the market in good times and perform worse than the market in
bad times. This is risky! For an asset with β = 1.5, in a market with r = 0.05 and rM = 0.13, an
expected rate of return of
0.05 + 1.5 × (0.13 − 0.05) = 0.17
would be considered fair compensation for assuming the risk associated with this asset.
138
30 Finance II: Efficient Market Hypothesis
30.1 Review
In Section 29 we considered the implications of the CAPM assumptions with respect to the return
on a risky asset i. In particular, if asset i has random return Ri and beta β, (i.e. if, on average,
when RM is 1% higher/lower than usual, then Ri is β% higher/lower than usual), then ri = E[Ri ]
is required to satisfy
ri = π0 + βπ1 (30.1)
where, in the CAPM, π0 = r is the risk-free rate of return and π1 = rM − r is the excess return on
the market portfolio.
Researchers in the 1970s and 80s attempted to test the CAPM by estimating betas for classes of
assets and checking whether, on average, assets with bigger betas had higher expected returns. This
was not always successful. These days, economists interpret the model less literally. Often, they
augment the model with additional factors. The original CAPM says that all one needs to know
about an asset is its covariance with the market. A more agnostic view is that while beta matters,
there may be additional considerations. So it is common to see models such as the following:
ri = π0 + βπ1 + γπ2
where γ is some other factor. A typical factor is one that reflects how a given asset covaries with a
portfolio comprised of small stocks, or with a portfolio made up of bonds as opposed to stocks.
One key use of (30.1) is the determination of how to discount returns on different assets. For
example, if we are dealing with an asset that is priced at P0 in the current period, and will sell for
P1 in the next, then the expected return on the asset is (E[P1 ] − P0 )/P0 . If the asset has beta equal
to β, then under the CAPM the expected return is π0 + βπ1 . Thus we have
E[P1 ] E[P1 ]
− 1 = π0 + βπ1 =⇒ P0 =
P0 1 + π0 + βπ1
This very simple equation has many immediate implications.
30.2 Efficient Market Hypothesis
Consider a very short holding period, e.g. one week. The discounting is negligible, thereby giving
P0 = E[P1 ] (30.2)
This means the price today has to be the expected value of the price next week. A stochastic process
is simply a sequence of random variables X1 , X2 , . . .
Examples:
• The height of the Nile River at a given location on June 1, some year onward.
• The closing price of the S&P 500 on Friday, some week onward.
139
Roughly speaking, a random walk is a stochastic process with the additional property that
E[Xt |Xt−1 , . . . , X1 ] = Xt−1
i.e. the best forecast of the value in the next period is the current value. So people sometimes say
that asset prices constitute a random walk.
Equation (30.2) is sometimes called the efficient market hypothesis (EMH). The key insight in this
equation is that all the information we have to forecast the value of the asset tomorrow is factored
into the current value. As an example, suppose you think a share of Google stock will be worth x in
six weeks. Then you should be willing to pay almost x for the stock now, (subject to the discount
factor only).
Suppose (30.2) holds for a stock. The realized gain from buying the stock today and selling it in
the next period is
P1 − P0 = P1 − E[P1 ]
But the deviation of a random variable from its expected value is unpredictable. This means that
techniques such as drawing charts, etc., (so called “technical analysis”), cannot work!
Suppose new information is revealed about an asset. Take, for example, news concerning a drug
company such as Merck: the news could be regarding a prospective drug that is being evaluated
in a randomized clinical trial, or the discovery of side effects associated with an existing drug, or
a decision by the FDA, etc. Equation (30.2) says the news—whatever it may be—should cause an
instantaneous adjustment of the stock price, up or down. Likewise, news with implications for the
entire economy, e.g. the results of a Federal Reserve “Open Market” meeting, should cause the
market as a whole to adjust, up or down, instantaneously, as people adjust their expectations.
This leads to the idea of an event study. If one is trying to evaluate the effect of news on the value
of a firm, one looks at the excess returns on the firm’s stock:
Pt − Pt−1 Mt − Mt−1
XRt = −β
Pt−1 Mt−1
where Pt is the value of a share in the firm under consideration, at closing on day t, and Mt is the
value of the market index at the same time.
The cumulative excess return is the sum of the excess returns over some horizon:
t
X
CXRt = XRi
i=1
where period zero is some time prior to the breaking of the news, e.g. 7–14 days beforehand. One
then plots CXRt , 1 ≤ t ≤ T , where T is several days after the breaking of the news. Ideally one
should observe random fluctuations before and after the news, with a “jump” on the day of the
news.21
An implication of the EMH is that on average there is no advantage to following the suggestions
of advisors (at least, adjusting for the excess risk of the portfolios they recommend). It is widely
21 A good reference on the topic is W. Craig McKinley, “Event Studies in Finance and Economics,” Journal of
Economic Literature, March 1997.
140
Figure 30.1: Plot of cumulative abnormal return for earning announcements from event day –20 to event
day 20. The abnormal return is calculated using the market model as the normal return
measure. Source: McKinley (1997).
believed that the high returns reported by some funds in some periods are merely strings of luck.
Tables 5–7, (drawn from a paper by Burton Malkiel, “The Efficient Market Hypothesis and Its
Critics,” Journal of Economic Perspectives, Winter 2003), demonstrate this idea.
141
Table 5: Percentage of Large Capitalization Equity Funds Outperformed by Index Ending 6/30/2002
1 year 3 years 5 years 10 years

S&P 500 vs. Large Cap Equity Funds 63% 56% 70% 79%
Wilshire 5000 vs. Large Cap Equity Funds 72% 64% 69% 74%
Note: All large capitalization mutual funds in existence are covered with the exception of “sector”
funds and funds investing in foreign securities.
Source: Lipper Analytic Services.
Table 6: Median Total Returns Ending 12/31/2001
10 years 15 years 20 years

Large Cap Equity Funds 10.98% 11.95% 13.42%
S&P 500 Index 12.94% 13.74% 15.24%
Source: Lipper Analytic Services, Wilshire Associates, Standard &
Poor’s and The Vanguard Group.
Table 7: Getting Burned by Hot Funds
1998 – 1999 2000 – 2001

Average Average
Annual Annual
Fund Name Rank Return Rank Return
Van Wagoner:Emrg Growth 1 105.52 1106 −43.54
Rydex:OTC Fund;Inv 2 93.43 1103 −36.31
TCW Galileo:AGr Eq;Instl 3 92.78 1098 −34.00
RS Inv:Emrg Growth 4 90.19 1055 −26.17
PBHG:Large Cap 20 5 84.56 1078 −29.03
Janus Olympus Fund 6 77.24 1061 −27.03
Van Kampen Aggr Gro;A 7 76.70 1067 −28.04
Janus Mercury 8 76.31 1057 −26.35
PBHG:Sel Equity 9 76.21 1097 −33.19
WM:Growth;A 10 74.77 1046 −25.82
Berger new Generation;Inv 11 73.31 1107 −45.96
Janus Enterprise 12 72.28 1101 −35.40
Janus Venture 13 72.22 1091 −30.89
Fidelity Aggr Growth 14 70.56 1105 −38.02
Janus Twenty 15 69.09 1090 −30.83
Amer Cent:New Oppty 16 67.64 1033 −24.11
Morg Stan Sm Cap Gro;B 17 66.59 1102 −35.96
Van Kampen Emrg Gro;A 18 65.67 1021 −22.70
TCW Galileo:SC Gro;Instl 19 64.87 1099 −34.77
Black Rock:Md Cap Gro;Instl 20 64.44 1009 −22.18
Average Fund Return 76.72 −31.52
S&P 500 Return 24.75 −10.50
Source: Analytic Services and Bogle Research Institute, Valley Forge, PA.
142
31 Public and Near-public Goods
A pure public good is one such as public radio, with two properties:
1. The amount of the good consumed by one person has no effect on its availability to others.
(This is called the no rivalry, or no congestion condition.)
2. A person cannot be prevented from consuming the good. (This is called the non-exclusionary
condition.)
Sometimes condition (1) is true while (2) is not. This is arguably the case with intellectual property
distributed via the internet. (If I download a song or a software program, my use does not affect
anyone else’s use.)
Additional examples of near public goods:
• parks and wildlife reserves (although in some cases these can become congested)
• national defense.
There are many goods/services that are widely thought of as public goods yet really aren’t, e.g.
schools, which are subject to congestion and also are excludable.
31.1 Optimal Provision of Goods with No-rivalry Characteristics
Consider a public good which comes in various amounts. Let x be the amount provided, at a cost
of p dollars per unit.
An economy has n consumers, i = 1, . . . , n. Consumer i has income yi , and pays a tax ti toward the
purchase of the public good. Additionally, consumer i has utility given by ui (ci , x) = ui (yi − ti , x).
31.1.1 Case 1: one consumer; x = t1 /p.
The objective is
max u1 (y1 − t1 , t1 /p)
t1
FONC:
1 u1 (y1 − t1 , t1 /p)
−u1c (y1 − t1 , t1 /p) + u1x (y1 − t1 , t1 /p) = 0 =⇒ x1 =p
p uc (y1 − t1 , t1 /p)
i.e. M RS 1 (y1 − t1 , t1 /p) = p. Recall that M RS 1 is consumer 1’s willingness to pay for the last
unit of the public good x, in units of consumption c, or dollars.
31.1.2 Case 2: two consumers; x = (t1 + t2 )/p.
The objective is:
max u1 (y1 − t1 , (t1 + t2 )/p) s.t. u2 (y2 − t2 , (t1 + t2 )/p) ≥ k2

t1 ,t2
143
Why? A social optimum must maximize consumer 1’s utility subject to consumer 2’s current utility.
Such an outcome is called Pareto optimal. (If Pareto optimality fails to hold, then we could re-
allocate resources with the end result that both consumers are better off.) Varying k2 traces out a
full range of potential social optima.
The Lagrangian is:
L(t1 , t2 , λ; k2 ) = u1 (y1 − t1 , (t1 + t2 )/p) + λ u2 (y2 − t2 , (t1 + t2 )/p) − k2

(In what follows we shall occasionally omit functional dependencies for the sake of notational
simplicity.) Let v 1 be the maximum value of u1 s.t. u2 ≥ k2 . We know by the Envelope Theorem
that ∂v 1 /∂k2 = ∂L/∂k2 = −λ, so λ > 0. A higher value of λ assigns greater weight to consumer
2’s outcome.
FONC:
1 λ
Lt1 = −u1c + u1x + u2x = 0
p p
1 λ
Lt2 = −λu2c + u1x + u2x = 0
p p
Note that the second and third terms in each of the above equations are the same, so we get
u1c = λu2c ,
or λ = u1c /u2c . The intuition behind this is that the social planner can rearrange taxes on consumers
1 and 2 while keeping x constant. If consumer 1 pays one less tax dollar, his utility increases by
u1c . Likewise, if consumer 2 pays one less tax dollar, her utility increases by u2c . At the optimum, a
gain of one unit in consumer 2’s utility corresponds to a gain of λ in consumer 1’s utility.
The first of the FONC can be rewritten as follows:
1 1 λ 2
u1c = u + u
p x p x
1 u1 /u2
= u1x + c c u2x
p p
2
1 1 ux
= ux + u1c
p u2c
u1 u2
=⇒ x1 + x2 = p
uc uc
or
M RS 1 + M RS 2 = p
This means the optimal choice of x has the property that p equals the aggregate willingness to pay!
144
Pn
31.1.3 Case 3: n consumers; x = τ /p, where τ = i=1 ti .
The objective is
 2
 u (y2 − t2 , τ /p) ≥ k2
 u3 (y3 − t3 , τ /p) ≥ k3


max u1 (y1 − t1 , τ /p) s.t. ..
t1 ,...,tn  .


 n
u (yn − tn , τ /p) ≥ kn
This is the n-consumer version of Pareto optimality. The optimal choice of taxes is the one that
maximizes consumer 1’s utility subject to minimum levels of utility for the other n − 1 consumers.
The Lagrangian is
n
X
L = u1 (y1 − t1 , τ /p) + λi [ui (yi − ti , τ /p) − ki ].
i=2
For convenience define λ1 = 1 and k1 = 0. Then

n
X
L= λi [ui (yi − ti , τ /p) − ki ]
i=1
FONC:
n
1X
Lti = −λi uic + λi uix = 0, 1≤i≤n
p i=1
Note that the sum is constant with respect to i, so we must have
λ1 u1c = λ2 u2c = · · · = λn unc
In particular,
u1c = λi uic , 2≤i≤n
and thus
u1c
λi = , 2≤i≤n
uic
Putting the last result back into the first of the FONC gives
n
u1c

1X
u1c = uix
p i=1 uic
Dividing by u1c and multiplying by p, we see that

n n
X u1 x
X
p= 1
= M RS i
u
i=1 c i=1
As in the case of two consumers, p equals the aggregate willingness to pay.

Implications:
145
• For a non-rivalrous good, the optimal provision of the good has the property that the marginal
cost p equals the aggregate willingness to pay. This is called the Samuelson condition because
it was derived by the great American economist Paul Samuelson in 1954.
• A simple market mechanism will not necessarily achieve the optimality condition. With non-
excludable goods, in fact, it is hard to see why anyone is willing to contribute voluntarily,
(although people do). Thus, the provision of pure public goods usually is left to political
mechanisms.
• With excludable goods such as proprietary software, a per-user fee may be reasonable. Note
that the producer receives the sum of the user fees.
• For questions such as how much to invest in wilderness areas, some suggest polling the public
and asking how much people would be willing to pay to expand/protect the wilderness versus
selling it off. This practice is controversial because it’s unclear whether those polled under-
stand the questions, or tell the truth. Moreover, goods such as wilderness areas are valued in
a passive way since most people never will experience them first hand. Unlike ordinary con-
sumer goods, there is no observable behavior that can be traced back to a person’s willingness
to pay. Despite these issues, this method, known as contingent valuation, was used to value
the environmental damage—or lost passive use—caused by the Exxon Valdez oil spill.
31.2 Appendix: Social Optimum with Ordinary Goods
You may be wondering how the idea of a social optimum works with ordinary goods. Let’s consider
the decision how to allocate an ordinary good x. The government collects a tax ti from the ith
consumer, and allocates to the consumer xi unitsPnof the good.PnThe budget constraint for the
government in this case is τ = pχ, where τ = i=1 ti , χ = i=1 xi , and p is the price of x.
Assume, as before, that consumer i has income yi and uses his or her after-tax income to buy
ci = yi − ti units of the numeraire good.
The objective is  2
 u (y2 − t2 , x2 ) ≥ k2
3

 u (y3 − t3 , x3 ) ≥ k3



max u1 (y1 − t1 , x1 ) s.t. ..
t1 ,...,tn ,  .
x1 ,...,xn
 un (yn − tn , xn ) ≥ kn



τ = pχ

The Lagrangian is
n
X
L = u1 (y1 − t1 , x1 ) + λi [ui (yi − ti , xi ) − ki ] + µ(τ − pχ)
2
Once again define λ1 = 1 and k1 = 0 so that

n
X
L= λi [ui (yi − ti , xi ) − ki ] + µ(τ − pχ)
i=1
146
FONC:
Lti = −λi uic + µ = 0, 1≤i≤n

Lxi = λi uix + µp, 1≤i≤n
The first collection of FONC implies
u1c = λi uic , 2≤i≤n
or
u1c
λi = , 2≤i≤n
uic
Combining these results with the second collection of FONC gives u1x = µp, or, equivalently, p =
u1x /u1c = M RS 1 , and
λi uix = µp
u1c

=⇒ uix = u1c p
uic
uix
=⇒ p = = M RS i , 1≤i≤n
uic
Thus at a social optimum we have
M RS i = p, 1≤i≤n (31.1)
Note that this is the same condition that would result from opening a market in good x, and
charging p dollars per unit of x. However, in order to reach a particular social optimum we would
have to redistribute income via our choice of t1 , . . . , tn .
It is possible to show the following:
• Any particular social optimum can be achieved by opening a free market in good x, and
redistributing income via taxes.
• For any given distribution of income, setting all taxes equal to zero achieves one possible
Pareto optimum. This may not be the one that people particularly like—it will result in
highest utility for the person with highest income—but it is nonetheless efficient in the sense
that it satisfies (31.1).
147
32 Externalities
Externalities arise when the consumption or production of a good by one economic agent causes a
side effect for others. Examples include air pollution caused by burning fossil fuels, the playing of
loud music, etc. Externalities can be positive as well: a classic example is bees, which are needed
to pollinate fruit trees!
This secion deals primarily with air pollution, which is like a public good to the extent that air
quality affects the entire population of an area.
32.1 Consumption Externalities
We shall use an extended version of the model used in our analysis of public goods. Assume that
consumers care about three things:
• consumption of a basic, numeraire good c
• consumption of a good x, with an externality
• the level z of the externality
Think of x as gasoline and z as the amount of smog in the air. Consider an economy with n
consumers, i = 1, . . . , n. Consumer i has income yi and with it consumes ci and xi . The level z of
the externality is determined by the total consumption of x:
z = αχ
Pn
where χ = i=1 xi and α is the amount of smog produced per gallon of gas used. Let p denote the
price—and the marginal cost—of x. The utility of consumer i is given by
ui (ci , xi , z) = ui (yi − pxi , xi , αχ)
We are assuming that uic > 0, uix > 0, and uiz < 0, i.e. z is bad. Notice the similarity between z
and the public goods we studied previously: consumer i’s “consumption” of z has no effect on the
amount of z “available” to others.
32.1.1 Market Equilibrium

Pn
Consumer 1 takes p as given, and while he realizes z = α i=1 xi , he also takes x2 , . . . , xn (gas
consumption of others) as given. His objective is
max u1 (y1 − px1 , x1 , αχ)

x1
FONC:
1 u1x u1z
−pu1c + u1x + αu1z = 0 =⇒ =p−α
p u1c u1c
|{z}
M RS i (x,c)
148
In general, a consumer is advaised to set her M RS—for x relative to c—equal to p − αuiz /uic . If
uiz < 0, then p − αu1z /Uc1 > p, so the consumer acts as if the price of x is actually higher. The price
difference αuiz /uic is α, (the rate of production of z per unit x), times the marginal willingness to
pay for clean air, uiz /uic .
32.1.2 Social Optimum
A social planner has to allocate

Pn x and collect taxes
Pn ti , (i = 1, . . . , n) that balance the government’s
costs: τ = pχ, where τ = i=1 t i and χ = i=1 xi . As before, we look for Pareto omptimal
outcomes. The social planner’s objective is:
 2

 u (y2 − t2 , x2 , αχ) ≥ k2
 ..

max u1 (y1 − t1 , x1 , αχ) s.t. .
t1 ,...,tn
x1 ,...,xn


 un (yn − tn , xn , αχ) ≥ kn
τ = pχ

Define λ1 = 1, k1 = 0. The Lagrangian is

n
X
L= [λi ui (yi − ti , xi , αχ) − ki ] + µ(τ − pχ)
i=1
FONC:
Lti = −λi uic + µ = 0, 1≤i≤n (32.1)

Xn
i
Lxi = ux + α λi uiz − µp = 0, 1≤i≤n (32.2)
i=1
Equations (32.1) imply

µ = λi uic , 1≤i≤n
and in particular
µ = u1c (32.3)
As a consequence,
u1c
λi = , 1≤i≤n (32.4)
uic
149
Equations (32.2) imply
n
X
λi uix = µp − α λi uiz
i=1
n
λi i X λi i
=⇒ u =p−α u
µ x i=1
µ z
n
λi i X λi i
=⇒ u x = p − α u by (32.3)
1
uc u1 z
i=1 c
n
u1c /uic i X u1c /uic i
=⇒ 1
u x = p − α uz by (32.4)
uc i=1
u1c
n
uix X ui z
=⇒ =p−α , 1≤i≤n
uic i=1
uic
|{z}
M RS i (x,c)
This means everyone has to set

M RS = p + τ
Pn i i
where τ = −αν, and ν = i=1 uz /uc is the aggregate marginal willingness to pay for clean air.
32.1.3 Market Equilibrium versus Social Optimum
Market Eq: M RS i (x, c) = p − αuP

i i
z /uc
n
Social Opt: M RS (x, c) = p − α i=1 uiz /uic = p + τ
i
So, in the social optimum, consumer i takes account of the effect of her gas consumption on everyone
else whereas in the market equilibrium she cares only about herself.
The sum p + τ is the social marginal cost of consuming gas. It exceeds the private cost p if α is
non-zero, and if there is some value to clean air, (which obviously is the case if uiz /uic ≤ 0 for all
i). In the real world, α is very small but n is very big, so while αuiz /uic is negligible, τ can be
significant.
In the 1920s the English economist Arthur C. Pigou figured out that one can “correct” an externality
by taxing the activity that creates it, with a tax τ . We have shown that the optimal Pigouvian tax
for a consumption externality that affects the entire population is
X
τ =α {consumer i’s willingness to pay for marginal reduction in externality}.
i
32.1.4 Other Examples
• Taxes for “wear and tear” on the road. The usual justification for a gas tax—apart from
the air pollution effect—is that driving causes the roadways to deteriorate. If the wear and
tear caused by a given car is proportional to the car’s gas mileage, a Pigouvian tax on gas is
sensible.
150
• Taxes on cigarettes are sometimes justified because they are a tax on second hand smoke.
• Some people have proposed a tax on foods that cause obesity. This is a more complicated
case but the basis of their argument is that health care costs for those over 65, (which is
when most costs are incurred), are heavily subsidized through Medicare. Thus, if someone
eats too much and as a result winds up with diabetes later in life, this person contributes to
the Medicare bill, which we all pay.
32.2 Production Externalities
We will restrict our attention to a very simple example of a production externality. The example
is motivated by the electric power industry, which in most places uses coal to create electricity.
Assume there are n plants, i = 1, . . . , n. Plant i has cost function ci (si , yi ), where yi is the amount
of electricity (kWh) produced, and si is a choice variable representing the choice of factors that
affect the amount of SO2 produced. For example, si could represent the choice of what type of coal
to use (more expensive coal from the Western US, which burns cleaner, versus cheaper coal from
the East), or the choice of what kind of scrubber to install. The amount of SO2 emitted by the
plant is
zi = yi αi (si )
where αi0 (si ) < 0 and αi00 (si ) > 0, i.e. αi is decreasing and convex as in Figure 32.1.
Figure 32.1: αi0 (si ) < 0 and αi00 (si ) > 0.
Let ν be the aggregate willingness to pay to avoid SO2 —across the entire population, not only
the power industry—and let p be the value of a kWh of electricity. From the point of view of an
industry regulator, the objective is to maximize the industry surplus, valuing SO2 at −ν/kWh:
max π − νζ
y1 ,...,yn
where
n
X n
X
π= πi = [pyi − ci (si , yi )]
i=1 i=1
151
is the total profit of the industry as a whole, and
n
X n
X
ζ= zi = yi αi (si )
i=1 i=1
is the total amount of SO2 emitted by the industry. (As an alternative, we could set up the problem
by having utility functions for all the local residents, who each use electricity and consume another,
numeraire good c, and wish to avoid having SO2 in the air. As an exercise, set up the problem this
way.)
FONC w.r.t. yi :
p − ciyi − ναi (si ) = 0, 1≤i≤n (32.5)
This means the output of plant i should be chosen so that
ciyi + ναi (si ) = p
The LHS, ciyi + ναi (si ), is called the marginal social cost of production at plant i. The regulator
wants to set this equal to p, the “social” value of a kWh of electricity.
FONC w.r.t. αi :
− cisi − νyi αi0 (si ) = 0, 1≤i≤n (32.6)
Dividing by yi ,
1 i
c = −ναi0 (si )
yi si
| {z }
∂AC i /∂si
The optimal choice is the one for which the marginal increase in average cost offsets the marginal
value of the reduced pollution per unit of output. Assuming AC i (si , yi ) is convex in si (so that
with higher si , an additional increase in si has a bigger effect on AC i ) and that αi is decreasing
and convex, we have Figure 32.2.
Figure 32.2: How can a regulator get Plant i to choose s∗i ?
Method 1 (Pigouvian Tax):
152
• Tax each plant ν per ton of SO2 produced.
• Buy electricity at p/kWh.
The manager of plant i will then attempt to maximize
πi = pyi − ci (si , yi ) − ναi (si )
which has FONC equivalent to (32.5) and (32.6) above.

Method 2 (Cap & Trade):
• Distribute among the plants a fixed amount of SO2 emission rights, each of which entitles
the bearer to produce a ton of SO2 .
• Allow the plants to trade emission rights among themselves.
• Buy electricity at p/kWh.
Let q be the value of an emission right, where q > 0. A plant manager who owns k emission
rights will then attempt to maximize
pyi − ci (si , yi ) + v
where v = kq − qyi αi (si ) is the value of the emission rights she can sell on the market (or will
have to buy). Notice that if q = v, the FONC for this plant is equivalent to (32.5) and (32.6).
This is how SO2 really is regulated.
Why use Method 2?
• In reality, no one knows what v to charge. So instead the regulator looks at the total amount
of SO2 emitted at some reference point in time, then issues a somewhat smaller number of
emission rights, e.g. 80%. This method ensures that SO2 is reduced by 20% “efficiently.”
• Firms prefer this method because they get the emission rights “free of charge.” (Emission
rights were distributed in the early 1990s, and plants were allowed to trade them, but the
rules forbidding them from exceeding the limits didn’t take effect until 1995.)
• It is claimed that enforcement is easier.
153
33 Empirical Methods in Microeconomics
This section provides the reader with an overview of how microeconomists use real data to test
alternative theories and (in some cases) estimate the relevant parameters of a particular model.
The examples are drawn from my own work in labor economics.
33.1 Experiments and Counterfactuals
Suppose one is interested in testing a prediction of microeconomic theory. To be concrete, we shall

consider four examples:
• If single mothers currently on welfare are offered an earnings subsidy, will they work more?
• If the supply of low-skilled workers in a local labor market is increased by an influx of immi-
grants, will wages of native, low-wage workers fall?
• If the minimum wage is increased, will low-wage employers hire fewer workers?
• If people without health insurance are provided insurance, will they use more health care
services? Will they become healthier?
The classical scientific approach to such questions would be to conduct a randomized experiment.
In such an experiment, a population whose behavior is to be studied would be randomly divided into
two groups: the treatment group, members of which receive the “treatment,” and the control group,
members of which do not. For the welfare question, the population would be single mothers currently
on welfare. For the immigrant question, the population would be cities (or other geographic entities
such as counties). For the minimum wage question, the population would be employers. For the
final question, the population would be the uninsured. Note that some of these experiments seem
harder to carry out than others.
Let’s assume that one could conduct a randomized experiment on welfare mothers. (In reality, such
an experiment was conducted in two Canadian provinces in the mid-90s. We will examine the data
shortly.) How would one do this? Presumably, one could tabulate the employment rates of the
treatment group YT and the control group YC some time after the subsidy was in place. One would
then calculate the treatment effect
∆ = YT − YC
The idea of a randomized experiment is that in the absence of the treatment, the two groups would
have had equal outcomes. Randomization is key: if treatment status really is randomly assigned
to the general population, then it is reasonable to expect the two groups to exhibit the same
behavior in the absence of treatment. The impact of “statistical accidents” is minimized by using
big groups. The behavior of the control group represents a counterfactual for assessing whether or
not the treatment has an effect. If a theory predicts that a subsidy will increase work effort, for
example, then we want to test the null hypothesis H0 : ∆ = 0 versus the alternative hypothesis
H1 : ∆ > 0.
A randomized experiment is considered the gold standard for scientific evidence. The FDA, for
example, requires drug companies to evaluate the efficacy of a new drug by means of a randomized
experiment. The high status of randomized experiments is due to several features:
154
1. Randomization ensures that YC is a valid counterfactual. So, except for chance errors, ∆ is
truly attributable to the treatment, not to some inherent difference between the two groups.
2. Once the experimental design is determined, the researcher’s hands are tied. There is no room
for weaseling. (The experimental design is a full description of the population, the sample
size, the randomization procedure, the treatment, and the data collection process.)
3. Because of (1) and (2), randomized experiments are easy to understand and therefore have a
lot of credibility.
33.1.1 The Self Sufficiency Project (SSP)
SSP is the name of a randomized experiment conducted in Canada during the 90s. Half a random
sample of single mothers who had been on welfare for at least a year was assigned to the treatment
group. The other half was assigned to the control group. Members of the control group were eligible
to receive their regular welfare benefit, a fixed monthly sum based on the number of children in
the home as well as the province, (e.g. $712 per month for a mother of one in New Brunswick).
Welfare payments are reduced dollar-for-dollar for those who earn over $200 per month. Members
of the treatment group were allowed to remain on welfare but were offered an earnings subsidy
S = (M − E)/2, where M is a monthly earnings target ($2500/month) and E is actual earnings.
So, if a participant earned $650 in a month, she received a subsidy of $925. Participants qualified
for the subsidy only if they worked at least 30 hours per week, for up to three years. They also had
to receive their first subsidy payment within a year of entering the treatment group or they forfeited
all future eligibility. Figure 33.1 shows the monthly budget constraint, and Figure 33.2 shows the
Figure 33.1: Monthly budget constraint for members of the treatment group in the SSP.
fractions of each group on welfare as a function of time, in months, since random assignment, along
with a graph of the average employment rate for each group.
155
1.0
0.8
Fraction of Group on IA
0.6
0.4
Control Group
Treatment Group
0.2
Difference
0.0
0 10 20 30 40 50 60 70
Time (Months)
40
Fraction of Group Employed
30
20
Control Group
Treatment Group
Difference
10
0
0 10 20 30 40 50
Time (Months)
Figure 33.2: Source: D. Card and D. Hyslop, “Estimating the Effects of a Time-limited Earnings Subsidy
for Welfare Leavers,” Econometrica 73, November 2005.
156
33.2 Research Designs Based on Natural Experiments
Often we cannot carry out an experiment, either because it would cost a lot, and be quite invasive
(e.g. SSP), or because it would be impractical. How do we proceed in such cases?
One approach is to consider events that occur, and gauge whether an anlysis of the event could be
interpreted as if the event were a random experiment. A very simple example is a paper I wrote
on the Mariel Boatlift. In that paper, examined the movements in wages and unemployment rates
in Miami, (where the Marielitos landed), and within a control group comprised of four other cities:
Tampa, Houston, Atlanta, and Los Angeles. A key difference between a true randomized experiment
and a natural experiment is that treatment is not randomly assigned. So it is debatable whether
the control group provides a valid counterfactual. For my paper, I examined trends in employment
in Miami versus the average of the four other cities throughout the 70s: the two moved in close
parallel. (Ironically, the editor of the journal forced me to remove this graph from the published
paper!)
In a natural experiment, it may not happen that outcomes are exactly the same in both groups,
even before the treatment. Let
∆0 = YT0 − YC0
represent the pre-existing gap in the outcome—or measurable quantity—of iterest (e.g. average
wages), and let
∆1 = YT1 − YC1
represent the gap at some time after the treatment has begun. Then we might want to look at the
“difference-in-differences”
DD = ∆1 − ∆0 = (YT1 − YT0 ) − (YC1 − YC0 )
This is the change in the treatment group relative to the change in the control group. The implicit
assumption is that in the absence of treatment, ∆0 would have remained constant.
33.2.1 The Mariel Boatlift
In the Boatlift, about 125,000 Cuban immigrants were transported on a flotilla of small boats to
Miami, over the period from April 1980 to July of the same year. This represented an increase of
about 7% in the Miami labor force—mainly in the ranks of the unskilled. One simple hypothesis
is that such an influx would reduce wages for unskilled workers already in Miami. Table 8 shows
outcomes for blacks in Miami relative to the comparison cities.
33.3 Natural Experiments with Several Control Groups
In a natural experiment, one never can be sure the control group provides a valid counterfactual.
Sometimes it is possible to do additional checks by using two or more control groups. Then you
157
Table 8: Logarithms of Real Hourly Earnings of Workers Age 16–61 in Miami and Four Comparison Cities,
1979–85.
Group 1979 1980 1981 1982 1983 1984 1985

Miami:
Whites 1.85 1.83 1.85 1.82 1.82 1.82 1.82
(.03) (.03) (.03) (.03) (.03) (.03) (.05)
Blacks 1.59 1.55 1.61 1.48 1.48 1.57 1.60
(.03) (.02) (.03) (.03) (.03) (.03) (.04)
Cubans 1.58 1.54 1.51 1.49 1.49 1.53 1.49
(.02) (.02) (.02) (.02) (.02) (.03) (.04)
Hispanics 1.52 1.54 1.54 1.53 1.48 1.59 1.54
(.04) (.04) (.05) (.05) (.04) (.04) (.06)
Comparison Cities:
Whites 1.93 1.90 1.91 1.91 1.90 1.91 1.92
(.01) (.01) (.01) (.01) (.01) (.01) (.01)
Blacks 1.74 1.70 1.72 1.71 1.69 1.67 1.65
(.01) (.02) (.02) (.01) (.02) (.02) (.03)
Hispanics 1.65 1.63 1.61 1.61 1.58 1.60 1.58
(.01) (.01) (.01) (.01) (.01) (.01) (.02)
Note: Entries represent means of log hourly earnings (deflated by the Consumer Price Index—
1980=100) for workers age 16–61 in Miami and four comparison cities: Atlanta, Houston, Los Angeles,
and Tampa–St. Petersburg.
Source: D. Card, “The Impact of the Mariel Boatlift on the Miami Labor Market,” Industrial and
Labor Relations Review, January 1990. Based on samples of employed workers in the ongoing rotation
of groups of the Current Population Survey in 1979–85. Due to a change in SMSA coding procedures
in 1985, the 1985 sample is based on individuals in outgoing rotation groups for January–June of 1985
only.
can construct
DD1 = (YT1 − YT0 ) − (YC11 − YC01 )

DD2 = (YT1 − YT0 ) − (YC12 − YC02 )
DD3 = (YC12 − YC02 ) − (YC11 − YC01 )
where C1 refers to control group 1, and C2 refers to control group 2. Ideally it will be the case that
DD1 = DD2 , or equivalently, DD3 = 0.
33.3.1 The New Jersey Minimum Wage
In April 1992, the minimum wage rose from $4.25 to $5.05 per hour in the state of NJ. Elsewhere,
it remained $4.25. The statute that raised the minimum wage had been passed in fall of the year
before, and, in anticipation, Alan Krueger and I developed a survey of fast food restaurants in NJ
and PA. We surveyed a set of about 400 restaurants first in February–March of 1992, (just before
the increase), and again in late fall. We were extremely careful to track down all the restaurants
that were surveyed in the first round. The treatment group consisted of restaurants in NJ whose
starting wages were less than $5.00 per hour prior to the increase. There were two control groups:
restaurants in PA, and restaurants in NJ that already were paying relatively high wages, ($5.00
158
Table 9: Average Employment Per Store Before and After the Rise in the NJ Minimum Wage
or more per hour prior to the increase). Table 9 shows the comparisons of employment growth
between groups.
33.4 The Discontinuity Research Design
Sometimes one cannot find a good natural experiment; it is nonetheless possible to find a good
counterfactual by looking at treatments that affect some groups but not other, extremely similar
groups. A good example is Medicare. When individuals who have worked for at least 10 years turn
65, they become eligible for “free” health insurance. (One also is eligible if one’s spouse worked 10
years.) This age limit suggests that we compare individuals who are just a few months younger
than 65, with those who are a few months older. Figure 33.3 shows the fractions of people with
health insurance, by age (measured in quarters). The plots are for two groups: (relatively) more
educated whites (over 12 years of education), and less educated minorities (blacks and hispanics
with less than 12 years of education). The idea of the discontinuity design is that the rule that
grants free insurance to those who reach their 65th birthday creates an experiment: we think of
those just over 65 as the treatment group, and those just under 65 and the control group. There
are some potential problems with this idea, depending on the application:
• It may be that other factors, apart from the primary treatment, also change at the same point
in time. So it is important to check very carefully that these factors are very similar between
groups.
159
1.0
0.9
Fraction of Group Insured
0.8
Whites, High Edu. (Actual)

0.7
Whites, High Edu. (Pred.)

Overall (Actual)
Overall (Pred.)
0.6
Minorities, Low Edu. (Actual)

Minorities, Low Edu. (Pred.)
55 60 65 70 75
Age
Figure 33.3: Health insurance coverage rates by age, based on 1992–2001 data from NHIS.
• There may be an age trend in the outcome of interest, so that even without treatment,
individuals who are a little over 65 tend to be a little different from those under 65 in a
certain respect. This can be checked by looking at the age profile of the outcome of interest.
• If individuals know they soon will be eligible for Medicare, they may act differently when they
are just under 65 from the way they would if there were no such rule.
160
Percentage Who Did Not Get Medical Care Last Year for Cost Reasons
14
Whites, High Edu.
Overall
12
Minorities, Low Edu.

10
Percentage
8
6
4
2
55 60 65 70 75
Age
Florida Outpatient Data, 1997−2002

3.5
White
Log(No. of Cataract Surguries)
Hispanic
Black
3.0
2.5
2.0
1.5
55 60 65 70 75
Age
Figure 33.4: Here are plots showing the fractions of individuals belonging to three demographics who
report that they did not receive medical care in the last year because they could not afford
it, and the number of cataract surguries by age in Florida. You can see the discontinuities
in the cataract data.

Econ 101A Notes

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Econ 101A Notes

Transféré par

Droits d'auteur :

Formats disponibles

Lecture Notes for Econ 101A

3 Two Applications of Indifference Curve Analysis 23

4 Indirect Utility and the Expenditure Function 28

5 Comparative Statics of Consumer Choice 31

7 Using Market Level Demand Curves 42

10 Production and Cost I 55

12 Cost Functions and IRFs 68

14 Input Demand for a Competitive Firm 75

20 Symmetric Cournot Equilibria 99

21 Game Theory I 102

22 Game Theory II 106

23 Uncertainty I: Income Lotteries 110

24 Uncertainty II: Expected Utility 114

26 Uncertainty IV: The State-preference Approach and Adverse Selection 122

27 Auctions I: Types of Auctions 127

28 Auctions II: Winner’s Curse 131

29 Finance I: Capital Asset Pricing Model 135

30 Finance II: Efficient Market Hypothesis 139

31 Public and Near-public Goods 143

33 Empirical Methods in Microeconomics 154

3 Applications of Indifference Curve Analysis, Expenditure Function

4 Comparative Statics, Slutsky’s Equation

5 Market Level Demand and Supply

7 Intertemporal Consumption & Savings

8–9 Production & Cost, Sheppard’s Lemma

10–11 Supply Determination

12 Monopoly and Price Discrimination

13 Consumer/Producer Surplus & Applications

16–17 Game Theory

18–21 Uncertainty and Insurance Markets

24–25 Finance: CAPM and Efficient Markets

26–27 Public Goods, Externalities

28 Empirical Methods in Microeconomics

1.1 Unconstrained Optimization

(a) (b) (c)

This generalizes to two or more dimensions.

1.2 Constrained Optimization

Now we consider maximizing a function f (x1 , x2 ) subject to—“s.t.”—some constraint on x1 and x2

Figure 1.6: Illustration of two-step approach described on p. 10.

dg = g1 (x1 , x2 )dx1 + g2 (x1 , x2 )dx2 = 0

(where gi denotes ∂g/∂xi ), which implies

α ∈ [0, 1] =⇒ f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 )

f is convex ⇐⇒ f 00 (x) > 0 for all x

1.3.2 SOC in Higher Dimensions

2.1 Budget Constraint

Note the following:

2.2 Consumer’s Objective

As an exercise, graph the indifference curves for these three examples.

(a) (b) (c)

2.3 Consumer’s Optimum

Analytically, the consumer’s problem is to solve

max u(x1 , x2 ) s.t. p1 x1 + p2 x2 = I

To proceed analytically, let’s use the Lagrangian method:

L(x1 , x2 , λ) = u(x1 , x2 ) − λ(p1 x1 + p2 x2 − I)

Dividing (2.1) by (2.2) gives the tangency condition

2.4 Special Problems

3.1 Analysis of a Subsidy

3.2 The Consumer Price Index

Category Weight Price Index (Dec. 2000)

4.1 Indirect Utility

We characterized the solution to the problem

max u(x1 , x2 ) s.t. p1 x1 + p2 x2 = I

v(p1 , p2 , I) = max u(x1 , x2 ) s.t. p1 x1 + p2 x2 = I

= u(x01 (p1 , p2 , I), x02 (p1 , p2 , I))