Vous êtes sur la page 1sur 284

EconS 501: ADVANCED MICROECONOMIC THEORY I

LECTURE NOTES

Felix Munoz-Garcia
1

School of Economic Sciences
Washington State University



This document contains a set of partial lecture notes that are intended to serve as a starting
point when coming to class, so every student can complement them with additional examples,
exercises and applications discussed in class. (Do not quote).


1
103G Hulbert Hall, School of Economic Sciences, Washington State University. Pullman, WA 99164-6210,
fmunoz@wsu.edu. Tel. 509-335-8402.
1

Chapter 1 Preferences and Utility
Preference and Choice
We begin our analysis of individual decision-making in an abstract setting. We will first specify a set of
possible alternatives (denoted by set X) for a particular decision maker. This set might include the
consumption bundles that an individual is considering to consume, the career paths that the student is
considering, or any general list of alternatives. Given this set, we will approach the decision making
process in two different ways. First, using the preference-based approach and second using the choice-
based approach. The first approach analyzes how the individual would use his preferences to choose an
element (or elements) from the set of alternatives X. We will then impose some rationality assumptions
on the individuals preferences. The second approach analyzes, instead, the actual choices the individual
makes when he is called to choose an element (or elements) from the set of possible alternatives.
Similarly as we did for the preference-based approach, we will also impose some consistency conditions
on the choices that the individual makes. Both of the approaches have their own advantages. For instance,
the choice-based approach is based on observables (the actual choices made by the individual decision-
maker) while the preference-based approach is based on unobservables (the individuals preferences).
1
On
the other hand, the preference-based approach is more tractable than the choice-based approach,
especially when the set of alternatives X contains many elements (which usually is the case in individual
decision-making problems).
2
After describing both approaches, and the assumptions that we will impose
on each approach, we want to understand the relationship (and potential equivalence) between both
approaches. Hence, we will examine under which conditions rational preferences imply a consistent
choice behavior, and under which conditions the opposite relationship holds.

Preference-based approach
Let us start with the preference-based approach.
3
In this regard, we will understand preferences as
attitudes of the decision-maker towards the set of alternatives X. Preferences hence should specify the
attitudes of the decision-maker towards each pair of alternatives. These attitudes are obtained by
presenting a questionnaire Q to the individual. In particular, this questionnaire asks for all elements x and
y that belong to the set of alternatives X, how do you compare element x and y? Check one and only one
box.
I prefer x to y (which we write as x y), or
I prefer y to x (which we write as y x), or
I am indifferent (which we write as x y).

1
This approach could in principle allow for more general behavioral motives than the preference-based approach.
However, as we will see, this is only in principle, since the preference-based approach will also allow for very
general individual preferences.
2
This reason explains why the preference-based approach is explained in more detail in most intermediate
microeconomics textbooks.
3
We will be using Rubinstein (lecture one) and MWG (Ch. 1B).
2

Note that we are asking the individual decision-maker to check only one box. This is related with the
completeness assumption on individual preferences.
4
In particular, we define completeness in a
preference relation if for any to alternatives x and y that belong to the set of alternatives X, we have that
either alternative x is strictly preferred y, or y is strictly preferred to x, or both (which implies that the
individual decision-maker is indifferent between x and y). This implies that the individual is capable of
comparing any pair of alternatives that we present to him. This might be a relatively strong assumption if
we think about goods that we haven't consumed in the past or goods that we havent even seen before.
Think, for instance, about the last time you were in a new ethnic restaurant in which the descriptions in
the menu did not help you decide what to order. This assumption hence considers that the individual
decision-maker has had enough time to compare all alternatives, and that he is ready to express his
preference over one of them (or indifference between two alternatives) when we ask him to compare any
two alternatives x and y.
Remark: note however that not all binary relations satisfy completeness. Indeed, the binary
relation is the brother of is not satisfied for all the elements (persons) in the set of available
alternatives (set X in this case could be a given group of people). If we select John and Bob from
this group, we might observe that neither John is the brother of Bob nor Bob is the brother of
John; i.e., they are not related. That is, not all pairs of alternatives are comparable according to
this binary relation. Hence, this binary relation does not satisfy completeness. Similarly, the
binary relation to be the father of doesn't satisfy completeness since, from a group of people,
we can select two persons that are not related.
Let us now turn into weak preferences. In order to learn the weak preferences of an individual we present
a questionnaire R to him as follows: for all alternatives x and y in the set of alternatives X (where x and y
are not necessarily distinct),
5
is alternative x at least as preferred as y?
Yes, which we write as x

y .
No, which we write as y

x .

The respondents therefore must answer yes, no, or both.
6
We are now ready to define what we mean by a
rational preference relation. We say that a preference relation

is rational if it possesses the following


two properties:

4
Note also that we do not allow the individual to add a new box in which he writes I love X. and Y. in other
words, we do not allow him to specify the intensity of his preferences over two alternatives.
5
Note that we do not assume that alternatives x and y are different. In the case that they coincide, the definition of
completeness becomes the reflexivity assumption. We discuss this assumption below, but at this stage, we can
understand the reflexivity assumption as a condition on the preference relation guaranteeing that every alternative x
is weakly preferred to, at least, one alternative: itself.
6
Note that this refers to the assumption of completeness again, since we ask the individual to be able to compare
any pair of two alternatives, where now this comparison is done using the weak preference symbol rather than the
strict reference symbol.
3

Completeness: For any pair of alternatives x and y in the set of alternatives X, either x

y, or y

x, or
both (x y).
Transitivity: For any three alternatives x, y and z in the set of alternatives X, if x

y and y

z, then it
must be that x

z.

The assumption of transitivity is often understood as that individual preferences should not cycle. In order
to understand this point, let us consider an example in which an individuals preferences do not satisfy
transitivity. James weakly prefers an apple to a banana, and he weakly prefers a banana to an orange.
However, he prefers an orange to an apple. (Note that according to transitivity, he should have preferred
an apple to an orange.) What is the problem associated to this intransitive preference relation? James
would be wiped out from the market. Indeed, businessmen could approach James (when James owns an
orange) and offer him a banana for one dollar. James will probably accept the deal since he prefers a
banana to an orange. Then the businessmen could approach James again and offer him an apple for a
dollar, something James will also accept, since he prefers an apple to a banana. Finally, the businessmen
could approach James again offering him an orange for the apple he now owns. Since James preferences
are intransitive (and therefore he prefers an orange to an apple) he would accept this deal, paying another
dollar. However, this makes James return to his original position, owning an orange, but having spent
three dollars in the process. Of course, this cycle could be repeated ad infinitum, extracting all James
wealth.
Despite the previous argument about the reasons why we shouldn't observe individual decision-makers
with intransitive preference relations, there are however situations in which intransitivities might arise:
First example. Comparing elements that are too close to be distinguishable.
When two alternatives are extremely similar we are often unable to state which of them we prefer.
Consider the following example. Take the set of alternatives X to be the real numbers, e.g., a piece of pie.
An individual states that he prefers alternative x to y if x>=y-1 (x+1>=y) but he is indifferent between x
and y if the two alternatives are very close together, i.e., |x-y|<1. Intuitively, he prefers x to y only when
alternative x is larger than y in one unit. If the difference between the two alternatives is smaller than one
he cannot tell them apart, and the individual is indifferent between both of them. Then,
Alternative 1.5 is indifferent to 0.8 since 1.5-0.8=0.7<1, and
Alternative 0.8 is indifferent to 0.3 since 0.8-0.3=0.5<1
Therefore, by transitivity we would have that 1.5 is indifferent to 0.3, but in fact 1.5 is preferred to 0.3,
since the former is larger than the later by more than one unit. This shows the presence of an intransitive
preference relation.
7



7
Note that this example could be applicable to milligrams of sugar in your coffee (very difficult to distinguish) or to
similar shades of gray paint on the wall in a room. You might not be able to distinguish one milligram more of sugar
in your coffee (a slightly darker gray color on your office walls, respectively), but you can probably detect when
your coffee is becoming too sweet (or your office is almost black!).
4

Second example. Framing effects. In certain cases intransitivity might be violated because of the way in
which alternatives are presented to the individual decision-maker, also referred as framing effects. Let
us consider an example from Rubinstein, where he showed the following holiday packages to his Masters
students, asking each student: Which holiday package do you prefer?
a. A weekend in Paris for 574 at a four star hotel.
b. A weekend in Paris at the four star hotel for 574.
c. A weekend in Rome at the five star hotel for 612.
Alternatives a and b are, of course, the same. This was indeed detected by most of the students since they
stated to be indifferent between alternatives a and b. Moreover, they strictly preferred alternative b to c.
By transitivity, hence, we should expect that students who gave the previous responses should then
strictly prefer alternative a to c when asked to compare options a and c. However, this didn't happen.
Indeed, more than 50% of the students responded that they strictly preferred alternative c to a, showing an
intransitive preference relation, merely induced by the way in which the options were presented (framed)
to the students.

Third example. Aggregation of considerations. In some cases several individual preferences must be
aggregated into only one. In these situations we might find that the resulting preference relation violates
transitivity. Let us consider the following example. The set of possible alternatives X contains three
universities where you were admitted: MIT, WSU, and your home University. When considering which
university to attend you might compare them according to different criteria. First, if you only consider the
academic prestige reasonable comparison would be:
1
: MIT
1
WSU
1
Home Univ
Second, considering the city size or congestion your comparison could be:
2
: WSU
2
HOMEUNIV
2
MIT
Finally, considering the proximity of the university to your family and friends, a reasonable comparison
would be:
3
: HOMEUNIV
3
MIT
3
WSU
We must now aggregate all of these considerations (for example, using majority rule). We do so by
making all possible pair-wise comparisons and checking, for each pair, which university wins according
to most of the criteria described above. When comparing MIT versus WSU, the former wins according to
criteria 1 and 3. When comparing WSU versus your home university, the former beats the later according
to criteria 1 and 2. Finally, comparing your home university with MIT, the former wins against the later
5

according to criteria 2 and 3. However, this resulting preference relation violates transitivity.
8
A similar
argument can be used for the aggregation of individual preferences in group decision-making, where
every person in the group has a different (transitive) preference relation but the group preferences
(aggregated from these individual preferences in order to have a ranking of alternatives politicians can use
to take decisions affecting the welfare of the entire group) are not necessarily transitive.
9


Fourth example. Intransitive preferences because there is a change in the underlying preferences. This is
very common in individuals preferences over goods that create a strong dependency or that become
addictive. For instance, when an individual starts smoking his preferences over cigarettes might be: one
cigarette is weakly preferred to no smoking and no smoking is weakly preferred to smoking heavily.
Hence, according to transitivity, he should prefer one cigarette to smoking heavily. However, once this
individual has been smoking for several years, his preferences over cigarettes could have changed to:
smoking heavily is weakly preferred to one cigarette and one cigarette is weakly preferred to no smoking
at all. According to this new preference relation, and using transitivity, we can conclude that now this
individual prefers to smoke heavily versus having only one cigarette. But this conclusion contradicts this
individuals past preferences when he started to smoke.
10


Utility function
Once we have defined the main assumptions behind a rational preference relation, we are ready to define
a utility function. A function u:XR from the set of alternatives to the set of real numbers is a utility
function representing a preference relation if, for every pair of alternatives x and y that belong to X,
x

y is equivalent to u(x) u(y).


Let us emphasize two main points from this definition. First, only the ranking of alternatives matters.
Indeed, a utility function u(x) such that u(x)=14 and u(y)=10 provides the same ranking of alternatives x
and y than utility function u(x) where u(x)=2000 and u(y)=3, since both utility functions rank
alternative x above alternative y. Hence, the individual does not care about cardinality (the number that
the utility function associates with each alternative) but instead cares only about ordinality (the ranking of
utility values among alternatives). Second, if we apply any strictly increasing function f(.) on the utility

8
you should check that by noticing that we created a cycle, since:

condition 1&3 cond. 1&2 cond. 2&3
MIT WSU Home Univ MIT



9
This is the so-called Condorcet paradox, extensively studied in social choice problems.
10
This has been criticized as a real form of intransitivity, since the individual decision-maker could be regarded as
different according to the period of time in which he states his preferences over alternatives. Hence we will only
refer to the first three types of intransitivies.
6

function u(x), i.e., v(x)=f(u(x)). Importantly, the values associated to this new function keep the ranking
of alternatives intact, and therefore the new function still represents the same preference relation.
11

Note on reflexivity. Note that we didn't define reflexivity in our previous discussion. In particular a
preference relation satisfies reflexivity if for any alternative x in X, we have that:
1. x x, so that any bundle is indifferent to itself,
2. x

x, so that any bundle is preferred or indifferent to itself, and


3. x / x
4. This assumption ensures that any bundle belongs to at least one indifference set
12
, namely the set
containing itself if nothing else. Note however, that reflexivity is implied from completeness. Indeed,
if we replace alternative y for x, we can transform the assumption of completeness into the
assumption of reflexivity.

Choice based approach
In the choice based approach we focus on the actual choices made by the individual, rather than on the
process of introspection by which the individual discovers his own preferences over different alternatives.
In the choice based approach we use the so-called choice structure, which contains two elements:
1. | is a family of nonempty subsets of X, so that every element of | is a set B X c . Let us
provide some examples of sets B.
a. In consumer theory, set B can be understood as a particular set of all the affordable
bundles for a consumer, given his wealth and the market prices. (We refer to this set of
affordable bundles as the consumers budget set.) Note that the budget set can be defined
as a subset of the real numbers.
13

b. B as a particular list of all the universities where you were admitted, among all
universities in the scope of your imagination X, i.e., B X c .
2. c(.) is a choice rule that selects, for each budget set B, a subset of elements of B, with the
interpretation that c(B) are the chosen elements from B.

11
For instance, v(x)=3u(x), v(x)=5u(x)+8, etc. are all examples of strictly increasing functions applied to the
original utility function u(x) that represent the same preference relation as u(x) since all of them maintain the same
ranking of utility values associated to each alternative x in X.
12
Below we provide a more detailed description of indifference sets, but note that they can be understood as the set
of alternatives over which the consumer is indifferent. Using an example from consumer theory, recall that
indifference sets are graphically represented using indifference curves, reflecting the set of bundles for which the
consumer reaches the same utility level.
13
In the case of consuming only two goods, the set of affordable bundles B (budget set) becomes a subset of R
2
, i.e.,
a subset of the positive quadrant, which represents all possible bundles.
7

a. Following with our example of consumer theory, c(B) would be the bundle/s that the
individual chooses to buy, among all bundles he can afford in the budget set B; and
b. In the example of the universities you were admitted to, c(B) would contain the
university that you choose to attend.
Note that c(B) might contain a single element, in which case the choice rule is a function, or it
might contain more than one element, in which case the choice rule is correspondence.
14


Examples. Let us now see two examples of choice structures. Define the set of alternatives as X={x,y,z},
and consider two different budget sets (both of them being subsets of the set of alternatives X), budget set
B1={x,y} and budget set B2={x,y,z}.
In choice structure one, the individual chooses element x, and only x, regardless of which budget set is
presented to him. That is, c1({x,y})={x} and c2({x,y,z})={x}. In choice structure two the individual still
selects only alternative x when he is confronted with budget set B1, i.e., c1({x,y})={x}. However, when
the budget set is enlarged to contain alternative z as well, as it does in B2, his choice reverts to only
alternative y, i.e., c2({x,y,z})={y}. (We will comment on the consistency of this choice rule below).

Consistency on choices: the Weak Axiom of Revealed Preference (WARP)
Paralleling the rationality assumption on the preference-based approach, we now impose a consistency
requirement on the choice-based approach. Specifically, we consider that the actual choices of an
individual are consistent if they satisfy the following weak axiom revealed preference (WARP).
We say that the choices structure (B,c(.)) satisfies the WARP if,
For some budget set with , we have that element is chosen, ( ), then B x y B x x C B eB e e

For any other budget set where alternatives and are also available, , , and where
alternative is chosen, ( ), then we must have that alternative is chosen as well, ( ).
B x y x y B
y y C B x x C B
' ' eB e
' ' e e

Example. When the individual decision-maker faces budget set B={x,y}, he chooses only alternative x.
When he faces an enlarged budget set B (which contains the same alternatives as budget set B (x and y),
but also alternative z), then his legal choices according to the WARP are:
1. x, which can be rationalized because alternative x is still the best alternative even after including
z as an additional alternative.
2. z, which can be explained because the new option z is better than the previous alternatives x and
y.

14
Informally, we generally understand a function as a mathematical mapping that provides a single element in the
range to each element in the domain, while a correspondence is understood as a mapping providing more than one
element in the range to each element in the domain.
8

3. x and z, which can be justified because new option z is very similar to alternative x and as a
consequence the decision-maker chooses both.
Note, that the individual decision-maker cannot select alternative y alone if his choice rule satisfies
WARP. Indeed, this alternative was available under budget set B but the individual did not select it when
B was presented to him. Hence, the fact that new options are now available should not cause an
alternative that was affordable but not selected under the old budget to be selected under the newly
enlarged budget set B. This means the individual cannot choose {x,y} when facing budget set B since
alternative y is contained in this choice. As suggested in the previous argument, y was not selected under
budget set B, and therefore cannot be part of the choices made by the individual under budget set B.
The following figure illustrates a choice rule that satisfies the WARP. Indeed, the individual decision-
maker selects alternatives x and y, both when facing budget set B and B.

Figure #1.1
In contrast, the figure below represents a choice rule violating WARP, since the individual chooses only x
when facing budget set B, but switches to y when facing budget set B, despite of the fact that both
alternatives x and y were available under budget set B. (Note that this choice rule is similar to choice rule
2 in our above example in the previous section, where the individual decision-maker chooses only x when
confronted to budget set B, and only y when facing budget set B, where both x and y belong to budget
sets B and B. For this reason, we can conclude that choice rule 2 above violated the WARP).
9


Figure #1.2
We can now construct the preferences that the individual reveals in his actual choices when he is
confronted to choose an element (or elements) from different budget sets.
A. First, if there is some budget set B for which the individual chooses x, where alternatives x and y
belong to B, then we can say that alternative x is revealed at least as good as alternative y, and
denote it as
*
x y

.
B. Second, if there is some budget set B for which the individual chooses x but he does not select y,
where alternatives x and y belong to B, then we can say that alternative x is revealed preferred to
alternative y, and denote it as
*
x y .
[Note that when x
*

y in the above point, the individual decision-maker is allowed to choose both


alternative x and y. However, when x
*
y, the individual is only allowed to choose x.]
Let C*(B,

) be the set of optimal choices generated by the preference relation

when facing a budget


set B. using the notation, we can restate the WARP as follows:
If alternative x is revealed at least as good as y, then y cannot be revealed preferred to x, i.e., if x
*

y, then we cannot have y


*
x.
We finally examine the relationship between the preference-based approach and the choice based
approach. In particular we want to investigate under which cases a rational preference relation implies
that the choices structure satisfies the WARP, and under which conditions the opposite relationship holds.
Let us next check that a rational preference relation implies that the choices structure satisfies the WARP.
Proof:
- First, suppose that for some budget set BeB, we have that
*
, and ( , ) x y B x C B e e

.
*
( , ) , for all x C B x y y B e e


10

- In order to check WARP, assume some other budget set B' eB with , and x y B' e
*
( , ) y C B' e

.
*
( , ) , for all z y C B y z B ' ' e e


- Combining the conclusions from the previous two points, x y

and y z

, we can apply
transitivity (because the preference relation is rational), and we obtain x z

. Then
*
( , ) x C B' e

, and we find that
*
, ( , ) x y C B' e

, which proves that WARP is satisfied.

The opposite relationship (where the choice structure satisfying the WARP implies a rational preference
relation) only holds if the budget set B contains three or fewer elements. MWG describes a proof from
Arrow about this result.

Consumption sets
In this lecture we start considering the set of feasible bundles for the consumer. The consumption set is
the set of affordable bundles. One way to define a consumption set is by a set of prices, one for each
possible good, and a budget. Or a consumption set could be defined in a model by some other set of
restrictions on the set of possible consumption bundles. Also, consumption set is a subset of the
commodity space RL denoted by X contain RL whose element are the consumption bundle that the
individual can conceivably consume given the physical constrains imposed by his environment. e.g. if
consumer i can consume nonnegative quantities of all goods, it is standard to define xi as its consumption
set, a member of R+L where L is the number of goods. Normally if the agent is endowed with a set of
goods, the endowment is in the consumption set.
Let's denote a commodity bundle x as the vector of L components. For generality, at this stage we allow
each component to be positive or negative.

We can impose physical or economic constraints on the consumption set. The physical constraints are
only related with legal constraints (such as the maximum amount of consumption of a particular good, the
maximum amount of working hours, etc.). In contrast, economic constraints on the consumption set
emerge from market prices and the individual's income, which determine the set of affordable bundles for
the individual.


11

Physical constraints
Let us first have a look at the effect of imposing physical constraints on a consumption set. For simplicity
we will assume that the consumption set is defined in the set of positive real numbers. The following
figure illustrates a particular physical constraint in the labor market. Specifically, the human worker
cannot work more than 24 hours a day, and therefore his maximum amount of leisure is 24 hours. If, a
law establishes a maximum working day of 16 hours a day, his consumption set would shrink, and would
be represented by the area from 8 to 24 hours of leisure per day.

Figure #1.3
The following figure indicates the presence of indivisibilities in the consumption of good two. Indeed,
this good can only be consumed in integer amounts while good one can be consumed in any small
divisible parts. Therefore, the consumption set is given by the union of different horizontal lines, each of
them representing a particular amount of the indivisible good.

Figure #1.4
The next figure represents the consumption of two goods that cannot be enjoyed simultaneously in the
same consumption bundle. In particular, good one denotes the consumption of bread in Seattle at noon,
while good two denotes the consumption of bread in New York City at noon in the same day. Therefore,
the consumption set coincides with the two axes. Indeed, the consumer can choose to consume any
amount of bread in Seattle, or any in New York City, but not a combination of the two.
12


Figure #1.5
Finally, the following figure illustrates the presence of a minimum amount of bread that guarantees
survival. Specifically, the consumer must eat at least four slices of either type of bread (white or brown)
in order to avoid starvation. He can do so by consuming four slices of one type of bread, or a combination
of the two types.

Figure #1.6
We can now define convexity in consumption sets. We say that a consumption set X is convex if, for two
consumption bundles x and x in X, the bundle
(1 ) ( 0 , 1 ) x x x o o o '' ' = + e
is also an element of the consumption set X. Intuitively, a consumption set is convex if for any two
bundles that belong to the set we can construct a straight-line connecting them that lies completely in the
set. As a practice, let us check if some of the previous consumption sets satisfy this definition of
convexity. First, note that the consumption set representing the presence of indivisibilities in
consumption, where the individual can only consume integer amounts of good two, is not convex since
the linear combination (straight line) between any two bundles that belong to the set does not necessarily
lie in the set. Similarly, the consumption set with consumption of bread in Seattle and New York City at
noon on the same day does not satisfy convexity either. Indeed, the linear combination of any two bundles
lies entirely outside the consumption set.
15
Note that by aggregating data, such as considering the

15
As a remark, note that we are not conceding the extremes of the straight-line that connects bundles x and x in the
definition of convexity since we impose alpha>0 and <1 strictly.
13

consumption of bread in Seattle (or in New York City) during an entire month, we would be able to
convexify the consumption set, since individuals would be able to consume both goods during that time
span.

Economic constraints
Before defining the economic constraints in the individuals consumption set, let us first discuss some of
the assumptions we make on the price vector.
1. We assume that all commodities can be traded in a market, at prices that are publicly observable.
This is the so-called principle of completeness of markets (or universality of markets) seems all
goods can be traded. Importantly, note that this assumption discards the possibility that some
goods cannot be traded, such as pollution when no property rights are clearly defined.
2. Prices are assumed strictly positive for all L goods. We denote this by writing p>>0, i.e., pk>0 for
all goods k. Again, note that some prices could be negative in some circumstances, such as
pollution since individuals would be willing to pay in order to have less of them. We do not allow
for negative prices in the following chapters but we return to the possibility of negative prices
when we discuss externalities.
3. Price taking assumption: a consumers demand for all the goods he consumes represents a small
fraction of the total demand for good. Therefore, his position on whether to buy or not buy the
good does not affect market prices.
16

We are now ready to define the set of affordable bundles for the consumer. In particular bundle x,
describing the amounts purchased of L different goods, is affordable if
1 1 2 2
...
or in vector notation
L L
p x p x p x w
p x w
+ + + s
s

Note that px represents the total cost combined bundle X. at market prices p, while w represents the total
wealth of the consumer.
17
When we define the consumption set to coincide with the set of positive real
numbers, then the set of feasible (affordable) consumption bundles consists of the elements in the
following set:
{ }
,
:
L
p w
B x p x w
+
= e s
Let us next see one example of a set of affordable consumption bundles where, for simplicity, we only
consider two goods.

16
Note that this assumption will not be valid if the consumer possesses monopsony power in his demand for a
particular good. This is the case, for instance, in labor markets where only one employer buys labor services in a
relatively small locality.
17
Note here a usual distinction between wealth and income: wealth refers all of the resources of the consumer
during a certain time span (which can potentially include his entire lifetime), whereas income refers to the
individuals resources during a single time period.
14


Figure #1.7
Graphically, the upper boundary of the set of affordable consumption bundles represents the set of
bundles for which the individual entirely exhausts his wealth buying different combinations of good one
and two, i.e., p1x1+p2x2=w, or in vector notation, px=w. We refer to this upper boundary as the budget
line. Intuitively, note that the individual is exhausting all his wealth buying only good two (one), the
maximum amount of this good he can afford his w/p2 (w/p1, respectively). Finally, note that the slope of
the budget line is given by the price ratio p1/p2.
18
In the case that the consumer can buy more than two
goods, the budget line is usually referred as the budget hyperplane. The following figure illustrates the
budget hyperplane for the case in which the consumer buys three different goods. Graphically, note that
the budget hyperplane represents the surface of bundles for which the consumer exhausts his wealth.

Figure #1.8
One important characteristic of the price vector is that it is orthogonal to the budget line. In order to see
this, first note that on the budget line px=w for any x on the budget line. We can then take any other

18
Note that, solving for good two, the equation of the budget line is given by
1
2 1
2 2
p w
x x
p p
= , where w/p2
represents the vertical intercept while p1/p2 represents the negative slope.
15

bundle x which also lies on the budget line, so that px=w. Similarly for any other bundle xbar, i.e.,
pxbar=w. We can now combine these results, finding that pxbar=px=w, or p(x-xbar)=0, or simply
0 p x A =
And since this result is valid for any two bundles on the budget line, then the price vector must be
perpendicular to deltax on the budget line. Hence, this implies that the price vector is perpendicular
(orthogonal) to the budget line, as depicted in the following figure.

Figurer #1.9
Finally, we impose an assumption on the budget set which will become very convenient in later chapters
when we analyze the optimal consumption bundle that the consumer selects among all the bundles he can
afford. In particular, we consider that the budget set is convex. In this regard, we need that for any two
bundles on the budget set x and x, the linear combination
(1 ) (0,1) x x x o o o '' ' = + e
also belongs to the budget set.
19

We know that if and . Then, p x w p x w ' s s
(1 )
(1 )
p x p x p x
px px w
o o
o o
'' ' = +
' = + s

Note that the budget sets described above for two and three goods satisfied this definition of convexity
since we could select any two bundles from the budget set, construct a linear combination between both
of them (straight-line), and check that all the bundles in this linear combination belong to the budget set
as well.
Let us see next an example of a budget set that doesn't satisfy convexity. In particular, it describes the set
of affordable bundles for an individual working for a firm, with his consumption of leisure in the

19
Similarly as our definition of convexity for consumption set, note that here we only consider alpha>0 and <1
strictly, since otherwise the extremes of the linear combination between bundles x and x (that this, bundles x and x
themselves) would be included in our definition of convexity.
16

horizontal axis and his consumption of all other goods the vertical axis. Starting from the horizontal
intercept (where this individual enjoys 24 hours of leisure with no consumption of other goods), this
individual can start working and obtain a wage of s dollars per hour for his first eight hours of work. If he
works more than eight hours, he received overtime wage of s>s dollar per hour, which allows him to
consume a larger amount of other goods. However, when his labor income exceeds M dollars, he must
pay a proportion t from his total income, reducing his real wage (after taxes) to s(1-t). Graphically, this
implies that the budget line is relatively flat for the first eight hours of work, becomes steeper when the
worker starts to receive overtime pay, but becomes flatter again when the worker is taxed.

Figure #1.10
Importantly, this budget set is not convex since for any two bundles, such as x and x in the figure, its
linear combination does not lie in the budget set.
20

Quasilinear preference relations

20
In our initial discussion of convexity of the budget set, we suggested that a non-convex budget set could lead to
potential problems when solving for the optimal bundle that the consumer selects when solving his utility
maximization problem. Indeed, note that for several preference relations the above non-convex budget set could lead
to multiple solutions. Graphically, a given indifference curve could be tangent to the above budget line at several
points.
17


Intuitively, the first condition simply states that if two bundles lie on the same indifference curve then if
we increase the amount of the first good contained in both bundles, then the newly created bundles must
also lie on the same indifference curve. The second condition, on the other hand, states that if we increase
the amount of the first good in bundle X the newly created bundle must be strictly preferred to the
original bundle X. These conditions can be easily understood by looking at the following figure:

Figure #1.11
Finally, note implication of the above to conditions. In particular if bundle X is strictly preferred to
bundle y then if we increase the amount of good one in bundle X and y it must be the case that the
enlarged bundle X must be preferred to the enlarged bundle y this property is also illustrated in the figure.
After analyzing the definition of quasilinear preferences we can discuss how to detect quasilinear utility
functions. In particular, a quasilinear utility function that you might have encountered in your
intermediate microeconomics classes looks as follows
18

2
-
An example from undergrad:
( , ) ( ) where 0 and ( ) non-linear. ex: ( )
Easily generizable to 2 goods,
( , , ) ( , )
de
non linear in all other goods
U x y v x b y b v x is v x x or x
N
U x y z v x y b z
= + > =
>
= +


sirable good

The MRS of such functions is constant in the good that enters linearly in the utility function. In other
words, for a given level of good one, an increase in the amounts of good two does not affect the slope of
the indifference curve. Let us see that with an example.

Figure #1.12
Note that another example is that a linear preference relation (perfect substitutes), where both goods enter
linearly into the utility function. We can therefore conclude that preferences over perfectly substitutable
goods are a particular case of quasilinear preferences.

So far we have examined assumptions behind the preference relations and particular types of preference
relations and utility functions. However, we have not analyzed under which conditions we can guarantee
that a preference relation can be represented with a utility function. Specifically, the assumptions we
consider so far are not enough to guarantee that any preference relation can be represented with a utility
function. One example of a preference relation that cannot be represented by a utility function is the so-
called lexicographic preference relation that we discuss next.

Lexicographic preferences:
1 1
1 2 1 2
1 1 2 2
, or if
( , ) ( , ) iff
and
x y
x x y y
x y x y
>

= >


Intuitively, note that this preference relation works like alphabetizing a dictionary: first the individual
refers bundle X if it contains more of good one than bundle y if however, both bundles contain the same
amount of good one, then the individual prefers the bundle which contains more of the second good. One
important characteristic of this preference relation is that its indifference set cannot be drawn as an
19

indifference curve. For a given bundle there are no more bundles for which the consumer is indifferent.
Let us examine this property by identifying the upper contour set, lower contour set, and the indifference
set.

Figure #1.13
1 1 1
1 2
1
1
1
( , )
( ) :
( ) :
( ) : singletons
x x x
UCS x
LCS x
IND x
=

First, note that the upper contour set of bundle x is the set of bundles containing more of good one and
those bundles that, contain the same amount of good one but have more of good two. Similarly, the LCS
is defined by those bundle that contain less of good one and those that, containing the same amount of
good one, have less of good two. Hence, the UCS and LCS span all the positive quadrant, leaving no
room for the indifference set of bundle x, other than the bundle itself. As a consequence, we say that
indifference set for bundle x is the bundle itself, or in other words, that IND(x) is a singleton.
Hence, the previous example suggests that we need to impose an additional condition on preference
relations in order to guarantee that they can be represented with a utility function. This property is
continuity as we define below.

Continuity. A preference relation defined on X is continuous if it is preserved under limits. That is, for
any sequence of pairs } {
1
( , ) with for all
n n n n
n
x y x y n


and lim and lim
n n
n n
x x y y

= = , then we
have the preference relation is maintained in the limiting points, x y . Intuitively, this implies that there
are no jumps in my preferences over a sequence of pairs.

Intuitively, this property states that there can be no sudden jumps in an individual preference over a
sequence of bundles, i.e., there are no sudden preference reversals. The following figure illustrates
20

preferences that satisfy continuity, where the individual decision-maker refers bundle x1 to y1, x2 to y2,
and similarly at limiting points of the sequence, where he still prefers bundle x to y

Figure #1.14
Let us next show why a lexicographic preference relation doesn't satisfy continuity.

Figure #1.15
Notice the limits of the sequences. Intuitively, the individual prefers bundle x1 to y1 since the former
contains more of good one that the later. Similarly the individual prefers bundle x2 to y2 given that the
former still contains more of good one than the later. However, at the limiting points of the sequence,
21

bundle x becomes (0,0) while bundle y is still (0,1). Therefore, both bundles contain the same amount of
good one, and the individual ranks them based on the content of good two, leading to bundle y being
strictly preferred to bundle x. These is a preference reversal, and as a result a violation of continuity.
After describing continuity, we are ready to establish under which conditions any preference relation can
be represented using a utility function.


Figure #1.16
22



Note: as a remark, note that a utility function can satisfy continuity but still be non-differentiable. For
instance, the Leontieff utility function, min{ax1,bx2}, is continuous but cannot be differentiated at the
kink.

1

Chapter 2 Demand functions
The utility maximization problem
We are now ready to combine the tastes of the individual embodied in his utility function and the budget
line representing the set of bundles he can afford, in order to examine the set of optimal choices for the
individual. In particular, the consumer maximizes utility level by selecting a bundle X (choice variable)
subject to the fact that the cost of such bundle cannot exceed his wealth.
0
max ( ) ( )
. .
x
u x UMP
s t p x w
>
s

One important point is to know whether the above maximization problem has a solution. The Weierstrass
theorem provides us with an answer, since the objective function we are maximizing (utility function) is
continuous and the budget constraint defines a closed and bounded set (given that p>>0 and w>0),
therefore the problem does have a solution. Regarding the number of solutions to the above maximization
problem, note that if preferences are strictly convex, then the solution is unique.
For simplicity, we denote the solution to the UMP as the argmax of UMP. Argmax means: the argument,
x, that solves the maximization problem. We denote the solution as ( , ) x p w : the Walrasian demand.
We can conclude three main properties from the solution of the above maximization problem.

First, note that homogeneity of degree zero should come as no surprise. Specifically, an increase in both
the price vector and wealth level of the same extent doesn't change consumers budget set. Since the
budget set is unchanged, the optimal bundle selected by the individual shouldnt change either. Second,
note that WL follows from LNS. Indeed, if the consumer were not selecting a bundle x that lies strictly
inside the budget set (so that he is exhausting all of his wealth), we could find another bundle y at epsilon
distance from bundle x that is strictly preferred by the individual to bundle x. In this case, however, the
initial bundle x cannot be utility maximizing because there are other bundles that are still affordable and
which are strictly preferred by the consumer. If bundle x in contrast lies on the budget line we could
identify bundles that are strictly preferred to x but these bundles would be unaffordable to the consumer.
2


Figure #2.1
Finally, note that if preferences are convex (but not strictly convex) the set of bundles that maximize the
individual's utility define a convex set, as the figure below illustrates. If, in contrast, the consumers
preferences are strictly convex, he selects a unique bundle as Walrasian demand.

Figure #2.2
After describing the UMP, we can now examine the first order conditions of these maximization
problems.
3


A natural question at this point is whether the above necessary conditions are also sufficient. In other
words, under which conditions we can guarantee that the Walrasian demand that we have found is the
maximum of the UMP and not the minimum. In particular, this is the case when the utility function is
quasiconcave and monotone, and the vector of first order derivatives is different from zero for all x. Let
us briefly analyze these conditions. First, the condition stating that the utility function should be
monotone only implies that if we increase both goods simultaneously we reach a higher utility level,
which is expected in most applications. Second, the condition that the first order derivatives are different
from zero simply guarantees that there are no bliss points. Intuitively, if the vector of first-order
derivatives was zero we would have reached the peak of utility. At this point, however, the individual
would not be able to find any other preferred bundle, thus violating LNS. Finally, the condition that the
utility function satisfies quasiconcavity is also easy to justify. The following figure represents an
indifferece map of an individual whose preferences do not satisfy quasiconcavity.

Figure #2.3
4

Indeed, note that the UCS is not convex. This implies that the tangency condition between the
indifference curves under the budget line is not a sufficient condition for a utility maximization bundle.
Specifically note that a point of tangency condition such as bundle C gives a lower utility level than a
point of non-tangency, such as bundle B. therefore, if preferences do not satisfy quasiconcavity the KT
conditions (graphically represented by the tangency condition) are not sufficient for a maximum.
1

Because the three requirements for the necessary conditions to become sufficient are relatively mild, we
can then expect KT conditions to be sufficient in most economic applications.
Note: why does the MRS represent the slope of the indifference curve? Answer: note that in order to find
the slope of the indifference curve we must modify both x1 and x2 without altering the utility level of the
individual. We do that by totally differentiating the individuals utility function,

Importantly, note that so far we have been analyzing interior solutions. If, however, the individual prefers
to consume zero amounts of some of the goods, the above tangency condition will not be satisfied. In
particular, at the corner solution we find that, after taking the first order conditions,
* *
( ) ( )
,
, or alternatively, ,
because the consumer would like to consume even more of good !!
l k l
k
u x u x
x x p
l k p
l k
MRS
p p
l
c c
c c
> >

*
( )
*
In the FOCs, this implies for those goods whose consumption is zero,
x 0, and...
k
u x
k x
k
p
c
c
s
=

*
( ) *
for the good for which consumption is positve, x 0
l
u x
l l x
p
c
c
= >
* *
( ) ( )
per dollar per dollar
spent on good spent on good
l k
u x u x
x x
l k
MU MU
l k
p p

c c
c c
= >




1
Note that the two maximum this case is bundle A.
5

Figure #2.4
A note on the Lagrange multiplier. The Lagrange multiplier is usually referred as the marginal value of
relaxing the constraint in the UMP (or alternatively as the shadow price of wealth). Let us analyze why
this is the case. First, note that if we relax budget constraint in the UMP the consumer is capable of
reaching a higher indifference curve and as a consequence of obtaining a higher utility level. The
following figure illustrates this point.

Figure #2.5
Hence, we want to measure what is the increase in utility resulting from a marginal increase in wealth. In
order to do so, we take first order conditions on the individuals utility level measured at the bundle that
maximizes his utility (Walrasian demand).
6


As an example, note that if lambda=5, then a marginal increase in wealth induces an increase of five units
of utility.
Example: Lets consider a real example connected with utility maximization problem. Take the Cobb
Douglas function expressed by U (X, Y) = X
u

[
, which is subject to the following budget constraint
I = P
x
X + P

Y, where for convenience we assume +=1. We can now solve for the utility maximizing
values of X and Y for any prices (P
x
,P

) and income (I). Setting up the Lagrangian expression


1 = X
u

[
+(I -P
x
X -P

)
yields the first order conditions:
d
dx
= oX
u-

[
- P
x
=0

= X
u

[-1
- P

=0

= I - P
x
X - P

= u
Taking the ratio of the first two terms shows that
u
[X
= P
x
/P


or P

Y=
[
u
P
x
X =
1-u
u
P
x
X,
where the final equation follows because +=1. Substitution of the first order conditions to I = P
x
X +
P

Y gives P
x
X(1 +
1-u
u
)=
1
u
P
x
X. Solving for X yields X
-
= I/P
x
and a similar set of manipulations would
give
-
= I/P

.


7

Walrasian demand
We found the Walrasian demand function, ( , ) x p w , as the solution to the UMP. This demand function
satisfies several properties:
R
R
Walras' Law: for every p 0, w 0 we have
p x w for every x x(p, w)
Generally, Homog(R) of a function f (x, y) :
f (ax, ay) a f (x, y)
Example from production:
f (2L, 2K) 2 f (L, K)
>> >
= e
=
=

Recall that homogeneity of degree zero can easily be understood by the fact that an increase in prices and
wealth in the same proportion do not modify the consumers budget set.
2
Regarding Walras' law, note that
it only relies on LNS.
Let us now analyze how the Walrasian demand is affected by changes in the individuals wealth level or
in the prices of some of the goods. When demand increases in wealth we say that good is a normal good
while when it decreases in wealth we refer to those goods as inferior. Examples of the former can be
computers whereas examples of the later are Two-Buck Chuck or Wal-Mart during the economic crisis.
3

Graphically an increase in the wealth level produces an outward shift in the budget line, as the following
figure illustrates.

Figure #2.6

2
Remember that we say that a function is homogeneous of degree R if increasing all the elements of the function by
a factor alpha produces an increase in the value of the function of alpha to the power of R. hence, when a function is
homogeneous of degree zero an increase in all its arguments does not modify the initial value of the function.
3
Indeed, several reports suggest that a decrease in the average wealth during the 2009 economic crisis produced an
increase in the sales of certain discount supermarkets such as Wal-Mart.
8

At a given price level, the consumer chooses an optimal consumption bundle, as described in the figure.
We can then connect all these optimal consumption bundles for different levels of wealth forming what
we refer as the wealth expansion path, or Engel curve. When the wealth expansion indicates an increase
(decrease) in the consumption of good j as a consequence of further increments in the wealth level, we
say that this expansion path is reflecting that good j is normal (inferior, respectively). The above figure
illustrates an example in which good one is initially normal but then becomes inferior, while good two is
normal for all levels of wealth.
We now move to the analysis of how demand reacts to price changes. When the demand for good K
decreases as a result of an increase in the price of good K we simply regard that good as a usual good,
seems its quantity demanded reacts negatively to its own price. If, in contrast, quantity demanded of good
K increases as a result of an increase in the price of good K, we regard that good as Giffen.
4
We can
illustrate these negative and positive relationships in the following two figures, with demand for good K.
in the horizontal axis and own price in the vertical axis.

Figure #2.7
Other than analyzing the effect of its own price we are interested in examining the effect of a change in
the price of good L on the quantity demanded for good K (more compact preferred as cross-price
effects) we can either find that this relationship is positive for two goods regarded by the consumer as
substitutable (such as two brands of mineral water) or negative for two goods regarded as complementary
in consumption (such as left and right shoes, cars and gasoline, etc.). We can use a similar graphical
representation is the one employed above in order to represent these cross-price effects.

4
One of the few examples of Giffen goods is that of potatoes in Ireland during the 19th century. However, this is
still a strong controversy among economists on whether demand for potatoes actually moved in the same direction
as its own price.
9


Figure #2.8
In the figure on the left side we can observe that an increase in the price of one brand of mineral water
increases the demand of the other brand over no water that the consumer regards as a close substitute. In
the figure on the right, we observe how an increase in the price of gasoline reduces the demand for cars,
shifting it inwards.
We have discussed the set of properties of the optimal consumption bundle (Walrasian demand) as the
solution of the UMP. There are still, however, some important points about the UMP that we must stress.
First if we insert the optimal consumption bundle into the individuals utility function we obtain the
highest utility level that the individual can achieve by solving this UMP. More formally, we refer to the
utility function evaluated at the solution of the UMP as the indirect utility function, v(p,w). [More
generally, we will refer to the objective function of an optimization problem evaluated at the solution of
the optimization problem as the value function. Hence, the value function of the UMP is the indirect
utility function]. Function v(p,w) satisfies several properties:
1. Homogeneity of degree zero.
2. Strictly increasing in w and nonincreasing in
k
p for any k.
3. Quasiconvex: the set } {
( , ) : ( , ) p w v p w v s is convex for any v (Figures in Rubinsein and
MWG for examples).
4. Continuous in p and w.
First, note that homogeneity of degree zero should come as no surprise. In particular, it states that
increasing market prices and wealth by the same proportion does not modify consumers budget set, as a
consequence such increase does not modify the consumers optimal consumption bundle, and therefore it
doesn't modify the maximal utility level that the individual can reach, as measured by v(p,w). The second
property states that if we increase the wealth level of the individual we are enlarging the set of feasible
bundles he can afford and as a consequence the indifference curve he can reach when selecting his
optimal consumption bundle. Therefore, the maximal utility level that he can reach is strictly increasing in
his wealth level. In contrast, an increase in the price of any good shrinks the set of affordable bundles and
as a consequence the individual can only reach indifference curve associated to lower utility levels. Thus,
an increase in the price of any good K produces a reduction in the maximal utility level that the individual
10

can obtain by solving this UMP. Regarding the second property, quasi-convexity, let us provide an
intuitive explanation by using the following figures.

Figure #2.9
First, note that the indirect utility function is depicted in the prices in the horizontal axis and wealth level
in the vertical axis. Hence, when prices increase from P11 to P12, wealth must also increase in order to
maintain the same utility level for this individual. In addition, note that lower prices and higher wealth
levels are associated to higher maximal utilities. Quasiconvexity tells us that, if the max utility associated
to a given pair of prices and wealth (A) is weakly higher than the max utility associated to another pair of
prices of wealth (B), then max utility associated to the linear combination of prices and wealth between A
and B is weakly lower than that associated with A.
We can provide an alternative interpretation of Quasiconvexity as follows. The indirect utility function
satisfies quasiconvexity if the set of pairs of prices and wealth for which the max utility that the consumer
can reach is lower than that under pair (p*,w*) then the function defines a convex set. More compactly,
{ }
* *
* *
( , ) is quasiconvex if the set of ( , ) pairs for which ( , ) ( , ) is convex.
i.e., ( , ) : ( , ) ( , ) is convex
v p w p w v p w v p w
p w v p w v p w
<
<

11


Figure #2.10
An alternative way to understand Quasiconvexity uses only good one and two in the axis as follows.

Figure #2.11
Let us construct this figure sequentially. First, when the individual decision-maker is facing budget set
Bp,w, his optimal consumption bundle is x(p,w). Second, when prices and wealth change to p and w, he
faces budget set Bp,w, and therefore selects bundle x(p,w). Third, note that both bundles x(p,w) and
x(p,w) induce an indirect utility function of v(p,w)=v(pw)=ubar. Fourth, we can now construct a linear
combination of prices and wealth
'' ''
'' '
, '' '
(1 )
(1 )
p w
p p p
B
w w w
o o
o o
= +

`
= +

)

This combination of prices and wealth provides us with budget set Bp,w. Finally, note that any
solution to the UMP facing budget set Bp,w must provide a optimal consumption bundle that lies on a
lower indifference curve (associated to a lower utility level) than ubar.

12

WARP and demand
After presenting different properties about the UMP, its solution and its value function, we are now ready
create the optimal consumption bundle obtained in the above UMP with the WARP. Hence, we want to
understand if the consistency requirement imposed by the WARP limits the set of optimal consumption
bundles that individual decision-maker can select when solving the UMP.

WARP and Demand: Take two different consumption bundles
' '
( , ) and ( , ) x p w x p w , both being
affordable under (p,w).
' '
( , ) p x p w w s
When prices and wealth are (p,w), the consumer chooses
' '
( , ) despite ( , ) x p w x p w was also affordable.
Then he reveals a preference for
' '
( , ) over ( , ) x p w x p w when both are affordable. Hence, we should
expect him to choose
' '
( , ) over ( , ) x p w x p w when both are affordable (consistency). Therefore, bundle
( , ) x p w must not be affordable at
' '
( , ) p w because the consumer chooses
' '
( , ) x p w . That is
' '
( , ) p x p w w > . We can conclude that Walrasian demand satisfies WARP if, for two different
consumption bundles,
' '
( , ) ( , ) x p w x p w = :
' ' ' '
( , ) ( , ) p x p w w p x p w w s >
In words, if bundle x(pw) is affordable under budget set Bp,w, then bundle x(p,w) cannot be affordable
under budget set Bp,w.
Let us first present an example of optimal consumption bundles that satisfy WARP. The following figure,
note that bundles x(p,w) and x(p,w) are both affordable under initial prices and wealth, since they both
lie below budget line Bp,w. However bundle x(p,w) is not affordable under final prices and wealth, since
it lies above the budget line Bp,w. Therefore, WARP is satisfied.

Figure #2.12
13

Let us now examine an example in which optimal consumption bundles do not satisfy WARP. The
following figure demand under final prices and wealth, represented by bundle x(p,w), is not affordable
under initial prices and wealth, since it lies above budget line Bp,w.
5


Figure #2.13
Note the general procedure we have been using to test whether two particular bundles satisfy WARP.
First, we check if bundle x(p,w) and x(p,w) are both affordable under the initial prices and wealth.
Graphically, this implies that both bundles lie on or below budget set Bp,w. If this first step of the
procedure is satisfied then we can move to step two. Otherwise, the premise of the WARP is not satisfied,
which doesn't allow us to continue checking whether it is violated of not. In these cases, we say that the
WARP is not violated.
Second, we check if bundle x(p,w) is affordable under final prices and wealth. Graphically, bundle x(p,w)
must lie on or below budget line Bp,w. If this condition is satisfied, then this Walrasian demand violates
WARP. If, in contrast, this second step is not satisfied, then this Walrasian demand satisfies WARP.
6

Let us next evaluate another example in which optimal consumption bundles do not satisfy WARP. The
figure below represents another case in which demand under final prices and wealth, represented by
bundle x(p,w), is not affordable under initial prices and wealth, since it lies above budget line Bp,w.

Figure #2.14


5
Importantly, note that here we can check if the conclusion of the WARP since the premise of WARP is not
satisfied.
6
For more examples and practice about Walrasian demand functions that satisfy or violate WARP, see homework
assignment #2.
14

The following figure represents a similar case.

Figure #2.15
In the following figure, optimal consumption bundle under final prices and wealth, x(p,w) is affordable
under initial prices and wealth, since it lies below the budget line Bp,w. However, the optimal
consumption bundle x(p,w) under the initial prices and wealth is not affordable under the new prices and
wealth, given that it lies above budget line Bp,w. Hence, WARP is not satisfied.

Figure #2.16
In our last example below we see a similar situation as the one represented above. Specifically, the
optimal consumption bundle on the final prices and wealth, x(p,w), is affordable under initial price of
wealth since it lies below budget set Bp,w. However the demand x(p,w) is affordable under the new
prices and wealth since it lies below budget set Bp,w. Therefore, WARP is not satisfied.
7


Figure #2.17

7
In the course website you can find more applications of the WARP to taxes and subsidies, since this type of
policies modify the set of affordable bundles for the individual in a similar fashion as we did in the above figures,
15

Implications of WARP
Interestingly, the WARP has important implications on the set of optimal consumption bundles that a
given consumer chooses before and after a price change. Let us analyze these implications by considering
a reduction in the price of good one as the following figure illustrates by an upward pivoting effect on the
budget line.

Figure #2.18
But, after the price change, we want to adjust the consumers wealth so that he can consume he is initial
demand x(p,w) at the new prices. In other words, we shift the final budget line inwards (reducing this
consumers wealth) until the point at which we reach the initial consumption bundle x(p,w). Importantly,
the budget line after the shift (after the reduction in wealth) is parallel to budget line Bp,w, reflecting the
final price ratio. But, what is in particular the reduction in wealth that we must apply to this consumer in
order for him to afford bundle x(p,w)?

16

Hence, the Slutsky wealth compensation reflects that the consumers wealth has been reduced so that he
can afford his initial consumption bundle before the price change.
8
Given this definition of the Slutsky
wealth compensation, we are now ready to establish a relationship between the law of demand and the
WARP.

This is indeed an important result. It establishes that if, after the price change, the consumers wealth is
compensated a la Slustky as described above, then the WARP becomes equivalent to the law of
demand, i.e., quantity demanded and price move in different directions.
Let us next see one example in which the WARP restricts behavior when we apply Slutsky wealth
compensation.

Figure #2.19

8
in contrast, the so-called Hicksian wealth compensation is such that the wealth level of the individual after the
price change is adjusted so that he can still reach the same indifference curve he was reaching before the price
change. We will comment on this type of wealth compensation later on in this chapter.
17

The figure depicts price change similar to that represented above, where the price of good two is not
affected by the price of good one decreasing. After pivoting outwards the budget line, we apply a Slustky
wealth compensation so that the consumer can afford his initial bundle x(p,w). The consumers budget
line after the wealth compensation is hence Bp,w.
A natural question at this point is where can the optimal consumption bundle under Bp,w, x(p,w), lie.
Let us first examine whether such bundle can lie to the left-hand side of bundle x(p,w) (on segment A).
First, note that the premise of the WARP is satisfied because both bundles x(p,w) and x(p,w) would be
affordable under budget set Bp,w, since they both lie below Bp,w. However bundle x(p,w) is affordable
under final prices and wealth, given that it lies below budget set Bp,w, implying a violation of WARP.
Therefore, bundle x(p,w) cannot lie on segment A. Let us examine whether such bundle can lie to the
right-hand side of bundle x(p,w) (on segment B). First, note that bundle x(p,w) is affordable under initial
prices and wealth, since it lies below budget set Bp,w, but bundle x(p,w) will not be affordable, given
that it would lie above budget set Bp,w. Hence, the premise of the WARP does not hold, and as a
consequence, WARP would be violated if bundle x(p,w) lies on segment B. Thus, bundle x(p,w) must
contain more of good one than bundle x(p,w). We can therefore conclude that a decrease in the price of
good one (when we appropriately compensate wealth effects) leads to an increase in the quantity
demanded for such good. This is what we refer as the compensated law of demand.
9

Note an important distinction between the uncompensated law of demand and the compensated law of
demand we just described. Specifically, the demand for good one can fall as a consequence of a decrease
in the price of good, but only when wealth is compensated, as illustrated in the following figure.

Figure #2.20
This figure depicts a reduction in the price of good one similar to the one that we analyzed before. The
individual demand after the price change is given by x(p,w), where the quantity demanded of good one

9
Interesting practice: can you repeat this analysis for the case of an increase in the price of good one? First, you will
need to pivot the budget line inwards. Second, note that the wealth compensation must imply in this case an increase
in consumers wealth. Finally, you will have a budget set after the wealth compensation with two segments A and B.
Determine which one is restricted or allowed according to WARP.
18

goes down despite of the fact that the group became cheaper. In this case, therefore, the uncompensated
law of demand is not satisfied since quantity demanded and price move in the same direction. This is the
reason why we say that WARP is not a sufficient condition to yield the uncompensated law of demand,
i.e., law of demand for price changes that were not compensated. Hence, WARP and the compensated law
of demand are equivalent, but WARP and the uncompensated law aren't necessarily related. We can
examine the last point by checking whether the WARP was satisfied under the uncompensated law of
demand. In particular, bundle x(p,w) was affordable under the initial budget set Bp,w, but the
consumption bundle after the (uncompensated) price change x(p,w) was not affordable, since it lies
above budget set Bp,w. Therefore, the premise of WARP is not satisfied and hence WARP is not violated.
This example shows a case in which WARP is not violated by the uncompensated law of demand is
violated. As a consequence, this example illustrates that the WARP and the uncompensated law of
demand are not necessarily related.

The Walrasian demand function is differentiable in both prices and wealth under relatively general
conditions. Let us next examine the relationship between the compensated law of demand and the WARP.
In order to do so, let us first totally differentiate the Walrasian demand function, as follows:
( , ) ( , )
p w
dx D x p w dp D x p w dw = +
obtained from the Slutsky wealth compensation
And since the consumer's wealth is compensated,
( , ) (this is the differential analog of ( , ) ).
Substituting,
( , ) ( ,
p w
dw x p w dp w p x p w
dx D x p w dp D x p
= A = A
= +

)[ ( , ) ]
or equivalently,
( , ) ( , ) ( , )
dw
T
p w
w x p w dp
dx D x p w D x p w x p w dp

( = +


Hence, the compensated law of demand, dpdx<=0, can be also expressed as
( , ) ( , ) ( , ) 0
T
p w
dp D x p w D x p w x p w dp ( + s


where the term in brackets is the so-called Slutsky (or substitution) matrix.
11 1
1
( , ) ... ( , )
( , )
( , ) ... ( , )
where
( , ) ( , )
s ( , ) ( , )
L
L LL
l l
lk k
k
s p w s p w
S p w
s p w s p w
x p w x p w
p w x p w
p w
(
(
=
(
(

c c
= +
c c
. .

The next proposition describes the conditions under which the substitution matrix in negative
semidefinite.
19

Proposition: If ( , ) is differentiable, satisfies WL, Homog(0) and WARP, then ( , ) is
negative semidefinite,
x p w S p w

( , ) 0 for any
L
v S p w v v s eR
The fact that the substitution matrix is negative semidefinite implies that all terms in the main diagonal of
the matrix must be weakly negative. In particular, the terms in the main diagonal are sll(p,w), representing
the own-price effect, i.e., how quantity demanded for good L is affected by the price of good L.
10

As described above, the substitution effect sll(p,w) embodies two effects:
(-) for usual goods ( )for normal goods
(+) for Giffen goods (-) for inferior goods
substitution effect (-)
income effect
total effect
( , ) ( , )
( , ) ( , )
l l
ll l
l
x p w x p w
s p w x p w
p w
+
c c
= +
c c

_
_
_


We can rearrange this expression in order to state that the total effect is equal to the substitution effect
minus the income effect. [Recall that when the quantity demanded of good L decreases (increases) in the
price of good L we refer to that good as usual (Giffen, respectively). Similarly, when the quantity
demanded of good L increases (decreases) in wealth, we prefer to that good as normal (inferior,
respectively).] Let us describe each of the terms in the above expression. First, from our previous
discussion, we note that the substitution effect is negative for all types of goods, since sll(p,w)<=0.
Second, the total effect measures the change in the quantity demanded for good L. as a result of a change
in the price of good L. hence, it considers and uncompensated price effect, given that it reflects a change
in the price of good L. without adjusting the wealth of the individual. Finally, the third term measures the
wealth effect since it measures the change in demand due to the adjustment in wealth as a result of a
Slutsky wealth compensation.
11

Let us next provide a more graphical intuition of the previous discussion. The following figure represents
a reduction of the price of good one, where good one is considered normal. First, the reduction in the
price of good one enlarges the set of feasible bundles for this consumer which allows him to reach a
higher indifference curve, at which he increases the amount of good one consumed from x10 to x12. We
also include a graphical representation of the Walrasian demand associated to the above figure. The
demand indicates that a decrease in the price of good one leads to a increase the quantity demanded, i.e.
inducing a negatively sloped Walrasian demand curve (so the good is normal).

10
Importantly, note that in the fact that the substitution matrix satisfies NSD doesn't imply that this matrix is
symmetric. As a remark, in section 3G of MWG we will see that if the utility function is continuous and represents a
strictly convex preference relation that satisfies LNS, then the substitution matrix is symmetric. In many
applications, these conditions are assumed and therefore the substitution matrix will be symmetric. However, note
that in the case of preferences for perfect substitutes the indifference curves will not satisfy strict convexity and we
will not be able to use the results in section 3G in order to guarantee that the substitution matrix is symmetric.
11
We describe the concept of the Slutsky wealth compensation below. Recall from the lectures the difference
between a Slutsky and a Hicksian wealth compensation. The former adjusts the consumers wealth so that he can
still afford the initial bundle before the price change, whereas the later adjusts wealth so that the consumer can still
reach the same utility level as before the price change (graphically reaching the same indifference curve).
20


Figure #2.21
The increase in the quantity demanded of good one as a result of a decrease in its price represents what
we refer above as the total effect. Nonetheless, it might be interesting to disentangle total effect into the
substitution and income effects. The following figures replicate reduction in the price of good one and the
total effect. The figures also include the Slutsky wealth compensation. In particular, after the price change
we reduce this consumers wealth so that he can afford the same consumption bundle that he was buying
before the price change. Graphically we do so by shifting the budget line after the price change inwards
until it touches the initial bundle. The consumer can afford any consumption bundle along the new
budget line. In the top figure, the consumers optimal point involves buying only X13 units of good one.
The figure at the bottom we replicate the Walrasian demand we found above, but we also include the so-
called constant purchasing power demand curve, resulting from applying the Slutsky wealth
21

compensation. Specifically, note that a given reduction in the price of good one produces a relatively
small increase in the quantity demanded for good one when we hold the consumers purchasing power
constant (applying the Slutsky wealth compensation) but implies a relatively large increase in the quantity
demanded for good one when we do not hold the consumers purchasing power constant (in the case of
the Walrasian demand).

Figure #2.22
In the following figure we replicate our previous figure, adding the so-called Hicksian demand (also
referred as constant utility demand curve). The graph shows the Hicksian wealth compensation after the
22

price change. The consumers wealth level is adjusted so that he can reach his initial utility level
(graphically reaching the same indifference curve). In this graphical example, this implies a more
significant wealth reduction than when we apply the Slutsky wealth compensation. Therefore, the new
budget line after the wealth compensation is tangent to the initial indifference curve at a point where the
consumer demands X11 units of good one. In the bottom figure, the Hicksian demand curve reflects that
for a given decrease in the price of good one, the consumer heavily reduces his consumption of good one.

Figure #2.23
23

Let us now summarize our analysis about how different types of demand for good one are affected by a
decrease in the price of that good. First, a decrease in the price of good one increases the quantity
demanded of that good solely due to the price effect (either measured by the Hicksian demand curve or
the CPP demand curve). This increase is smaller than the increase in the quantity demanded measured by
the optimal consumption bundle x(p,w), since x(p,w) measures both price and wealth effects. Second, the
wealth compensation (wealth reduction in our example) that maintains the original level of utility of the
individual is larger than the wealth compensation that maintains purchasing power unaltered.
12

Let us now briefly review the Slutsky equation, and the classification of goods as normal, inferior, or
Giffen. The following three figures depict a decrease in the price of the good in the horizontal axis (food).
In the first figure the substitution effect moves in the opposite direction as the price change (so a
reduction in the price of food implies a positive substitution effect), and the income effect is also positive
indicating that this good is normal.

Figure #2.24
In the second figure (below), the substitution effect is still moving in the opposite direction as price, but
the income effect now negative, partially offsetting the increase in the quantity demanded associated with
the substitution effect. Nonetheless, the total effect is still positive. In this case, food is an inferior good
since the income effect is negative.

12
Reality check: note that is an undergrad, you have been using the Hicksian wealth compensation when examining
income and substitution effects. However, the Slutsky equation (and the Slutsky matrix) is obtained from applying
the Slutsky wealth compensation. An interesting practice is to redo the graphical representation of the income and
substitution effects of your intermediate microeconomics textbook applying the Slutsky wealth compensation rather
than the (probably used) Hicksian wealth compensation.
24


Figure #2.25
Finally, in the third figure the income effect is also negative and sufficiently large to completely offset the
substitution effect. As a consequence, the total effect becomes negative. [Graphically, note that the first
two figures generate a negatively sloped Walrasian demand curve, while the third figure produced a
positively sloped Walrasian demand.]
13


Figure #2.26

13
For a readable first approach to income and substitution effects, see NS pp. 141-158.
25

Comparing the second and third figures we can conclude that an inferior good doesn't necessarily have to
be a Giffen good, as shown in the second figure.
In our preceding discussions of consumer theory we considered an individual decision-maker who solves
his Utility Maximizing Problem (UMP) by choosing a bundle that maximizes his utility level subject to
his budget constraint. Alternatively, we can understand the process by which the consumer chooses an
optimal consumption bundle as an optimization problem in which the consumer minimizes his total
expenditure on goods and services subject to the constraint that he wants to reach a given utility. We refer
to this minimization problem as the expenditure minimization problem (EMP). More formally,
0
Expenditure minimization problem:
min
. . ( )
x
p x
s t u x u
>

>

As the following figure indicates, the EMP can be understood as a problem in which the consumer wants
to reach a utility level associated with a particular indifference curve, while spending as little as possible
(shifting the budget line towards the origin). Intuitively, note that the budget line strictly above bundle X*
cannot be a solution to the EMP since, despite reaching the utility level U, it does not minimize total
expenditure of this consumer, given that there is another budget line for which total expenditure is lower
and utility level U is still reached. Finally, the budget line strictly below X* cannot be the solution to the
EMP problem since, despite being very cheap, it does not satisfy the constraint of reaching utility level
U.
14


Similarly to the UMP process, we can set up the Lagragian associated to this minimization problem, and
then apply first-order (necessary) conditions with respect to each good and the Lagrange multiplier.
Setting up the Lagrangian:

Focusing on interior solutions, the above first order conditions are satisfied with equality, therefore:
Solving the EMP:

14
Assuming college students need a certain amount of groceries to survive the week, they may chose a set higher
than x*, in which case the student is spending more than their budget (expensive, name-brand items). However,
choosing a bundle below x*, the student is not reaching his ideal level of utility because they are buying all the
cheapest brands, and not getting their favorite cereal.
26


At the optimal bundle, x*, the slope of indifference curve = slope of budget line.
The bundle that solves the EMP is denoted as the Hicksian demand, h(p,u), which is the function of the
price vector and the utility level that the individual wants to reach.
15

Hicksian demand satisfies different properties. In particular, when the utility function is continuous and
represents a preference relation satisfying LNS, the Hicksian demand associated to the EMP satisfies:
1. Homogeneity of degree zero in prices:
h(p,u)=h(op, u) for any p, u and o>0. This property hence states that if bundle X is a solution to
the EMP facing a price vector P then it must also be a solution to the problem when all prices
have been scaled by the factor o. Intuitively, a common change in all prices doesn't alter the
slope of the consumers budget line. The following figure provides a graphical representation.
First, note that an increase in the price of both good one and two in the same proportion produces
a downward shift in the budget line. However, the consumer must reach a utility level U in order
to satisfy the constraint of the EMP. Hence, he will need to spend more in order to buy bundle X*
at the new price ratio. As a consequence, he selects the same optimal bundle before and after the
price change.
1617

2. No excess utility:
For any optimal consumption bundle x* that solves the EMP, the utility level satisfies u(x)=u.
The following figure provides a graphical intuition of this property. First consider a bundle X that
solves the EMP, for which the consumer obtains a utility level of u(x)=u
1
>u. That is, he obtains a
utility level higher than that he must reach when solving his EMP. But, therefore, we can find
another bundle X scaling down X, X=oX, but very close to X ( oclose to 1) for which the
utility level associated to the consumption of bundle X is larger than U, u(X)>U. [graphically,
this implies that the indifference curve passing through bundle X is associated to a utility level
higher than U.] Therefore, bundle X exceeds the minimal utility level that the consumer must
reach in his EMP and it is cheaper than the original bundle X. Concluding, for a given utility level

15
As a formality, note that when the EMP provides more than one solution we write that X* belongs to h(p,u),
whereas in the case in which the EMP this only one solution, we write X*=h(p,u).
16
As a remark, note that we define homogeneity of degree zero for all alpha> 0. This implies that a similar argument
as the one developed here for any price increase in both goods one and two (for alpha>1) can be extended to a price
decrease in both goods as well when alpha<1.
17
On a weekly basis, you must have 2 Sodas and 1 candy bar in order to function. Each of those items cost $1 in the
vending machine. Therefore, you spend $4 each week to satisfy your needs, x*. If prices were to increase by 50%,
each would now cost $1.50. With only $4 in your pocket, your budge constraint must shift downward. In order to
reach your original indifference curve, you must now spend $6 per week in order to consume your ideal bundle, x*
27

U that the consumer has to reach in the EMP, bundle h(p,u) does not exceed U since otherwise he
would be able to find a cheaper bundle that exactly reaches utility level U.
18



Figure #2.27

3. Convexity:
If the preference relation is convex, then h(p,u) is a convex set with multiple bundles satisfying
the solution. (This property is also related with the uniqueness property.)
4. Uniqueness:
If the preference relation is strictly convex, then h(p,u) contains a single element. The following
figures describe properties 3 and 4.
19
This is a more restrictive concept than convexity

18
Example of soda and candy bar continued: If you fulfill your necessary soda/candy ratio, but still have
some loose change in your pocket, there is the opportunity to reach a higher utility by purchasing
additional items. As a rational consumer, your goal is to maximize your happiness each week, therefore,
you must spend all the money in your pockets.

19
Example 3E1 in MWG provides an explanation about how to find the Hicksian demand for a Cobb-Douglas
utility function. It is simple to check for homogeneity of degree zero in prices, no excess utility and uniqueness.
28


Figure #2.28

5. Compensated law of demand:
For changing prices p and p,
' '
' '
( ) [ ( , ) ( , )] 0
Implication: for every good ,
( ) [ ( , ) ( , )] 0
k k k k
p p h p u h p u
k
p p h p u h p u
s
s

29

Interestingly, note that this property is definitely true for the Hicksian demand where we have
already compensated the wealth level of the consumer so that he can still reach he is initial utility
level. However, this property is not necessarily true for the uncompensated Walrasian demand, as
described in previous lectures.
Plugging the result of the EMP into the objective function, we obtain the value function of this
optimization problem
( , ) ( , ) p h p u e p u =
e(p,u) represents the minimum expenditure that the consumer needs to make in order to reach utility level
U when prices are p. This expenditure function also satisfies a set of interesting properties when the utility
function is continuous and it represents a preference relation that satisfies LNS:
1. Homogeneity of degree one in prices.
Intuitively, we know that the optimal bundle of the EMP, h(p,u), is not changed when all prices
change in the same extent (because the Hicksian demand satisfies homogeneity of degree zero in
prices). Such change in prices just makes it more or less expensive to buy the same bundle. That
is, e(o,u)=oe(p,u).
2. Strictly increasing in U.
intuitively, reaching a higher utility level for a given price vector requires an increase in your
expenditure, as the following figure indicates.
30


Figure #2.29

3. Non-decreasing in prices, for any good K.
That is, higher prices mean higher expenditure in order to reach a given utility level. The
following slide provides a more detailed explanation about this property:
31


Note, that we only increase the price of good K leaving the prices of all other goods unaltered.
Then the minimal expenditure that the consumer must make in order to reach utility U at prices p
is higher than when he buys expenditure minimizing bundle X at lower prices p. And similarly,
his minimum expenditure is larger than it was at prices p. (Indeed, at prices p, bundle X
minimizes the consumer expenditure more than any other bundle X)
4. Concave in prices.
We provide a graphical intuition of this property in the following figure.
32


Figure #2.30
Starting from bundle X, we know that the total expenditure at prices p is lower when buying
bundle X than when buying any other bundle X. For instance, this is true for bundle X , since
px<=p X . Similarly, at prices p, the total expenditure when buying bundle X is lower than
from buying any other bundle X, where for instance X= X . Using the results found above for
prices p and p, we obtain
' ''
' ' '' '' ' ''
' ' '' ''
( , )
( , ) (1 ) ( , )
' ''
(1 ) [ (1 ) ]
(1 )
( , ) (1 ) ( , ) ( , ) Concavity is confirmed
p
e p u
e p u e p u
p x p x p p x
p x p x p x
e p u e p u e p u
o o
o o o o
o o
o o

+ s +
+ s
+ s




5. Continuous in prices and utility.





33

Duality
In previous lectures we have described both the utility maximization problem (UMP), the so-called primal
problem in consumer theory, and its dual: the expenditure minimization problem (EMP). But, when can
we guarantee that the solution x* to both problems coincide?
Rubinstein provides a very intuitive approach to this question. Consider a function M(t) that describes the
distance that a turtle travels in time t, as the following figure illustrates.

Figure #2.31
In this case the primal and dual problems would provide us with the same answer: the maximal distance
that a turtle can travel in t* units of time is x* (in this case we're moving from the time axis to the
distance axis), or alternatively, the minimal time that a turtle needs to travel x* distance is t* (here we
are moving from the distance axis to the time axis). Note that in order to obtain the same answer from
both statements we critically need that the M(t) function satisfies monotonicity. Otherwise, we would
have a figure like the one below. In particular, the maximal distance traveled after both t1 and t2 is x*.
20

Time, t, is not a function of x otherwise.

Figure #2.32

20
Note that this figure is extremely counterintuitive since the turtle increases the distance traveled until reaching a
point from which the turtle starts to travel backwards, inducing a decrease in the distance traveled.
34

Similarly we also require that the function M(t) satisfies continuity. Otherwise, the function would show a
pattern similar to that depicted below. Specifically, the minimum time required to travel distance x1 and
x2 is t*for both distances.
21



Figure #2.33

Given the condition of monotonicity and continuity, we are now ready to specify under which conditions
we can guarantee that the solution UMP and EMP problems coincide: let the utility function be
monotonic and continuous. Then if bundle x* is the solution to the UMP, then it must also be the solution
to the EMP.
Let us now more formally approach the issue of duality in consumption. In this regard, we first need a
few definitions.
{ }
{ }
Hyperplane: for some and , the set of points in such that:
:
Half-space: is the set of bundles for which . That is:
:
L L
L
L
p c
x p x c
x p x c
x p x c
+
e e
e =
>
e >
R R R
R
R

The following figure depicts a hyperplane and half-space in R2. First note that the hyperplane is simply
represented by the set of bundles whose cost is c. (Note that this definition could be generalized to more
dimensions, indicating the consumption of more than two goods. For instance, if the consumer is
considering bundles with three different goods, the hyperplane would be a flat surface in R
3
(or 3D).) On
the other hand, the half-space represents the set of bundles whose cost is larger than c.

21
Again, this is very counterintuitive turtle since, after traveling a distance x1, the turtle makes a huge jump from x1
to x2 in a nanosecond.
35


Figure #2.34
We are now ready to state the separating hyperplane theorem.

Intuitively, this theorem states that every convex and closed set K can be equivalently described as the
intersection of the half-spaces that contain it. Indeed, as the following figure indicates, as we create more
and more half-spaces, their intersection becomes set K.

Figure #2.35
A natural question at this point is what would happen if the set that we are trying to equivalently
describe is not convex. As the following figure suggests, the intersection of half-spaces doesn't coincide
with set K, and we cannot use several half-spaces to equivalently describe set K. interestingly, the
intersection of half-spaces that contain set K is the smallest, convex set that contains K. this set is known
For every convex and closed set , there is a half-space containing and excluding
any point outside of this set.
That is, there exist and such that
for all elements in the set, x
L
K K
x K
p c
p x c
+
e
e e
> e
R R
K,
but for all elements outside the set, x K, p x c < e
36

as the convex hull of set K, and denoted as Kbar. (Of course, the convex hull Kbar is itself convex, unlike
set K).

Figure #2.36
Let us now introduce an additional definition that will be used frequently in our future expositions. We
denote the support function of the nonempty closed set K as the set

Intuitively, the support function selects, for a given price vector p, the bundle X that minimizes the total
expenditure pX.
22
Using the support function as defined above, we can now construct set K. In particular,
for every price vector p, we can define half-spaces whose boundary is the support function of set K. That
is, we define a set of bundles for which pX>=minuK(p). [Note that all bundles x in such half-space
contain elements in the set K, but do not contain elements outside set K.]
23

Therefore, the intersection of the half-spaces generated by all possible values of p describes (
reconstructs) the set K. That is, set K can be described by all those bundles X such that pX>=muK(p) for
every p.

We describ the previous intuitions in the following figure. First, note that for a given price vector, the
support function muK(p) selects the element in the set K that minimizes total expenditure pX. This
element is bundle x1 in this graphical example. Second, we can now define the half-space of the previous

22
Remember that from a mathematical point of view, we use the inf operator when we cannot guarantee that the min
of a particular function is well defined. Generally, however, most of the functions we encounter in this course have a
well defined min and max.
23
Concavity of the support function is an interesting mathematical result. You would prove that in a homework
assignment.
{ } ( ) inf for all and
L
k
x
u p p x x K p = e eR
{ }
{ }
: ( ) for every
By the same logic, if is not convex, then the set : ( ) for every
defines the smallest closed, convex set containing (i.e., the convex hull of set
L
k
L
k
K x R p x u p p
K x R p x u p p
K K
= e >
e >
).
37

hyperplane, as follows: px>=px1 for all x in K, and where px1=muK(p). Graphically, this inequality
identifies all bundles to the left of the hyperplane px1. Third, we can repeat the previous procedure for
any other bundles (for example, a bundle X3 in the north boundary of set K. repeating this process enough
times provides a full description of set K.

Figure #2.37
The above definition of the support function provides us with a useful duality theorem that we will use in
the future. Consider a nonempty, closed set K, and let muK(.) be its support function. Then there is a
unique element in set K, xbar, such that

Intuitively, note that the above theorem simply states that, for a given price vector pbar, the support
function chooses bundle xbar that minimizes total expenditure, and that such expenditure is therefore
pbar*xbar. In addition, the derivative of total expenditure with respect to price (when evaluated at the
optimum) is xbar. We will use this theorem in our discussion of the expenditure function below.

Relationships between the expenditure function and hicksian demand
Let us assume that the utility function is continuous and represents a preference relation that satisfies LNS
and that is strictly convex. Then for all p and u, we have


/
( ) ( ) is differentiable at
Moreover, in this case ( )
k k
k
p x p
p x u p u p
u p x
c c
=
V =

( , )
( , ) for every good
k
k
e p u
h p u k
p
c
=
c
38

The identity tells us that, if we want to find the Hicksian demand for good k and we have information
about the expenditure function, we just need to differentiate e(p,u) with respect to the price of good k.

Proof I (using the Duality Theorem)
First note that the expenditure function is the support function for the set of all bundles for which utility
reaches at least a level U. That is,

or, alternatively, the upper contour set of bundle x is convex and closed. If we use the duality theorem, we
can then state that this is a unique bundle in this set, h(p,u), such that
ph(p,u)=e(p,u)
Where e(p,u) is the support function of this problem. Let us see this result in the following figure.

Figure #2.38
First, note that the upper contour set of bundle X is indeed closed and convex. We can then identify the
support function of this set as the hyperplane associated to the lowest cost with elements that still belong
to the upper contour set. In particular this occurs with hyperplane px*=e(p,u), which provides us with the
minimal expenditure that still reaches utility level U.
We can more formally see the extreme similarity between the duality theorem and the main result of the
EMP as follows.
{ }
: ( )
L
x R u x u
+
e >
39


Moreover, note that from the duality theorem the limiting of the support function coincides with this
unique bundle (expenditure minimizing bundle). That is,

Proof II - using first order conditions

Proof III - using the envelope theorem
24

Let us first understand the economic intuition behind the envelope theorem

24
For an intuitive description of the envelope theorem from an economic point of view, see NS pp. 32-36.

( , )
( , )
Duality Theorem
If is closed and ( ) is its support function, then there is unique element such that:
( ) ( ) is differentiable at
( , ) ( , ) ( , ) is
k
UCS h p u
e p u
k k
K u x K
p x u p u p
p h p u e p u e u
e
=
=

differentiable at
Moreover,
( )
( , )
( , )
p k
k
k
p
u p x
e p u
h p u
p
V =
c
=
c
( , )
( , ) for every good
( , ) ( , )
k
k
p
e p u
h p u k
p
e p u h p u
c
=
c
V =
40


Figure #2.39
Using the envelope theorem function in the expenditure function we easily obtain:

Note that this result is convenient from a practical point of view. In particular, the researcher does not
know the actual expression of the expenditure function or if its expression is relatively intractable, this
result states that the researcher can still measure the reaction of a consumers minimal expenditure to
changes in prices by knowing the Hicksian demand.
( , )
( , ) [ ( , )]
where
[ ( , )] ( , )
( , )
and since the hicksian demand is already at the optimum, indirect effects
are negligible, 0, implying
( , )
( , )
k
k k
k k
h p u
p
k
e p u p h p u
p p
p h p u h p u
h p u p
p p
e p u
h p u
p
c
c
c c
=
c c
( c
= +
(
c c

=
c
=
c
(convenient if you don't know ( , ) or it is a huge expression). e p u
41

Relationship between the Walrasian and Hicksian demand
Consider a continuous utility function, representing a strictly convex preference relation that satisfies
LNS. Then for all p, w and u=v(p,w), we have

Importantly, its expression coincides with the slk(p,w) that we discussed in our explanation of the Slutsky
equation. Therefore, the matrix of partial derivatives of the Hicksian demand, Dph(p,u), coincides with
the Slutsky matrix.
25
Let us further consider the relationship between these two demand curves. Consider
a consumer facing prices and wealth (pbar,wbar) and attaining a utility level ubar.


Application of IE and SE: The consumer as a labor supplier
In this section we apply our analysis of the income and substitution effects to an individuals decision
about how many hours to work. In particular, this individual enjoys consumption of all other goods, x (a
vector of N different goods), and leisure hours, L. Thus, his UMP can be expressed as follows
,
1
max( , )
. . , and
x L
K
i i
i
x L
s t p x M wz M T z L
=
s = + = +


Where M is the individuals total wealth coming from two sources: the z hours he dedicates to work (paid
at a wage rate of w per hour) and his non-labor income, M , e.g., inheritance, government subsidies, etc.
(Note that his total time, T, must be dedicated to either work, z, or leisure, L).

25
recall that both of these matrices have LxL dimension, since they reflect both own- and cross-price effects among
all goods.
{ } ( ) inf for all and
L
k
x
u p p x x K p = e eR
Note that ( , ). In addition, we know that for any ( , ), ( , ) ( , ( , )).
Differentiating this expression with respect to , and evaluating it at ( , ), we get
( , ) ( , ( ,

l l
k
l l
k
w e p u p u h p u x p e p u
p p u
h p u x p e p
p
= =
c c
=
c
)) ( , ( , )) ( , )
l
l k
u x p e p u e p u
p w p
c c
+
c c c
42

We can rewrite the above UMP using the Composite Commodity Theorem, as follows: if the prices of all
goods maintain a constant proportion with respect to the price of labor (wage), i.e.,
1 1
p w o = ,
2 2
p w o =
,, then we can represent these goods by a single (composite commodity), y, with price p. This is useful
because when we examine many goods the relationships between demand for all of them becomes very
complicated. It is useful to be able to group them together into large groups and then we can examine one
of these goods using this theorem. We therefore collapse the above UMP to only two goods: the
composite commodity y and the number of hours dedicated to work z. That is, we can rewrite the above
UMP as follows:
,
max ( , )
. .
y z
v y z
s t py wz M s +

The Lagrangian associated to this UMP is
,
marginal disutility per doll
( , ) ( )
and the FOCs (for interior optimum) are
: 0
: 0
and solving for in both of them, we obtain
MRS
y
y
z
z
z
z y
y
z
L v y z M wz py
v
v p
y p
v
v w
z w
v w
v p
v
w

= + +
c
= =
c
c
= =
c
=

ar of labor
marginal utility per dollar of good consumption
using the constraint, we finally obtain the Walrasian demand
for the composite commodity ( , , ) and the labor
supply function ( ,
y
y
z
v
p
x w p M
x w
=


, ). p M

The following figure illustrates the individuals optimal choice of an amount of the composite
commodity, y, and hours of work, z.
43


Figure #2.40
First, note that the budget line is represented by an upward sloping straight line. Intuitively, an increase in
the amount of hours worked provides the individual with a larger amount of wealth to spend on
consumption goods, measured by the composite commodity in the vertical axis.
26
Second, note that this
individuals indifference curves show increasing utility levels as we move northwest, indicating that the
individual is better off when his consumption of the composite commodity increases and the number of
working hours decreases. For a starting wage rate w, this individual selects bundle A, where he works
1
z
hours. When the wage rate increases to
1
w the budget line becomes steeper (i.e., for every extra hour of
work the individual can afford a larger amount of the composite commodity). At this new wage rate w1

26
In particular, note that the budget line originates at
M
p
, which indicates the amount of the composite commodity
he can afford when he does not obtain any resources from working. In addition, the positive slope of the individuals
budget line is given by the price ratio w/p. Also note that an increase in labor supplied must be accompanied by an
increase in the composite consumption commodity to keep utility constant in the top figure. Further, the indifference
curves reflect the fact that preferences are quasiconcave. Review the GR handout on labor supply on the class
website for further details on this.
44

the individual chooses bundle B, spending
2
z hours working. The change from bundle A to B in the top
figure is also reflected in the bottom figure. Specifically, we represent working hours in the horizontal
axis and the wage rate in the vertical axis. For an increase in the wage rate from w to
1
w , the number of
working hours increases from
1
z to
2
z , indicating that a higher wage induces this worker to spend more
hours on the job. When the wage rate experiences a further increase, from
1
w to
2
w , the budget line
becomes steeper in the top figure, and the individual selects bundle C, illustrating a reduction in the
number of working hours, from
2
z to
3
z . This effect is also reflected in the bottom figure, where an
increase in the wage rate from
2
w to
3
w induces a reduction in the individuals labor supply curve.
Summarizing, labor supply initially increases as a result of higher wages but then decreases, acquiring a
backward bending pattern as that in the bottom figure.
27
This is due to the relative size of the substitution
and income effect associated to the increase in the wage rate (a change in the price of one of the goods
that the individual consumes), as we examine next. Intuitively this also makes sense. Unlike other goods
where if demand rises shown by an increase in price supply will also rise. In this case however, the
individual only has a fixed amount of supply, hours of labor so higher wages should not cause much of
an increase in supply and can quite naturally cause supply to be decreased.

The following figure illustrates the substitution and income effects from a wage increase.

Figure #2.41

27
This effect has been empirically confirmed in many occupations, such as nursing services in Massachusetts.
Experiencing a shortage in the number of nurses, managers of hospital facilities decided to increases the wage per
hour in order to attract more nurses. Unfortunately, the increase in wages was counterproductive, at least in the short
run, since it induced nurses currently working for those hospitals to reduce the number of hours they chose to work.
45

Using the same analysis as for two consumption goods, we make a wealth compensation after the price
change that leaves the consumer just as well off as before the price change. This is indicated in the figure
by the downward shift in the budget line after the price change towards a new budget line that is tangent
to the indifference curve the individual reaches before the price change,
1
I . Given this new budget line,
the individual selects bundle D. Thus, the substitution effect of an increase in the wage rate is measured
by an increase in the number of working hours, from
a
z to
d
z , while the income effect is represented by
a decrease in the number of working hours, from
d
z to
b
z . Intuitively, working hours become more
attractive (relative to the composite good) for the individual, leading him to offer more labor services as
reflected in the substitution effect. This higher wage per hour, however, allows the individual to afford
more consumption without the need to work so many hours per day, which induces him to reduce his
working hours, as indicated by the negative income effect. Therefore, the income effect partially offsets
the substitution effect, leading a relatively minor (but still positive) total effect. (Note that when the
income effect is significantly negative, and in absolute value, larger than the substitution effect, the total
effect of a higher wage rate becomes negative. In such case, working hours decrease as a result of an
increase in the wage rate, becoming a Giffen good. Another way of looking at this is that the worker is
now choosing to consume another good freetime so the person has reached the point where his marginal
utility of freetime outweighs his marginal utility of income.

The figure also illustrates the compensating variation (CV)
28
associated with the wage increase. In
particular, after the wage increase the workers wealth is compensated (reduced) so that the worker can
maintain his initial utility level (before the wage increase). Graphically, we do so by shifting the workers
budget line downwards after the wage increase in a parallel fashion (maintaining the price ratio) until the
worker reaches his initial utility level. This wealth compensation represents the compensating variation.
Indeed, recall that the vertical intercept of the workers initial budget line is Mbar/p, while the vertical
intercept of his final budget line is
' M
p
, where ' M represents the wealth level that the individual needs
in order to reach his initial utility level at the final price ratio, i.e., '
M
M
p
< . Therefore, the amount of
money that the individual is willing to give up in order to reach his initial utility level at the final price
ratio (the compensating variation) is
' M M
p p
. Hence, the difference in the vertical intercepts ' M M
represents the compensating variation if the price of all other goods is normalized to one.
After describing the income and substitution effect in the labor market from a graphical approach, we
next formalize these two effects using the Slutsky equation.

28
Recall compensating variation is the the amount of money which must be taken from the consumer in the new
situation to make him as well off as he was in the initial situation.
46

,
First, let us state the previous problem as EMP

min
. . ( , )
From this EMP we can find the optimal Hicksian demands,
( , , ) and ( , , ), and inserting them into the
objective functio
y z
y z
M py wz
s t v y z v
h w p v h w p v
=
=
n, we obtain the value function of this EMP
(the expendature function):
( , , ) ( , , ) ( , , )
y z
e w p v ph w p v wh w p v = +

( , , )
We know that
( , , ( , , )) ( , , )
Differentiating on both sides and using the chain rule
and since we know that ( , , ), then
z z
z z z z z z
e w p v
z w
z z
x w p e w p v h w p v
x x h x h x e e
w e w w w w e w
h w p v
x h
w w
c
c
=
c c c c c c c c
+ = =
c c c c c c c c
=
c c c
= +
c c
( , , )
z
z
e
w
x
h w p v
e
c

c
c


Let us next interpret the above Slutsky equation in terms of the substitution and income effect in the labor
market. First, the term /
z
h w c c represents the substitution effect. It is always positive, indicating that an
increase in wages increases the worker's supply of labor (as long as we compensate the wealth of the
worker, so that his initial utility level is unaffected
29
). The second term, ( / )( ( , , ))
z z
x e h w p v c c denotes
the income effect. When /
z
x e c c >0 then an increase in wages makes the worker richer, and he decides
to work more. In this case, working hours are regarded as a normal good, and the income effect reinforces
the substitution effect yielding an upward sloped labor supply curve. If, in contrast, /
z
x e c c <0 then an
increase in wages makes the worker richer, but he decides to work less. In this case, working hours are
regarded as an inferior good, and the (negative) income effect moves in the opposite direction of the
substitution effect. When the income effect is relatively small, the total effect of an increase in wages is
still positive, producing a positively sloped labor supply curve. If, however, the income effect is
sufficiently negative, the (negative) income effect completely offsets the (positive) substitution effect,
implying that the total effect of an increase in wages becomes negative, yielding a negatively sloped labor
supply curve.
The following figure illustrates a case in which the income effect resulting from a wage increase is
positive, yielding a positively sloped labor supply curve. For completeness, we also include the
compensated labor supply curve, which reflects the substitution effect due to the wage increase, but not
the income effect. As a consequence, the compensated supply curve is always positive sloped. The
uncompensated labor demand curve represents both the substitution and income effect associated to the
wage increase.

29
Note that these would imply reducing the workers wealth in the case of a wage increase, or increasing the
workers wealth in the case of a wage decrease.
47


Figure #2.42
Hence, the uncompensated labor supply curve is positively sloped when the income effect is positive, or
when, despite being negative, is still smaller than the substitution effect (in absolute value). When the
income effect is negative, and sufficiently strong to totally offset the substitution effect, the
uncompensated supply curve becomes negatively sloped as the figure below illustrates.
30


30
From an empirical point of view, there is substantial evidence showing that: (1) the labor supply curve for British,
American, Japanese and Dutch men is virtually vertical, indicating that the substitution and income effect
completely offset each other; (2) the labor supply curve for British and German married women is almost vertical;
(3) the labor supply curve for US and Canadian married women is slightly backward bending, indicating that the
income effect becomes significantly negative for relatively high wages; and (4) supply curve for single women in
most developed countries is clearly positively sloped, indicating that the income effect is not significantly negative
(i.e., the elasticity of labor supply is larger than 4 in many studies).
48


Figure #2.43
An interesting application of the income and substitution effects in the labor market is related with the so-
called Laffer curve. The supporters of this curve argue that increasing the taxes on wages might initially
increase tax revenue but, after a certain tax rate, further increasing taxes reduces the incentives to work as
a consequence of workers being subjected to a higher tax rate. This means that there is an optimal tax rate
which will bring in the most taxes and any increase or decrease from that point will cause a reduction in
tax revenue. Graphically, the Laffer curve resembles an inverted-U, with tax rate in the horizontal axis
and local tax revenue in the vertical axis
31
.

31
Invented by Arthur laffer the curves show the relationship between tax rates and tax revenue.
49

32

Figure #2.44
In order to better understand the precise relationship between tax rate and tax revenue in this setting, let
us define tax revenue as
Tax revenue = ( ) w H t e ,
where H( e) represents the total number of hours supplied when the wage net of taxes is e =(1-)w (so w
is the nominal wage). Differentiating with respect to the tax rate, we obtain
2
positive effect
negative effect
( )
T dH
w H w
d
e t
t e
c
=
c


Intuitively, the first term represents an increase in total revenue because, for a given number of working
hours, a higher tax rate increases total revenue. The second term denotes, however, a negative effect from
increasing the tax rate. In particular, higher taxes induce workers to work fewer hours, decreasing the tax
revenue. Therefore, the negative effect dominates the positive effect (and as a consequence an increase in
the tax rate produces a reduction in tax revenue) if
1
( )
dH w
d H t e e
<
And multiplying both sides by (1-),
supply,
1 (1 ) 1

( )
dH w
E
d H
e
e
t t t
t e e t

< <


Where the term in the right-hand side is the elasticity of labor supply with respect to net wages. Hence,
for tax revenue to fall from a small increase in the tax rate, it must be that the elasticity of labor supply is
larger than
1 t
t

suggesting that tau would have to be greater than .50. This condition however is
relatively difficult to be satisfied. If taxes on labor are significant (as in Japan, Sweden, or the US during

32
The Laffer curve is a large part of a supply side economics which believes that changes in marginal tax rates have
a great effect on economic activity. www.econlib.org/library/Enc/SupplySideEconomics.html
50

the 1970s), elasticity of labor supply was indeed larger than
1 t
t

. Nonetheless, an average US worker


nowadays (making around $35,000 per year) is subject to approximately a 25% wage tax. This would
imply that, in order for an increase in the tax rate to be counterproductive (actually reducing tax revenue),
the elasticity of labor supply should be larger than (1 ) / t t =1-0.25/0.25=3, which is very unlikely to
hold among most workers because that the outrage from such a huge tax increase would probably offset
any potential tax gains the country would earn.

Income and substitution effects among different goods
In previous chapters we focused on the substitution and income effects of varying the price of good k on
the demand for that same good k. In this chapter we emphasize the SE and IE of varying the price of good
k on the demand for other goods j. In terms of the Slutsky matrix, we paid close attention to the elements
along the main diagonal of the matrix. We briefly examined the elements away from the main diagonal,
but in this chapter we investigate these elements in more detail.
We start our analysis using only two goods. Of course, the type of relationships that can occur between
only two goods are relatively limited, but this will help us illustrate some intuitions using simple figures.
Later on we generalize our analysis to N>2 goods.
Let us start defining goods that are gross complements or substitutes in consumption. In particular, when
the price of good y falls, the substitution effect (Which by definition results exclusively from a change in
prices. So that the consumption bundle remains on the same indifference curve as before.), can be so
small that the consumer purchases a larger amount of both goods x and y. In this case, we denote good x
and y as gross complements. The following figure illustrates this case. Specifically, the consumer starts
purchasing bundle A. then the price of good y decreases, producing an upward pivoting effect on the
consumers budget line. We can then apply a Hicksian wealth compensation, so that the consumer can
maintain his utility level intact after the price change. The reduction in consumption of good x from A to
B reflects the substitution effect, whereas the increase in consumption of good x from B to C illustrates
the income effect. Indeed, the total effect on the consumption of good x is positive. Nonetheless, note that
the total effect on the consumption of good y is also positive, since the consumer increases his
consumption of good y from
0
y to
1
y . We can therefore conclude that a reduction in the price of good y
produces an increase in the consumption of good x, and thus the goods can be regarded as gross
complements.
51


Because / 0
y
x p c c < they are gross complements
Figure #2.45

Let us now describe the opposite case. Specifically, when the price of good y falls, the substitution effect
may be so large that the consumer purchases less of good x and more of good y. In this case, we regard
goods x and y as gross substitutes in consumption. The following figure reflects the situation. Similarly as
our previous figure, the price of good y decreases, rotating the consumers budget line. The reduction in
consumption of good x from A to B illustrates the substitution effect, while the (small) increase in
consumption of good x from B to C reflects the income effect. The total effect is negative, implying that a
decrease in the price of good y leads a reduction in the consumption of good x. Hence, goods x and y can
be regarded as gross substitutes.

Figure #2.46
Because / 0
y
x p c c > they are gross substitutes
52

After providing a graphical representation of the definition of gross substitutes and complements, let us
next introduce a more mathematical treatment of the relationship between these two goods. In particular,
the change in the consumption of good x caused by changes in
y
p is explained using the Slutsky equation,
as follows.

constant
income effect
(-) if is normal substitution
effect (+)
combined effect
(ambiguous)
y y
U
x
x x x
y
p p I
=
c c c
=
c c c
_
_

First, note that the substitution effect is positive. Intuitively, we are just saying that a decrease in the price
of good y induces the consumer to buy less of good x, if his utility level is kept constant, i.e., graphically,
the consumer moves along the same indifference curve. Indeed, good y became relatively cheaper and
good x relatively more expensive, inducing the consumer to modify his consumption patterns towards the
cheaper good.
33
Second, the derivative
x
I
c
c
from the second term of the Slutsky equation is positive when
good x is a normal good but negative if x is an inferior good. Because of the minus sign on the front of
the income effect, the income effect is therefore negative for normal goods and positive for inferior
goods. Intuitively, the income effect in this case is representing that an increase in
y
p reduces the
consumers real purchasing power makes him poorer leading him to reduce his consumption of
good x. As a consequence, an increase in
y
p reduces the consumption of good x due to the income effect
(if good x is normal) or increases the consumption of good x due to the income effect (if good x is
inferior). Overall, the total effect of an increase in
y
p is therefore ambiguous, and depends on the relative
size of the substitution and income effects. The previous Slutsky equation can also be represented using
elasticity terms, as described in previous chapters, as follows.
, ,
,
c
y
y
x p y x I
x p
E E s E =
This expression just confirms our previous intuition: the combined effect of an increase in
y
p (via the SE
and IE) on the observable Walrasian demand, x(p,w), is ambiguous, i.e., elasticity
,
y
x p
E can be positive,
negative, or zero. Also, the impact that a change in
y
P has on purchasing power is dependent on how
important good y is to the person.

Example. In the following example we use the Walrasian and Hicksian demand associated to a Cobb-
Douglas utility function, u(x,y)=
.5 .5
x y , in order to show the substitution and income effect across
different goods. In particular the Walrasian and Hicksian demands for good x are:

33
Alternatively, this positive substitution effect represents that an increase in
y
p implies an increase in the
consumption of good x. Intuitively, good x becomes relatively cheaper while good y becomes more expensive. As a
consequence, the consumer increases (decreases) his consumption of the former (latter).
53

1
( , , )
2
( , , )
x y
x
y
c
x y
x
I
x p p I
p
p
x p p V V
p
=
=

First note that an increase in
y
P doesn't affect the Walrasian demand for good x but affects the Hicksian
demand for good x (an increase in
y
p increases the Hicksian demand for x). Indeed,
( , , )
0
( , , )
1
0
2
x y
y
c
x y
y x y
x p p I
p
x p p V
V
p p p
c
=
c
c
= =
c

We can now find the substitution effect of changing
y
P . we do so by taking the derivative of the
Hicksian demand with respect to
y
P ,
( ) 1
and plugging in gives us the SE
2 2
1

4
c
y x y x y
x y
x V I
V
p p p p p
I
p p
c
= =
c
=

In order to find the income effect associated to the price change, we operate as follows
1 1 1 1
2 2 4
y x x y
x I I
y
I p p p p
| |
| | c
= =
|
|
|
c
\ .
\ .

Therefore, we can now express the total effect of a change in
y
P on the Walrasian demand of good x as
the combination of the substitution and income effect:
1 1
0
4 4
y x y x y
TE SE IE
x I I
p p p p p
c
= =
c


Intuitively, this implies that the substitution and income effect completely offset each other.
34
We can
therefore generalize the Slutsky equation to the case of N>2 goods as follows: for any two goods i and j, a
change in the price of good j produces

34
A usual mistake is to interpret this result to be saying that goods x and y cannot be substituted in consumption.
That is, they must be consumed in fixed amounts. This statement is only true if the income effect is zero.
54

constant
i i
j
i j
U
x x x
x
p p I
=
c c c
=
c c c

Therefore, the concept of gross substitutes
35
and complements
36
include both the substitution and income
effect. In particular we say that two goods are gross substitutes if the total effect is positive, / 0
y
x p c c > ,
whereas we refer to two goods as gross complements if the total effect is negative, / 0
y
x p c c < .

Asymmetry of the gross definitions
Importantly, the definitions of gross substitutes and complements are not necessarily symmetric. In
particular, it is possible for good
1
x to be a substitute for good
2
x , and simultaneously for good
2
x to be a
complement of good
1
x . Let us next see this potential asymmetry with one example.
Example. Suppose that the utility function for two goods, x and y, is given by U(x,y)=lnx+y. Setting up
the Lagrangian,
ln ( )
x y
L x y I p x p y = + +
We obtain the following first order conditions:
1
0
1 0
0
L
x x x
L
y y
L
x y
p
p
I p x p y

c
c
c
c
c
c
= =
= =
= =

Manipulating the first two equations we get
x y
p x p = . Inserting this information into the budget
constraint, we find the Walrasian demand for good y,
y y
p y I p = . We can observe that an increase in
y
p causes a decline in spending on y. Therefore, we can conclude that the spending on good x must rise,
since
x
p and I are unchanged. That is, / 0
y
x p c c > . Hence, good y is a gross substitute of good x.
However, spending on good y is independent of
x
p (given that the demand for x and y are independent of
one another). Therefore, / 0
y
x p c c = , yielding that good x is neither a gross substitute nor a gross
complement of good y. This shows the asymmetry of / 0
y
x p c c = and / 0
x
y p c c = .
This conclusion, suggests that it depending on how we check for the existence of gross substitutability or
complementarities between two goods, there is potential to obtain different results. A natural question at
this point is whether there is some other more precise measure to check if two goods are complements or
substitutes in consumption. We next present such a measure.


35
Two goods are substitutes if one good may replace the other in use. For example: tea & coffee, butter &
margarine.
36
Two goods are complements if they are used together. For example: coffee & cream, fish & chips.
55

Net substitutes and Net complements
The concept of net substitutes and net complements focuses solely on the substitution effect. In particular
2 goods are regarded as net substitutes if
37

constant
0
c
i i
j j
U
x x
p p
=
c c
= >
c c

While two goods are regarded as net complements if
constant
0
c
i i
j j
U
x x
p p
=
c c
= <
c c

Graphically, this condition looks only at the shape of the indifference curve. We are analyzing how an
increase in the price of one good affects the demand for another good, when the consumer remains at the
same indifference curve.
In contrast to our definition of gross substitutes and gross complements, this definition is symmetric
across two goods. This means once two goods are determined to substitutes or complements they stay that
way no matter which direction the definition is applied. Specifically,
constant
constant
j
i
j i
U
U
x
x
p p
=
=
c
c
=
c c

In terms of the substitution matrix, these conditions states that every element above the main diagonal is
symmetric with respect to the corresponding element below the main diagonal,

Note that the symmetry in the elements away from the Main diagonal is easy to show: first recall that
( , )
( , )
k
k
e p u
h p u
p
c
=
c


37

c
i
j
x
p
c
c
is just the
( , )
k
j
h p u
p
c
c
in MWG.
56

Hence we can express the substitution effect as
2
( , ) ( , )
k
j k j
h p u e p u
p p p
c c
=
c c c

And using Youngs theorem, we know that
2 2
( , )
( , ) ( , ) ( , )
j
k
k j j k j k
h p u
h p u e p u e p u
p p p p p p
c
c c c
= =
c c c c c c

Since our definition of net complements and net substitutes focuses solely on the substitution effect, two
goods can be regarded as gross complements, even if they are net substitutes. Let us see an example. The
following figure illustrates a decrease in
y
p that induces an increase in the consumption of good x due to
the substitution effect (so goods x and y are regarded as net substitutes), but an overall reduction in the
consumption of good x due to the total effect (so goods x and y are regarded as gross complements).

Figure #2.47
More generally, the fact that the MRS between two goods is diminishing indicates that the substitution
effect must be negative. Indeed, if good y becomes cheaper, and the consumer remains at the same
indifference curve, the budget line becomes steeper and as a consequence the consumer reduces his
consumption of good x (since this good became relatively more expensive) but increases his purchases of
good y (since this good is now relatively cheaper).
38





38
The opposite explanation is applicable for the case in which the MRS is increasing (i.e., indifference curves are
bowed out from the origin). In particular, if good y becomes cheaper, and the consumer remains at the same
indifference curve, the budget line becomes steeper and as a consequence the consumer increases his consumption
of good x but reduces his consumption of good y.
57


A note on the Eulers theorem (and its relationship with homogeneity of degree k).
Let us briefly recall the definition of homogeneity. We say that the function
1 2
( , ) f x x is homogeneous of
degree k if
1 2 1 2
(1) ( , ) ( , )
k
f tx tx t f x x =
Note that differentiating this expression with respect to
1
x , we obtain
1 2 1 2
1 1
( , ) ( , )
k
f tx tx f x x
t t
x x
c c
=
c c

And rearranging,
1
1 1 2 1 1 2
( , ) ( , )
k
f tx tx t f x x

=
We can hence conclude that, if a function is homogeneous of degree k, its first-order derivative must be
homogeneous of degree k-1. This is a useful result that we use below.
Differentiating both sides of expression (1) with respect to the proportionality factor t, we obtain
1 2
1 1 2 1 2 1 2 2
1 1 2
1 2
( , )
( , ) ( , )
( ( , )
( , )
k
k
f tx tx
f tx tx x f tx tx x
t
t f x x
k t f x x
t

c
= +
c
c
=
c

Therefore we have,
1
1 1 2 1 2 1 2 2 1 2
( , ) ( , ) ( , )
k
f tx tx x f tx tx x k t f x x

+ =
And making the proportionality factor t=1, we obtain
1 1 2 1 2 1 2 2 1 2
( , ) ( , ) ( , ) f x x x f x x x k f x x + =
where k is the degree of homogeneity of the original function
1 2
( , ) f x x . First, note that if the original function
is homogeneous of degree zero, i.e., k=0, then we obtain that the left-hand side of the previous expression is zero.
This result is intuitive: it says that if a function is homogeneous of degree zero, increasing the proportionality
factor t will not affect its value. Second, note that if the original function is homogeneous of degree one, i.e., k=1,
then we obtain that the left-hand side of the previous expression is
1 2
( , ) f x x . Intuitively, if we marginally
increase the proportionality factor t the function increases in its entire initial value
1 2
( , ) f x x .
We can apply this result to the Hicksian demand function. We know that the Hicksian demand is homogeneous
of degree zero in prices, i.e., k=0. That is, ( ) ( )
1 2 1 2
h tp , tp , , u h p , p , , u .
k k
. = . . Hence,
1 2
1 2
... 0
c c c
n
n
x x x
p p p
p p p
c c c
+ + + =
c c c

58

We can now continue with our previous discussion of net complementarity and substitutability between
different goods. In particular, we want to understand whether substitutability or complementarity is more
prevalent from an empirical point of view. This is of interest because whether two goods are net
complements or net substitutes is basically up to that individual person. However, using the Hicksian
demand curve, ( )
1 2
h p , p , , u
k
. , we can apply Eulers theorem (as discussed above) yielding,
1 2
1 2
=constant =constant =constant
... 0
i i i
n
n
U U U
x x x
p p p
p p p
c c c
+ + + =
c c c

Alternatively, we can express the above expression using elasticities, as follows
1 2
... 0
c c c
i i in
E E E + + +
We know, however, that own-price substitution effects are negative (the elements in the main diagonal of
the Slutsky matrix are negative). This implies that 0
c
ii
E s . Therefore, the sum of the compensated cross-
price elasticities for all other n-1 goods must be positive, 0
c
ij
j i
E
=
>

, if we need the sum of the


compensated elasticities for all n goods to be exactly equal to zero. Intuitively, this result implies that
most goods must be net substitutes. This is usually referred as Hicks second law of demand.
39


Composite commodities
When analyzing the consumers purchasing decision among n goods, we deal with potentially n different
demand functions, with
( 1)
2
n n +
40
different substitution effects. It is therefore often convenient to group
goods into larger aggregates, for instance, food, clothing, or more generally, all other goods different
from the good that we are analyzing. In order to do that we make use of the so-called composite
commodity theorem.
Suppose that consumers choose among n goods, and that the demand for
1
x depends on the prices of all
other n-1 goods. If all of these prices move together, it may make sense to group them into a single
composite commodity (y). Let
0
2
,...
o
n
p p represent the initial prices of these other commodities. Let's
assume that they all vary together (so that the relative prices of
2 3,
, ...,
n
x x x do not change). We can now
define the composite commodity y as the total expenditures on all other goods,
2 3,
, ...,
n
x x x , at the initial
prices. That is,
2 2 3 3
...
o o o
n n
y p x p x p x = + + +

39
Note that some textbooks use the notation Hicksian substitutes to refer to two goods that are net substitutes in
consumption. Similarly they refer to goods that the net complements as Hicksian complements.
40
There are
2
N elements but because of symmetry, only
2
2
N
are unrepeated, plus
2
N
from half of the main
diagonal elements. Thus we can find the number of different substitution effects is
2
( 1)
2 2 2
N N N N +
+ = .
59

The individual's budget constraint is therefore
1 1 2 2 1 1
...
o o
n n
I p x p x p x p x y = + + + = +
Moreover, if we assume that the prices of all other goods,
0
2
,...
o
n
p p , change by the same factor (t>0) then
the above budget constraint becomes
1 1 2 2 1 1
...
o o
n n
I p x tp x tp x p x ty = + + + = +
And therefore we can analyze the substitution effect associated to price changes, where the prices that can
be changed in this context are only
1
p and t. Hence, this theorem allows us to say that as long as
0
2
,...
o
n
p p move together, we can restrict our examination of demand choices to two types of goods: the
good we are analyzing
1
x and everything else. As a consequence, we can represent our results in two-
dimensional figures, with
1
x in the horizontal axis and the composite commodity in the vertical axis.
41

Example. Let us next examine an example of how to use the composite commodity theorem. Suppose that
an individual who receives utility from three goods: food (x), housing services (y), measured in hundreds
of square feet, and household operations (z), measured by electricity use. Let us next assume a CES utility
function:
1 1 1
( , , ) U x y z
x y z
=
We can find the Walrasian demand function for each of the three goods
x x y x z
y y x y z
z z x z y
I
x
p p p p p
I
y
p p p p p
I
z
p p p p p
=
+ +
=
+ +
=
+ +

If initially the consumers income is I=$100 and prices are
x
p =$1,
y
p =$4 and
z
p =$1, we obtain that the
quantity demanded for the three goods is x*=25, y*=12.5 and z*=25 units. Hence, $25 is spent on food
and $75 is spent on housing-related needs.
If we assume that prices
y
p and
z
p move together, we can use the initial prices to find the composite
commodity housing (h), as follows
h=4y+1z

41
As a disadvantage of the composite commodity theorem, however, note that the term makes no prediction about
how the choices between
2 3,
, ...,
n
x x x behave, since it only focuses on the total expenditure on these other goods.
60

The expenditure on housing goods implies a price of $4 for good y and $1 for good z. Therefore, the
initial quantity of good h is the total amount of money spent on housing ($75). Hence,
h
p =$1 and
1
4
h z
y
p p
p
= = . Plugging this information into the Walrasian demand for good x, we obtain

*
*
4
3
x x y x z x x h x h
x x h
I I
x
p p p p p p p p p p
I
x
p p p
= =
+ + + +
=
+

And the consumers income is I=$100 and prices are
x
p =$1 and
h
p =$1, we obtain
*
If $100, $1, $1
100 100
25
4 1 3 1
x h
I p p
x
= = =
= = =
+

Finally in order to find the optimal amount of housing demanded by the consumer, h*, we just need to use
the budget constraint
* *
*
*
$1 25 $1 $100
75
x h
p x p h I
h
h
+ =
+ =
=

Therefore, the Walrasian demand for good x can be shown as a function of income,
x
p and
h
p ,
3
x x h
I
x
p p p
=
+

And if the income is I=$100,
x
p =$1,
y
p =$4 and
h
p =$1, we obtain x*=25 and h*=75.
Note that if
y
p rises from $4 to $16 and
z
p rises from $1 to $4 (but px remains at $1),
h
p would also
rise to
h
p =$4. Indeed,
1 1
16 4
4 4
4
h y
h z
p p
p p
= = =
= =

Due to this price change the Walrasian demand for good x would fall to
*
100 100
7 1 3 4
x = =
+

while housing purchases would be given by
61

*
100 600
100 85.7
7 7
h
p h = = =
And since
h
p =$4, then h*=85.7/4=21.43.
Finally, note that we could also find these results by plugging the initial information about income and
prices, I=$100 and
x
p =$1,
y
p =$4 and
z
p =$1, into the expressions of Walrasian demand for all three
goods, obtaining
x*=100/7, y*=100/28 and z*=100/14
Which implies that the amount of housing consumed is h*=4y*+1z*=21.43.
For more practice with this concept problem 6.8 in Nicholson and Snyder provides a useful exercise.


1

Chapter 3 Aggregate demand
Aggregate demand
In this chapter we move from individual demand, x
i
(p,w
i
), where w
i
denotes individual is wealth level, to
aggregate demand,
1
( , )
I
i i
i
x p w
=


In particular, this chapter focuses on answering 3 main questions:
1. We know that the individual demand depends on prices and individual wealth, xi(p,wi). But,
when can we express aggregate demand as a function of prices and the aggregate wealth level?
That is,
1 1
( , ) ,
I I
i i i
i i
x p w x p w
= =
| |
=
|
\ .


2. We know that individual demand satisfies the WARP as long as preferences are rational. But,
when does aggregate demand satisfies the WARP?
3. Finally, we know how to measure welfare changes associated to a price change in the case of
individual demand (using, for instance, the CV, EV, and AV). But, can we apply the same
measures of welfare change in the case of aggregate demand?

First question: Aggregate demand and aggregate wealth
First, we want to understand under which conditions we can guarantee that the aggregate demand defined
as x(p,w
1
,w
2
,,w
I
)=
1
( , )
I
i
i
x p w
=

satisfies
1 1
( , ) ,
I I
i i i
i i
x p w x p w
= =
| |
=
|
\ .


That is, under which conditions aggregate demand depends only upon prices and the aggregate wealth
level in the economy. The above condition is satisfied if, for any 2 distributions of wealth, (w
1
,w
2
, ,w
I
)
and (w
1
,w
2
,,w
I
) with the same aggregate wealth,
1 1
I I
i i
i i
w w
= =
' =

, we have that
1 1
( , ) ( , )
I I
i i i i
i i
x p w x p w
= =
' =


Intuitively, a change in the wealth distribution across individuals that does not modify the aggregate
wealth in the economy might change individual demands but will not modify the aggregate demand for a
particular good. In order for the above condition to be satisfied let us start with an initial wealth
distribution (w
1
,w
2
, ,w
I
) and the apply a differential change in wealth (dw
1
,dw
2
, ,dw
I
) such that the
2

aggregate wealth level is unchanged, i.e.,
1
0
I
i
i
dw
=
=

. Note that, if aggregate demand is just a function of


aggregate wealth, then we must have that
1
( , )
0
I
k i
i
i
i
x p w
dw
w
=
c
=
c


That is, the wealth effects of different individuals are compensated in the aggregate. Or, more compactly,
( , )
( , )
kj j
ki i
i j
x p w
x p w
w w
c
c
=
c c

For every good k, and for every two individuals i and j. Note that this result implies that the income effect
for individual i and j are equal in absolute value. That is, any redistribution of wealth between i and j will
lead to
( , )
( , )
0
lj j
li i
i j
i j
x p w
x p w
dw dw
w w
c
c
+ =
c c

For example, if we redistribute wealth from subject i to subject j, we have
( , ) ( , )
( , ) ( , )
0 also:
lj j lj j
li i li i
i j
i j i j
x p w x p w
x p w x p w
dw dw
w w w w

+
c c
c c
+ = =
c c c c



Indicating not only that the income effect for subject i (subject j) is negative (positive, respectively), but
also that the absolute value of these two income effects exactly coincide across individuals. Summarizing,
the above conditions states that for any fixed price vector p, for any good k, and for any wealth level of
any two individuals i and j, the wealth effect of a redistribution of wealth is the same across individuals.
In other words, the wealth effects arising from the redistribution of wealth across consumers cancel out.
Graphically, this condition is saying that all consumers exhibit parallel, straight wealth expansion paths.
First, note that straight wealth expansion paths imply wealth effects do not depend upon the individual's
wealth level. That is, a given increase in wealth produces a change in the consumption of good k that is
independent on the individuals wealth level. The following figure illustrates a straight wealth expansion
path where, an increase in wealth produces an increase in the consumption of good one and two of the
same size when the consumers wealth increases from w to w, and from w to w. In contrast, a non-
straight (curvy) wealth expansion path as the one depicted in the figure below, implies that a given
increase in wealth might lead to changes in the consumption of good k that are dependent on the
individuals wealth level. This is illustrated in the figure where, the consumer regards good 1 as normal
when his wealth level increases from w to w, but considers good 1 an inferior good when his wealth is
further increased from w to w.
3


Figure #3.1
Second, note that a parallel wealth expansion path across individuals implies that individual's wealth
effects must coincide across individuals. We illustrate this property in the figure below, where the wealth
expansion path for consumers one and two are parallel to each other, indicating that both individuals
demand for good one and two change similarly as they become richer.

Figure #3.2
Recall that in previous lectures we have seen several examples of preference relations that imply straight
and parallel wealth expansion paths: homothetic preferences, quasilinear preferences, etc. hence, if both
individuals exhibit either of these preference relations, we can guarantee that their wealth expansion paths
will be straight and parallel to each other and, as a consequence, demand can be expressed as a function
of market prices and aggregate wealth.
4

One interesting question at this stage is whether we can group all these types of preference relations
(homothetic, quasilinear, etc.) as special cases of a particular type of preference. Indeed, there is such a
general type of preference relation. In particular, a necessary and sufficient condition for consumers who
exhibit parallel, straight wealth expansion paths is that every consumers indirect utility function can be
expressed as
( , ) ( ) ( )
i i i i
v p w a p b p w = +
This indirect utility function is usually referred as the Gorman form
1
.
Let us next show that an indirect utility function that can be represented using the Gorman form
representation satisfies the property that aggregate demand can be represented as a function of prices and
aggregate wealth. First, using Roy's identity on v
i
(p,w
i
) we obtain
( , )
( , )
( , )
( ) ( )
( ) ( )
=
( ) ( ) ( )
( , ) = -A ( ) ( )
i i
i i
i i
i
i
i
j j
i
i
j j
j j j
i i i i
v p w
p
x p w
v p w
w
a p b p
w
p p
a p b p
w
b p b p p b p p
x p w p B p w
c
c
=
c
c
c c
+
c c
c c
= +
c c


And using the same approach in order to find the Walrasian demand of individual i for all goods, we have
( ) ( )
( ) ( )
( , ) ( , )
( ) ( )
i i
p i p i
p i i i i
A p B p w
a p b p w
v p w x p w
b p b p

V V
V = + =


Therefore, summing over all I individuals in the economy, we obtain
1 1 1 1
1
( , ) ( ) ( ) ( ) ( )
,
I I I I
i i i i i
i i i i
w
I
i
i
x p w A p B p w A p B p w
x p w
= = = =
=
= + =
| |
=
|
\ .



1
The Gorman utility function presents some interesting features. First, note that an increase in the individual's
wealth level produces the same increase in utility level, b(p), across all individuals. Nonetheless, this utility function
is not symmetric for all i, since it allows asymmetries in the first term, ai(p). Finally, note that in the case of
quasilinear preferences, using b(p)=1/pk, we can represent the utility function of an individual with quasilinear
preferences using the Gorman form as follows,
1
( , ) ( )
k
i i i i p
v p w a p w = + . Practice: take some of the examples we
have seen about quasilinear preferences, find the indirect utility function, and show that it can be expressed in its
Gorman Form representation.
5

Hence, we can indeed express aggregate demand as a function of prices and aggregate wealth.

We conclude that we can represent aggregate demand as a function of prices and aggregate wealth when
preferences can be represented with a Gorman form indirect utility function. This condition, however,
might be somewhat restrictive. We wonder, hence, if we can obtain the same results using weaker
conditions. The literature has shown that we can indeed use weaker conditions by using two different
approaches. First, rather than assuming that aggregate demand depends on total (aggregate) wealth, note
that we could assume that aggregate demand depends on a wider set of variables, e.g., average wealth
level, the variance of the wealth distribution, etc. as shown in Deaton and Muellbauer (1980). The second
approach, asks why we dont restrict the type of admissible wealth distributions. Indeed in our previous
analysis we were allowing a type of wealth distribution. However, the distribution of wealth among
individuals is usually a direct consequence of the labor market (wage distribution), stock ownership,
governmental programs, taxes, etc.
2,3


Second question: aggregate demand and the WARP
In this section we seek to understand under which conditions aggregate demand satisfies the WARP. For
simplicity, let us use wealth distribution rules, wi(p,w), a function that assign a wealth level to every
individual i, depending on the price level p and the aggregate wealth in the economy w. In particular, we
consider only wealth distribution rules that are independent of prices, and assign a constant fraction of the
aggregate wealth to every individual,
4

w
i
(p,w)=w
We can then express the aggregate demand function of the wealth distribution rule. That is,

2
One particular example of this approach uses the so-called wealth distribution rule, which considers a function
wi(p,w) and assigns a wealth level to every individual i, depending on the price level and the aggregate wealth in the
economy w.
3
Example: Having missed an opportunity with the recent vampire craze created by the Twilight series, Mattel has
offered a new product targeted at younger kids than their competitors: the Vampire Teddy Bear. The Vampire
Teddy Bear is a small, fluffy bear with two plastic fangs that (safely, according to Mattel) drill into the childs neck.
Mattel, is planning a blitz marketing campaign emphasizing that the more bears you buy for your child the longer
they will stay silent (because of their extreme satisfaction with the bear). In fact, they provide an equation in the
commercial (the marketing director is on vacation and the chief economist has been pitching in): Minutes of Silence
= Number of Bears*20. Assuming this message penetrates the parenting market equally and there isnt a
diminishing return on silence from a child, should Mattel be concerned with income distribution? No, as long as the
total income does not change, Mattel will sell exactly the same number of bears. Why? Because we assume all the
parents want silence from their child, any demand lost from one parent with lower income is gained by a richer
parent who wants more minutes of silence.
4
Note that this wealth distribution rule allows for different amounts of wealth to be distributed to every individual,
i.e.,
i
being different from
j
for any two subjects i and j, or to coincide across all individuals, i.e.,
i
=
j
.
6

1
1
( , ) ( , ( , ))
( , )
I
i i
i
I
i j
i
x p w x p w p w
x p w o
=
=
=
=


We can now describe under which conditions the aggregate demand function satisfies the WARP. In
particular, we extend the definition of WARP that we discussed in the chapter on Walrasian demand to
the aggregate demand function, as follows: aggregate demand x(p,w) satisfies the WARP if
1. the new bundle that consumers choose under p and w, x(p,w), is affordable under the old
prices and wealth, i.e., p x(p,w)<w, but
2. the old bundle that consumers choose under p and w, x(p,w), is NOT affordable under the new
prices and wealth, i.e., px(p,w)>w.
One interesting property of this definition of the WARP at the aggregate level is that individual Walrasian
demand might satisfy WARP at the individual level but the aggregate demand might violate WARP at the
aggregate level. Let us illustrate this possibility with an example. For simplicity, consider that the wealth
distribution rule assigns the same share of the wealth to both individuals 1 and 2, i.e., each receives w/2.
The following figure represents individual 1s Walrasian demand. It satisfies WARP since the new bundle
x(p,w/2) is affordable under the old budget set Bp,w/2, and the old bundle x(p,w/2) is not affordable
under the new budget set Bp,w/2.


Figure #3.3
7

The figure below illustrates individual 2s Walrasian demand. It also satisfies WARP given that the new
bundle is an affordable under the old budget set
5
Bp,w/2.

Figure #3.4
We can now aggregate the Walrasian demand for individuals 1 and 2. For completeness the following
figure illustrates individual and aggregate demands. First, note that, for the old budget set Bp,w/2, the
average consumption across both individuals, 1/2x(p,w/2), lies in the midpoint connecting individual 1s
and 2s demand at the old budget line, Bp,w. (A similar argument is applicable for the new budget line
Bp,w/2 and the midpoint 1/2x(p,w/2)). Using these midpoints, we obtain
'
1
,
2 2 2
w w
p x p
| |
<
|
\ .

Since bundle B is below the old budget set Bp,w/2, but
'
1
,
2 2 2
w w
p x p
| |
<
|
\ .

Given that bundle A is also below the new budget line Bp,w/2. Multiplying both sides of these
expressions by 2, we obtain a violation of the WARP at the aggregate level.
1
2 2 2
( , )
w w
p x p' <
But, how can it be that the WARP is satisfied at the individual level not violated at the aggregate? First,
note that the WARP at the individual level is equivalent to the compensated law of demand (CLD):
( ) [ ( , ) ( , )] 0
i i
p p x p w x p w ' ' ' s

5
Recall that if, when applying the definition of the WARP the premise is false, then WARP cannot be violated.
8

Where w=px(p,w) is the wealth compensation we must make to the consumer so that he can still afford
his old bundle x(p,w) at the new prices p, i.e., Slutsky wealth compensation. And, if the change in prices
is compensated for all the individuals, wi=
i
w, and thus
i
w =px
i
(p,
i
w), then we could have
( ) [ ( , ) ( , )] 0
i i i i
p p x p w x p w o o ' ' ' s
for every individual i. Adding over all individuals, we have that in the aggregate
( ) [ ( , ) ( , )] 0 p p x p w x p w ' ' ' s
which implies that the compensated law of demand is satisfied in the aggregate and, as a consequence, the
WARP is also satisfied in the aggregate.
Price changes, however, might not be accompanied with a wealth compensation for all individuals, i.e.,

i
w might differ from pxi(p,
i
w). In such case we would have that
( ) [ ( , ) ( , )] 0
i i i i
p p x p w x p w o o ' ' ' s
doesn't hold for all individuals. As a result, the compensated law of demand
( ) [ ( , ) ( , )] 0 p p x p w x p w ' ' ' s
might not hold for aggregate demand, which implies that the WARP might not be necessarily satisfied
either. The following figure summarizes our results:

Figure #3.5

Remark about the uncompensated law of demand at the aggregate level and the WARP: note that when a
price change is not accompanied with a wealth compensation, we might have that for some individual i,
a. If the income effect is negative, then it reinforces the substitution effect and the uncompensated law of
demand holds.
b. If the income effect is positive, then it goes in the opposite direction as the substitution effect. If the income
effect partially offsets the substitution effect the uncompensated law of demand still holds. However, if the
9

income effect totally offset the substitution effect, then the uncompensated law of demand doesn't hold for
this individual i.
Importantly, when the uncompensated law of demand doesn't hold for some individuals we might have that the
uncompensated law of demand doesn't hold in the aggregate, and therefore, the WARP is not necessarily satisfied.

The possibility of having the WARP satisfied at the individual but not at the aggregate level raises the
question of whether we can impose some minimal conditions on the preference relations that guarantee
that the WARP is satisfied for the aggregate Walrasian demand. The following proposition shows that we
can.

Proposition. If every consumers Walrasian demand function xi(p,wi) satisfies the uncompensated law of
demand, then aggregate demand x(p,w)=
1
( , )
I
i i
i
x p w o
=

satisfies the compensated law of demand and it


also satisfies the WARP.

if ( , ) ( , )
for all
Proof:
adding ( ) [ ( , ) ( , )] 0 over all ,
we have ( ) [ ( , ) ( , )] 0 over all ,
Let us now check WARP:
1) Take any ( , ) and ( , ), such that
( ,
i i i i
i i i i
x p w x p w
i
p p x p w x p w i
p p x p w x p w i
p w p w
p x p
' =
' ' s
' ' <
' '
'
new bundle is affordable at old ( , )
( , ) ( , )
( , )
)
2) Define
3) By homogeneity of degree zero of ( , ), we have
( , ) (
p w
x p w x p w
x p w
w w
w
p p
w
x p w
w w
x p w x
w w
o o
o o
=
''
' s
'' ' =
'
' ' =
' '

, ) p w ' '

10

Hence ( , ) ( , )
4) a) From ULD at the aggregate level, we know that
( ) [ ( , ) ( , )] 0
) From the equaltiy in (3) and step (1) [affordablity] we have:
step (1) ( , )
equality i
x p w x p w
p p x p w x p w
b
p x p w w
'' ' ' =
' '' <
' ' s
n (3) ( , )
) From Walras' Law, we know that: ( , ) and ( , )
5) From step 4(a), we can conclude:
( , ) ( , ) ( , ) ( , ) 0
w w
p x p w w
c p x p w w p x p w w
p x p w p x p w p x p w p x p w
'' s
'' ' = =
'' '' '' '' + <


, from step 4(b)
Therefore,
2 ( , ) ( , )
6) Hence,
( , )
Which implies:
( , ) ( , )
That is, the bundle ( , ) was unaffordable at
w w
p
w p x p w p x p w
p x p w w
w
p x p w w p x p w w
w
x p w
s >
''
'' '' <
'' >
| |
' ' ' > >
|
'
\ .

new prices and wealth ( , )


is satisfied.
p w
WARP
' '



Intuitively, this proposition states that if the uncompensated law of demand property is satisfied at the
individual level then everything will work out nicely at the aggregate level: uncompensated law of
demand will hold at the aggregate level and the WARP will be satisfied as well.

Remark on the uncompensated law of demand and NSD: recall that if the derivative of the Walrasian demand
with respect to prices, Dpx
i
(p,w
i
) is negative semidefinite (NSD), then the elements of the Main diagonal of
Dpx
i
(p,w
i
) must be weakly negative. Intuitively, own-price effects must be weakly negative. Therefore, the
uncompensated law of demand holds. We can hence conclude that if Dpx
i
(p,w
i
) is NSD then the uncompensated law
of demand holds for x
i
(p,w
i
).
6


One question at this point is whether assuming that the uncompensated law of demand holds across all
consumers at the individual level is a very restrictive assumption, i.e.,

6
In homework #4 you are asked to show that the converse relationship is not necessarily true.
11

( ) [ ( , ) ( , )] 0
i i i i
p p x p w x p w ' ' s
Let us next see one example of individual preference relations for which the uncompensated law of
demand holds at the individual level (and, as a consequence, at the aggregate level as well).
If a preference relation is homothetic, then this individual Walrasian demand satisfies the compensated
law of demand (while the converse is not necessarily true).
Proof:
Slutsky equation
( , ) ( , ) ( , ) ( , )
and for homothetic preference relations, ( , ) ,
( , )
or alternatively, , we have that ( , ) ,
which we can write as D
T
i i p i i w i i i i
i i i i
i i
i w i i i
i
S p w D x p w D x p w x p w
x p w w
x p w
D x p w
w
o
o o
= +
=
= =
0 if
0 if
( , )
( , ) . Plugging and
rearranging,
( , )
D ( , ) ( , ) ( , )
Now we pre- and post-multiply all elements by ,
dp D ( , ) ( , )
i i
w i i
i
T i i
p i i i i i i
i
p i i i i
dp p
dp p
x p w
x p w
w
x p w
x p w S p w x p w
w
dp
x p w dp dp S p w dp
o
o
< =
= =
=
=
=

0 if 0
0 if 0
( , )
dp ( , )
Either way, dp D ( , ) 0, except when zero
consumption ( 0) and the change in prices is proportional
to the initial price level, i.e.,
i
i
T i i
i i
i
x
x
p i i
i
x p w
x p w dp
w
x p w dp
x
> >
= =

<
=

. Since D ( , ) is
then negative semidefinite, and we already know
ULD D ( , ) is negative semidefinite
Hence, ( , ) satisfies ULD.
[note: we just showed homotheticity in preferences ULD
p i i
p i i
i i
dp p x p w
x p w
x p w
o =
:
]


Recall that the homothetic preferences we analyzed above are just one example of a preference relation
that satisfies the uncompensated law of demand at the individual level (and therefore it also satisfies
WARP at the aggregate level). Can we identify more general conditions under which the uncompensated
law of demand holds? First, recall that
12

0
( , ) ( , ) ( , )
( , ) 0
ki i ki ki i
ki i
k k i
SE IE
x p w h p u x p w
x p w
p p w
<
c c c
= <
c c c


for normal goods (which have a positive income effect), we have that income effect reinforces the
substitution effect, and therefore the total effect associated to a price change is negative. In other words,
the uncompensated law of demand holds. For inferior goods, the income effect is negative, which implies
that we can either have: (1) the absolute value of the substitution effect is still larger than that of the
income effect and, as a consequence, the total effect is still negative. In this case the uncompensated law
of demand still holds; (2) the absolute value of the substitution effect is smaller than that of the income
effect and, as a result, the total effect is positive, implying that the uncompensated low demand is violated
(intuitively, this good is a so-called Giffen good). We can therefore conclude that the uncompensated law
demand is satisfied at the individual level as long as consumer i doesn't regard good k as a Giffen good.
Hence, at the aggregate level, the compensated law of demand is satisfied as long as there is a positive
total effect (TE>0, associated to those goods that some consumers might regards as Giffen goods),
( , )
ki i
k
x p w
p
c
c
, does not completely offset the negative total effect (TE<0, associated to usual goods) from
the rest of consumers
( , )
ki i
k
x p w
p
c
c
. Therefore, we can conclude that assuming the compensated law of
demand at the individual level doesn't seem very restrictive assumption (since Giffen goods are rare), and
constitutes an even milder assumption at the aggregate level.
In all our previous discussion we showed that if the uncompensated law of demand is satisfied at the
individual level then it must be satisfied at the aggregate level as well. The converse, however, is not
necessarily true. Let's see one example.
Example. Suppose that all consumers have identical preferences, with individual demand functions (p,w)
where ( (p,w) is denoted without a subscript because all individuals have the same individual demand
function- and that individual wealth is uniformly distributed on the interval [0,] (with a continuum of
consumers). Then, the aggregate demand function
0
( ) ( , )
w
x p x p w dw =
}

satisfies the uncompensated law of demand.
7,8


7
You are asked to show that this aggregate Walrasian demand satisfies the uncompensated law of demand at the
aggregate level but it does not affect the individual level in homework #5 (you can also read about this in MWG,
page 113).
8
Example: After a congressional investigation, it is revealed that the silence equation provided by Mattel for the
Vampire Teddy Bear is inaccurate. The equation only holds up to five bears. At that point, the effect on the child
diminishes and by ten bears, no additional minutes of silence are obtained. Other than the enviable lawsuits, should
this concern Mattel? Yes. Mattel assumed that they could base pricing and production decisions on aggregate
demand. Aggregation depends on the uncompensated law of demand being satisfied which, in turn, depends on
homothetic preferences. Because of the diminishing nature of the minutes of silence, the preferences are not
homothetic. Intuitively this makes sense. If you already purchased ten bears for your child and someone shifted
13

Aggregate demand and the representative consumer (3
rd
question of the chapter)
In this section we analyze under which conditions can use the welfare measures from previous chapters
(CV, EV, and AV) to evaluate aggregate welfare. In other words, we seek to answer the question: when
can we treat aggregate demand as if it were generated by a fictional representative consumer whose
preferences can be used as a measure of aggregate social welfare. We begin the section with two
definitions about the representative consumer: one from a positive (or behavioral) perspective and the
other from the normative approach.
Positive or behavioral definition: a positive representative consumer exists if there is a rational preference
relation on R
L+
such that the demand is precisely the Walrasian demand function generated by this
preference relation. That is, the bundle chosen by the aggregate demand function x(p,w) is strictly
preferred to any other affordable bundle, i.e., x(p,w) is strictly preferred to any bundle x such that pxw,
where x is different from x(p,w). Or, in other words, x(p,w) is the argmax of the representative
consumer's UMP when facing budget set pxw.

Normative definition:
In order to be able to assign welfare significance to this fictional individuals demand function we must
first define what we mean by social welfare. Lets first introduce the concept of the social welfare
function (SWF).
Bergson-Samuelson SWF: a social welfare function (SWF) is a function W:R
I
R that assigns a utility
value to each possible vector of individual utility levels (u
1
,u
2
,,u
I
) for each of the I consumers in the
economy. We assume that the social welfare function W(u
1
,u
2
,,u
I
) is increasing, concave and
differentiable in each argument. The figure below illustrates an example of a social welfare function. Note
that W(.) is increasing in u
1
and experiences an upward shift when u
2
increases.

Figure #3.6

income from some poorer individual to you, you would not buy an eleventh bear. However, the change in income
for the poorer individual would result in fewer bears being purchased.
14

Note that a utilitarian social welfare function such as W(u
1
,u
2
)=u
1
+u
2
satisfies the above properties, and so
does a social welfare function such as W(u
1
,u
2
)=au
1
+bu
2
, where a,b>0 denote the weights that society
assigns to the welfare of individuals 1 and 2, respectively. Generally, note that social welfare functions do
not need to be additive. Indeed, a social welfare function such as W(u
1
,u
2
)=(au
1
xb
u
2)^1/2 also satisfies the
above assumptions.
9
Finally, recall the Rawlsian social welfare function W(u
1
,u
2
)=min{au
1
,bu
2
}, where
a,b>0. Intuitively, this function represents that society is only as well as the individual in the worst
position. Finally, we assume that there is some process (central authority, benevolent planner, IRS)
that, for any prices and aggregate wealth (p,w) gives the optimal wealth distribution among individuals by
solving:
1 2
1 1
, ,...,
1
max ( ( , ),..., ( , ))
. .
I
I I
w w w
I
i
i
W v p w v p w
s t w w
=
s


Intuitively, the social planner first distributes a wealth level to every individual i, w
i
, and afterwards every
individual chooses these optimal consumption bundle x
i
(p,w
i
) independently solving his UMP, reaching
an associated utility level v
i
(p,wi). (For this reason, the social planner considers in his maximization
problem the maximum utility level that every individual will achieve when independently solving his own
UMP, v
i
(p,w
i
).) The optimal value of the previous social planners maximization problem defines a value
function v(p,w), which is referred to as the social indirect utility function.
We can relate the results from our previous maximization problem with the concept of a representative
consumer. Suppose that for every (p,w), the wealth distribution (w
1
(p,w), w
2
(p,w), , w
I
(p,w)) solves the
above maximization problem. Then the resulting social indirect utility function v(p,w) is an indirect utility
function of a positive representative consumer for the aggregate demand function
1
( , ) ( , ( , ))
I
i i
i
x p w x p w p w
=
=


That is, from the above maximization problem we obtain a positive representative consumer for the
aggregate demand function in which every consumers wealth is the argmax of the above problem, i.e., a
wealth distribution among individuals that maximizes social welfare. We can now define the normative
representative consumer.
10






9
Lets consider functions that cannot be regarded as social welfare functions: W(u
1
,u
2
)=au
1
-bu
2
, or W(u
1
,u
2
)=u
1
/u
2
,
since they are both decreasing in u
2
, W(u
1
,u
2
)=(u
1
)
2
+u
2
, since it is convex in u
1
. (Note that in the last function,
individual 1 raises social welfare so much that it could be convenient to make u
2
=0).
10
For some examples, see MWG pp. 119-120.
15

Proposition 4.D.3.
1
The positive representative consumer (with preferences ) for the aggregate demand function
( , ) ( , ( , )) is a normative representative consumer (relative to the SWF ( )) if,
for every pai
I
i i
i
x p w x p w p w W
=
=

1
r ( , ), the distribution of wealth among consumers ( ( , ),..., ( , )) solves
problem 4.D.1, and therefore the value function ( , ) from problem 4.D.1 is the Social indirect
utility function.
I
p w w p w w p w
v p w

The regularizing effects of aggregation
The theory of aggregate demand presents two advantages: first, most of the data is in aggregate terms,
making the theory extremely applicable. Second, the use of aggregate (rather than individual) demand has
regularizing effects. In particular, the average (per consumer) demand tends to be more continuous, as a
function of prices, than the individual demands separately. For example, we know that if individual
preferences are strictly convex then individual demand functions are continuous. In this case, aggregate
demand is continuous as well. But, what if individual demands are not continuous? In this case, aggregate
demand can be (nearly) continuous. Let us elaborate on this property. The following figure illustrates the
case in which preferences are strictly convex and, as a consequence, individual demand is continuous.

Figure #3.7
16

If preferences are concave, however, discontinuities might arise, as the next figure depicts. Specifically,
and the initial price ratio the consumer chooses bundle A, spending all his income on good 2. When the
price of good 1 decreases sufficiently the consumer might find it profitable to switch all his consumption
towards good 1, since by doing so he can reach a higher indifference curve I
2
. The bottom figure
illustrates the demand curve at different price levels: when the price of good 1 slightly decreases, the
consumer still uses all his income to buy good 2 alone. When the price of good 1 decreases below

,
however, the consumer stops buying good 2 and starts using all his income on purchases of good 1.
11


Figure #3.8
In cases as the one analyzed above, the only requirement we will impose in order to guarantee that the
aggregate demand is continuous in prices is that the preferences of all individuals are not too concentrated
around the prices for which individual demands are discontinuous. Let us examine an example.
Example 4.AA.1. Consider two goods, and consumers with quasilinear preferences with respect to the first
good (we will treat the second good as the numeraire). For simplicity, we assume that the first good is
only available in integer amounts, and consumers have no wish for more than one unit of it (e.g.,
appliances, cars, etc.) We can hence normalize consumer is utility to be zero when he consumes zero
amounts of the first good, v
1i
=0 when x
1
=0, and positive otherwise, v
1i
>0. Thus, v
1i
represents the utility

11
Note that a similar discussion is applicable for the case of linear preferences (when the consumer regards two
goods as perfectly substitutable).
17

of holding positive amounts of good 1 in terms of good 2 (the numeraire). Consumer is demand for the
first good is hence
{ }
1 1
1 1 1 1
1 1
1 if
( ) 0,1 if
0 if
i
i i
i
p v
x p p v
p v
<

= =

>


Graphically, this demand function can be represented as the figure below.

Figure #3.9
Intuitively, v
1i
can be interpreted as the reservation price of good 1 for consumer i, since it denotes the
maximum price at which he is willing to buy the good: for prices above that level he demands zero units,
while for prices below he demands one unit. When constructing the aggregate demand for good 1, we can
horizontally add the individual demand for all consumers. Note that some consumers might be in the
interval of zero demand (where p
1
>v
1i
), others might be in their interval of one-unit demand (because
p
1
<v
1i
), while other consumers might simply be indifferent between buying and not buying the good (i.e.,
price is exactly at the discontinuity point of individual demand p
1
=v
1i
). The only assumption we need to
impose is that most of these consumers are not at the discontinuity point p
1
=v
1i
.
Denote by x
1
(p
1
) the demands of consumers with one-unit demand for the good (i.e., those with a
reservation price sufficiently high, v
1i
>p
1
). Therefore, if consumers reservation price is distributed
according to a cumulative distribution function G(p
1
) representing all consumers with reservation prices
below p
1
, v
1i
<p
1
, we can express x
1
(p
1
) more compactly as x
1
(p
1
)=1- G(p
1
). Hence, the aggregate demand
x
1
(p
1
) is a continuous function even though none of the individual demand correspondences are so.
12
The
figure below represents a demand curve. Note that the vertical intercept reflects the highest reservation

12
Note that if we are dealing with a small number of consumers, the above condition on not everybody having the
same v
1i
point might not be satisfied. For this reason, we assume a continuum of consumers.
18

price among all consumers, while the horizontal intercept represents a low enough price such that all
consumers (100%) decide to buy the good.

Figure #3.10
Finally, the following figure illustrates the role of G(p
1
) on the demand for good 1: G(p
1
) can be
understood as the share of consumers for which a particular price p1 is higher than the reservation price
for the good, and who therefore demand zero units of it. In contrast, 1- G(p
1
) reflects the share of
consumers for which the particular price p
1
is low enough to justify a one-unit demand.

Figure #3.11
1

Chapter 4 Production Theory
Production theory
In this chapter we introduce the production set and production function of a firm and their related
properties. In addition, we will examine the firms profit maximization problem, and its dual: the firms
cost minimization problem. We will also describe the cost function, and aggregate production decision
among several firms.
1
Finally, we will investigate under which conditions we can state that a firm's
production decision is efficient.
2


Production sets
Let us define a production vector (or production plan) y=(y
1
,y
2
,,y
L
) as a vector with L components. If a
particular component of the vector is positive, e.g., y
2
>0, it denotes that the firm is producing positive
amounts of good 2. If instead, a complement of the vector is negative, y
2
<0, it denotes the firm is using
the good 2 as an input in its production process.
We are especially interested in production plans that are technologically feasible. We represent all
technologically feasible production plans as part of the production set Y.
{ }
: ( ) 0
L
Y y F y =
where F(y) is the transformation function. This function can be intuitively understood as a production
function, as the following figure illustrates.

Figure #4.1

1
Aggregation in production theory will prove easier than in consumer theory. In particular, no wealth effects arise in
production theory, making the aggregation among several firms easier.
2
This chapter follows chapter 5 in MWG. For an intermediate microeconomics approach see chapters 6 to 8 in
Besanko and Braeutigam, and for a presentation combining sections of MWG and intermediate microeconomics, see
Varian (chapters 1-5).
2

In particular the firm uses units of good 1 as an input in its production process in order to produce units of
good 2 as an output. For this reason, in the left-hand side of the figure y
1
<0 (input) while y
2
>0 (output).
On the one hand, the boundary of the production function indicates production plans for which F(y)=0.
(We also referred to this boundary as the transformation frontier). On the other hand, points below the
transformation frontier indicate feasible production plans, since F(y)<0.
Therefore, for any production plan y on the transformation frontier such that F( y )=0, we can totally
differentiate the transformation frontier, as follows
( ) ( )
0
k l
k l
F y F y
dy dy
y y

+ =


and solving for dy
l
/dy
k
, we obtain
( ) ( )
, ( ) ( )
where ( )
k k
l l
F y F y
y y
l
l k F y F y
k y y
dy
MRT y
dy




= =
The marginal rate of transformation between good l and k, evaluated at y , MRT
l,k
( y ), measures how
much the (net) output of good k can increase if the firm decreases the (net) output of good l by one
marginal unit.
Let us next the note inputs and outputs with different letters. In particular,
1, 2
1 2
( ,..., ) 0 outputs
( , ,..., ) 0 outputs
M
L M
q q q q
z z z z

=
=

where the number of inputs, L, is larger or equal to the number of outputs, M. in this case, hence, goods
are transformed into outputs by the production function f(z
1
,z
2
,,z
L-M
), i.e., f:R
L-M
R
M
. Let us consider
an example. A firm producing one single output, M=1, using L inputs, has a collection set Y that can be
described as
{ }
1 2 1 1 2 1
1 2 1
( , ,..., , ) : ( , ,...,
and ( , ,..., ) 0
L L
L
Y z z z q q f z z z
z z z


Totally differentiating this firm's production function f(z) similar to how as we did above for the
transformation function-- and holding output level fixed, we obtain
( ) ( )
0
k l
k l
f z f z
dz dz
fz fz

+ =
and rearranging
( ) ( )
, ( ) ( )
where ( )
k k
l l
F z F z
z z
l
l k F z F z
k z z
dz
MRTS z
dz




= =
3

Intuitively, the marginal rate of technical substitution between inputs l and k, evaluated at input vector z ,
MRTS
l,k
( z ), measures the additional amount of input k that must be used when we decrease the amount
of input l marginally, and we want to keep output level unchanged at q =f( z ).
3
The following figure
depicts the combinations of inputs 1 and 2 (e.g., capital and labor) for which the firm reaches a production
level of 200 units. We refer this level set of (z
1
,z
2
) pairs reaching the same total output as an isoquant
curve for the firm.
4
As we have described above, the slope of the isoquant is the MRTS
1,2
, since it depicts
by how much we must increase the use of input 1 if we are to marginally decrease the amount of input 2.

Figure #4.2
Example. Let us next find the MRTS
l,k
( z ) for Cobb-Douglas production function.
1 2 1 2
1 1 2
1 2
1
1 1 2
2 2
2
1 ( 1)
1 2 1 2
1,2 1 ( 1)
2 2 1 1
Cobb-Douglas Production Function
( , ) where 0 and 0
( , )
( , )
( )
If we were given a
l z z z z
f z z
z z
z
f z z
z z
z
z z z z
MRTS z
z z z z





= = =
1 2 1,2
particular value of inputs:
3
z ( , ) (2, 3), then ( )
2
z z MRTS z


= = =



3
Note the close relationship between the MRTS
l,k
in production theory and the MRSx,y in consumer theory. In
particular, the later measures additional amount of good y that an individual must consume when we decrease the
amount of good x marginally, we want to keep the utility level of this individual unchanged at u .
4
Note also the close relationship between isoquant curves for a firm (representing combinations of inputs for which
the firm reaches the same level of output) and indifference curves for a consumer (representing combinations of
goods for which the consumer reaches the same utility level).
4

Properties of production sets
Let us next describe different properties or production sets. Note that the following properties can be
mutually exclusive.
1. Y is nonempty: That is, we have inputs and/or outputs. If, using inputs we can only obtain zero
amounts of output, our production set would still be nonempty.
2. Y is closed: the production set includes its boundary points.
3. No free lunch: this property states that the firm must use inputs in order to produce output. Or, in
other words, the firm cannot be producing outputs 1 and 2 without using any inputs. The
following figure represents a production set that satisfies the no free lunch condition, since the
firm is using amounts of input 1 in order to produce positive amounts of good 2 as an output. In
contrast, the two figures at the bottom illustrate production plans that violate the no free lunch
condition given that the firm produces positive amounts of good 1 and 2 without the need to use
any inputs.

Figure #4.3
4. Possibility of inaction. This condition states that the firm can choose to use no inputs and obtain
no output as a consequence. In other words, the input-output vector 0 (in the origin of the figure)
is part of the production set.
5


Figure #4.4
Let us examine the relationship between this condition and the presence of fixed or sunk costs. If
the firm experiences fixed costs, as the figure below depicts, the firm is using an amount of input
1 without obtaining any output in return. Inaction, however, is still possible since the origin still
belongs to the production set. If the fixed costs that the firm must incur (e.g., setup costs) are
sunk, then the firm cannot move towards the origin 0. For instance, the firm already signed for the
purchase of
1
y input, and cannot renege from such a contract. In this case, inaction is not
possible.

Figure #4.4
5. Free disposal: if y is a production plan that belongs to the production set Y, and yy, then y
must also belong to production set Y. Intuitively, note that production plan y is less efficient than
production plan y: either it produces the same output using more inputs, or produces less output
using the same amount of inputs. Then, production plan y must also belong to the firm's
production set. Hence, the producer can use more inputs without the need to reduce his output: in
particular, the producer can dispose of (eliminate) the additional inputs he doesn't need at no cost
(for this reason this property is referred as free disposal).
The following two figures illustrate the no free lunch property for two very different production
sets: if production plan y belongs to the production set Y, then y<y must also be part of Y.
6


Figure #4.5

6. Irreversibility: suppose that production plan y belongs to production set Y (and that it does not
coincide with the origin). Then, production plan y cannot belong. The following two figures
illustrate the no irreversibility property. This property illustrates that there is no way back for
the firm. It is easy to construct a production set Y and a production plan y belonging to that set,
for which its mirror in the right-hand side quadrant does not belong to the production set.
7. Nonincreasing returns to scale: if production plan y belongs to Y, then a scaling down of
production plan y, y for [0,1] , is also part of the production set Y. the following figures
illustrate a production set meeting nonincreasing returns to scale (since scaling down any
production plan y denotes a new production plan that also lies in the production set Y), and a
production set that violates nonincreasing returns to scale (where scaling down production plan y
creates a new production plan that does not belong to the production set Y, the right graph).

Figure #4.6
Nonincreasing returns to scale maintain an interesting relationship with the presence of fixed and
sunk costs. In particular, the presence of any of these costs implies that the firm's production set
violates nonincreasing returns to scale. The following two figures illustrate that scaling down a
given production plan y when the firm incurs fixed or sunk costs yields a new (scaled down)
production plan that that doesnt necessarily lie within production set Y.
7


Figure #4.7
8. Nondecreasing returns to scale: if production plan y belongs to Y, then a scaling up of production
plan y, y for 1, is also part of the production set Y. the following figure (left) depicts a
collection set satisfying nondecreasing returns to scale, since scaling up any production plan y
yields a new production plan that also belong to the production set Y. In contrast the figure on the
right shows a production set that violates nondecreasing returns to scale: scaling up production
plan y yields a new production plan that does not belong to production set Y.

Figure #4.8
Unlike our previous discussion about the relationship between nonincreasing returns to scale and
fixed and sunk costs, nondecreasing returns to scale can be satisfied even when firms incur fixed
and sunk costs. The next two figures illustrate this point: scaling up production plan y yields a
new production plan that belongs to production set Y, both when firms incur fixed costs (left
figure) and when they incur sunk costs (right figure).
8


Figure #4.9
9. Constant returns to scale: if production plan y belongs to Y, then production plan y also
belongs to Y, for any 0. Hence, both scaling down an original production plan y (when alpha
takes a value between zero and one [0,1] ) and scaling up an original production plan y
(when alpha takes a value larger than one, >1) yields new production plans that still belong to
the production set Y. This point is illustrated in the following figure, which emphasizes that in
order for constant returns to scale to be satisfied we need the transformation frontier to be
represented with a straight line.
5
Another way to interpret constant returns to scale is by noticing
that production set Y satisfies both non-increasing and non-decreasing returns to scale
simultaneously.
6


5
Note that in the case in which the firm uses two inputs in order to produce one output, constant returns to scale
implies that the cone representing the production set for the firm must have a straight surface. That is, making a
vertical slice of the cone we can obtain a production set in 2D as that in the above figure.
6
One interesting exercise is to check whether constant returns to scale can be satisfied when the firm incurs fixed or
sunk costs. Another interesting exercises to show that a production set Y satisfies constant returns to scale if and
only if the production function is homogeneous of degree one (see Exercise 5.B.2, and the review sessions).
9


Figure #4.10

In the following figures we include an alternative graphical representation of constant, increasing and
decreasing returns to scale using isoquants. The figure on the left shows constant returns to scale: an
increase in both labor and capital by a factor of 2 (doubling their initial amounts) increases output
proportionally (doubling it, allowing the firm to move from isoquants Q=100 to Q=200). The figure at the
center shows that, a same increase in labor and capital increases output more than proportionally (the firm
moves from isoquants Q=100 to Q=300) when the production process exhibits increasing returns to scale.
Finally, the figure on the right hand side reflects that a similar increase in both inputs produces a less than
proportional increase in output when the firm exhibits decreasing returns to scale.

Figure #4.11
10

Importantly, note that the presence of increasing, decreasing or constant returns to scale can have
regulatory implications. In particular, if a firm exhibits significant increasing returns to scale, it will be
able to produce a given amount of output at a lower cost per unit than could two equal-size smaller firms,
each of them producing half as much output. In such context, a market would be most efficiently served
by a large firm than by several small firms. If, in contrast, firms in an industry exhibit decreasing returns
to scale, an opposite argument applies, and a market would be most efficiently served by several firms
rather than by a large firm.
7

Note that the presence of constant returns to scale does not necessarily imply increasing marginal product.
Indeed, as the figure below illustrates, a firm can exhibit constant returns to scale (an increase in both
inputs by a factor of 2 produces a proportional increase in output) but diminishing marginal product of
labor, since an increase in labor from 10 to 20 workers produces an increase in output of 40 units (from
100 to 140), but a further increase in labor from 20 to 30 workers only induces an increase in output of 30
units (from 140 to 170), i.e., the marginal product of labor is positive but decreasing.

Figure #4.12

Example. Let us next check returns to scale in the Cobb-Douglas production function

7
Another measure of returns to scale is the so-called scale elasticity, which measures the percent increase in
output due to a 1% increase in the amounts of all inputs. That is,
,
( , )
( , )
q t
f tk tl t
E
t f k l

. Exercise 9.9 in NS
provides additional practice on scale elasticities.
11

1 2 1 2 1 2 1 2
( , ) ( ) ( ) ( , ) f z z z z z z f z z


+ +
= = =
Therefore, when the 1 + = , we have constant returns to scale. Importantly, note the strong
relationship between returns to scale and homogeneity of the production function. The definition of
homogeneity of degree one states that if we increase all inputs in the same proportion we must see the
total output of the firm increase in the same proportion. This is exactly what occurs when constant returns
to scale are satisfied, i.e., when 1 + = . When the sum 1 + > , we then have increasing returns to
scale and the production function is homogeneous of degree larger than one. Finally when the 1 + < ,
the production set satisfies decreasing returns to scale and the production function is homogeneous of
degree less than one.
8

Several empirical applications use the Cobb-Douglas production function to test for the presence of
increasing, decreasing or constant returns to scale. Here we have the sum of the exponent + ,
separating industries in three groups: those with increasing returns to scale ( 1 + > ), those with
constant returns to scale ( 1 + = ), and those with decreasing returns to scale ( 1 + < ). Note that,
for example, doubling all inputs in the tobacco industry implies that output grows less than proportionally
(in 1.42), while increasing inputs in a similar fashion in the primary metal industry produces a more than
proportional increase in output (of 2.36).

Industry Alpha+Beta
Decreasing returns Tobacco 0.51
Food 0.91
Constant returns Apparel and textile 1.01
Furniture 1.02
Electronics 1.02
Increasing returns Paper products 1.09
Petroleum and coal 1.18
Primary metal 1.24

Example. The linear production function exhibits constant returns to scale. Indeed,
( , )
( , ) ( ) ( , )
f K L aK bL
f tK tL atK btL t aK bL t f K L
= +
= + = + =


8
For a more detailed discussion of the relationship between returns to scale and homogeneity of the production
function, see NS 302-304.
12


And similarly the fixed proportion production function exhibits constant returns to scale since
{ }
{ } { }
( , ) min ,
( , ) min , min , ( , )
f K L aK bL
f tK tL atK btL t aK bL t f K L
=
= = =

One interesting property of a production function, f(k,l) exhibiting constant returns to scale is that we can
incorporate increasing or decreasing returns to scale by simply using a transformation F(.),
( , ) [ ( , )] where >0 f K L f K L

=
Indeed,

by CRS of ( , )
( , ) [ ( , )] [ ( , )] ( , ) ( , )
f K L
F tK tL f tK tL t f K L t f tK tL t F K L

= = = =
Then if >1, the transformed production function F(k,l) exhibits increasing returns to scale, if =1 it
exhibits constant returns to scale, and if <1 it exhibits decreasing returns to scale.

10. Additivity (or free entry): if production plans y and y individually belong to production set Y,
then its sum y+y must also belong to Y. that is, if one plant produces y and another plant enters
producing y, then the aggregate production y+y must be feasible.
11. Convexity: if two production plans y and y belong to the production set Y, then its linear
combination must also belong to Y.
, and [0,1]
(1 )
y y Y
y y Y



+

The left figure below illustrates a production set that satisfies convexity. In particular, the linear
combination between two production plants on the production frontier belongs to the production
set Y, as does the linear combination between two production plans that do not belong to the
production frontier. In contrast, the figure in the right reflects a production set that violates
convexity. Specifically, the linear combination between two production plants does not
necessarily lie within production set Y. Therefore; the interpretation of convexity is that balanced
input combinations (using a mixture of the inputs in the two production plans) are more
productive than unbalanced input combinations.
9


9
In addition, the convexity of the production set maintains a close relationship with the concavity of the production
function; see exercise 5B3 in MWG, and the review sessions.
13


Figure #4.13
The following two figures examine convexity for production sets in which the firm incurs fixed
costs (left figure) or sunk costs (right figure). In particular, note that when the firm incurs fixed
costs a linear combination of two production plans yields a new production plan that doesn't
necessarily satisfy convexity. In contrast, when the firm incurs sunk costs, any linear combination
between two production plans lies within production set Y, satisfying convexity.

Figure #4.14
Let us now make a brief detour in order to discuss under which conditions the marginal rate of technical
substitution (MRTS), representing the slope of the firms isoquants, is decreasing.
14

, ,
,
2

( ) ( )
( )
l
l k l k
k
dk dk
l k k ll lk l kl kk dl dl
k
f dk
MRTS MRTS
f dl
MRTS f f f f f f
l f
= =
+ +
=


We hence want to check under which conditions this derivative is smaller than zero.
2
,
2
2
Using the fact that along an isoquant, and Young's Theorem ,
( ) ( )
( )

( )
l l
k k
l
k
l
lk kl
k
f f
k ll lk l kl kk f f l k
k
f
k ll lk l l lk kk f
k
f dk
f f
dl f
f f f f f f MRTS
l f
f f f f f f f
f
= =

=

+
=

2 2
,
3
2 2
or
Hence, multiplying numerator and denominator by
2
( )
0 and 2
k
l k
k ll kk l l k lk
k
k ll kk l l k lk
f
MRTS
f f f f f f f
l f
f f f f f f f
+ + + +

+
==

+ <


First, note that if f
lk
>0 (i.e., if an increase in the amount of capital raises the marginal productivity of
workers), then MRTS is decreasing in labor, and the isoquants gets flatter as we move to larger numbers
of workers. If, however, f
lk
<0, then we can have two cases:
,
, 2 2
2 2
1) If 0 ( ), then is decreasing in
2) If 0, then we can have:
a) 2 0
b) 2
lk l l k
lk
l k
k ll kk l l k lk
k ll kk l l k lk
f k MP MRTS l
f
MRTS
f f f f f f f
l
f f f f f f f
>
<

+ > <

+ <
,
0
l k
MRTS
l

>


We summarize our results in the following two figures. The first one illustrate isoquants where the MRTS
is decreasing in labor, and embody the case in which f
lk
>0 and the case in which f
lk
<0 (but an increase in
the amount of capital produces a relatively small decrease in the marginal productivity of workers). The
second figure (right) reflects, in contrast, isoquants where the MRTS is increasing in labor, and represents
the case in which f
lk
<0 and, in addition, an increase in the amount of capital produces a large drop in the
marginal productivity of workers.
15


Figure #4.15
Example. Let us next examine one example.
2 2 3 3
2 3 2
2 2 3
2 3
( , ) 600
1) Marginal Products:
1200 3 0 iff 400
1200 3 0 iff 400
2) Decreasing Marginal Productivity:
1200 6 0 iff 200
120
l l
k k
l
ll
k
kk
f k l k l k l
MP f k l k l kl
MP f kl k l kl
MP
f k k l kl
l
MP
f
k
=
= = > <
= = > <

= = < >

= =

2 3
0 6 0 iff 200 l kl kl < >

We can therefore summarize our results about the values of kl for which MPL and MPK are positive and
decreasing in shaded area of the following figure.

Figure #4.16
But is the above condition (graphically represented in the area 200<kl<400) a sufficient condition in order
to guarantee that the MRTS is diminishing and we obtain the standard bowed isoquants? No. As we
described in our previous discussion, in order to guarantee that the MRTS is diminishing in labor we need
to check the sign of f
lk
.
16

2 2
2400 9 0 if and only if 266 = = > <
lk kl
f f kl k l kl

Figure #4.17
We know that when f
kl
>0 we can guarantee that MRTS is diminishing. Among the area 200<kl<400, this
occurs in particular at values below kl=266, as depicted in the figure. For values above that cutoff,
however, f
kl
becomes negative, and we cannot guarantee that the MRTS is diminishing.

Remark on CRS: Recall that when a production function exhibits constant returns to scale we have that an
increase in all inputs by the same proportion produces an increase in the firms output in the same
proportion. That is,
( , ) ( , ) f tk tl t f k l =
But we know that if the production function exhibits constant returns to scale, then it must be
homogenous of degree one. In addition, we know that if a function is homogeneous of degree one, its
derivative must be homogeneous of degree zero. Hence, the marginal product of labor and of capital must
be homogeneous of degree zero. Therefore,
1
( , ) ( , )
( , ) ( , )
if we set ,
( , ) ( ,1)
l
l l
l
k
l l l l
f k l f tk tl
MP
l l
f k l f tk tl
t
MP f k l f

= =

= =
=
= =

We can therefore conclude that, when the firms production function exhibits constant returns to scale, the
marginal product of labor only depends on the ratio of capital to labor, but not on the absolute levels of
capital and labor used by the firm. A similar argument can be extended to the marginal product of capital.
Hence, the ratio of the marginal products MRTS=MPL/MPK only depends on the ratio of capital to labor,
but not on the absolute levels of capital and labor. Graphically, this implies that the slope of a firms
isoquants coincide at any point along a ray from the origin.
10
This occurs, in particular, when the firms
production function is homothetic.
11
We illustrate this property in the figure below.

10
Note that a ray from the origin maintains the ratio of k/l constant.
11
Recall a similar property in consumer theory.
17


Figure #4.18
Elasticity of substitution. The elasticity of substitution measures the proportionate change in the ratio k/l
relative to the proportionate change in the MRTS along an isoquant. That is,
% ( / ) ( / ) ln( / )
% / ln
k l d k l RTS k l
RTS dRTS k l RTS


= = =


Note that the value of the elasticity of substitution is positive because when k/l decreases (increases) the
MRTS decreases as well (increases as well, respectively). Indeed, if we move along an isoquant towards
higher amounts of labor, then the ratio k/l decreases and the isoquant becomes flatter (reducing MRTS). A
similar argument applies when we move towards higher amounts of capital.
12
This is illustrated in the
following figure, whereby a movement from point A to B reduces the ratio k/l and also the slope of the
isoquant, measured by MRTS. The elasticity of substitution provides as with a measure about how
magnitude changes the most.

Figure #4.19

12
Note that this reasoning is only valid when the isoquants are bowed in towards the origin, i.e., when MRTS is
decreasing in labor. If, in contrast, MRTS increases in labor, a decrease in ratio k/l (moving towards higher amounts
of labor) might cause the isoquant to become steeper (higher MRTS).
18

First, note that if the elasticity of substitution is high, this implies that the MRTS is not substantially
changing relative to k/l. This occurs, in particular, when the isoquants are relatively flat, as the figure
below depicts.

Figure #4.20
Second, if the elasticity of substitution is low, then the MRTS is substantially changing relative to k/l.
This occurs when the slope of the isoquant changes significantly when we alter the input combination.
We provide a graphical illustration of an isoquant associated with a low elasticity of substitution below.

Figure #4.21
19

Let us briefly analyze the extreme cases described above. Let us start with the linear production function
q=f(k,l)=ak+bl. Importantly, this production function exhibits CRS since
( , ) ( ) ( , ) f tk tl atk btl t ak bl tf k l = + = + =
In addition, all isoquants are straight lines (so their slope is constant for all amounts of labor, and
therefore, for all k/l ratios). The figure below illustrates a linear production function in which labor and
capital are perfect substitutes in the production process. Importantly, this implies the MRTS is constant in
k/l and hence the elasticity of substitution defines an infinitely large number.
13


Figure #4.22
Let us next examine the other polar case in which both inputs must be used in fixed proportions, i.e.,
q=min{ak,bl} where a,b>0. In this case, the MRTS changes dramatically from infinite (for labor amounts
below the kink of the isoquant) to zero (for labor amounts beyond the kink). This implies that the change
in the MRTS is infinite, defining as a consequence a zero elasticity of substitution.

Figure #4.23
Let us now analyze the Cobb-Douglas production function. ( , ) , , 0
a b
q f k l Ak l A a b = = >

13
Another associated property of the linear production function is that it is homothetic, i.e., the slope of its isoquants
are constant along any ray from the origin.
20

As we described in previous classes, this production function can exhibit any returns to scale, depending
on the sum of its exponents.
14
Importantly, this production function can be linearized by applying
logarithms, as follows.
ln ln ln ln q A a k b l = + +
Where a is the elasticity of output with respect to capital, i.e.,
ln
, ln
q
q k k
E

= , and b is the elasticity of output


with respect to labor, i.e.,
ln
, ln
q
q l l
E

= .
Note that the elasticity of substitution for the Cobb-Douglas production function can be shown to be
exactly one, for any parameter values. Indeed,
1
, 1
,
,
,
( , )
Therefore,
ln ln ln
or,
ln ln ln
Hence,
ln
1
ln
a b
a b
l
l k a b
k
l k
l k
k
l
l k
f k l Ak l
MP b A k l b k
MRTS
MP a A k l a l
b k
MRTS
a l
k b
MRTS
l a
MRTS

=

= = =

= +
=

= =


Let us finally examine the CES production function:
/
( , ) [ ] 1, 0, >0 q f k l k l

= = +
Where parameter gamma determines whether this function exhibits increasing, decreasing or constant
returns to scale (when gamma>1, <1 or +1, respectively). On the other hand, note that for this production
function we can define the elasticity of substitution as follows:

14
It is easy to prove that the elasticity of substitution of the Cobb-Douglas production function is exactly one. The
proof is below. A good practice would be to prove it on your own.
21

1
1 1
1
1
1
1
[ ] ( )
[ ] ( )
Hence,
ln ( 1) ln
and solving for ln , we have
ln ln
Therefore, the elasticity of substitution between capital and labor
q
k
q
l
k
l
k
l
k
l
k l k
k
MRTS
l
k l l
MRTS
MRTS

+

= = =


+
=
=
is
ln 1
ln 1
k
l
d
d MRTS

= =


And therefore the CES production function embodies all the production functions described above. First,
when =1 we obtain that the elasticity of substitution between two inputs becomes infinite (i.e., linear
production functions). Second, when = - rho=-infinity we obtain that the elasticity of substitution
becomes zero, indicating that inputs cannot be easily substituted in the production process (i.e., fixed
proportions production function). Finally, when =0 we find that the elasticity of substitution becomes
one, indicating that inputs can be somewhat easily substituted in the production process, as in the Cobb-
Douglas production function.
We include below some empirical evidence about the elasticity of substitution for different German
industries. In particular, note that inputs cannot be easily substituted in the chemical industry (while
maintaining output constant), while they can be in the food industry.

Finally, note that the elasticity of substitution is defined as the percentage change in the ratio of two
inputs to the percentage change in the MRTS, keeping the firms output and the amounts of all other
inputs fixed. If we allow for variations in the output, we can then find a value of the elasticity of
substitution between two inputs associated to different levels of production. In particular, the elasticity of
substitution can take a different value for a particular production scale (i.e. a specific isoquant), but might
change when we increase or decrease the level of production. First, we illustrate one case in which the
elasticity of substitution decreases in scale: when the production level is low, at q
0
and q
1
, the firm can
easily substitute between labor and capital, but when the production level increases to q
5
and q
6
, the
substitution among inputs becomes more difficult.
22


Figure #4.24
The next figure reflects the opposite case. Now the firm can more easily substitute capital for labor as its
production level increases.

Figure #4.24



23

Profit maximization
Let us now analyze the profit maximization problem and its dual: the cost minimization problem. We will
henceforth assume that firms are price takers, i.e., the production plans of every individual firm do not
alter market price p. In addition we assume that the production set satisfies nonemptiness, closedness and
free-disposal. (Note that for completeness, we do not impose specific conditions about convexity or
returns to scale, but we comment on that later on.) The profit maximization problem can also be viewed
as maximizing the difference between total revenues and total economic costs.
The profit maximization problem (PMP) for the firm is
max
. . or alternatively . . ( ) 0
y
p y
s t y Y s t F y


,

where F() is a transformation function describing Y.

The value function resulting from this maximization problem, (p), denoted as the profit function of
the firm, associates every p with the highest amount of profits, chosen by the profit-maximizing
production plan y. More formally, we can define the profit function as
{ } ( ) max :
y
p p y y Y =
And the supply correspondence y(p) associates to every price vector p the profit-maximizing production
plan, i.e., the supply correspondence y(p) is the argmax of the above PMP. That is,
{ } ( ) : ( ) y p y Y p y p = =
Positive components in the supply correspondence reflect the firms outputs supplied to the market, while
negative components are inputs in the production process demanded by the firm. The following figure
represents the profit maximization problem.
24


Figure #4.25
First, note that in addition to the firm's production set Y, we can depict the firms isoprofit lines.
Intuitively, an isoprofit line represents the combinations of inputs and output for which the firm obtains a
given level of profits, e.g., 1 million dollars. Note that if the firm uses a larger amount of inputs, it should
also obtain a larger amount of output to be sold in the market in order to maintain its profit level.
Graphically, this implies that isoprofit lines increase in output when the firm uses more inputs, i.e., the
isoprofit lines have a negative slope. In particular, the slope of the isoprofit line is given by the price ratio
p1/p2, which is sometimes otherwise denoted as w/p. In addition, note that an increase in profits is
associated to higher isoprofit lines, which graphically shifts the isoprofit lines northeast. Intuitively, the
firm is capable of producing more units of output sold a constant market price p at a given input usage,
15

its profits increase. Therefore, the firm increases its isoprofit line as far as possible (maximizing profits),
and selects the production plan that is technologically feasible according to production set Y. In the above
example, this occurs at the tangency point between the production set Y and the isoprofit line associated

15
Note that market prices have not changed, since isoprofit lines are moving in a parallel fashion. Hence larger can
only be associated to a higher productivity of inputs.
25

to profit level (p). Hence, the firm chooses supply correspondence y(p) which reaches a profit level of
(p).
16

A natural question at this point is that of existence. In particular, one might wonder whether there are
PMPs with no supply correspondence, i.e., PMPs with no well defined profit-maximizing production plan
y(p). The following example illustrates such a case.
Example. Consider a firm with a production function f(z)=q, where every unit of input to the production
process is transformed into a unit of output. The following two figures illustrate this production function
(which exhibits constant returns to scale). In particular, the figure at the top reflects the case in which
input price p
z
is lower than the output price p, and as a consequence, isoprofit lines are relatively flat
compared with the transformation frontier. In this case, the firm can increase the amount of input used
(and output obtained) reaching higher profits (associated to higher isoprofit lines). If the firm is
unconstrained in the use of inputs, it can always increase the amount of inputs in order to reach a higher
isoprofit lines. Hence, the supply correspondence is not well defined, since the firm could always increase
the input.
17
If, in contrast, the input price is larger than the output price, then isoprofit lines become
steeper than the transformation frontier. This case is illustrated in the figure at the bottom. In this
example, the firm tries to reach a higher isoprofit lines by reducing the input in the production process. In
the extreme, the firm reduces z until reaching z=0, i.e., y(p)=0 with associated profit function (p)=0.
In this example, if p
z
<p , the isoprofit curves are flatter than the transformation frontier
if p
z
>p , the isoprofit curves are steeper than transformation frontier.

16
Note that the firm doesn't want to choose production plans associated to lower profit levels (such as those in
isoprofit line
0
) since, despite being technologically feasible, they do not reach the highest possible profit.
Similarly, the firm cannot reach profits beyond (p) (i.e., to the northeast of y(p)), since they are not technologically
feasible.
17
However, note that if the firm is constrained in the use of inputs in the interval [0,zbar], then the firm can only
increase the amount of inputs up to zbar, making the firms PMP well defined.
26


Figure #4.26
In other terms,
if p
z
p , then q=0 and (p)=0 and
if p
z
<p , then q= and (p)= ,
clearly portraying a supply correspondence which is not well-defined.
Let us now go back to the solution of the profit maximization problem, approaching it algebraically.
Taking first order conditions with respect to the firms production plan y, we obtain
*
( )
0
k
k
F y
p
y


And for interior solutions, we have that
*
( )
k
k
F y
p
y

, or using matrix notation,


*
( ) p F y = .
Intuitively, this condition says that, at the solution of the profit maximization problem, the firm selects a
production plan at which the price vector and the gradient vector are proportional (as depicted on the
above figure of the profit maximization problem for interior solutions). Therefore, from the first order
conditions of the PMP we have
27

; and hence
* * *
( ) ( ) ( )
*
( )
*
( )
,
*
( )
p p p
k k l
F y F y F y
y y y
k k l
F y
p y
k k
MRT y
k l
p
F y
l
y
l
= =

= =


Graphically, this condition implies that the slope of the transformation frontier (at the profit-maximizing
production plan y*), i.e., MRT
k,l
(y*), coincides with the price ratio, p
k
/p
l
. (This condition is also
graphically illustrated in the above figure representing interior solutions of the PMP.)
In order to grasp a good understanding of the above PMP, let us now focus on the case in which the firm
uses several inputs in order to produce a single output. In particular, consider a production function f(z)
that produces a single output using a vector z of inputs. We can then represent the profit maximization
problem as follows.
0
max ( )
z
pf z wz


Note that, because of producing a single output, the only choice variable for the firm is the input vector z.
Taking first order conditions with respect to each input z
k
, we obtain
*
( )
k
k
f z
p w
z


Therefore, for interior solutions, this states that the market value of the marginal product obtained from
using additional units of input k, pMP
K
, must coincide with the price of this input, w
k
. For the case of only
two inputs, this condition implies that
*
* *
( )
*
,
( ) ( )
; and hence ( )
k
k l
f z
z
k k
k l
f z f z
l
z z
w w
p MRTS z
w



= = =
From this interior solution of the PMP, we see the ratio of input prices must be equal to the ratio of
marginal products. Alternatively, we can express this condition by saying that the marginal productivity
per dollar spent on input k is equal to that spent on input l.
18

When the production set is convex, these first order conditions also sufficient. We illustrate this property
in the following two figures. Note that in addition to the isoquant, we also depict isocost lines for the
firm, where

18
This condition is analogous to the bang for the buck condition we described in consumer theory, but applied to
the marginal productivity per dollar spent on inputs, rather than the marginal utility per dollar spend on a particular
good.
28

w
1
z
1
+w
2
z
2
=cbar ,
which reflects combinations of inputs z
1
and z
2
for which the firm incurs the same total cost, cbar, for
given input prices w
1
and w
2
.
With convex production sets, we have isoquants bowed in toward the origin, i.e., defining convex upper
contour sets, as the figure below reflects. In this case, first-order necessary conditions are also sufficient.

Figure #4.27
However, if production sets are not convex, we have bowed out isoquants, defining concave upper
contour sets, as the figure below illustrates.

Figure #4.28
29

In this case, the tangency condition specified in the first-order necessary conditions of the PMP defines an
input vector (z
1
,z
2
) denoted as A in the figure which is clearly not profit-maximizing. Instead, the
profit-maximizing vector is at a corner solution, where the firm uses none of input 1 but only input 2.
Indeed, the firm reaches the farthest out isoquant for a given isocost line at that input combination where
the slope of isoquant and isocost do not coincide. In particular, the isoquant is flatter than the isocost,
reflecting that the marginal productivity per dollar spent on input 1 is lower than that spent on input 2,
which leads the firm to spend all its money on input 2 alone.

Profit function. Let us next describe some properties of the profit function, (p). Assume that the
production set Y is closed and satisfies the free disposal property. Then,
1. Homogeneity: The profit function (p) is homogeneous of degree one in prices. That is,
increasing the prices of all inputs and outputs produces a proportional increase in the firm's
profits, i.e., (p)= (p).
Note that the profit function can be expressed as
1 1 2 2
... p q w z w z =
,

where inputs and outputs are evaluated at the profitmaximizing amount. Scaling all prices up by a
common factor , we obtain
( ) ( ) p q w z p q w z p = =
Which shows that the firm's profits increase in the same proportion as input and output prices
were increased, i.e., homogeneity of degree one holds.

2. Convex in output prices: The profit function (p) is convex. Intuitively, this implies that the firm
obtains more profits from balanced input and output combinations, than from unbalanced
combinations.

3. If the production set Y is convex, then
{ }
: ( ) for all 0
L
Y y p y p p = >>
4. If y() is a differentiable function at pbar, then Dy(pbar)=D(pbar) is a symmetric and positive
semi-definite matrix with Dy(pbar)pbar=0. Dy(p) here is the supply substitution matrix, whose
properties parallel those of substitution matrices in demand theory, however the sign is reversed.

Intuitively, property number 3 implies that the production set Y can be alternatively represented
by this dual set. It specifies that, for any given prices p, all production vectors y generate less
profits (py) than the optimal profit function (p). Let us provide next a graphical representation of
this property. The following figure represents a convex production set Y, the supply
correspondence y(p) that maximizes profits, and the associated isoprofit line =pq-wz.

30


Figure #4.29
First, note that all combinations of an output below this isoprofit line yield a lower profit for the
firm. That is, pq-wz (p). Alternatively, the isoprofit line can be represented by
w
q z
p p

= +
Note that if the price vector w/p is constant (i.e., different levels of input usage or different levels
of output sales do not affect input or output prices, respectively), then we have that the slope of
the isoprofit lines are constant in z, and therefore the profit function is convex. (The linear
combination of any two points (z,q) and (z,q) is on or below the isoprofit line, i.e. lies within the
set). If, however, input prices (w) and/or output prices (p) are not constant, we might have that the
price vector is not constant. In this case, we might face nonconvex isoprofit lines.
a. Let us first focus on the case in which input prices are a function of input usage, i.e., w=f(z)
where f(z) is different from zero. Then either:
i. f(z)<0, and the firm gets a price discount per unit of input from suppliers when
ordering large amounts of inputs, e.g., loans; or
ii. f(z)>0, and the firm has to pay more per unit of input when ordering large amounts
of inputs, e.g., scarce qualified labor.
b. Now we analyze the case in which output prices are a function of production, i.e., p=g(q)
where g(q) is different from zero. Then either:
i. g(q)<0, and the firm offers price discounts to its customers; or
ii. g(q)>0, and the firm applies price surcharges to its customers.
For the time being we ignore the possibility that a change in the firms production
affects output prices. We will return to this topic the later chapters.
When we consider the possibility that w=f(z), we can then express the profit function as
( ) f z
q z
p p

= + .
a. If f(z)<0 (as described in point a.i. above), we then have strictly convex isoprofit curves, as
the following figure illustrates. Intuitively, the price ratio becomes lower as we increase z,
and therefore the isocost curve becomes flatter.
31


Figure #4.30
b. If f(z)>0 (as described in point a.ii. above), we then have strictly concave isoprofit curves, as
the following figure illustrates. Intuitively, the price ratio increases as we increase z, and
therefore the isocost curve becomes steeper.

Figure #4.31

c. If f(z)=0, we then have straight isoprofit lines as in our previous examples where input prices
are independent upon input usage.
More comments about the profit function are in order. First, recall that it is a value function, measuring
firm profit only for the profit-maximizing vector y*. Second, the profit function can be understood as a
support function. In particular, let us first take the negative of the production set Y, i.e., -Y. Then we can
define the support function of this Y set as
{ } ( ) min ( ) :
Y
y
p p y y Y

=
The support function first evaluates the profits resulting from old production vectors y in Y, py; second, it
takes the negative of all these profits, p(-y); and finally, the support function chooses the smallest one. Of
32

course, this procedure is the same as maximizing the positive value of the profits resulting from all
production vectors y in Y, py. We provide below a simple example for comparison.
19

max p y
min ( ) p y
1
p y
Highest ranking
1
( ) p y
Lowest ranking
2
p y
2
( ) p y
3
p y
3
( ) p y




Lowest ranking Highest ranking

Therefore, the profit function (p) is the support function of the negative production set Y, i.e.,
( ) ( )
Y
p p

= . Note that the representation of the profit function as a support function allows us to
equivalently represent the production set using the support function. We do that in the following figure.
First, note that the production set Y, which we are trying to equivalently describe, is convex. Then, for a
given price vector p, we select the supply correspondence y(p) resulting from solving the PMP at prices p.
We obtain an associated profit function (p). We can then take all production plans y for which isoprofit
is lower, i.e.,
{ } : ( ) y p y p
Graphically, this set considers all production plans below the isoprofit line associated to y(p) on the
figure. For a different price vector p, we can similarly select the supply correspondence y(p) resulting
from solving the PMP, which yields a profit function (p). We can now take all production plans y such
that
{ } : ( ) y p y p
Graphically, the set considers all production plans below isoprofit line associated to y(p), which contains
an overlap region with the set described above for price p. If we repeat this process for any other price
vector p, we can define infinity many sets whose overlap exactly coincides with the area of production set
Y. The representation of the profit function as a support function, therefore, allows us to equivalently
describe production set Y.

Supply correspondence. Let us now describe the properties of the supply correspondence y(p) that result
from solving the profit maximization problem.
1. If the production set Y is weakly convex, then the supply correspondence y(p) is a convex set for
all p. Moreover if the production set Y strictly convex, then the supply correspondence y(p) is
single-valued (if nonempty).

19
Note that this is applicable to the argmax of any objective function. If x*
1
is the argmin that maximizes function
f(x), we can then claim that x*
1
coincides with the argmin of the negative of this objective function. That is, if x*
2
is
the argument that minimizes f(x), then x*
1
= x*
2
.
33

In the following figure the production set is weakly convex. In particular, it has a flat surface
along which the isoprofit line associated to the highest profit level is tangent. Therefore, we can
identify the set of supply correspondences that generate the highest profit for the firm. Intuitively,
the firm manager is indifferent among any of the input-output combinations within the y(p)
region of tangency between the isoprofit line and the production set, since all these combinations
yield the same profit level. Such a set of supply correspondences is, of course, convex, since a
linear combination between any production plan in the y(p)-region also lies within that region.
We can therefore conclude that the supply correspondence is a convex set.

Figure #4.32

(These graphs could also include the price vector, orthogonal to the isoprofit line.)
If production set Y is strictly convex, as the following figure illustrates, then the tangency
condition between the isoprofit line and the production set occurs at a single point. Therefore, in
this case the supply correspondence y(p) is single-valued.

Figure #4.33

(These graphs could also include the price vector, orthogonal to the isoprofit line.)

34

2. Hotellings lemma: If the supply correspondence y(pbar) consists of a single point, then the profit
function (p) is differentiable at pbar. Moreover, such derivative yields
( ) ( )
p
p y p =
This lemma is an immediate application of the duality theorem that we described in consumer
theory.


The law of supply and quantity theory also apply here; quantities respond in the same direction as
price changes. Mathematically expressed,

. 0 ) ' ' ' ( ) ' ( ) ' ( ) ' (
). ' ( ' ) ( ' , 0 ) ' ( ) ' (
= + =
=
y p y p py py y y p p Thus
p y y and p y y p p y y p p


3. If the supply correspondence y(p) is differentiable at pbar, then its derivative Dy(pbar)=D
2
y(pbar)
is a symmetric and positive semidefinite matrix with Dy(pbar)pbar=0. This property has two
immediate consequences. First, it implies that the elements in the main diagonal of the matrix
Dy(pbar) are nonnegative. Recall that the elements in the main diagonal of this matrix describe
the own substitution effects. We therefore know that
( )
0 for all
k
k
y p
k
p


Moreover, since the matrix Dy(pbar) is symmetric, we can hence conclude that the cost
substitution effects are symmetric. That is,
( ) ( )
for all and
l k
k l
y p y p
l k
p p

=


Importantly, nonnegative own substitution effects imply that quantities and prices move in the same
direction, that is
(p-p)(y-y)0
This implies that the supply function of the firm is positively sloped, as the following figure indicates.
That is the law of supply holds.

Figure #4.34
35

Note that in this new budget constraint, there is no wealth compensation requirement, unlike in demand
theory. This implies that there are no income effects, only substitution effects.
20
Alternatively, from a
revealed preference perspective, this implies that
when , I choose when , I choose
( ) ( ) ( ) ( ) 0
p y p y
p y p y p y p y
p p y y py py p y p y


= +



Cost minimization
Let us now analyze the combination of inputs the firm selects in order to minimize its total cost of
production, conditional on reaching a particular output level. For simplicity, we focus on the single output
case, where z is the input vector, f(z) reflects the production function, q are the units obtained of the
single output, and w>>0 is the vector of input prices.
Therefore, the cost minimization problem (CMP) can be stated as follows (we assume free disposal of
output):

0
min
. . ( )
z
w z
s t f z q

(productive feasibility)

In words, the firm selects a vector of inputs (or factors of production), z, that minimizes total costs, wz,
subject to productive feasibility, i.e., f(z)q. The optimal vector of inputs is denoted as z(w,q), and it is
usually referred to as the conditional factor demand correspondence.
21
(or function if it is always single-
valued). Intuitively, z(w,q) reflects the optimal demand or inputs of a firm when input prices are w and
the firm wants to reach a production level q. The following figure provides a graphical representation of
the above cost minimization problem for a firm producing output using two inputs, z1 and z2.

20
We return to this issue when analyzing the cost-minimizing problem for the firm, where we describe it in more
detail.
21
The term conditional in this expression simply refers to the fact that z(w,q) represents the firm's demand for
inputs, conditional on the requirement that the output level q be produced.
36


Figure #4.35
First, note that the input combinations on or above the isoquant f(z)=q are technologically feasible, while
those below the isoquants are not. Therefore, the above CMP can be summarized as: for input
combinations along a given isoquant f(z)=q, choose the input combination associated to the lowest cost,
wz, i.e., to the isocost line closer to the origin. At input combination z(w,q) the firm cannot reduce its
costs any farther and still produce output level q. At this input combination, the firm's costs are
wz=c(w,q), as depicted in figure.
22
Therefore, the input combination that minimizes costs is z(w,q), and
the isocost line associated with that combination of inputs is {z : wz=c(w,q)}, where c(w,q) represents the
lowest cost of producing output level q when input prices are w, and it is usually referred as the cost
function.
23
Graphically, note that the cost minimizing input combination z(w,q) the firm's isoquant curve
is tangent to the isocost line. Let us prove this result by using the first order conditions of the above CMP.
*
*
( )
0 ( 0 if interior solution, 0)
k k
k
f z
w z
z


= >


or in matrix notation,
*
( ) 0 w f z
and solving for the Lagrange multiplier, we obtain that

22
Note that for isocost lines above c(w,q) using more inputs still reach the isoquant f(z)=q, thus satisfy the
constraint of this CMP. However, because of using more inputs, this input combinations are more costly than z(w,q)
and are hence not cost minimizing. Similarly, isocost lines below c(w,q)using less inputscannot be optimal
either, since they do not reach output level q.
23
Note that, mathematically, the cost function c(w,q) is the value function of the CMP.
37

*
*
( )
*
,
( )
( )
k
l
f z
z
k
k l
f z
l
z
w
MRTS z
w

= =
Note that alternatively, this condition states that at the cost minimizing input combination, the marginal
utility per dollar spent on input k must be equal to the marginal utility per dollar spent on input l.
Otherwise, if the marginal utility per dollar is larger for one input then the firm will not be at the optimum
since it would have incentives to spend more money on the input for which marginal utility per dollar is
larger. (Importantly, note that this tangency condition coincides with the one obtained for the profit
maximization problem some pages above, showing that the CMP is the dual problem of the PMP.)
24

Sufficiency: similarly to the PMP, the above first-order necessary conditions become sufficient when the
production set is convex. The following figure illustrates a nonconvex production set, in which the input
combinations satisfying the first-order conditions is not the cost minimizing input combination z(w,q).
Instead, z(w,q) occurs at the corner, where the firm only uses input 1.

Figure #4.36
A similar argument can be extended to linear production functions, as we describe in the following
example.
Corner solutions: consider a firm with production function Q=10L+2K, where L and K denote amounts
of labor and capital respectively. It is easy to check that the isoquant is a straight line with slope MRTS=-
MPL/MPK=-5. In the case that input prices are w=$5 and r=$2, the isocost lines has a slope of w/r=-2.5.
If the firm wants to reach an output level of Q=200 units, the marginal product per dollar spent on labor is
higher than that in capital, inducing the firm to choose a combination of inputs L=20 K=0 (corner) for
which the above tangency condition (first order condition) does not hold. The following figure illustrates
this case.

24
For a firm with production function Q=50(LK)
1/2
(where L and K denote the amounts of labor and capital
respectively) that wants to reach a production level of Q units, and facing input prices w and r, find the conditional
factor correspondences for labor and capital.
38


Figure #4.37
Lagrange multiplier: Finally, note that the Lagrange multiplier can be interpreted as the cost increase
that the firm experiences when it needs to produce a higher output level q.
25
therefore, the Lagrange
multiplier is the marginal cost of production: the marginal increase in the firm's costs from producing
additional units.
Comparative statics of z(w,q). Let us now continue with comparative statics analysis. We first describe
how the conditional demand correspondence is affected by changes in input prices. When w falls, two
effects occur:
1. A substitution effect. If output is held constant, there will be a tendency for the firm to substitute
labor or capital in the production process.
2. An output effect. A change in the price of labor, w, reduces firms costs, allowing it to produce
larger amounts.
Let us next provide a graphical intuition behind these two effects and later on describe them
mathematically. The following figure illustrates the substitution effect associate to a wage decrease.
Starting from an initial cost-minimizing input combination z
0
(w,q), a reduction in wage produces an
outward pivoting effect on the firms isocost associated with cost level c(w,q).
26
However, the firm is not
cost minimizing if it selects a point along the new isocost. Indeed, it can reduce its total costs
(graphically, pushing the new isocost inwards in a parallel fashion) until it reaches a tangency point with
the isoquant. At the new cost-minimizing input combination z1(w,q) the firm is indeed selecting the input
combination that minimizes total costs (at the new input prices) and reaches output level q. That is,
z1(w,q) solves the new CMP for the firm after the change in input prices. Comparing the cost-minimizing

25
Recall that, generally, the Lagrange multiplier represents the change in the value function resulting from the
optimization problem if we relax the constraint, e.g. change of wealth level in the UMP, the utility level that must be
reached in the EMP, etc.
26
Note that the isocost associated to the cost minimizing input combination must incur a cost level
c(w,q)=wl(w,q)+rk(w,q), where l(w,q) denotes the cost minimizing amount of labor and k(w,q) that of capital. A
reduction in w therefore pivots the isocost line outwards, as depicted in the figure.
39

input combinations before and after the fall in w, z0(w,q) and z1(w,q), we can observe that the firm uses
more labor (the factor of production that became relatively cheaper) and less capital (the input that
became relatively more expensive).

Figure #4.38
We can therefore conclude that the substitution effect in production is negative: a decrease in the price of
one input increases the firm's demand (use) of that input.
27
That is, however, another effect associated to a
decrease in the price of labor. In particular, the firm can now reach a higher output levels incurring the
same total costs as before the input price change. We refer to this effect is the output effect. The following
figure represents the output effect for our previous example.

Figure #4.39

27
Note that this is a consequence of the diminishing MRTS (isoquants becoming flatter as we increase labor in the
figure).
40

Starting from the cost minimizing input combination after the price change (denoted as B in the figure),
we can observe that the firm is able to reach a higher isoquant f(z)=q1 incurring the same total costs as
before the price change. In particular, note that the isocost passing through input combination A (at old
input prices) and that passing through C (at new input prices) are equally costly. We can hence
decompose the increasing labor demand associated to a reduction in labor prices into two effects: a
substitution effect (measured by the increase in labor demand from LA to LB) where the firm still
produces the same amount as before the price change, and an output effect (measured by the increase in
labor demand from LB to LC) where the firm still incurs the same total costs as before the price change,
but is capable of reaching higher output levels. The sum of these two effects reflects the total effect of a
decrease in wages on labor demand.
A couple of comments are in order. First, note that the own substitution effect (a change in the price of
one input into the demand for that same input) is negative. The output effect is, perhaps surprisingly, also
negative, even when inputs are regarded as inferior in production (i.e., when an increase in output implies
using that input in lower amounts).
28
Second, the cross-price substitution effect is not necessarily
negative, i.e., a decrease in wages can potentially increase/decrease the firm's demand for capital. We
elaborate on these two points in our following mathematical treatment of the substitution and output
effect.
Let lc(r,w,q) denote the conditional demand for labor (where conditional refers to the fact that the firm
always produces output q)
29
, and let l(p,r,w) denote the unconditional demand for labor (which depends
on the market price of the output and input prices, but doesn't depend on a particular output level q). We
know that at the profit maximizing output level, q(p,r,w), both the conditional and unconditional demand
for labor must coincide. That is,
l(w)=l
c
(w,q)=l
c
(w,q(p,r,w))
Differentiating with respect to w yields
substitution effect
output effect
( , , ) ( , , ) ( , , )
c c
l P r w l r w q l r w q q
w w q w

= +



As indicated above, a reduction in wages produces an increase in the demand for labor when the firm
maintains its production level unmodified. This increase in labor demand is reflected in the substitution
effect. Nonetheless, a reduction in the price of labor allows the firm to increase production (reach a higher
isoquant), i.e., dq/dw<0, and an increase in production is associated to an increase in the demand for
labor, dlc(r,w,q)/dq>0. As a result the output effect is also negative, reinforcing the substitution effect.
Hence, the unconditional labor demand l(p,r,w) must be negatively sloped.
30
The following figure
illustrates the conditional and unconditional labor demands. The reduction in wages produces a relatively
small increase in labor demand if output is fixed at q1, i.e., moving from A to B along the conditional

28
For a longer discussion on why the output effect is always negative, see NS pp. 378-379 (specially footnote 15,
and the accompanying explanations).
29
Recall that this conditional demand is denoted as z(w,q) in MWG using vector notation, i.e., z(w,q) includes the
firms demand for all inputs, and w is the vector of input prices. Otherwise, both expressions are equivalent.
30
That is, we can observe Giffen inputs.
41

labor demand lc(r,w,q1). This increase is reinforced by the output effect due to the fact that the firm is
now capable of reaching a higher production level q2. The total effect, moving from A to C, is reflected
by the unconditional labor demand l(p,r,w). Note that, because the total effect is larger than the
substitution effect for all types of inputs, the unconditional labor demand must be flatter than the
conditional labor demand.

Figure #4.40
Let us now turn into the cross-price effects associated to a reduction in the price of one input. Importantly,
we cannot make a precise prediction about how capital usage responds to a wage change. On one hand,
after a fall in wages the firm will substitute away from capital (since it became relatively more
expensive). As a consequence, the cross-price substitution effect is positive, i.e.,
( , , )
0
c
K r w q
w

>

. On
the other hand, the output effect we described above will cause more capital to be demanded by the firm
as it expands production. This implies that the cross price effect of output is negative, i.e.,
( , , )
c
K r w q q
q w


. Therefore, we cannot conclude whether the cross-price substitution effect dominates
the output effect (implying that the cross price total effect is positive) or that, instead, the cross price
output effect dominates the substitution effect (in which case the cross price total effect is negative).
31


Cost function. Let us next describe some properties about the cost function c(w,q) (i.e., the value
function associated to solving the CMP). If the production set Y is closed and satisfies the free disposal,
then
(i) C(.) is homogeneous of degree one in w and nondecreasing in q.
(ii) C(.) is a concave function of w

31
For an interesting example of the substitution and output effects seen page 379-380 using NS. (If you are revising
these lecture notes you should expand on this example).
42

(iii) If the sets {z>=0: f(z)>=q} are convex for every q, then Y = {(-z,q): w.z>=c(w,q) for all
w>>0}
(iv) Z(.) is homogeneous of degree 0 in w
(v) If the set {z>=0:f(z)>=q} is convex, then z(w,q) is a convex set. Moreover, if {z>=0:f(z)>=q}
is a strictly convext set, then z(w,q) is single valued
(vi) Shepards lemma

These properties are discussed in details here:

1. The cost function c(w,q) is homogeneous of degree one in the input prices w, i.e.,
c(w,q)=c(w,q). That is, increasing only input prices by a common factor induces a
proportional increase in the minimal costs of production. As the following figure illustrates, an
increase in all inputs by the same proportion produces a parallel downward shift in the firm's
isocost line. If the firm needs to reach isoquant f(z)=q again, it needs to incur larger costs
(shifting its isocost upwards) until it reaches f(z)=q.

Figure #4.41

2. The cost function c(w,q) is nondecreasing in output level q. Intuitively, producing higher output
levels implies a weakly higher minimal cost of production. The following figure illustrates this
property.
43


Figure #4.42

3. If the sets {z : f(z)q} are convex for every output level q, then the production set can be
equivalently described as
{ } ( , ) : ( , ) for every 0 Y z q w z c w q w = >>
The following figure illustrates this property. First, take an isoquant f(z)=q. Next, for input prices
w=(w1,w2), find the cost function c(w,q) by solving the CMP. (We do that at input combination
z(w,q) in the figure, with associated cost function c(w,q)). Note that only input combinations
above the cost function represent input combinations that satisfy the constraint f(z)q of the
CMP. But these combinations are more costly than the cost minimizing input vector z(w,q). We
can now repeat this process for other input prices w=(w1,w2), for which we can find the cost
minimizing input vector z(w,q) with associated cost function c(w,q). If we repeat this process
for infinitely more input vectors, the intersection of the more costly input combinations,
wzc(w,q) for every input price vector w>>0, describes the set f(z)q.

Figure #4.43
44


Conditional factor demand correspondence, z(w,q). If the production set Y is closed and satisfies the
free disposal property, then
1. The conditional factor demand correspondence z(w,q) is homogeneous of degree zero in the input
prices, w, i.e., z(w,q)=z(w,q). Intuitively, an increase in all input prices by the same amount does
not alter the firm's demand for inputs. We provide a graphical example of this property below.
The firm is initially choosing the cost minimizing input vector z(w,q). When all inputs become
more expensive, the firm's isocost line shifts downwards (in a parallel fashion, since the ratio of
input prices has not been modified). However, if the firm wants to reach output level q, it must
shift the isocost line upwards until reaching isoquant f(z)=q again. This, however, implies
incurring larger costs, as described in our discussion of the cost function. Importantly, since the
relative input prices have not changed the tangency between the isoquant and isocost occur at the
same input combination, and therefore z(w,q) is unaffected by a common change in all input
prices.

Figure #4.44

2. If the set {z : f(z)q} is strictly convex, then the firm's demand correspondence z(w,q) is single
valued. If, in contrast, the set {z : f(z)q} is weakly convex, then the demand correspondence
z(w,q) is a convex set. These two properties are illustrated in the following two figures
respectively. When set {z : f(z)q} is strictly convex, a unique combination of inputs is cost
minimizing, and therefore the demand correspondence z(w,q) is single valued. When, in contrast,
the set {z : f(z)q} is weakly convex (e.g., has a flat surface as that in the figure) the firm can
identify a set of cost minimizing input combinations where the isocost is tangent to the isoquant
curve. This set of cost minimizing input combinations is itself convex since a linear combination
between any two pairs in the set yields an input combination that also lies on the set.
45


Figure #4.45

3. Shephards lemma. If the vector of demand correspondence z(wbar,q) consists of a single point,
then the cost function c(w,q) is differentiable with respect to input prices, w, at wbar, and
( , ) ( , )
w
c w q z w q = . Note that this lemma is an updated application of the duality theorem
described in previous chapters.
32

4. If z(w,q) is differentiable at wbar, then D2wc(wbar,q)=Dwz(wbar,q) is a symmetric and negative
semidefinite (NSD) matrix, with Dwz(wbar,q)wbar=0.
a. First, note that Dwz(wbar,q) is a matrix representing how the firm's demand for every
input responds to changes in the price of such input, or in the price of other inputs.
Therefore, the fact that this matrix is negative semidefinite implies that the elements
along the main diagonal must be negative (or zero). That is, own substitution effects are
weakly negative
( , )
0 for every input
k
k
z w q
k
w



32
If you are revising these lecture notes you should expand on the connection between Shephards lemma and the
duality theorem
46

Intuitively, an increase in the price of input k implies a reduction in the demand for this
input.
b. Second, the fact that matrix Dwz(wbar,q) is symmetric implies that cross substitution
effects are symmetric. That is,
( , ) ( , )
for all inputs and
k l
l k
z w q z w q
k l
w w

=


Production function, f(z). If the production set Y is closed and satisfies the free disposal, then
1. If the production function f(z) is homogeneous of degree one (i.e., if the production function
exhibits constant returns to scale), then the cost function c(w,q) and the conditional factor demand
correspondence z(w,q) are both homogeneous of degree one in output, i.e, c(w,q)=c(w,q) and
z(w,q)=z(w,q)
Intuitively, if the production function exhibits constant returns to scale, an increase in the output
level the firm wants to reach induces an increase by the same proportion in the firm's demand for
inputs and in the cost function. The following figure illustrates this property. In particular, an
increase in the output level that the firm wants to produce (from 10 to 20 units, for instance)
induces a similar increase in the amount of inputs that the firm needs to use (because the firm's
production function exhibits constant returns to scale). This increase in input usage implies, in
turn, a similar increase in the minimum cost that the firm must incur.

Figure #4.46
2. If the production function f(z) is concave, then the cost function c(w,q) is a convex function of
output, q. In particular, marginal costs are nondecreasing in output. That is,
33

2
2
( , ) ( , )
0, i.e., weakly increases in q
c w q c w q
q q




33
If you a revising these lecture notes you should expand on this property.
47

Alternative representation of the PMP. We can alternatively represent the PMP using the cost function
(i.e., the value function of the CMP). In particular,
0
max ( , )
q
pq c w q


Note that in our previous discussion the firm chose an input combination yielding a particular output
level, i.e., the z vector was the choice variable in the version of the PMP analyzed above. In contrast, the
firm now chooses an output vector, which yields a particular cost level, reflected in the cost function,
c(w,q). (Recall that, in particular, the cost function contains information about the minimum cost that the
firm must incur in order to produce output level q at given input prices w).
The first order conditions for q* to be profit maximizing in above PMP are
* *
( , ) ( , )
0; and in interior solutions,
c w q c w q
p p
q q

=


Intuitively, at an interior optimum q*, price equals marginal cost, dc(w,q*)/dq.
34



Firms expansion path
A firms expansion path represents the locus of cost-minimizing tangencies as the firm reaches higher
production levels. We provide a graphical example of an expansion path below, in which the firm
increases its demand for both labor and capital when it raises its output from q0 to q1 and from q1 to q2.
35


34
MWG present a nice example of this problem. See example 5.C.1.
35
Note the analogy with wealth expansion paths in consumer theory: the wealth expansion path is the locus of
utility-maximizing bundles for the consumer, i.e., it shows how the consumers demand for good 1 and 2 increases
as wealth increases.
48


Figure #4.47
Intuitively, this figure shows that in order to produce more output, the firm needs more of all inputs.
Graphically, this implies that the firms expansion path is positively sloped. Hence, both inputs are
regarded as normal inputs (as opposed to inferior inputs) since
( , ) ( , )
0 and 0
c c
K w q l w q
q q




If, instead, the firm uses fewer units of one input as output increases, we denote that input as inferior. The
following figure illustrates one example in which both inputs are normal when the firm increases
production from q0 to q1, but labor becomes inferior when output is further increased from q1 to q2.

Figure #4.48
49

The intuition behind inferior inputs is noteworthy; a firm using fewer units of an input as it increases
production. Indeed, most inputs are normal and few can be regarded as inferior. Note that even in
presence of an inferior input, the isoquants may keep their usual convex shape. However, we can identify
inferior inputs when the list of inputs used by a firm is relatively disaggregated. For example, among
labor input within a company we might have CEOs, executives, managers, accountants, secretaries,
janitors, etc. First, note that these inputs do not necessarily increase in the same proportion as the firm
increases output, i.e., expansion paths do not need to be straight lines. Moreover, after reaching a certain
scale of output, the firm might buy, for instance, a powerful computer with which accounting can be done
with actually fewer accountants, making the specific input labor from accountants an inferior input for
the firm.
Remark: Note that if the firms expansion paths are straight lines, then: all inputs increase in the same
proportion as output is increased, i.e., the firms production function exhibits constant returns to scale.
(Recall the figure of constant returns to scale from previous chapters).

Cost and Supply in the single output case
In this section we analyze cost functions and its relationship with the firms production function analyzed
in previous sections of this chapter. Let us assume a given vector of input prices wbar>>0. Then the cost
function c(wbar,q) can be reduced to c(q), where we consider that the vector of input prices remains
constant. Therefore, the expression of average and marginal costs is
( ) ( )
( ) and ( )
C q C q
AC q C q
q q

= =

,
where C(q) = MC.

Recall also from our discussion of the PMP in the previous section that pc(q) (where p=c(q) at interior
solutions
36
).
Remark: In previous classes we showed that the cost function is homogeneous of degree one in input
prices. Let us now demonstrate that we can extend this property to the average and marginal cost
expressions. First, if we increase all input prices by a common factor t, average cost becomes
( , ) ( , )
( , ) ( , )
C tw q t C w q
AC tw q t AC w q
q q

= = =
And similarly for marginal costs,
( , ) ( , )
( , ) ( , )
C tw q t C w q
MC tw q t MC w q
q q

= = =


At this point it is important to clarify a common confusion. Some students consider that our last result
about the marginal cost function violates Eulers theorem, since we show that both the cost function and

36
Recall that this expression states that all output levels for which the firms marginal cost equals market price for
the output are optimal supply correspondences for the firm, y(p).
50

its first order derivative (the marginal cost function) are homogeneous of degree one in input prices.
However, Eulers theorem wouldnt predict this result, but a different one: if the cost function is
homogeneous of degree one in input prices, then the derivative of the cost function with respect to input
prices,
( , )
( , )
C w q
z w q
w

is homogeneous of degree zero in input prices. That is, the conditional factor
demand correspondence z(w,q) is homogeneous of degree zero in input prices, which holds, as shown in
previous sections of this chapter.

Graphical analysis of total costs.
Let us examine next the relationship between returns to scale and total costs for different production
functions. The following figure represents the case of a constant returns to scale technology, such as a
Cobb-Douglas production function where Q=50(LK)
1/2
. In this case, total costs maintain a constant
relation with output, i.e., TC=c*q.
37


Figure #4.49
As a consequence, we can conclude that the average cost of this firm is constant, i.e., AC(Q)=TC/Q=c,
and so is the marginal cost, i.e.,
TC
MC c
q

= =

, as the next figure depicts.



Figure #4.50

37
For a specific example, consider the case in which a firm with this production function faces input prices w=$5
and r=100. It is easy to check that in this example, TC(Q)=2Q.
51

In the case that total costs are not proportional to output (i.e., the production function does not exhibit
constant returns to scale), the analysis of average and marginal costs becomes more involved, as the
following figure illustrates.

Figure #4.51
In this figure total costs initially grow very rapidly, then become relatively flat, and for high production
levels increase rapidly again.
38
Graphically, note that average costs are represented by the slope of the ray
connecting any point along the total cost curve, such as A, with the origin.
39
Rays connecting the origin
with the total cost curve are initially very steep for low production levels (implying high average costs in
the bottom figure), become flatter as we increase production, reaching a minimum slope (where the AC
also reaches its minimum in the bottom figure), and finally when output is further increased rays from the
origin to the total cost curve become steeper again, leading to an increase in the corresponding AC curve.
Similarly, the firms marginal costs of production are represented by the slope of the total cost curve at
any given point, such as A, where the slope of tangent line BAC is 10. Initially the slope of the total cost
function is high, but decreasing in the concave portion of the total cost curve (i.e,. marginal costs are
initially decreasing in output), it becomes almost zero at the inflection point of the total cost curve (where
the corresponding marginal cost curve is close to zero), and grows again in the convex region of the total
cost curve (i.e., marginal costs increase in output).

38
This might occur, for instance, when a third factor of production is present in the production process, such as the
entrepreneurial skills of the founder of the firm: total costs grow fast initially, then they are almost unaffected by
increases in production, but when the firms scale (output) becomes sufficiently large, the entrepreneur cannot
manage the firm by himself and needs to hire additional managers who do not have the specific skills that he
possesses, inducing a significant increase in costs.
39
This can be further understood by noticing that at point A, total costs are $1,500 and output is 50 units, implying
an average cost of $1,500/50=30, which coincides with the slope of the ray connecting point A with the origin.
52

Three elements of the above figures are especially noteworthy.
1. First, both the AC and MC curve originate at the same level for output q=0, as the following
figure illustrates.

Figure #4.52
In order to show this property, note that we cannot compute the average cost at q=0, given that
AC(0)=TC(0)/0=0/0. We can nonetheless apply lHopitats rule, as follows
0
( )
0 0 0
lim ( )
( )
lim lim lim ( )
q
C q
q
q
q q q
q
AC q
C q
MC q
q

= =


We can therefore conclude that AC=MC at q=0.
2. When MC>AC, the AC declines, and when MC<AC, the AC increases. The intuition behind the
average of a variable (in this case costs) and the marginal of that same variable can be understood
using the example of grades. If, before revealing your result in a new exam, your instructor tells
you that your score in the new exam is helping you raise your average grade in the class, it must
be that such new grade is better than your average so far. In this case, the marginal effect is
higher than your previous average, inducing an increase in your average grade in the class. (For
the case of MC>AC, this implies that producing an additional unit increases total costs so much
that the firms average costs per unit experience an increase). In contrast, if your instructor
informs you that your score in the new exam lowers your current average, it means that your
score in the exam was below your average in the class. (For the case of total costs, MC<AC
implies that additional production induces a slight increase in total costs, producing a decline in
your average costs per unit).
3. Finally, note that the AC and MC curves cross at exactly the minimum of the AC curve. In order
to show that, let us find the minimum of the AC curve.
53

( ) ( )
2 2
. ( ) 1
( ) ( )
0
Hence,
( ) ( ) 0 ( ) ( )
( )
( ) ( )
At the value of for which ( ) is minimized
c q c q
q q
q c q
AC q MC q c q
q q q q
q MC q c q q MC q c q
c q
MC q AC q
q
q AC q



= = = =

= = =
= =

Hence, MC(q)=AC(q) at the value of q for which the AC(q) curve is minimized.

Let us now continue with our analysis of cost and supply curves in the single output case. The following
figures depict a firm with a technology that exhibits strictly decreasing returns to scale. Indeed, an
increase in the use of inputs produces a less than proportional increase in output, i.e., the firms
production set is strictly convex as figure (a) illustrates. For simplicity, we normalize the price of input z
to $1. Thus, the use of zbar units of input implies a cost of zbar dollars. This normalization helps us
represent the firms cost function as a 90
0
-rotation of the production set, where the vertical axis in figure
(a) (representing output) becomes the horizontal axis in figure (b), and the horizontal axis in figure (a)
becomes the firms total cost in figure (b) in the vertical axis. As a consequence, the firms total cost
function (figure b) is convex. This, in turn, implies that marginal costs are increasing (as depicted in
figure c) since the slope of the firms total cost function increases in q. Similarly, average costs are also
increasing in output (as represented in figure c) given that slope of the ray connecting the origin with any
point along the total cost function increases as we raise output. Finally, note that the firms supply
correspondence is identified by the locus of points for which the firm produces an output level q such that
market price equals marginal cost. We showed in previous sections of this chapter that, under this
condition, firms solve the PMP and that this is not only a necessary but also sufficient condition of an
optimal production plan for the firm when production sets are convex, just as that we are analyzing in this
case.
40


Figure #4.53

40
When production plans are non-convex, we might encounter cases in which this first-order necessary condition is
not sufficient for a profit-maximizing production plan. We expand on this result below.
54

The following figures provide a similar analysis for technologies exhibiting constant returns to scale.
First, note that an increase in input usage produces a proportional increase in output. Rotating the firms
production set 90-degrees we obtain a strictly linear total cost function, with constant average and
marginal cost curves, as described in previous sections. In this case, the firms supply correspondence
becomes the output levels q that satisfy p=MC(q). Intuitively, for p<MC(q) for any q>0 the firm does not
supply positive amounts, while for pMC(q) the firm supplies infinitely large amounts of output.

Figure #4.54
Finally, when the firms production set is non-convex, as that depicted in figure (a) below, we obtain a
total cost curve that first increases very rapidly, becomes almost flat for intermediate levels of output, and
increases rapidly again for large scales of output. In this case, as described above, AC and MC start from
the same origin, MC lies below AC for low levels of output but MC is above AC for output levels above
the minimum of the AC curve. Importantly, in this case the firms supply curve does not exactly coincide
with the marginal cost curve, but rather, with the portion of the MC curve above AC. Indeed, note that for
market prices below AC(qbar), the firm would sell units at a price below AC (incurring a loss of AC-MC
per unit). As a consequence, the firm sells no units for p<AC(qbar) (and we represent that by the vertical
spike at the vertical axis) but sells output levels at the locus of the MC curve for p>AC(qbar).

Figure #4.55
Although, it is logical to take the assumption of preference maximization as a primitive concept for the
theory of the consumer, the same cannot be said for the assumption of profit maximization by the firm.
The objectives of the firm should emerge from the objectives of those individuals who control it. A firm
owned by a single individual has well-defined objectives: those of the owner. In this case, the only issue
is whether this objective coincides with profit maximization. Whenever there is more than one owner,
55

however, we have an added level of complexity (for full analysis of the objectives of the firm read
MWG pp 152-154)





Cost and supply in the single output case
The following figures examine the presence of nonconvexities in the production set Y arising from the
existence of fixed setup costs, K, which are nonsunk. In particular, the firm's cost function can be
represented in this case as
C(q)=K+C
v
(q)
Where C
v
(q) denote variable costs. The figure bellow illustrates the case in which variable costs are linear
in output. Note that the firm's costs in figure (b) are zero for q=0, but K (or greater) for any strictly
positive amount of output, because the firm will choose not to spend the fixed costs if they don't want to
produce q>0. In addition, note that average costs are K/q+C
v
(q)/q, where C
v
(q)/q is a constant due to the
linearity of the production function, i.e., C
v
(q)=c*q implying that C
v
(q)/q=c. Thus, the marginal cost is
constant in output at c=C
v
(q). Average costs decline in q since K/q declines in q and C
v
(q)/q is constant
in q. Note that since average costs are K/q+C
v
(q)/q (or K/q+c), average costs, despite declining, lie above
the firms marginal cost for all production levels, approaching the a horizontal asymptote at the marginal
cost as q goes to infinity. Finally, regarding the firms supply curve, recall that the firm supplies positive
amounts only if market prices are high enough to recover both variable and fixed costs, i.e., if prices are
above average total costs. Since in this case, average costs lie above marginal costs for all output levels,
the firms supply curve is a vertical spike for prices below pbar.

(a): Production Function (b): Cost curve (c) : Supply Curve,
Figure #4.56

56


In the case that variable costs are nonlinear in output, the firm's total cost function also starts at K, but
increasing variable costs imply that every additional unit is more costly for the firm, i.e., total costs are
convex in output; as depicted in figure (e). This is confirmed in figure (f) where marginal costs are
positive and increasing in output (indicating that the slope of the total cost curve increases in output).
41

Regarding the average cost curve, note that it initially decreases and then increases in q. Intuitively, in the
decreasing portion of the AC curve, the firm benefits from spreading its fixed costs over larger output
levels (while variable costs are still relatively low). In the increasing portion of the AC curve, in contrast,
the firms larger average variable costs offset the firms lower average fixed costs from spreading its fixed
costs over larger output levels and, as a consequence, total average costs grow. Note that AC crosses the
MC curve at exactly the output level qbar for which the slope of the total cost curve in figure (b) i.e., the
MCcoincides with the slope of the ray connecting the total cost curve at that point with the origin, i.e.,
the AC. Finally, the firm produces positive amounts of output when market prices are above total average
costs of production. Otherwise the firm produces zero output, as depicted in the supply curve of figure (c).

Figure #4.57
At the following figure, we slightly modify our above description by assuming that the firm's fixed costs
are now sunk. First, note that a portion of the firm's production set is now not included, since the firm
cannot modify input levels within that interval (figure a), i.e., inaction is not possible in this case.
Similarly, the total cost curve now originates at K, given that the firm must incur fixed sunk costs K.
finally, note that the supply locus in this case considers the entire marginal cost curve and not only output
level for which MC>AC, as in the case that the firm experiences convex costs under the presence of fixed
(nonsunk) costs. Intuitively, note that now the firm faces sunk costs, so it will not shut down even if it is
obtaining negative profits in the short run.
42


41
Examples of total cost curves: (1) TC(q)=a+bq where a,b>0 is a linear cost function incurring fixed costs a>0; (2) TC(q)=bq
2

represents the presence of convex production costs, but without fixed costs (note that in this case, marginal costs lie above
average costs); and (3) TC(q)=a+bq
2
illustrates the presence of convex variable costs and fixed costs.
42
Note that this supply locus resembles that of a firm with a convex cost function but facing no fixed costs at all.
57


Figure #4.58

Short-run total costs
In this section we examine the firm's minimal cost of production when one of the inputs is fixed at a
certain level. Since the firm doesn't have the flexibility of input choice in the short run, the firm will
generally incur higher costs than in the long run. In other words, the firm will not be able to choose an
input combination in which the isoquant and isocost are tangent to each other and, as a consequence, the
MRTS will not be equal to the ratio of input prices.
Let us first analyze an example, depicted in the following figure, where capital is fixed in the short run at
kbar.
43
In the long run, if the firm was capable of choosing any cost minimizing input combination, it will
select the input vector denoted by A in the figure, where isoquant Q0 and isocost are tangent. In the short
run, however, the firm cannot alter the amount of capital from Kbar and hence, if the firm must still reach
a production level of Q0, the firm manager will need to choose input combination F, associated to a
higher isocost line. Therefore, the firm's inability to modify the amount of capital being used induces the
firm to incur higher costs.

43
Capital can be fixed in the short run if the process of financing the acquisition of new equipment is relatively slow, or for other
technological reasons, making labor more flexible in the short run, i.e., having to build a new production plant vs. hiring more
workers to keep the factory open longer in the short run. Nonetheless, a similar analysis can be extended to production processes
in which labor is the fixed input in the short run while capital is variable. This might be the case in certain highly-qualified
occupations where the scarce resource is the precise human capital of the job candidate, whereas the capital equipment that the
firm uses is so standardized that the firm can easily acquire it in 1-2 business days, e.g., computers, software packages, etc.
58


Figure #4.59
The following figure illustrates a similar situation where, for an output level of 1 million TVs per year,
the firm chooses input combination (k1,l1) both in the long run and in the short run when its capital
structure is fixed at exactly k=k1. In this case, we can conclude that the firm's minimal cost of producing
1 million TVs per year is the same in the long run and in the short run when its capital structure is fixed at
k=k1. This point is graphically illustrated in the figure below where we represent the firm's long run cost
function TC(q), and short run cost function when k=k1. When production requirements are increased to 2
million TVs per year, however, a capital level of k=k1 does not allow the firm to minimize costs. Indeed,
in the short run the firm selects input combination B, associated to a higher isocost line, while in the long
run the firm selects input combination C, associated to a lower isocost line. This difference in the short
and long run costs for a capital of k=k1 is also illustrated in the bottom figure where short run total costs
when k=k1 are higher than long run costs for a production level of q=2 million.

Figure #4.60
59


Figure #4.61
We can repeat this analysis for different capital structures, reaching similar conclusions, as the following
figure illustrates. Indeed, short run total costs lie above long-run total costs, except for the case in which,
for a given output level, in the long run the firm chooses to use exactly the amount of capital that the firm
is obliged to use (fixed input) in the short run, i.e. the input is fixed at the long run optimal level.

Figure #4.62
Let us next provide an example of our previous discussion. Considering a firm using two inputs in order
to produce one output, the firms cost function in the long run is given by
1 1 2 2
( ) C q w z w z = +

where both
60

inputs 1 and 2 are variable. In the short run, however, input 2 is fixed at a level z2bar, while input 1 is
variable. The firms short-run cost function when input 2 is fixed at a level z2bar is therefore
44

2 1 1 2 2 2
( | ) where is fixed C q z w z w z z = +
The following figure compares the firms long-run, C(q), and short-run cost curves, C(q|z2),for different
levels of the fixed input (input 2). As described above, C(q)C(q|z2) for any given level of z2, since in the
long run the firm is capable of selecting the exact value of input 2, z2, that minimizes the firms cost of
producing q units of output. In contrast, in the short run the firm must take the value of z2 as given.

Figure #4.63
Note that at the point where the long-run and short-run cost functions coincide (the firm incurs the same
costs) representing output levels, q, for which the firms factor demand correspondence of input 2,
z2(w,q), exactly coincides with the level at which input 2 is being fixed in the short run, z2bar. A similar
argument extends to the short-run cost function when input 2 is fixed at z21, which coincides with the
long-run cost function when the firms (long-run) demand for input 2, z2(w,q) is exactly z21.
45


44
Note that this implies that the firm uses only input 1 in order to reach output level q, i.e., chooses z1 such that f(z1,z2bar)=q.
(This explanation parallels our previous discussion about a firm increasing labor amounts, for a fixed capital level Kbar, in order
to reach a particular output level Q0). Therefore, the only choice variable for the firm in the short run is the amount of input 1, z1.

45
This discussion parallels our above explanation about the case in which input combination A is cost-minimizing both in the
long run and in the short run (when capital level is fixed at K1) since, in the long run, the firms demand for capital when
61

We can therefore conclude that when the demand for input 2 is at its long-run value, z2(w,q), the short-
run and long-run costs coincide,
C(q)=C(q|z2(w,q)) for all output levels q
From the above figure we can obtain an additional conclusion: when the short-run cost function is
evaluated at the long-run demand for input 2, not only do the level of the long-run and short-run cost
functions coincide (i.e., their heights coincide in the figure), but their slopes coincide as well. That is,
C(q)=C(q|z2(w,q)) for all output levels q
Geometrically, this means that the slope of the long-run marginal cost curve coincides with that of the
short-run marginal cost curve for every output level q, in other words the long and short run curves are
tangent at that point. This result, together with our above result of C(q)<C(q|z2), implies that the long-run
cost curve C(q) is the lower envelope of the short-run cost curves, C(q|z2).



Aggregation in production
In this section of our discussion on production theory we investigate under which conditions the law of
supply holds at the aggregate level, and under which conditions we can define a representative
producer that parallels the representative consumer in consumer theory.
As a side note for aggregate production, it can be stated that function of aggregate production is function
that maps aggregate inputs into aggregate outputs. In other words, it describes the maximum level of
output that can be obtained if the inputs are used efficiently in the production process.
Consider J firms with production set Y1, Y2, , YJ, where each production set Yj is nonempty, closed,
and satisfies the free-disposal property. In addition, assume that every supply correspondence yj(p) for
firm j is single valued
46
and differentiable in prices (where p>>0). Let us define the aggregate supply
correspondence for this economy as the sum of the individual supply correspondences
1
1
( ) ( )
:
J
j
j
J
L
j
j
y p y p
y y y
=
=
=

= =



producing Q=1 million TVs is exactly K=K1. For different capital levels, however, the short-run cost-minimizing input
combination does not coincide with that of the long-run, leading to higher costs in the short-run.

46
Note that this implies that production sets are strictly convex and hence the tangency condition between the firms
isoprofit line and the production set holds at a single input-output point.
62

For firm js profit-maximizing production plan yj(p), for all firms j=1,2,,J.

Law of supply
The law of supply is satisfied in the aggregate. We can easily show either:
1. Using the derivative of every firms supply correspondence with respect to prices, D
p
y
j
(p). This
derivative defines a symmetric positive semidefinite matrix, for every firm j. Since this property
is preserved under addition (when we aggregate across all firms in the economy), we can
conclude that the derivative of the aggregate supply correspondence with respect to prices,
D
p
y(p), must also define a symmetric positive semidefinite matrix. Intuitively, an increase in
market prices increases the aggregate output supplied by all firms.
2. Using a revealed preference argument. In particular, recall that for every firm j we have that
[ ] [ ( ) ( )] 0 for every , adding over .
j j
p p y p y p j j
We can hence add over all J firms, obtaining
[ ] [ ( ) ( )] 0 p p y p y p
Which implies that market prices and aggregate supply move in the same direction, i.e., the law
of supply holds in the aggregate.
Representative producer
Let us first define the aggregate production set as
1 2
1
... :
J
L
J j
j
Y Y Y Y y y y
=

= + + + = =


For every firm js production plan yj. Note that
1
J
j
j
y y
=
=

, where every production plan for firm j, yj,


is just a feasible production plan for firm j, but not necessarily firm js profit-maximizing production plan
(i.e., its supply correspondence, yj(p)). Let y*(p) be the supply correspondence for the aggregate
production set Y (i.e., the supply correspondence that maximizes aggregate profits), and let *(p) denote
the associated profits from this supply correspondence y*(p).
We can now claim that there exists a representative producer producing an aggregate supply y*(p) that
exactly coincides with the sum of the individual firms supply correspondences, i.e., y*(p)= y
]
(p)
]
]=1
,
and obtains an aggregate profits *(p) that exactly coincides with the sum of the individual firms profit
functions, i.e., *(p)= n
]
(p)
]
]=1
. Intuitively, the aggregate profit obtained by each firm maximizing
profits separately (taking prices as given) is the same as that which would be obtained if all firms were to
coordinate their actions (their production plans yjs) in a joint profit maximizing decision. Importantly,
this is a decentralization result. Indeed, it suggests that in order to find the solution of the joint profit
maximization problem for given prices p, it is enough to say let each individual firm do whats best for
it and add the solutions of their individual PMPs. This result is sometimes referred as supporting laissez
faire arguments since it suggests that the social planner should let every firm j choose its own production
plan yj that maximizes its own profits (i.e., every firm independently selecting its own yj(p)), since this
production plan will maximize aggregate profits.
63

This intuition is illustrated in the following figure, representing firm 1s and firm 2s production set, Y1
and Y2. Firm 1 maximizes profits choosing a supply correspondence y1, and firm 2 does so selecting
y2.
47
If we add vectors y1 and y2, we obtain y1+y2 in the figure. Importantly, the aggregate supply
correspondence y1+y2 coincides with the supply correspondence that a single firm manager would select
if the firms production set was described by the aggregate production set Y=Y1+Y2 when facing the
same price vector as firms 1 and 2. Hence, jointly both firms would be selecting (Y1+Y2) given p, and
given aggregate production set Y. Besides, we need to note that all iso-profit lines should be parallel.

Figure #4.64
Finally, note that one of the key assumptions in order to obtain the above decentralization result is that
firms take prices as given. If, in contrast, firms decision about how much to produce has an effect on
market prices, the above decentralization result is not necessarily satisfied.
48


Efficient production
Let us continue with our discussion of when individual firms choose profit-maximizing production plans
that maximize aggregate profits. In this regard, let us define efficient production vectors. We say that a
production vector yY is efficient if there is no other production vector yY such that yy and y=y.
That is, y is efficient if there is no other feasible production vector y producing more output with the
same amount of inputs (or alternatively, producing the same amount of output with fewer inputs).

47
Note that the isoprofit line (that firms use to choose the tangency point where the isoprofit line is tangent to the
production set) has the same slope for firm 1 and 2 since both firms face the same market prices. Nonetheless, firm
2s profits are higher than firm 1s, since firm 2s isoprofit line at y2 is further from the origin than firm 1s isoprofit
line when evaluated at y1.
48
A simple example is that of oligopoly markets where firms compete in quantities (a la Cournot). In particular,
when every firm independently selects a profit-maximizing output level it does not take into account the effect that
its additional production has on the units sold by its competitors. This leads every firm to overproduce, relative to
the output level that maximizes joint profits (i.e., the output level that every firm would produce if they coordinated
by forming a cartel).
64

Graphically, note that this definition of efficiency implies that if a production plan is efficient then it lies
on the boundary of the production set Y, as the following figure illustrates. In particular, y is efficient,
whereas y and y are inefficient (y is inefficient because it uses the same amount of inputs as y, but
produces less output. y is inefficient because it produces the same output as y, but uses more inputs).

Figure #4.65
The converse argument (that every production plan lying on the boundary of the production set must be
efficient) is not necessarily true, as the next figure shows. Specifically, production plan y despite lying
on the boundary of production set Y is inefficient since it produces the same amount of output as y, but
uses more inputs.

Figure #4.66
After defining efficient production plans, we can now present the first and second fundamental theorem of
welfare economics (FTWE).
First FTWE: if a production plan yY is profit maximizing for some price vector p>>0, then y must be
efficient.
65

Proof. Let us proof the first FTWE by contradiction. Hence, suppose that production plan yY is profit
maximizing, i.e., pypy, but y is not efficient. Then, there is another production plan yY such that
yy. Multiplying both sides by price vector p, we obtain pypy, since p>>0. But then y cannot be profit
maximizing (as the premise of this proof established). We have then reached a contradiction, proving the
1
st
FTWE.
Importantly, note that for this result we do not need the production set Y to be convex. The following two
figures illustrate convex and non-convex production sets. In both cases production plan y is profit
maximizing, which implies that it must lie on the boundary of the production set, for both convex and
non-convex production sets.

Figure #4.67
Furthermore, note that when applied to the aggregate, the 1
st
FTWE says that if a collection of firms each
independently maximizes profits with respect to the same price vector p>>0, then the aggregate
production plan is socially efficient.
In addition, note that the assumption p>>0 on the price vector cannot be relaxed to p0. In order to see
why, take a production set Y with an upper flat surface, as that in the following figure. Hence, any
production plan y in the flat segment of the production set can be profit maximizing if prices are p=(0,1).
Indeed, this price vector implies that the slope of the isoprofit line is zero. The firm hence can choose a
region of profit-maximizing production plans (where the isoprofit line and the production set are tangent
to each other, as depicted in the figure). However, not all of these profit-maximizing production plans are
efficient. Indeed, only the production plan y, lying exactly on the kink of the production set is efficient,
66

i.e., all other profit-maximizing production plans to the left of y are inefficient since they use more inputs
than y in order to produce the same amount of output. Hence, in order to apply the 1
st
FTWE we need
p>>0, i.e,. price vector is positive in all components.

Figure #4.68
The 2
nd
FTWE states the converse of the 1
st
FTWE (i.e., if a production plan y is efficient, then it must be
profit-maximizing). Note that the converse of the 1
st
FTWE is not necessarily true. The following figures
illustrate that when the production set is convex, then every efficient production plan (lying on the
boundary of the production set) must also be profit maximizing. When the production set is non-convex,
however, the fact that a production plan is efficient does not imply that such plan maximizes the firms
profits. This is evident in production plan y which lies on the boundary of the production set but is not
profit-maximizing. Indeed, production plan y is the profit-maximizing vector.
67


Figure #4.69
The 2
nd
FTWE is therefore restricted to convex production sets. Specifically, the 2
nd
FTWE states that, if
the production set Y is convex, then every efficient production plan y in Y is a profit-maximizing
production plan, for some non-zero price vector p0.
In order to easily prove the 2
nd
FTWE, let us use the following steps. First, take an efficient production
plan, such as y in the next figure. Let us now define the set of production plans that are strictly more
efficient than y, that is
{ }
:
L
y
P y y y = >> . As the figure depicts, this set contains all production
plans producing more than production plan y using the same inputs, and those producing the same output
amount using fewer inputs. Furthermore, note that the boundaries of the set are not included since we
only consider production plans that are strictly more efficient than y. This implies that there exists no
intersection point between set Py and the production set Y, i.e. set Py is an open set. In addition, note that
set Py is a convex set, since any the linear combination of any two production plans in Py lies within the
set.
68


Figure #4.70
We can now apply the Separating Hyperplane Theorem. In particular, we can claim that there exists some
price vector p=0 such that pypy for all production plan y in Py and y in Y.
49
Since this is true for
all y in Y, it must also be true for any other production plan on the boundary, such as y. Therefore,
pypy for all production plan y that is more efficient than y, i.e., y>>y. We can now take any
production plan y in Y, to obtain pypy for all y in the set of more efficient production plans Py.
Finally, since we can choose y to be arbitrarily close to the efficient production plan y, we can have
pypy for every production plan y in Y. Therefore, production plan y must be profit-maximizing.
One interesting property of the 2
nd
FTWE is that we are not imposing that all prices must be positive, i.e.,
p>>0, but only that all must be weakly positive, i.e., p0. Hence, we just assume that the price vector is
not zero at every single component, i.e., p= (0,0,,0). Note that this implies that the slope of the isoprofit
line can be zero (which occurs when the price of the input y1 is zero). The following figure illustrates this
case. In particular, note that there is a set of profit-maximizing production plans (where the isocost line is
tangent to the production set). However, there is only one efficient production plan, y, situated at the kink
of the production set Y. According to the 2
nd
FTWE, such efficient production plan y must also be part of
the set of profit-maximizing production plans, which holds in this case. Hence, the 2
nd
FTWE can be
satisfied even if some input prices are zero.
50


49
Note that production plan y is not technologically feasible since it lies outside production set Y.
50
Recall that, in contrast, the 1
st
FTWE does not necessarily hold if some input prices are zero, as described above.
69


Figure #4.71
Despite allowing for some input prices to be zero, the 2
nd
FTWE does not allow for input prices to be
negative. Let us examine if the 2
nd
FTWE could still hold if the price of one input was negative. Let us
hence consider the case in which the price of input l was negative, pl<0. We would then have that py<py
for some production plan y that is more efficient than y, i.e., y>>y, where yl-yl being sufficiently large.
Let us show why we can have py<py with the following example.
Example. Consider a price vector p=(p1,p2)=(3,-5), and assume that the efficient production plan is
y=(1,4) while a more efficient production plan (a production plan that is technologically unfeasible) is
y=(6,25), then
py=3*1+(-5)*4=3-20=-17, and
py=3*6+(-5)*25=18-125=-107.
Hence, py<py, i.e., the firm obtains a larger profit from production plan y than from a technologically
unfeasible production plan y. This contradicts the 2
nd
FTWE. Therefore, we need that the price vector
satisfies p0.





1

Chapter 5 Competitive Markets
Competitive markets
In this chapter we bring consumer and producer theory together. In particular, we analyze competitive
equilibrium and compare it with Pareto optimal allocation. We then discuss the two fundamental
theorems of welfare economics, measurements of welfare changes in the partial equilibrium analysis, and
long-run considerations were firms are allowed to enter or exit the industry.
1


Pareto optimality and competitive equilibrium
In this section we analyze the allocation of goods and inputs across the economy when consumers and
firms interact in perfectly competitive markets, and compare such a location with that selected by a
benevolent planner maximizing social welfare (also referred as a Pareto optimal allocation). Let's start by
describing Pareto optimal allocations. In this regard, we first need to define what we mean by an
economic allocation in a society involving L consumers and J firms. Specifically, an economic allocation
(x1, x2,,xL,y1,y2,,yJ) is a specification including:
1. A consumption vector xiXi for every consumer iI, where xi=(x1i,x2i,,xLi) describing the
amount that individual i consumes of every good L, and
2. A production vector yjYj for every firm jJ, yj=(y1j,y2j,,yLj), describing the amount that the
firm j produces a very good L.
Given this definition of economic location, we say that allocation is feasible if, for every good l, we have
1 1
I J
li l lj
i j
x w y
= =
+


Intuitively, the notation is feasible for good l if the total consumption of this good by all I consumers in
the economy is lower (or equal) than the initial endowment of this good
2
and the production of these
goods by all J firms. We are now ready to define Pareto optimal allocations. Specifically, a feasible
allocation (x1, x2,,xI,y1,y2,,yJ) is Pareto optimal (or Pareto efficient) if there is no other feasible
allocation (x1, x2,,xI,y1,y2,,yJ) such that
ui(xi)ui(xi) for all subjects i=1,2,,I; and ui(xi)>ui(xi) for some subject.
That is, there is no alternative way to arrange the distribution of goods among consumers and/or the
production of goods such that some individual is made strictly better off with the alternative allocation,
i.e., ui(xi)>ui(xi), and no consumer is made worse off, i.e., ui(xi) ui(xi). Importantly, the definition of
Pareto optimality implies a notion of efficiency in production (since there is no way to rearrange inputs in
order to produce more output) and in consumption (since there is no alternative way to achieving a Pareto

1
In this chapter we follow MWG Chapter 10. For a good discussion of these topics, see Varian Ch. 13 and NS Ch.
12 (although none of them is as complete as MWG).
2
Note that the endowment of good l, wl, is in fact a vector describing the amount of good l that every consumer
initially owns, i.e., wl=(wl1,wl2,,wlI).
2

improvement). The above definition can be graphically represented using the utility possibility set. In
particular, these set can be defined as
1 2 1 2 1 2
( , ) : there is a feasible allocation ( , , , ,..., )
such that ( ) for all subjects 1, 2
L
i i i
u u x x y y y
U
u u x i

=

=



Figure #5.1
Intuitively, note that utility pairs in the frontier of the UPS are Pareto optimal. Indeed, for a utility pair
such as (u1hat, u2hat) we cannot improve individual 1s utility level without reducing that of individual 2.
Two remarks are noteworthy. First we don't need convexity in the UPS in order to have that points on the
frontier are Pareto optimal. Indeed, the above figure illustrates a nonconvex UPS, and yet utility pairs of
the frontier are Pareto optimal.
3
Second, it is important to distinguish efficiency (measured in the Pareto
optimal sense) from equity. Indeed, an allocation were all resources in the economy are assigned to
individual 2 (and none to individual 1), such as the one depicted on the vertical axis in the above figure, is
extremely unequal and yet is Pareto optimal, since we cannot increase the utility level of one individual
without decreasing that of other individuals.
Let us next describe a competitive equilibrium, CE (or Walrasian equilibrium) allocation. In particular, in
this context we consider that consumers and firms interact in markets where their relative size is

3
Importantly, note that we only need that the UPS doesn't have increasing segments. If it did, we could be able to
increase both consumers utility levels by choosing utility pairs away from the frontier.
3

negligible. As a consequence, their individual purchasing or selling decisions do not affect market prices
for output nor inputs. Specifically, we say that an allocation (x*1, x*2,,x*I,y*1,y*2,,y*J) and a price
vector p*RL constitute a CE if:
1. PMP: for each firm j, yj* solves
*
max
j j
j
y Y
p y


2. UMP: for each consumer i, xi* solves
* * *
1
max ( )
. . ( )
i i
i i
x X
J
i i ij j
j
u x
s t p x p w p y

=
+


3. Market clearing condition: for each good l,
* *
1 1
I J
li l lj
i j
x w y
= =
= +



Intuitively, the first condition states that, when consumers and firms interact in a market, every firm
individual solves its own PMP (as described in the chapter on production theory). Similarly, the second
condition says that every consumer individually solves its UMP, given a budget constraint which is
slightly differs from that considered in previous chapters. In particular, consumer i must select a bundle xi
for each its cost, p*xi, is lower than the value of this individuals initial endowment, p*wi, and the
participation of these individual in the profits of all J firms,
*
1
( )
J
ij j
j
p y
=

. Finally, the market clearing


condition states that, for every good l, the total consumption of this good by all consumers must be equal
to the initial endowment of this good in the economy plus the total production of this good by the J firms.
Hence, this condition implies that in equilibrium there can be no excess demand for a good (since
otherwise some consumers could have incentives to offer higher prices for the good in order to obtain
more units of it) nor excess supply for the good (since otherwise some firms could have incentives to
offer the good at lower prices in order to sell more units of it). An interesting consequence of the market
clearing condition is that if the market clearing condition is satisfied for all but one good, then it must be
satisfied for that good as well. Furthermore, note that an allocation (x*1, x*2,,x*L,y*1,y*2,,y*J) and
a price vector p*RL constitute a CE, then this allocation and price vector p* (for any >0) must also
be a CE. Hence we can normalize prices, keeping the same equilibrium allocation. Finally, we will
assume that market prices are all positive, since otherwise the consumer would demand infinite amounts.

Partial equilibrium competitive analysis
In this section we analyze competitive allocations. Let us start by analyzing the behavior of firms. For a
given price vector p*, every firm js equilibrium output level qj* must solve the PMP
*
0
max ( )
j
j j j
q
p q c q


Which has the necessary and sufficient condition
4

* ' * *
( ), with equality if 0
j j j
p c q q >
which, in the case of interior solutions the states that every firm j operating in a perfectly competitive
market increases output until the point in which the marginal cost of producing such output equals market
prices, as described in the previous chapter.
Lets now turn to the consumer. For simplicity, we consider that every consumer in the economy has a
quasilinear utility function ( , ) ( )
i i i i i i
u m x m x = + , where m
i
denotes the numeraire and ' ( )
i i
x >0 but
'' ( )
i i
x <0 for all x
i
>0, i.e., the consumer obtains a positive but diminishing marginal utility from
additional units of good x
i
. In addition, we consider that every individual obtains zero utility from good x
i

when consuming zero units of it, i.e, (0) 0
i
= .
4
Therefore, consumer is UMP is
,
* * * *
1
max ( )
. . ( ( )
i i
i
i i i
m x R
J
i i m ij j j j
j
m x
s t m p x w p q c q

+

=
+
+ +


Since the budget constraint must hold with equality (i.e., Walras law holds), we have
* * * *
1
( ( ))
i
J
i i m ij j j j
j
m p x w p q c q
=

= + +


and plugging the budget constraint into the objective function we can rewrite the UMP as
* * * *
1
max ( ) ( ( ))
i
i
J
i i i m ij j j j
x
j
x p x w p q c q
+

=

+ +


where now the only choice variable for consumer i is good xi. Taking first order conditions with respect
to xi we obtain
' * *
( ) with equality if 0
i i i
x p x >
Which intuitively states that a consumer increases the amount bought of good xi until the point in which
the marginal utility he obtains from consuming further units of the goods exactly coincides with the
market price he has to pay for them.
Summarizing, an allocation x*1, x*2,,x*I,y*1,y*2,,y*J) and a price vector p*RL constitute a CE if:



4
Recall that with quasilinear utility functions, wealth effects for all non-numeraire commodities (such as x
i
) are
zero. Our model examines, for instance, the consumption of a good x
i
that represents a small share of all monthly
expenses for consumers, since in that case wealth effects are negligible.
5

* ' * *
' * *
* *
1 1
( ), with equlity if 0
( ) with equality if 0
j j j
i i i
I J
i j
i j
p c q q
x p x
x q

= =
>
>
=


Note that the previous conditions do not depend upon the consumers initial endowment.
5
We next
provide a graphical illustration of the above conditions. The following figure represents consumer is
demand for good xi. in particular note that for prices above '(0)
i
, the consumers marginal utility from
purchasing the first unit of the good is lower than the market price p, leading him to buy zero units of
good xi. For prices below this cutoff, the consumer purchases a positive amount of good, increasing xi
until the point in which the market utility from by the last unit coincides with the going market price.
6


Figure #5.2
We can now horizontally sum individual demands in order to obtain the aggregate demand for good x, as
the following figure illustrates. Interestingly, we can identify the segments of aggregate demand x(p).
First, when the market prices are above max ' (0)
i i
, no individual demands a positive amount of good x,
implying that a demand is also zero. Intuitively, in this range of (high) market prices the marginal utility
that all consumers obtain from buying the first unit of good is the still lower than the current market price,
and hence no positive units are demanded. For intermediate prices, however, individual 2 in the figure
obtains a positive marginal utility from buying positive amounts of good x, but individual 1 does not. As
a result, aggregate demand coincides with individual 2s demand for this range of prices. Finally, when
market prices are sufficiently low, aggregate demand reflects the horizontal sum of all individuals
demand curves.

5
Note that this result arises from quasilinearity. Indeed, an increase in the initial endowment raises consumer is
initial wealth. This helps him increase the amount consumed of all other goods, but leaves his demand of good xi
unaffected, i.e., no wealth effects.
6
Importantly, note that inverting '( )
i i
x we can obtain this consumers Walrasian demand xi(p).
6


Figure #5.3
Let us now examine the firms supply curve. The following figure represents the supply curve for an
individual firm j. Note that when market prices are sufficiently low, i.e., p<cj(0), firm js marginal cost
of producing the first unit is higher than current market prices, leading the firm to supply zero units of the
good. When market prices, however, are above that cutoff, the firm increases production until the point in
which the marginal cost of such level of output exactly coincides with the market price the firm obtains
from setting those units to the market, i.e., p=c(qj), as described in previous chapters.
7


Figure #5.4

7
An interesting question at this point is if the firm supply curve could look like the one that we examined in the
chapter on production theory, where the firm produces positive amounts for prices above the minimum of the
average costs curve. Note that in this chapter we assume convex total costs. When the firm incurs no fixed costs, the
corresponding marginal cost curve starts at the origin (and coincides with the firms supply curve). In the case of the
firm incurs fixed (nonsunk) setup costs, its marginal cost curve also starts at the origin, but the firms supply curve
has a vertical spike at the vertical axis for prices below the minimum of the average cost curve, coinciding with the
firms marginal cost curve otherwise.
7

Aggregate supply can be obtained by horizontally summing individual supply curves. Similarly as in the
case of aggregate demand, we can identify three regions in the aggregate demand curve q(p), as the
following figure reflects. First, when market prices are below marginal cost of producing the first unit for
the most efficient firm (the firm with the lowest marginal cost of production, firm 2 in the figure), then no
firm chooses to supply positive units to the market, and aggregate supply is zero. When market prices are
intermediate, only the most efficient firm find profitable to supply positive units of good x, and aggregate
supply coincides with individual supply for the most efficient firm (firm 2 in the figure). Finally, when
market prices are sufficiently high, both firms supply positive units and as a consequence aggregate
supply consists of the individual supply of firms 1 and 2.

Figure #5.5
We can now combine aggregate demand and aggregate supply in a single figure in order to obtain
competitive equilibrium allocation of good x. First, note that in order to guarantee that a competitive
equilibrium exists (i.e., aggregate demand crosses aggregate supply in the figure), we need that
max ' (0) * min '(0)
i i j j
p c . Graphically, note that this condition states that the vertical intercept of the
aggregate demand curve lies above that of the aggregate supply curve.
8


Figure #5.6
Note that if, instead, max ' (0) min '(0)
i i j j
c < holds, we cannot guarantee that there is a positive
production or consumption of good x, as the following figure illustrates. Intuitively, this condition
indicates that the willingness to pay of the consumer most interested in the good is still lower than the
marginal cost of production for the most efficient firm. As a consequence is no room for a profitable
exchange, and no units of the goods are produced or consumed.

Figure #5.7
Additionally, since the marginal utility '( )
i i
x is downward sloping for every consumer, i.e., ''( ) 0
i i
x < for
all i, and the marginal cost cj(qj) is upward sloping in output for every firm j, i.e., cj(qj)>0 for all j, then
aggregate demand and supply cross at a unique point, and therefore the CE allocation is unique.
9

Finally, note that we can understand the inverse of the aggregate supply function as the industrys
marginal cost function. In particular, taking any given output qbar, we can now map it into the aggregate
supply function in the vertical axis. Then, the inverse q-1(qbar) can be viewed as the industry marginal
cost of production.

Figure #5.8
We can similarly understand the inverse of the aggregate demand function as the marginal social benefit
function. Specifically, take any consumption level xbar, map it into the aggregate demand function in the
vertical axis. Then, the inverse of the aggregate demand curve, x-1(xbar) which is also referred as p(xbar),
represents the marginal social benefit of xbar units of consumption.

Figure #5.9
Therefore, at the CE output, the aggregate marginal cost of producing such level of output coincides with
the marginal social benefit that all consumers obtain from consuming it.

Comparative statics
10

In this section we examine how the competitive equilibrium output prices are affected by changes in the
parameters of the model. Specifically let's assume that the consumers preferences are affected by a
vector of parameters R
M
, where ML.
8
Then, consumer is utility from good x becomes ( , )
i i
x .
Similarly, firms technology is affected by a vector of parameters R
S
, where SL. Then, firm js cost
function becomes cj(qj,). When bearing a tax, we will use pihat(p,t) to denote the effective price paid by
the consumer i and pjhat(p,t) to denote the effective price received by firm j.
9
If consumption and
production are strictly positive in the CE, then the following conditions must hold
' * *
' * *
* *
1 1
( , ) ( , ) for every consumer
( , ) ( , ) for every firm
i i i
j j j
I J
i j
i j
x p p t i
c q p p t j
x q

= =
=
=
=


We then have I+J=1 equations, which depend on parameter values , and t. In order to understand how
optimal consumption bundles xi* and profit-maximizing production plans qj* depend on parameters
and , we can use the Implicit Function Theorem as long as the functions are differentiable.
Remark, Implicit Function Theorem: Let u(x,y) be a utility function, where x and y are amounts of two
goods.
( , )
If 0 when evaluated at ( , ), then
( , )
( )
( , )
( , )
Similarly, if 0 when evaluated at ( , ), then
( , )
( )
( , )
for all ( , )
u x y
x y
x
u x y
dy x y
u x y
dx
x
u x y
x y
y
u x y
dx y
x
u x y
dy
y
x y


Similarly, if the utility function describes the consumption of a single good x, u(x,), where determines
the consumers preference for x, and
( , )
0, then,
u x
x



8
This implies that there are fewer parameters than goods. In most economic applications this is normally the case,
where only a few parameters are modified simultaneously.
9
Hence, in order to denote a per unit tax (charged on every unit sold), we use pihat(p,t)=p+t, where the consumers
total expenditure on that good thus becomes pq+tq, whereas to denote an ad valorem tax (i.e., a sales tax) we use
pihat(p,t)=p+pt=p(1+t), where the consumers total expenditure on that good becomes pq+tpq=p(1+t)q.
11

unknown we would
have to solve the
entire UMP
easy to solve
( , )
( )
( , )
u x
dx
u x
d
x


For a more detailed description of the Implicit Function Theorem with applications to economics, see
Simon and Blume, pp. 339-341.
Example: Introducing a Sales tax. The expression of the aggregate demand now becomes x(p+t) since the
effective price that the consumer pays is actually p+t, i.e., the sales tax is equivalent to an increase in the
price paid by consumers. In equilibrium, the market price after imposing the tax, p*(t), must satisfy
x(p*(t)+t)=q(p*(t))
Hence, if the sales tax is marginally increased (and functions are differentiable at p=p*(t)), we obtain
' * *' ' * *'
*' ' * ' * ' *
' *
*'
' * ' *
( ( ) ) ( ) 1 ( ( )) ( )
rearranging,
( ) [ ( ( ) ) ( ( ))] ( ( ) )
hence,
( ( ) )
( )
( ( ) ) ( ( ))
x p t t p t q p t p t
p t x p t t q p t x p t t
x p t t
p t
x p t t q p t
+ + =

+ = +
+
=
+

Since the aggregate demand function x(p) is decreasing in prices and the aggregate supply function q(p) is
increasing in prices, then x(p*(t)+t)<0<q(p*(t)), and
' *
*'
' * ' *
( ( ) ) ( )
( )
( ( ) ) ( ( )) ( )
x p t t
p t
x p t t q p t
+
+
= = =
+


Moreover, the above ratio is larger than -1, which implies that p*(t) lies in the interval (-1,0]. Therefore,
we can conclude that the equilibrium price p*(t) decreases in t, i.e., the price received by producers falls
in the tax. Additionally, since p*(t)+t is the price paid by consumers, then p*(t)+1 is the marginal
increase in the price paid by consumers when the tax is marginally increased. Since p*(t)-1, then
p*(t)+11, and consumers cost of the product raises. The following figure summarizes the effect of
imposing a tax on competitive equilibrium price and quantity. Before the introduction of the tax, CE
occurs at p*(0) and x*(0), where the aggregate demand x(p) and aggregate supply q(p) cross each other.
The imposition of the tax shifts aggregate demand curve from x(q) to x(q+t), without affecting the supply
curve, q(p). (Note that the vertical distance between these two curves is equal to the tax, t, at any output
level q.) This implies that the new CE after the introduction of the tax occurs at a lower output level,
decreasing output from x*(0) to x*(t). Regarding prices, note that consumers pay p*(t)+t after the
imposition of the tax, rather than p*(0) before the tax was introduced, while producers receive a price
p*(t) for the x*(t) units after the tax they sell rather than p*(0) they received per unit before the tax.
12


Figure #5.10
At this point, we can easily examine if supply curve is very responsive to price, i.e., if q(p*(t)) is large. In
this case,
' *
*'
' * ' *
huge and negative
( ( ) )
( ) 0
( ( ) ) ( ( ))
x p t t
p t
x p t t q p t
+
=
+


Therefore, p*(t)0, and the price received by producers before the tax, p*(0), does not fall after the
introduction of the tax, p*(t), as depicted in the following figure. However, consumers still have to pay
p*(t)+t, which after the tax change raises to p*(t)+1=0=1. That is, the tax is mainly borne by consumers.
Indeed, as the figure illustrates, the price paid by consumers increases by the tax.

Figure #5.11
If, in contrast, supply is not responsive to price changes, i.e., if q(p*(t)) is close to zero, then
13

' * ' *
*'
' * ' * ' *
0
( ( ) ) ( ( ) )
( ) 1
( ( ) ) ( ( )) ( ( ) )
x p t t x p t t
p t
x p t t q p t x p t t
+ +
= = =
+ +


Therefore, p*(t) -1, and the price received by producers falls in $1 for every extra dollar in taxes, i.e.,
producers bear most of the tax burden. In contrast, consumers pay p*(t)+t, which after the tax changes
raises to p*(t)+1=-1+1=0. That is, consumers do not bear tax burden. This is illustrated in the following
figure, where consumers cost of the good does not increase, from p*(0) before the tax to p*(t)+t after the
tax, whereas the price received by producers falls in $1 for every extra dollar in taxes, i.e., from p*(0)
before the tax to p*(t) after the tax.

Figure #5.12
A remark on nonconvex cost functions. In all our previous discussion we considered that firms cost
function is convex. Let us examine the effect of considering cost functions with concave segments, as in
the example in the following figure. In the figure, aggregate demand x(p) is decreasing in prices, while
aggregate supply is not weakly increasing in prices, since the cost function is nonconvex. Then, aggregate
supply is represented by two intervals (shaded segments of q(p) in the figure). Intuitively, for relatively
high prices, firms prefer to supply more units than less (and hence select the region of the q(p) curve in
which, for the same price level, firms produce the largest output). Because of the specific pattern of the
firms nonconvex cost function, we might have that no crossing point exists between aggregate demand
and supply, and no CE exists.
10


10
Note that alternatively, more than one crossing point can occur if firms cost function is nonconvex. In that case,
we could observe that the same equilibrium price level is associated to different equilibrium output.
14


Figure #5.13

Tax incidence
In this subsection we use N&S approach to tax incidence. In order to discuss the effects of a per-unit tax,
t, we need to distinguish between the price paid by consumers (p
d
) and the price received by sellers (p
s
),
where p
d
=p
s
+t, or alternatively, p
s
=p
d
-t. As a result, note that the wedge between both prices is t=p
d
-p
s
.
When examining the effect of a small increase in the tax, we have
dt=dp
d
-dp
s

and since we must maintain the market clearing condition in equilibrium, dQ
d
=dQ
s
, or D
p
dp
d
=S
p
dp
s
.
Substituting this condition into the above expression of a marginal change in the tax rate, we obtain
D
p
dp
d
=S
p
dp
s
=S
p
(dp
d
-dt)
where the last equality originates from the fact that p
s
=p
d
-t
Rearranging,
D
p
dp
d
=S
p
dp
d
-S
p
dt, or S
p
dt=(S
p
-D
p
)dp
d

We can now solve for the effect of the tax on the price paid by consumers, p
d
, obtaining
0
P
Q
S D P
P
P P S D Q
e dP S
dt S D e e
+
+
+
= = >


Where the expression on the right-hand side is obtained by multiplying the numerator and denominator by
p/q. And similarly for the price received by suppliers, p
s
, obtaining
15

0
P
Q
S P D
P
P P S D Q
dP D e
dt S D e e

+
+
= = <


Since price-elasticity of demand is negative, e
D
0, but that of supply is positive, e
s
0, we obtain that
dpd/dt>0 while dps/dt<0. Intuitively, an increase in the sales tax increases the price that consumers have
to pay for the good and decreases what producers receive for the good, expanding the wedge between
both prices. In the extreme case in which demand is perfectly inelastic, i.e., e
D
=0, the per-unit tax is
completely borne by consumers (note that e
D
=0 implies dp
d
/dt=1, reflecting that a $1 increase in the tax
produces a $1 increase in the price paid by consumers). In contrast, when demand is perfectly elastic, i.e.,
e
D
=, the per-unit tax is completely borne by producers. In particular, e
D
= implies that dp
D
/dt
approaches one.
The above discussion illustrates that the actor (consumer or producer) with the less elastic responses bears
most of the price change caused by the tax. Indeed, if we divide the above two expression, we obtain
/
/
S D
D S
dP dt e
dP dt e
=
Finally, note that the introduction of a tax reduces consumer surplus by an amount p
D
FEp* in the
following figure. (Note that, out of this area, region p
d
FHp* represents the money transferred to the
government in the form of tax revenue). Similarly, the tax reduces producer surplus by an amount
p*EGp
s
, where p*HGp
s
denotes the money transferred by producers to the government in the form of tax
revenue. Hence, the net loss in CS and the net loss in PS illustrate the welfare that this economy
looses as a result of the tax, after taking into account the welfare that is merely transferred from either of
the agents to the government.
11
This is usually referred as the deadweight loss of the tax, and is
represented in the figure by area FEG.
12


11
Total tax revenue is therefore represented by area pdFGp
s
.
12
MWG present a more formal example of the deadweight loss of taxation (see example 10.E.1).
16


Figure #5.14

Mathematical model of supply and demand
Suppose that the demand function is represented by a function Q
D
=D(p,) that depends upon market
prices (negatively) and on a parameter that shifts the demand curve, i.e., dD/d=D

can have any sign


(e.g., positive for a transfer, negative for a tax). Similarly, the supply relationship can be expressed by a
function Q
S
=S(p,) that depends on market prices (positively) and on a parameter that shifts the supply
curve, i.e., dS/d=S

can have any sign. Equilibrium requires that market demand equals market supply,
so Q
D
=Q
S
. In order to arrive at the comparative statics of this model, we need to totally differentiate the
supply and demand functions,
( , )
( , )
D D P
S S P
Q D P dQ D dP D d
Q S P dQ S dP S d



= = +
= = +

Since the market must still be in equilibrium, we must have that the change in demand is offset by the
change in supply, or dQ
D
=dQ
S
. For simplicity consider that the demand parameter changes while the
supply parameter remains constant. The equilibrium condition hence requires that
17

0
0
Rearranging
P P
S d
P P
D dP D d S dP
D P
S D

+
+
+ = +


And since S
p
-D
p
>0, the derivate dp/d will have the same sign as D

. For instance, a fad making certain


clothing fashionable, i.e., D

>0, implies that equilibrium price increases in .


If, in contrast, the supply parameter changes while the demand parameter remains constant, we have
that the equilibrium condition requires
[ ]
P P
P P
P P
D dP D d S dP S d
S d S D dP
S
dP
d S D

+
+
+ = +
= =


Therefore, the derivative dp/d will have the opposite sign of S

. For instance, the introduction of a new


technology that reduces firms costs implies that S

>0 since aggregate supply is positively affected by this


technology. Hence, dp/d<0 implying that the introduction of this technology reduces market prices.
We can easily convert all our previous analysis into elasticities. Indeed, multiplying by /p on both sides
of the expression of dp/d, we obtain
, P
P P
D P
e
P S D P

= =


Dividing the numerator and denominator by Q, we have
( )
( )( )
,
,
, ,
Q Q
P
P
S P D P P P Q
D
e
e
e e S D

= =


Which states that a 1% increase in the demand parameter produces a e
p,
percent change in the quantity
demanded. A similar analysis can be extended to the supply parameter .
Fundamental Welfare Theorems
In this section we relate competitive equilibrium allocations with those chosen by the social planner as
being Pareto optimal. Before setting a formal comparison, let us emphasize some interesting properties of
consumers quasilinear demand for good x. Recall that when preferences are quasilinear,
( ) ( )
i i i i i
u x m x = +
18

Therefore, for a given allocation (x1bar, x2bar, q1bar, q2bar),
1
1 1 2 2 1 2 1 1 2 2
( )
( ) ( ) ( ) ( )
J
m j j
j
w c q
u x u x m m x x
=

+ = + + +


We can therefore define the utility possibility frontier as the set of all those (u1,u2) pairs for which
1 2 1 1 2 2
( ) ( ) u u u x u x + + , or in other words
1 2 1 1 2 2
1
( ) ( ) ( )
J
m j j
j
u u w c q x x
=
+ + +


Importantly, note that the left-hand side of the above inequality neither depends on u1 nor on u2. As a
consequence, the utility possibility frontier is a straight line, and changes in the endowment wm, in the
output, or in the amount of consumption (x1bar, x2bar) shifts the entire utility frontier upward or
downward, without altering its slope. The following figure depicts two different utility possibility
frontiers: one in which consumption and production is given by x1bar, x2bar and qjbar, and another one
in which it is given by amounts x1*, x2* and qj*.

Figure #5.15
Pareto optimal allocations. Let us now examine Pareto optimal allocations. In particular a benevolent
planner chooses the optimal consumption vector (x1,x2,,xI)0 and production vector (y1,y2,,yJ)0
such that
1 1
1 1
max ( ) ( )
. .
J I
m j j i i
j i
I J
i j
i j
w c q x
s t x q

= =
= =
+
=



Intuitively the above maximization problem states that the social planner wants to maximize aggregate
surplus (i.e., the sum of all individuals utility function less total production costs) subject to the market
19

clearing condition (stating, as usual, that aggregate consumption must be equal to the production). Taking
first order conditions with respect to xi and qj we obtain
* *
* *
* *
1 1
( ) with equality if 0
( ) with equality if 0
j j j
i i i
I J
i i
i j
c q q
x x
x q


= =
>
>
=


These first order conditions probably look familiar to you. Indeed, they coincide with the first order
conditions for competitive equilibrium allocations for the specific case in which the Lagrange multiplier
exactly coincides with the vector of market prices p*. Intuitively, this implies that the equilibrium price is
equal to the shadow price of good l.
13
We can now state the first connection between competitive
equilibrium and Pareto optimal allocations.
1
st
FTWE: If price p* and allocation (x1*, x2*, ,xI*, y1*, y2*,, yJ*) constitute a CE, then this
allocation is also PO.
This result, despite being applicable in many cases, crucially depends on some conditions. First, when
market participants (consumers and firms) are price takers. Otherwise, we would have monopsony or
monopoly (or other forms of market power). Second, we assume that markets a complete. That is, there
are markets for every relevant commodity.
14


The 2
nd
FTWE examined under which conditions we can state the converse of the 1
st
FTWE, as follows.

2
nd
FTWE. For every PO utility levels (u1*,u2*, , uI*) there are transfers of the numeraire commodity
(T1,T2,,TI) satisfying
1
0
I
i
i
T
=
=

(i.e., for distributing the fixed amount of the numeraire commodity


among all individuals) such that a competitive equilibrium reached from the endowments
1
1
( ,..., )
I
m m I
w T w T + + yields precisely the PO utility levels (u1*,u2*, , uI*).
That is, the 2
nd
FTWE states that a particular PO allocation in which individuals achieve utility levels
(u1*,u2*, , uI*) can be implemented by a central authority could transfers money among consumers
and then allows the market work, i.e., allows every individual to choose its optimal consumption bundle
given his/her new wealth level w
mi
+T
i
. The CE resulting from such a new initial state will induce PO
utility levels (u1*,u2*, , uI*). A normal question at this point is whether the 2
nd
FTWE tells us that

13
That is, in the CE: (1) every firm, by producing until the point in which marginal costs are equal to market prices,
the firm makes marginal cost equal to the marginal social value of output (); and (2) every consumer, by
consuming until the point in which the marginal benefit from additional units is equal to market price, makes the
marginal benefit from consumption equal to its marginal cost.
14
Note that this assumption does not hold when there exists incomplete information about the product being
exchanged in the market, as in the used-cars markets where the presence of incomplete information might induce all
good cars to be deterred from the market. This is the standard argument of the market for lemons.
20

redistribution is always good. Importantly, this theorem is supported only under relatively strong
assumption. In particular, we consider that preferences and production sets are convex and, of course, we
are assuming agents have complete information, which might be very restrictive in certain cases.
15

Note that an alternative way to set up the social planner problem is
{ } { }
1
1
1 1
, , ,
1 1
1 1
max ( )
. . ( ) for all 2, 3,...,
( ) for all 1, 2,...,
J
I
i i j j
i
j
i
x m z q
i i i i
I J
i j
i j
I J
i j m
i j
j j j
m x
s t m x u i I
x q
m z w
z c q j J

=
=
= =
= =
+
+ =

+
=



Intuitively this problem states that benevolent planner wants to maximize the utility level of individual 1
without reducing the utility level of any other individual in the society below a certain cutoff uibar, while
satisfying two resource constraints and a technological constraint for every firm.
A note on the social welfare function. We consider that society measures the social welfare generated by a
given vector of utility levels among individuals (u1,u2, , uI) by using a social welfare function
W(u1,u2, , uI). The following figure depicts an example of this function. First, note that from our
previous discussion the utility possibility frontier is a straight line indicating the pairs of utility levels that
the society can reach given its endowment and current technology. Intuitively, this set represents utility
pairs that are feasible for the society. The social welfare function, in contrast, helps select one particular
pair among all those that are feasible. For the initial consumption and production levels x10, x20 and qj0,
society prefers utility pair u0 since at this point society can reach the highest social welfare level.
16
When
consumption and production are increased to x11, x21 and qj1, the utility possibility set shifts outwards.
If, after the change in consumption and production levels society is at a utility pair u1, a policy of
transfers among consumers allows society to reach a higher social welfare level moving along the utility
possibility set towards utility pair u1*.
17


15
Standard presentations of general equilibrium theory show that the 2
nd
FTWE doesn't hold if these conditions are
not satisfied, while the 1
st
FTWE still holds. (For a reference, see section 16.D in MWG).
16
Graphically, the figure represents utility pairs for which the society reaches the same social welfare level, i.e., iso-
welfare curves.
17
Note that, in the specific case in which the social welfare function is utilitarian, i.e., W(u
1
,u
2
)=
2
1
i
i
u
=

, the iso-
welfare curves become straight lines, inducing the tangency condition with the utility possibility set to be a complete
overlap. In that particular case, any utility pair along the utility possibility set is Pareto optimal.
21


Figure #5.16
Welfare analysis
When evaluating how a change in consumption or production due to a change in some parameters (for
instance, after the introduction of a tax) modifies aggregate social welfare we use aggregate Marshallian
surplus, defined as the difference between the total benefit from consumption less the total cost of
production,
' '
1 1
( ) ( )
I J
i i j j
i j
S x c q
= =
=


and taking a differential change in the quantity of good k that individuals consume and that firms produce
such that
1 1
I J
i j
i j
dx dq
= =
=

. Then, the change in the aggregate Marshallian surplus is
' '
1 1
( ) ( )
I J
i i i j j j
i j
dS x dx c q dq
= =
=


and since the marginal benefit from additional units of consumption ' ( )
i i
x coincides with the inverse
demand function p(x) for all consumers (i.e., their individual consumes until the marginal benefit from
additional units is equal to the market price), and cj(qj)=C(q) for all firms (i.e., every firm js marginal
cost of its equilibrium production coincides with the aggregate marginal cost), then
'
1 1
'
1 1
( ) ( )
( ) ( )
I J
i j
i j
I J
i j
i j
dS P x dx C q dq
dS P x dx C q dq
= =
= =
=
=



But since
1 1
, and
I J
i j
i j
dx dq dx x q
= =
= = =

by market feasibility, then
22

'
( ) ( ) dS P x C x dx =


Therefore, the change in Marshallian surplus of a marginal increase in consumption (and production) is
the difference between the consumers additional utility and firms additional cost of production. This
intuition is graphically represented in the following figure, where the differential change in Marshallian
surplus produced by a marginal increase in x is depicted in the vertical distance between the marginal
benefit that consumers obtain from additional units the good and the marginal cost that firms incur in
order to produce those additional units.

Figure #5.17
We can also integrate the above expression, eliminating the differentials, so we can obtain the total
Marshallian surplus for an aggregate consumption level of x, as follows.
'
0
0
( ) ( ) ( )
x
S x S P s C s ds = +


Where S
0
=S(0) is the constant of integration, and represents aggregate surplus when aggregate
consumption is zero, x=0. The next figure represents aggregate Marshallian surplus for a given aggregate
consumption level x.
23


Figure #5.18
A natural question at this point is for which consumption level is aggregate Marshallian surplus S(x)
maximized? Differentiating the expression of S(x) with respect to x, we obtain the first order necessary
condition
S(x*)=P(x*)-C(x*)0, or rearranging P(x*)C(x*)
And the second order (sufficient) conditions,
S(x*)=P(x*)-C(x*),
and this expression is negative since P(x*)<0, given that the inverse demand function decreases in
quantity, and C(x*) since firms costs are convex in output (and therefore aggregate production costs are
convex as well). Hence, function S(x*) is concave and we can confirm that x* constitutes a maximum of
S(x). In addition, when x*>0 in interior solutions aggregate surplus S(x) is maximized for an output level
where P(x*)=C(x*). This implies that the aggregate surplus S(x) is maximized and the competitive
equilibrium allocation. This could be anticipated by a visual examination of the above figure, where S(x)
increases until x=x*. Coincides with the 1
st
FTWE, namely, every CE allocation is also PO, i.e., the CE
allocation maximizes aggregate welfare.
18

Concluding remarks. Let us briefly recall the assumptions in this chapter. First, all prices except for pk
are fixed. When is it valid to use this assumption? When studying groups of commodities, as long as
prices between the groups do not substantially change. Second, we were considering the absence of
wealth effects (i.e., we were using a quasilinear utility function). When wealth effects are present, our
supply and demand analysis, the definition of competitive equilibrium allocation, comparative statics,

18
For an interesting example related with the use of aggregate Marshallian surplus see Example 10.E.1 in MWG.
24

etc. are still valid. However, the welfare analysis (evaluating Marshallian surplus) is not accurate when
wealth effects are present, since neither AV=CV nor AV=EV.
1

Chapter 6: Choice under Uncertainty

Expected Utility Theory
In contrast to our analysis in previous chapters, where the individual or firm selects among a set of certain
outcomes, we now examine choices under uncertain outcomes. In this section we present the decision
makers preferences over uncertain outcomes, and how to represent this preference relation with an
expected utility function.
In particular, consider a set of possible outcomes (or consequences) C. This set might include, for
instance, simple monetary payoffs (either positive or negative), in which case C=Reals, or instead,
represent consumption bundles, in which case C=X (where X is a subset of RL, as in previous chapters).
For simplicity, outcomes are considered finite, and hence the set of possible outcomes C contains N
elements. In addition, the probabilities associated to every possible outcome are objectively known,
1

being p1 for outcome 1, p2 for outcome 2, etc. In this chapter we use the concept of lotteries to represent
uncertain outcomes. In particular, a simple lottery is a list
L=(p1,p2,...,pN)
With pn0 for all outcome n, and
1
1
N
n
n
p
=
=

where pn is the probability of outcome n occurring.


2
We
can graphically represent a simple lottery with two possible outcomes as a point along the line connecting
(0,1) and (1,0), as depicted below.

Figure 6.1

Intuitively, note that the horizontal (vertical) intercept represents degenerated probability distributions,
where outcome 1 (outcome 2, respectively) is certain. Strictly positive probability pairs (p1,p2) on the line
p1+p2=1, in contrast, describe a lottery where none of the outcomes is certain and therefore the individual
faces some uncertainty. We can easily extend this graphical representation of lotteries to the case of a

1
In later sections of this chapter we consider that the decision maker does not perfectly know the probability
associated to every outcome (e.g., he does not know how likely is outcome 1).
2
Note that some textbooks describe lotteries as lists of not only probabilities, but also the outcome associated to
every probability.
2

lottery of 3 possible outcomes with associated probabilities (p1,p2,p3), as the following figure illustrates.
First, note that the intercepts also represent degenerated probabilities where one outcome is certain.
Second, note that points strictly inside the hyperplane connecting the three intercepts denote a lottery
where the individual faces uncertainty, such as at the point depicted in the figure. This figure is usually
referred as the probability simplex of lotteries with N=3 outcomes.

Figure 6.2
In order to simplify our graphical analysis, we can do a 2-dimension projection of the above hyperplane,
as the following figure illustrates. First, note that the vertices represent the intercepts (where one outcome
is certain). Second, a simple lottery where the individual faces uncertainty (interior points in the triangle)
where the distance from the point and the side of the triangle represents the probability that the outcome
represented at the opposite vertex occurs.

Figure 6.3
3

We can now use our previous notation to define compound lotteries. Specifically, given a list of K simple
lotteries, where
Lk=(p1k,p2k,,pnk) for every lottery k=1,2,K
with associated probabilities k0 for every lottery k, with
1
K
k
k
a
=

then the compound lottery


(L1,L2,,LK;1,2,,K) is the risky alternative that yields the simple lottery Lk with probability k.
We can hence intuitively interpret a compound lottery as a lottery of lotteries: first, we face a
probability 1 of playing lottery L1, and lottery 1 occurs, then we face a probability p11 of outcome 1
occurring, probability p21 of outcome 2 occurring, etc. Then, the probability of outcome 1 is in fact
1 2
1 1 1 2 1 1
...
K
K
p p p p = + + +
Therefore, for any compound lottery (L1,L2,,LK;1,2,,K), we can calculate a corresponding
reduced lottery as the simple lottery L=(p1,p2,,pN) that generates the same ultimate distribution of
outcomes. That is, the reduced lottery L of any compound lottery can be obtained by
1 1 2 2
...
K K
L L L L = + + +
Let us see two examples of reduced lotteries. In example 1 below, all lotteries are equally likely (i=1/3
for i=1,2,3) but, if lottery 1 occurs, we are guaranteed outcome 1, while if lotteries 2 or 3 occur, we face a
positive probability of obtaining either of the three possible outcomes. The probability of outcome 1 in
this compound lottery is therefore,
1
3
1 +
1
3
1
4
+
1
3
1
4
=
1
2
. Similarly, the probability of outcome 2 is
1
3
u +
1
3
3
8
+
1
3
3
8
=
1
4
(the probability of outcome 3 can be found in a similar manner, also being ). The reduced
lottery of the compound lottery represented in example 1 is therefore [
1
2
,
1
4
,
1
4
.

Figure 6.4
In example 2 below, lotteries 4 and 5 are equally likely. The probability of each outcome is
4

1 1 1 1 1 1
3 3 4 3 4 2
3 3 1 1 1 1
3 3 8 3 8 4
3 3 1 1 1 1
3 3 8 3 8 4
outcome 1: 1
outcome 2: 0
outcome 3: 0
+ + =
+ + =
+ + =


Figure 6.5
The reduced lottery of the compound lottery represented in example 2 is therefore [
1
2
,
1
4
,
1
4
. Interestingly,
both compound lotteries induce the same reduced lotteries, despite originating from a different set of
simple lotteries. This reduced lottery (which assigns the same probability weight to lottery L4 and L5) is
graphically represented as the linear combination between these two lotteries in the probability simplex
below.

Figure 6.6


Preferences over lotteries
5

Regarding the preferences of decision makers who face uncertain outcomes, we assume that individuals
only care about the compound lotteries that induce the same reduced lottery; as in the previous example
where two different compound lotteries induced the same reduced lottery. We refer to this assumption as
consequentialism since only consequences (outcomes), and the probability associated to every
consequence, matter for the decision maker.
In addition, we consider the set of all simple lotteries over outcomes C, . We assume that the decision
maker has a complete and transitive preference relation over lotteries in , allowing him to compare any
pair of simple lotteries L and L. That is,
1. Completeness: Either and , or both, , L L L L L L L


2. Transitivity: If and , then , , , L L L L L L L L L L



Examples. Let us now describe some examples of preference relations over lotteries. First, we consider
examples of preferences over lotteries where the decision maker is only concerned about the probability
distribution over outcomes.
1. Extreme preference for certainty: The decision maker prefers lottery L to L if and only if
max max
n n
n N n N
p p


Intuitively, this preference relation represents a decision maker who is only concerned about the
probability associated to the most likely outcome. That is, he considers the most likely outcome
in lottery L and L and chooses the lottery in which such outcome is the most likely. (Note that
such outcome might differ from lottery L to lottery L).
2. Smallest size of the support: The decision maker prefers lottery L to L if and only if
supp(L)supp(L)
where supp(L) denotes the support of lottery L, i.e., the number of outcomes with an strictly
positive probability, or more precisely supp(L)={ne N:pn>0}. Intuitively, this preference relation
considers a decision maker who prefers the lottery whose probability distribution is concentrated
over the smallest set of possible outcomes.
Let us next examine preference relations over lotteries for which the decision maker cares about not only
probability distributions but also outcomes.
3. Lexicographic preferences: first, we order outcomes from most to least preferred. Then, the
decision maker prefers lottery L to L if and only if
p1>p1, or
if p1=p1 and p2>p2, or
if p1=p1 and p2=p2 and p3=p3, or
Intuitively, the decision maker prefers lottery L to L if outcome 1 (the most preferred outcome)
is more likely to occur in lottery L than in lottery L. If such outcome is equally likely in both
lotteries, i.e., p1=p1, then the decision maker prefers lottery L to L if outcome 2 (the second
most preferred outcome) is more likely to occur in lottery L than in lottery L, etc.
4. The worst case scenario: First, the decision maker attaches a number v(.) to every outcome, v(z).
Then, he prefers lottery L to L if and only if
6

min{v(z):p(z)>0}> min{v(z):p(z)>0}
Intuitively, this implies that this decision maker prefers lottery L if the worst utility he can get
from playing lottery L, min v(z), is higher than the worst utility he can get from playing lottery
L.
3

Let us next define continuity of preferences in this context of preferences over lotteries. For
completeness, we present two equivalent definitions.
Continuity 1. For any three lotteries L, L and L, the sets
{ }
{ }
[0,1] : (1 ) [0,1] is closed, and
[0,1] : (1 ) [0,1] is closed
L L L
L L L


+
+


are closed. The following definition of continuity is probably more intuitive. We therefore ellaborate on
the intuition behind continuity after presenting the following definition.
Continuity 2. If lottery L is strictly preferred to L, then there is a small neighborhood of L and L, B(L)
and B(L), such that for all LaeB(L) and LbeB(L), we have that La is strictly preferred to Lb. The
following figure illustrates the intuition behind this definition. In particular, small changes in the
probability distribution of lotteries L and L do not change the decision makers preference over the two
lotteries.

Figure 6.7
Using an example from MWG, if a decision maker prefers a car trip to staying at home (both events with
certain probabilities), then he must still prefer the car trip (if we include a small probability of suffering a
car accident) than staying at home, as the following figure illustrates. In particular, he slightly moves
from one of the vertices, but still prefers the lottery La (a car accident with a small probability of a car
accident) to lottery Lb (staying at home).

3
This preference ordering over uncertain outcomes is sometimes observed in computer sciences, where one
algorithm is preferred to another if it functions better in the worst case scenario, independently of the probability that
such worst case scenario occurs (as long as it is positive).
7


Figure 6.8
The above continuity assumption, as in consumer theory, implies the existence of a utility function from
the set of all lotteries to the reals, i.e., U:R, such that lottery L is weakly preferred to lottery L if
and only if U(L)U(L).
We must however impose an additional assumption on preferences over lotteries in order to guarantee
that the decision makers preferences satisfy consequentialism as suggested above. We do so by
imposing the so-called independence axiom (IA).
A preference relation over lotteries satisfies the IA if, for any three lotteries L, L and L, and e(0,1) we
have that L is weakly preferred to L if and only if L+(1- )L is weakly preferred to L+(1- )L.
Intuitively, if we mix each of two lotteries, L and L, with a third one L, the preference ordering of the
two resulting compound lotteries does not depend (is independent of) the particular third lottery L that
we use. We provide a graphical illustration of the IA below. Specifically, in the figure on the left the
individual prefers lottery L to L. Hence, it must be that, when we construct a linear combination of the
first two lotteries with any third lottery L, the linear combination of L and L is still preferred to that of
L and L.

Figure 6.9
8

The following example emphasizes on the intuition behind the IA. Consider a decision maker prefers
lottery L to L. We can construct a compound lottery where, after a coin toss, the decision maker plays
lottery L when heads comes up and L when tails does, and another compound lottery where the decision
maker plays lottery L when heads comes up and L when tails does. The IA tells us that this decision
maker must still prefer the first to the second compound lottery.
4
For examples of preferences that do not
satisfy the IA, see Rubinstein (pages 91-92).

Figure 6.10
Given the above assumptions, we can now state that the utility function over lotteries has the so-called
expected utility form.
The utility function U:R has the expected utility form if there is an assignment of numbers
(u1,u2,,uN) to the N possible outcomes such that, for every simple lottery L=(p1,p2,,pN) e , we
have
U(L)=p1u1+p2u2++pNuN
In addition, a utility function with the expected utility form is also referred as a von-Neumann-
Morgenstern (vNM) expected utility function. Note that this function is linear in the probabilities, as the
following result states.
A utility function U:R has the expected utility form if and only if it is linear. That is, if and only if
1 1
( )
K K
k k k k
k k
U L U L
= =

=




for any K lotteries Lke , k=1,2,,N and probabilities (1,2,,K)0 for every lottery. Intuitively, the
utility of the expected value of the K lotteries,
1
, coincides with the
K
k k
k
U L
=


1
expected utility of the K lotteries, ( )
K
k k
k
U L
=

. Indeed, note that the utility of the expected value of


playing the K lotteries is

4
Despite the IA seems a sensible assumption in the theory of choice under uncertainty, note that it did not
necessarily hold in consumer theory (under certain outcomes). In particular, a consumer might prefer good A over
good B, but the combination of A with a third good C does not need to be preferred to the combination of B with the
third good C, i.e., the consumer might regard A and C as substitutes but B and C as complements in consumption.
9

1
K
k
k k n k n
k n k
U L u p
=

=




Where, for a given outcome n, the decision maker finds the joint probability of outcome n occurring in
lottery 1,
1
pn
1
, plus the joint probability of outcome n occurring in lottery 2,
2
pn
2
, and similarly for all
K lotteries. Summing the joint probability of outcome n occurring along the K lotteries, we obtain the
total joint probability of outcome n occurring, and we multiply it times the utility that the decision maker
gets from outcome n, un. We can then repeat this process for every possible outcome n=1,2,,N.
Similarly, the expected utility from playing the K lotteries is indeed represented by
1
( )
K k
k
k k k n n
k n n
U L u p
=

=




where, for a given lottery k, we find the expected utility from outcome 1 occurring in lottery k, u1p1k,
plus the expected utility from outcome 2 occurring in lottery k, u2p2k, etc. Summing over all possible
outcomes, we obtain the expected utility from playing a given lottery k. We can then multiply this
expected utility from the associated probability of lottery k occurring, and then repeat this process for all
lotteries k=1,2, , K.
Note that the above EU property is a cardinal property (not ordinal). That is, not only the ranking matters,
but the particular number resulting from the utility function U:R. Hence, the EU form of a original
utility function U(L) is preserved only under increasing linear transformations, such as U(L)+, where
>0, as the following result confirms. A utility function UTILDE: R is another vNM utility function
for the decision makers preferences over lotteries if and only if UTILDE(L)= U(L)+ for every lottery
Le , where >0.
5





Using the above assumptions we can now state the following result. Suppose that the decision makers
preference relation over lotteries satisfies rationality (completeness and transitivity), continuity and the
independence axiom. Then, this preference relation admits a utility representation of the expected utility
form. That is, we can assign a number un to every outcome n=1,2,,N in such a manner that for any two
lotteries L=(p1,p2,,pN) and L=(p1,p2,,pN), lottery L is weakly preferred to lottery L if and only
if U(L)U(L), or
1 1
N N
n n n n
n n
p u p u
= =



. (Note that un is the utility that the decision maker assigns to
outcome n. It is usually referred as the Bernouilli utility function.)
Up to this point the decision makers preference over lotteries had not been graphically represented with
indifference curves. Let us next analyze the effect of the IA on individuals indifference curves over
lotteries. In particular, the IA implies that indifference curves must be straight and parallel lines.

5
U(L)+ is also referred as an affine transformation, i.e., an increasing linear transformation.
10

1. Indifference curves must be straight lines. Indeed, if a decision maker is indifferent between two
lotteries L and L, then applying the IA he must be indifferent between L+(1-)L and L+(1-
)L for all 1>>0 (where note that we only added L on both sides of the indifference
relation).This result is graphically illustrated in the following figure, where the decision maker is
indifferent between L and L, and therefore he must also be indifferent between L and any linear
combination between L and L, i.e., graphically represented by the line connecting lotteries L and
L.

Figure 6.11

Alternatively, note that if a decision maker is indifferent between lotteries L and L, then using
the IA we obtain that he must be indifferent between
1
2
I' +
1
2
I and
1
2
I +
1
2
I. This is graphically
represented in the following figure, where the individual is indifferent between lotteries L and L
(so they both lie on the same indifference curve), and therefore the IA implies that the compound
lottery
1
2
I' +
1
2
I should also lie on the same indifference curve, which graphically implies that
indifference curves must be straight. Note that if, in contrast, indifference curves are curvy as
that in the next figurethe compound lottery
1
2
I' +
1
2
I does not lie on the same indifference
curve as lottery L and L, and hence the decision maker is not indifferent between such
compound lottery
1
2
I' +
1
2
I and the simple lotteries L and L.

Figure 6.12

11

2. Indifference curves must be parallel. If a decision maker is indifferent between two lotteries L
and L, then applying the IA he must be indifferent between
1
3
I +
2
3
I'' and
1
3
I' +
2
3
I''. In the
figure below, this implies that, starting from two lotteries L and L over which the decision maker
is indifferent (and therefore lie on the same indifference curve), the linear combination of each of
these two lotteries with a third lottery L should also lie on the same indifference curve. If these
two compound lotteries
1
3
I +
2
3
I'' and
1
3
I' +
2
3
I'' lie on different indifference curves as they do
in the figure below then IA is violated.
6


Figure 6.13
Violations of the IA
Despite its intuitive appealing, many individuals violate the IA in their choices among uncertain
outcomes. Let us next present some examples.
Allais paradox. Consider a lottery over three possible monetary outcomes: a first prize of US$2.5
million, a second prize of half a million dollars, and a third prize of zero dollars. The decision maker is
initially asked to choose among lotteries L1 and L1, where
10 89 1
1 1 100 100 100
(0,1, 0) and ( , , ) L L

= =
and he/she is then asked to select one lottery between the following two:
89 10 90 11
2 2 100 100 100 100
(0, , ) and ( , 0, ) L L

= =
Interestingly, more than 50% of the students confronted with these two choices express preferring lottery
L1 to L1, but preferring L2 to L2. (This result has been recurrently observed in different countries, and
among subjects with different backgrounds.) Let us next show why this preference relation violates the
IA. If the decision makers preferences over lotteries satisfied all previous assumptions (and hence can be
represented with an expected utility function), the fact that L1 is strictly preferred to L1 implies that

6
Indeed, in the figure the decision maker is indifferent between lotteries L and L, but is not indifferent between
1
S
I +
2
S
I'' and
1
S
I' +
2
S
I'', violating the IA.
12

10 89 1
5 25 5 0 100 100 100
89 89
0 5 100 100
89 89 10 89 89 89 1
5 0 5 25 5 0 0 5 100 100 100 100 100 100 100
89 10 90 11
5 0 25 0 100 100 100 100
2
By the IA, we can add on both sides, we obtain
( ) ( )
and simplifying
u u u u
u u
u u u u u u u u
u u u u
L
> + +

+ > + + +
+ > +

2
L


Hence, if the decision maker prefers L1 to L1 the IA implies that he must prefer L2 to L2. The
dissonance between theoretical predictions and peoples actual choices over lotteries has produced several
reactions.
1. Approximation to rationality. One reaction to the Allais paradox considers that people might
violate the IA the first time (or the first few times) they are confronted with choices among
different lotteries. However, they are capable of adapting, and we shouldnt expect that subjects
still violate the IA after a sufficient period of time.
2. Little economic significance. Other reaction to the Allais paradox says that the lotteries
presented to subjects involve probabilities that are close to zero and one, which rarely occur in
real economic settings.
3. Regret theory. Some subjects justify their choice of lottery L1 over L1 saying that they did not
want to regret a sure win of half a million! These justifications led to the development of regret
theory in the context of choice under uncertainty.
7

4. Use of weaker assumption. Finally, another reaction to the Allais paradox is to give up the IA in
favor a weaker assumption, such as the betweeness axiom (as discussed in the review session).
Machinas paradox. Consider a decision maker with the following preference over certain outcomes: he
prefers a trip to Venice (Italy) than watching a movie about Venice, and he prefers a movie about Venice
than staying at home without watching the movie. Let us now consider the following two lotteries over
the above three outcomes.
99 99 1 1
1 2 100 100 100 100
( , , 0) and ( , 0, ) L L = =
Intuitively, the first lottery involves a 99% probability of winning a trip to Venice and a 1% probability of
winning the movie about Venice. The second lottery still maintains the same 99% probability of winning
a trip to Venice but shifts the 1% probability towards the outcome in which the individual does not watch
the movie about Venice. One interesting feature of the IA is that, from the previous preferences over
certain outcomes, we can infer this decision makers preference over the above two lotteries. Denote by T
the trip to Venice, M the movie about Venice and H staying at home (without the movie). Using the fact
that this decision maker prefers T to M, we have

7
One of the exercises in your homework assignment explores a decision maker whose preferences over lotteries
reflect regret.
13

99 99 1 1
100 100 100 100
99 99 1 1
100 100 100 100
1 2
Second, from , we have
Hence, by transitivity,
T M M M
M H
M M M H
L L
+ +
+ +


Hence, a decision maker whose preference over lotteries satisfies the IA should prefer the first to the
second lottery. Interestingly, many subjects in experimental settings prefer L2 to L1, violating as a
consequence the IA. Similarly as in the Allais paradox, many subjects explain choosing L2 over L1
because of the disappointment they would experience in the case of losing the trip to Venice, and having
to watch a movie about it instead.
The above two examples present situations in which subjects actual behavior is inconsistent with the IA.
Can we still rely on the IA as a sensible assumption about individuals preferences among lotteries? A
way to answer this question is by asking what would happen to individuals whose behavior violates the
IA. In short, they would be weeded out of the market because they would be open to the acceptance of the
so-called Dutch books, leading them to a sure loss of money.
Example of Dutch Books: Consider an individual who prefers lottery L to L and lottery L to L,
i.e., and L L L L . However, in violation of the IA, (1 ) L L L + for some
(0,1) . So if we present the individual with the chance to trade lottery L for a compound
lottery of L with probability and lottery L with probability (1 ) , (1 ) L L + , for a
small fee (x dollars), he would accept the trade. But as soon as the first stage of the compound
lottery is over the individual will have either L or L . Since he prefers lottery L to both of these
lotteries L and L, we could present him with the chance to trade his lottery for lottery L for a
small fee (y dollars) and he will accept the trade. Thus after both trades he will have paid two
small fees (x+y dollars) and ended up exactly where he began. We could start the cycle again,
extracting more money from this individual. Hence, individuals who systematically violate the IA
in their choices among risky lotteries would be weeded out of the marketplace.

Money lotteries
In the following sections we restrict our attention to lotteries over monetary outcomes, i.e., C=. Since
lottery is a continuous variable, x , this allows us to describe money lotteries as a cumulative
distribution function (cdf)
F(x)=prob{yx} for all y
That is, F(x) represents the probability that the realized payoff y is less than or equal to x. The following
figure illustrates an example of a money lottery that assigns the same probability to every possible payoff
and it can therefore be represented with a uniform cdf F(x)=x.
14


Figure 6.14
whereas the next figure depicts a money lottery that assigns a larger probability to the initial values
(approximately before $40) than to the last values (beyond $60).

Figure 6.15
The above examples consider continuous probability distributions. The decision maker can nonetheless
face a money lottery that is distributed according to a discrete probability distribution, as the following
figure illustrates.

Figure 6.16
15

1
4
3
4
0 if 1
if [1, 4)
( )
if [4, 6)
1 if 6
x
x
F x
x
x
<


In addition, if there is a density function f(x) associated with the cdf F(x), then
( ) ( )
x
F x f t dt

=


The following figures illustrate the density function f(x) for the above continuous and discrete cdfs.

Figure 6.17

Figure 6.18
In the context of money lotteries, we can represent compound lotteries as follows. If the list of cdfs
F1(x), F2(x), , FK(x) represents K simple money lotteries, each occurring with probability 1,2,
,K, then the compound lottery can be represented as
1
( ) ( )
K
k k
k
F x F x
=
=


which intuitively represents the expected value of the K simple money lotteries.
16

For simplicity, we thereafter consider that money lotteries are distributed over non-negative amounts of
money.
8
We can now express the expected utility that the decision maker obtains from playing a
particular money lottery as follows
( ) ( ) ( ) , or ( ) ( ) U F u x f x dx u x dF x =


where u(x) denotes the utility value that the decision maker obtains when the lottery gives him a monetary
amount of x dollars.
9
Note that U(F) is the mathematical expectation of the values of u(x), over all
possible values of x. Furthermore, note that this expression is linear in the probabilities. Indeed, in the
case that the cdf is a discrete probability distribution (as that described in the previous examples), we can
find the EU from playing such a money lottery by writing p1u(x1)+ p2u(x2)+
Importantly, this expected utility representation is sensitive not only to the mean of the distribution, but
also to the variance, and higher moments of the distribution of monetary payoffs. We show this property
of the expected utility function in the following example.
Example. Let us show that if u(x) =
2
x x + then EU is determined by mean and variance alone.
Indeed,
2 2
2 2 2 2
2
2
( ) ( ) [ ] ( ) ( ) ( )
and on the other hand, we know that
( ) ( ) ( ) . Hence, ( ) ( ) ( )
Substituting ( ) in the above expression,
EU ( ) ( ) ( )
EU u x dF x x x dF x x dF x xdF x
Var x E x E x E x Var x E x
E x
Var x E x E x


= = + = +
= = +
= + +


And as a consequence, the EU is determined by the mean and variance alone.

Importantly, note that we imposed a relatively limited set of assumptions on the decision makers
Bernouilli utility function, u(x) (the utility he obtains from a particular outcome or monetary outcome):
that it is increasing in money and continuous. We must however impose an additional assumption: that
u(x) is bounded. Otherwise, we can end up in relatively absurd situations, such as that illustrated in the
so-called St. Petersburg-Menger paradox, which we present next.
St. Petersburg-Menger paradox. Consider an unbounded Bernouilli utility function, u(x). We can then
find an amount of money xm such that u(xm)>2m, for any integer m. In particular, consider a lottery in

8
Note that this just implies a normalization with shifts all possible payoffs in the lottery, e.g., summing a constant P
to all of them, where P represents the smallest negative payoff that the decision maker can obtain in the lottery. This
normalization guarantees that all resulting payoffs are zero or positive.
9
Note that if there is a density function f(x) associated to the cdf F(x), we can use either of the above expressions.
Otherwise we can only use the latter. In addition, note that we did not write the intervals of integration. We
thereafter assume that the integral is defined over the full range of possible realizations of x, i.e., from zero to
infinity.
17

which we toss a coin repeatedly until tails comes up. We then give a monetary amount xm if tails comes
up at the m-th toss. Since, the probability that tails comes up in the m-th toss is
1
2
1
2
1
2
(m timcs) =
1
2
m
,
then the expected utility from playing this lottery is
1
1
( )
2
m m
m
u x


But because of u(xm)>2m, we then have that
1 1
1 1
( ) 2
2 2
m
m m m
m m
u x

= =



where the expression on the left is infinitely large. Hence, this individual would be willing to pay infinite
amounts of money to be able to play this lottery. It might therefore seem reasonable to assume that the
Bernouilli utility function, u(x), is bounded.
10


Measuring risk preferences
In this section we evaluate the preference towards risky lotteries of different individuals. First, we start
with the measure of risk aversion. In particular, we say that an individuals utility exhibits risk aversion if,
for any money lottery F(.),
( )
( ) ( ) ( ) u x dF x u xdF x


If this relationship holds with equality, we denote this individual as risk neutral. If, instead, the sign of the
inequality is reversed, we denote him as risk lover. Intuitively, the above expression says that the utility
that this individual obtains from receiving the expected value of playing the lottery is higher than the
expected utility from playing such lottery. The following figure illustrates this intuition. In particular, it
considers a lottery with two possible outcomes: $1 and $3 which are equally likely. Note that, first, we
depict the utility from outcomes $1 and $3, u(1) and u(3) respectively, by mapping $1 and $3 into the
utility function. We then find the expected value of the lottery ($2) and map it into the utility function,
obtaining u(2). We can then connect u(1) and u(3). The midpoint of this line represents
1 1
(1) (3)
2 2
u u + ,
which is the expected utility of playing the lottery. Clearly, the utility from the expected value of the
lottery, u(2), is higher than the expected utility from playing the lottery,
1 1
(1) (3)
2 2
u u + . We can therefore
conclude that this individuals utility exhibits risk aversion.

10
Alternatively, we can avoid situations such as that described in the St.Petersburg-Menger paradox by checking
that the distribution function we are using does not allow for this type of paradoxes. (You can read more about this
paradox, and potential solutions, in NS pp. 203-205. I strongly recommend you to read the Query in page 205 and
check your answer at the back of the book.)
18


Figure 6.19
Note that the above definition of risk aversion is a direct application of Jensens inequality. This suggests
a strong connection between the concavity of an individuals utility function and his degree of risk
aversion. We return to this topic below.
The next figure depicts an individual who is risk neutral. In this case the utility from the expected value of
the lottery, u(2), coincides with the expected utility of the lottery,
1 1
(1) (3)
2 2
u u + . Thus, this individual
exhibits risk neutrality.

Figure 6.20
Finally, if an individual is risk lover, as the following figure illustrates, the utility from the expected
valued of the u(2), is lower than the expected utility from playing the lottery,
1 1
(1) (3)
2 2
u u + .
19


Figure 6.21
An alternative way to measure risk aversion is by finding the certainty equivalent of a lottery. In
particular, the certainty equivalent of money lottery F(.) for an individual with utility function u(.), c(F,u),
is the amount of money for which the individual is indifferent between playing lottery F(.) and accepting
a certainty (sure) amount c(F,u). More compactly, the certainty equivalent can be expressed as
( ( , )) ( ) ( ) u c F u u x dF x =


Where the right-hand side denotes the expected utility that this individual obtains from playing lottery
F(.). The following figure illustrates the certainty equivalent for a risk-averse individual. Specifically,
note that c(F,u) is the amount of money that makes the individual reach the same utility as if he played the
lottery. Because he is risk averse, the certainty equivalent c(F,u) is below the expected value of the
lottery, $2. In particular, c(F,u) can be found by applying the above definition to this particular lottery
u(c(F,u))=
1 1
(1) (3)
2 2
u u + .
11
The difference between the expected value of the lottery and the certainty
equivalent that a risk averse individual would be willing to accept in order to avoid the risky lottery is
also used as a measure of how risk-averse a certainty individual is. In particular, this measure is
commonly referred as the risk-premium of a lottery, RP, and is defined as RP=EV c(F,u). The figure
below includes the risk premium that this individual is willing to bear in order to avoid the lottery, i.e., he
is willing to accept the certainty equivalent which is below the expected value of the lottery.

11
If, for instance, u(x)= x , then
1 1
( , ) 1 3
2 2
c F u = + , or ( , ) c F u =1.36. Squaring both sides of the equality,
we obtain a certain equivalent of c(F,u)=1.86.
20


Figure 6.22
In the case that we examine a risk lover, the previous rankings are reversed, as the following figure
illustrates. Indeed, the certainty equivalent c(F,u) lies above the expected value of the lottery, $2. As a
consequence, the risk premium for this individual, RP=EV c(F,u) is actually negative since EV<c(F,u).
Intuitively, this implies that this individual would have to be given an amount of money above the
expected value of the lottery in order to convince him to stop playing the lottery (he loves risk!!).

Figure 6.23
Finally, note that in the case of a risk-neutral decision maker, the certainty equivalent c(F,u) coincides
with the expected value of the lottery, $2, and therefore the risk premium is zero, RP=EV c(F,u)=0,
21

reflecting that this individual is not willing to accept money to avoid playing the lottery (as for risk averse
individuals), nor we must compensate him in order to stop playing the lottery (as for risk lovers).

Figure 6.24
All the measures of riskiness discussed above focus on money. The next measure, in contrast, focuses on
probabilities. The probability premium measures the excess in winning probability over fair odds (equally
likely outcomes) that makes the individual indifferent between the certainty outcome x and a gamble
between the two outcomes x+ and x-. That is,
[ ] [ ]
1 1
2 2
( ) ( , , ) ( ) ( , , ) ( ) u x x u u x x u u x = + + +
Intuitively, this implies that a risk adverse individual, in order to be attracted to play a particular lottery,
must be given better than fair odds since otherwise he would not accept the risk associated to the lottery.
This intuition is graphically represented in the following figure for our on-going example of the lottery
between monetary amounts $1 and $3. First, we map $1 and $3 into the utility function, obtaining u(1)
and u(3). We then find that the expected value of the lottery, $2, and map it into the utility function, u(2).
Then, the extra winning probability (extra probability of outcome $3 occurring) that the risk averse
decision maker needs in order to make the EU of the lottery raise until it coincides with the utility from
the expected value of the lottery, u(2), is
1 1
2 2
(2) ( ) (3) ( ) (1) u u u = + +
Graphically, note that the EU from the lottery with fair odds (equally likely outcomes) lies below the
lottery in which the winning probability has been increased by .
12


12
The probability premium of a lottery is referred as the insurance premium by NS. For examples on the
probability premium, see the exercises on the handout of the review session and example 7.2 in NS (pp. 209-210).
22


Figure 6.25
Given the above different measures of risk aversion, we next establish a connection between them. In
particular, the following properties are equivalent:
1. The decision maker is risk averse.
2. The utility function is concave, u(x)0.
3. The certainty equivalent is lower than the expected value of the lottery, i.e., c(F,u) ( ) xdF x

,
where ( ) xdF x

denotes the expected value of lottery F(x).


4. The risk premium is positive, i.e., RP=EV-c(F,u), which simply implies EV>c(F,u).
5. The probability premium is positive for all x and , i.e., (x,,u)0.

Example. The following example examines an individuals decision about how much insurance to
acquire. Consider a risk averse individual with utility function u(.) and wealth w. In the case that no loss
occurs (which happens with probability 1-), his utility is given by u(w-q), where q denotes the amount
of money he spends on units of insurance at a price of q per unit. If a loss occurs (which happens with
probability ), his utility is now given by u(w-q-D+) where D denotes the dollar amount of the loss he
suffers and represents that the insurance company gives him $1 per unit of insurance bought. Hence,
this decision makers expected utility maximization problem becomes
0
max(1 ) ( ) ( ) u w q u w q D

+ +
where is this individuals only choice variable (the number of units of insurance he buys). Taking first
order conditions with respect to we obtain
* * *
(1 ) ( ) (1 ) ( ) 0 q u w q q u w q D + +
When the FOC is satisfied with equality (at an interior optimum) we have
23

* * *
* * *
(1 ) ( ) ( 1) ( )
( ) ( ) ( ) ( )
q u w q q u w q D
q q u w q q u w q D


= +
+ = +

Now, assuming that q= (and hence the insurance is actuarially fair, since the price of every unit of
insurance is equal to the probability of a loss), then
2 * 2 * *
* * *
( ) ( ) ( ) ( )
( ) ( )
u w u w D
u w u w D


+ = +
= +

and since u(.) is strictly decreasing (by concavity), we obtain
* * *
w w D = +
and rearranging *=D. Thus, if insurance is actuarially fair, the decision maker insures completely, i.e.,
he acquires a number of units of insurance that are exactly equal to the loss he can suffer.
13


Arrow-Pratt coefficients of absolute and relative risk aversion. In this subsection we examine other forms
of measuring risk aversion. In particular, focusing on the connection between risk aversion and the
concavity of a decision makers utility function, we next present the Arrow-Pratt coefficient of absolute
risk aversion, r
A
(x).
( )
( )
( )
A
u x
r x
u x


Clearly, the greater the curvature of the utility function, u(x), the larger the coefficient r
A
(x). Despite
being interested in the curvature of the utility function as described by u(x) we cannot simply use
u(x) to measure an individuals risk aversion. In particular, such a measure is not invariant to positive
linear transformations of the utility function. For instance, if v(x)=u(x), then v(x)=u(x) (is not
invariant to the linear transformation) whereas the coefficient r
A
(x) is invariant since
( ) ( )
( )
( ) ( )
A
u x u x
r x
u x u x


= =


Example. Taking a utility function u(x)=-e
-ax
where a>0. Then, the Arrow-Pratt coefficient of absolute
risk aversion, r
A
(x), is
2
( ) for all
ax
A ax
a e
r x a x
ae

= =
where r
A
(x) is constant in the individuals wealth level, x. The literature refers to this utility function as
the Constant Absolute Risk Aversion (CARA) utility function.


13
If insurance is not actuarially fair, i.e., q>, then a different result follows. See homework assignment.
24

If, instead, coefficient r
A
(x) decreases as we increase wealth x, we say that such utility function satisfies
decreasing absolute risk aversion, i.e.,
( )
A
r x
0
x

<

. Intuitively, this implies that wealthier individuals are


willing to bear more risk than poorer individuals. Note, however, that this is not due to different utility
functions between these two groups of people, but rather, because the same utility function is evaluated at
higher/lower wealth levels.
The following coefficient is unaffected by the wealth level at which risk aversion is evaluated. In
particular, the coefficient of relative risk aversion can be expressed as follows.
( )
( ) that is, ( ) ( )
( )
Hence,
( ) ( )
( )
R R A
R A
A
u x
r x x r x x r x
u x
r x r x
r x x
x x

= =


= +


And the utility function for which the coefficient of relative risk aversion is constant is commonly
referred as the Constant Relative Risk Aversion (CRRA) utility function, U(x)=x
b
. (It is easy to check that
r
R
(x)=b for this utility function).
Finally, let us now establish equivalences between the above measures of risk aversion. For two utility
functions u1 and u2, where u2 is a concave transformation of u1 (i.e., u2 is more concave than u1), we
have that:
1. The coefficient of absolute risk aversion for the more concave utility function is higher, i.e.,
r
A
(x,u
2
) r
A
(x,u
1
).
2. There exists an increasing concave function (.) such that u2(x)= (u1(x)) at all x. That is, u2(.)
is a concave transformation of u1(.), i.e., u2(.) is a more concave function than u1(.).
3. The certainty equivalent that the decision maker with utility function u2(.) is willing to accept in
order to avoid the lottery is lower than that of the decision maker with utility function u1(.), i.e.,
c(F,u2)c(F,u1) for any lottery F(.).
4. The probability premium that the individual with utility function u2(.) needs in order to accept
playing lottery F(.) is higher than that of the individual with utility function u1(.), i.e., (x,,u2)
(x,,u1).
5. Whenever u2(.) finds a lottery F(.) at least as good as a riskless outcome xbar, then u1(.) also
finds such lottery F(.) at least as good as xbar. That is
2 2 1 1
( ) ( ) ( ) implies ( ) ( ) ( ) u x dF x u x u x dF x u x


The following figure summarizes some of the above results. First, note that u1(.) and u2(.) are evaluated
at the same wealth level x. Then, we map outcomes $1 and $3 into u1(.) and into u2(.), separately.
Connecting u1(1) and u1(3) we obtain the expected utility of playing the lottery for individual 1, and
similarly for individual 2. Note that EU1>EU2. We then find the certainty equivalent for each individual,
i.e., the amount of money that provides each individual with the same utility as what he expects to obtain
if actually playing the lottery. As the figure depicts, the certainty equivalent that individual 2 is willing to
accept in order to avoid playing the lottery is lower than that of individual 1, reflecting that individual 2 is
more risk averse than individual 1.
25


Figure 6.26

Comparison of payoff functions
In previous sections we analyzed risk preferences for a given lottery and described different measures of
riskiness. In this section we examine different distribution of payoffs, and how some might be more
attractive than others. Specifically we will use two main evaluation criteria:
1. If a lottery F(.) yields unambiguously higher returns than G(.) the first lottery seems more
attractive than the second lottery. We will explore this idea by the definition of first-order
stochastic dominance (FOSD). This concept is connected with the mean of the lottery. Hence
individuals compare the mean of two lotteries when facing a decision problem, and prefer the
lottery with a higher mean.

2. If, however two lotteries, F(.) and G(.) have the same mean, but lottery F(.) is unambiguously
less risky than G(.), i.e., it is distributed over a smaller support, then we can anticipate that lottery
F(.) would be preferred to lottery G(.),In this case, the concept developed to rank lotteries is
related with the variance of a lottery, and we will explore it in the definition of second-order
stochastic dominance (SOSD).
FOSD. The distribution of monetary payoffs in lottery F(.) first-order stochastically dominates (FOSD)
the distribution of monetary payoffs in lottery G(.) if and only if
1-F(x)1-G(x) for every payoff x, or alternatively F(x)G(x)
26

First, note that for a given lottery F(.), 1-F(x) intuitively represents the probability of obtaining prizes
above x. Hence, the above condition for FOSD implies that, at any given outcome x, the probability of
obtaining prizes above x is higher with lottery F(.) than with lottery G(.). This intuition is graphically
represented in the following figure, where for a given outcome xbar, F(xbar)G(xbar), or alternatively
1-F(xbar)1-G(xbar). Graphically, this implies that the cdf of lottery F(.) lies below that of G(.). Indeed,
the probability weight that lottery F(.) assigns to high monetary outcomes is larger than that of lottery
G(.).

Figure 6.27
Let us now examine an example with lotteries over discrete outcomes (the above examples of lotteries
F(.) and G(.) considered continuous cdfs). In the following figure, we consider lottery G(.), which assigns
half probability to the monetary outcome $1 and half to outcome $4. Lottery F(.), in contrast, shifts the
probability weight lying in outcome $1 towards outcomes $2 and $3 equally (with a probability of
each) the probability weight in outcome $4 is shifted to $5. The probability weight is kept unaltered.

Figure 6.28
The following figure illustrates these two lotteries, which provides a visual comparison of their cdfs. In
particular, we can easily check that F(.) lies below lottery G(.), and therefore F(.) FOSD G(.).
27


Figure 6.29
The previous example with discrete probability distributions suggests that an upward probabilistic shift
such as the one described from lottery G(.) to lottery F(.) produces a new cdf that FOSD the original
cdf. Generally, if we take any outcome x, and add an amount z, where z is distributed according to a cdf
H
x
(.), with H
x
(0)=0, then ( ) ( ) ( )
x
u x u x z dH z = +

since the distribution generates a final return of


at least x with probability one. (Recall example).





Intuitively, note that the above condition simply states that lottery F(.) generates a higher expected utility
than lottery G(.), where F(.) is simply the upward probabilistic shift that function Hx(.) produces in the
original cdf G(.).
SOSD. We now focus on the dispersion of monetary outcomes in a lottery, as opposed to the higher/lower
returns that FOSD analyzes. To focus on the dispersion of the lottery only, we assume that lotteries F(.)
and G(.) both have the same mean (i.e., the same expected outcome). We then say that lottery F(.) SOSD
G(.) if, for every nondecreasing utility function u(x), u: , (mapping certain monetary outcomes into
utility levels), we have that

That is, lottery F(.) SOSD G(.) if the former generates a larger expected utility than the latter, where both
of them yield the same mean.
Example 1: Mean Preserving Spread. Let us first consider lottery F(.), which assigns an equal probability
to outcomes $2 and $3 occurring. Then we spread the probability weight of these two outcomes over the
probability of these and other outcomes. In particular, we spread the probability weight of $2 (1/2) over
( )
x
H
u(x)dF(x) = u(x + z)dH
x
(z)

dF(x) u(x)dG(x)


( ) ( ) ( ) ( ) u x dF x u x dG x

EU
F
u(x) EU
G
28

outcome $1 and $2 equally (1/4 each). Similarly, we spread the probability weight of $3 (1/2) over
outcome $3 and $4 equally (1/4 each). First, note that the expected value of both lotteries coincides, being
5/2 for both F(.) and G(.). Hence, the mean is preserved across lotteries. However, lottery G(.) spreads the
probability weight of lottery F(.) over a larger set of outcomes.

Figure 6.30
We can conclude that lottery F(.) SOSD G(.) since they both have the same mean, but the former
concentrates its probability weight over a smaller support, i.e., F(.) is less dispersed than G(.). Note,
however, that neither lottery FOSD the other. Indeed, as the following figure indicates, F(.) is not above
G(.) for all x, or below G(.) for all x.


Figure 6.31
Example 2: Elementary Increase in Risk. We say that a cdf G(.) is an Elementary Increase in Risk (EIR)
of another cdf F(.) if G(.) takes all the probability weight of an interval [x,x] and transfers it to the end
points of this interval, x and x, such that the mean is preserved. Hence, both cdfs F(.) and G(.) maintain
the same mean but G(.) concentrates more probability at the end points of the interval [x,x] than the
original distribution F(.). The following figure illustrates an EIR.
29


Figure 6.32
Note that an EIR is a mean preserving spread (MPS), but the converse is not necessarily true.
14
In the
above example, F(.) and G(.) share the same mean but F(.) is less dispersed than G(.). As a consequence,
lottery F(.) SOSD G(.).
15

For exercises related to FOSD and SOSD, see MWG 6.D.2 and 6.D.3.

State-dependent utility
In all our previous discussions the decision maker only cared about the payoff arising from every outcome
of the lottery. In this section, we assume that the decision maker cares not only about his monetary
outcomes, but also about the state of nature that causes every outcome. Intuitively, this implies that, for a
given outcome x, the decision maker might experience a different utility if such outcome originates from
state of nature 1 occurring than from state of nature 2. In the following subsections, we will first discuss
how we can describe uncertainty using states of nature paralleling outcomes from our previous
discussions. Secondly, we will analyze how these state-dependent preferences can be used to obtain an
extended expected utility representation.
Using states of nature to represent utility
Let us now assume that each of the possible monetary payoffs in a lottery is generated by an underlying
cause (an underlying state of nature). Lets consider two different examples:
1. The monetary payoff of an insurance policy is generated by a car accident. In this case, state of
nature={car accident, no car accident}.
2. The monetary payoff of a corporate stock is generated by the state of the economy. In particular,
state of nature={economic growth, economic depression}.

14
This would be the case if the MPS shifts some probability weight towards points away from interval [x,x],
satisfying the definition of a MPS but not that of an EIR.
15
Note that, similarly to the above example, we cannot determine whether lottery F(.) FOSD G(.) since neither of
them lies above or below the other for all monetary outcomes x.
30

Generally, we know every state of nature as s S, where S is a finite set containing all states of nature.
Every state s has a well defined, probability of occurrence
s
0. Finally, a random variable is a function
g:S that maps states of nature in S into monetary payoffs. Let us extend our previous examples.
1. Car accident: the random variable assigns a monetary value to the state of nature car accident
(e.g., -$1,000, with probability
acc
) and to the estate of nature no accident (e.g., -$100 where
the driver only pays its insurance premium, with probability 1-
acc
).

State of Nature Monetary Payoff
car accident
Deductable premium
no car accident
Premium (-)

2. Corporate stock: the random variable assigns a monetary value to the state of nature economic
growth (e.g., $250 in increased value of the shares, with probability
growth
), and to the state of
nature economic depression (e.g., - $125 in decreased value of the shares, with probability 1-

growth
).
State of Nature Monetary Payoff
growth
economic growth
Dividends, higher price of shares
depression
economic depression
No dividends, loss if we sell shares

Every random variable g(.) can be used to represent the monetary lottery F(.). In particular,

where {s : g(s)x} represents all those states of nature s for which the monetary payoff arising from them,
g(s), is lower than a particular monetary payoff x.
16
hence, the random variable g(.) generates a monetary
payoff for every state the nature s S, and since set S is finite, we can represent this list of monetary
payouts as

Where x
s
is the monetary payoff corresponding to state of nature s. The following figure provides an
example of a random variable g(.). Specifically, outcomes are ordered from lower to higher monetary
payoffs, i.e., x
4
x
3
x
2
x
1
. In addition, outcome 1 can occur with probability 50%, outcomes 2 and 4 can
occur with probability 25% each, while outcome 3 receives zero probability.

16
For an example, think of stocks: F($200) represents the cumulated probability of obtaining a payoff equal or lower
to $200 from the stock.


acc

NOacc

{ } : ( )
( )
s
s g s x
F x

1 2
( , ,..., )
S
s
x x x
+

31


Figure 6.33
We can hence express the cumulative probability of every outcome as follows

This example reveals one disadvantage of using F(x). In particular, for a given outcome x, we cannot keep
track of which different states of nature generated x.
Extended expected utility representation
We can now express a preference relation of the list of monetary payoffs (x
1
,x
2
,,x
S
) . It is
important to note the similarity of this setting with that in consumer theory. Indeed, in that context we
described preferences over bundles, while now we described preferences over lists of monetary payoffs.
Since the list of monetary payoffs (x
1
,x
2
,,x
S
) specifies one payoff for each state of nature (one for
each contingency), this list is usually referred to as contingent commodities.
We now expand our previous EU representation to this state-dependent utility. In particular, we say that a
preference relation has an Extended EU representation if for every state of nature s, there is a utility
function u
s
: (mapping the monetary outcome in state s, x
s
, into a utility value u
s
(x
s
) in ), such
that for any two lists of monetary outcomes

Interestingly, note that the main difference with previous sections is that now the Bernouilli utility
function is state-dependent, u
s
(.), whereas in the previous section it was state-independent, u(.).
1
1 1 1 2
3 1 1
2 1 2 2 4 4
3 1 1
3 1 2 3 2 4 4
4 1 2 3 4
( ) since states with ( )
( )
( ) 0
( ) 1
F x g s x
F x
F x
F x




= = <
= + = + =
= + + = + + =
= + + + =
S
+

S
+

+

1 2 1 2
1 2 1 2
( , ,..., ) and ( , ,..., )
( , ,..., ) ( , ,..., ) iff ( ) ( )
S S
S S
S S s s s s s s
s s
x x x x x x
x x x x x x u x u x
+ +


32

Let us next provide a graphical representation of a decision makers state-dependent preferences. First, we
depict the monetary outcome arising in state of nature 1 and 2 (x1 and x2, respectively) in the horizontal
and vertical axis. In addition, note that at the certainty line the decision maker receives the same
monetary amount regardless of the state of nature, i.e., x1=x2 (45-degree line). Second, all the (x1,x2)
pairs on a given indifference curve must satisfy

Third, note that the upper contour set of an indifference curve that passes though point (x1bar,x2bar) is


Figure 6.34
Furthermore, note that movements along a given the indifference curve do not change the decision
makers utility level. Hence, totally differentiating (as we did in order to find the MRS in consumer
theory), we obtain
1 1 1 2 2 2
( ) ( ) u x u x U + =
1 1 1 2 2 2 1 1 1 2 2 2
( ) ( ) ( ) ( )
or more generally, ( ) ( )
s s s s s s
s s
u x u x u x u x
u x u x


+ +


33


which represents the slope of the indifference curve, evaluated at point (x1bar,x2bar). Finally, note that if
the Bernouilli utility function were state-independent, i.e., u
1
()=u
2
()==u
S
(), then the slope of the
indifference curve would be
Example. Insurance with state-dependent utility. Starting from an initial situation without insurance, the
pair of monetary outcomes for a particular individual with wealth level w is (w, w-D), where D represents
the loss he suffers from a certain accident. After purchasing insurance, the decision-maker gets a payment
z1 in state 1 (no accident) and a payment z2 in state 2 (accident). That is, the pair of monetary payoffs
becomes (w+z1, w-D+z2). Moreover, if the policy is actuarially fair, its expected payoff is zero


Figure 6.35
First, note that insurance allows this individual to consume along any point of his budget line. In addition,
note that the slope of the budget line is , which coincides with the slope of the decision makers
indifference curve at the certainty line x1=x2 when his preferences are state-independent. Therefore, in
this case the indifference curve is tangent to the budget line at the certainty line. This implies that this
individual would insure completely since his consumption level is completely unaffected by the
possibility of suffering an accident: his consumption with/without accident coincides. In the case that the
decision makers preferences are state-dependent, however, indifference curves are not tangent to the
budget line at the certainty line. Instead, the decision-maker prefers a point such as (x1,x2) to the certain
outcome (xbar,xbar). That is, at (xbar,xbar) he prefers higher payoffs in state 1 than in state 2, since
1 1
1
2 2
2
1 1 2 2
1 1 2 2
1 2
( )
1
2 1 1 1
( )
1 2 2 2 2
( ) ( )
0
and rearranging,
( )
( )
u x
x
u x
x
u x u x
dx dx
x x
dx u x
dx u x


+ =


= =

2 1
1 2
dx
dx

=
1 1 2 2
0 z z + =
1
2

34

u1(xbar)>u2(xbar). Otherwise, he would prefer higher payoffs in state 2 than in state 1. In addition, note
that u1(xbar)>u2(xbar) implies that u1(xbar)/u2(xbar)<1, and




Figure 6.36
(For more about the state dependent approach, see NS pp. 216-220, and for extra practice see Examples
7.3 and 7.4 for the CARA and CRRA, respectively, and the Portfolio problem in pp. 214-215)
Let us now allow for the possibility that the monetary payoff under state s, x
s
, is not a certain amount of
money, but a random amount with distribution function F
s
(.). Hence, when monetary outcomes arising
from the S states of nature can be described as a lottery for every state of nature L=(F
1
,F
2
,,F
S
). Given
this extended definition of lotteries to the account for state-dependence, we can then rewrite the IA as
the extended IA, as follows.
1 1 1
2 2 2
( )
( )
u x
u x



<

35

The preference relation over lotteries satisfies the extended IA if, for all L, L and L and , we
have

Hence, this extended IA is a mere extension of the standard IA to the case of extended lotteries
L=(F
1
,F
2
,,F
S
). We can now express the Extended EU Theorem.
Extended EU Theorem. Suppose that a decision makers preferences over lotteries satisfy continuity and
the extended IA. Then, we can assign a utility function u
s
(.) for money in state s, such that for any two
lotteries



The last inequality simply says that the decision maker prefers extended lottery L to L if the expected
utility from L is higher than that from L. In particular, note that the expected utility from extended lottery
L can be expressed as above, since for a given state of nature s, we have payoffs distributed according to
the cdf F
s
(.).
17


Subjective probability theory
Suggested additional reading: Varian chapter 11, Page 190-194
In previous sections we assumed that the probabilities of every possible outcome were objective and
observable by the decision maker. This might not be the case in certain cases where, instead, people might
hold probabilistic beliefs about the likelihood of a certain event. We will refer to these probabilistic
beliefs as individuals subjective probabilities. Because of being subjective, a natural question is whether
we can infer a decision makers subjective probabilities from his/her actual behavior. The answer to this
question is that we can infer subjective probabilities. For instance, consider a decision maker who prefers
a gamble giving him $1 in state 1 and $0 in state 2 to another gamble in which he gets $0 in state 1 and $1
in state 2. If the value of money is the same across states, we can infer that this decision maker is
assigning a higher subjective probability to state 1 than to state 2.
In this section we want to extend the EU theorem we described in previous parts of this chapter for
objective probabilities to the case of subjective probabilities. Before stating this extension of the EU
theorem we must, however, start with some definitions.

17
You can find more exercises on lotteries on Rubinstein, Lecture 8.
(0,1)
if and only if (1 ) (1 ) L L L L L L + +


L = (F
1
, F
2
, F
3
..........., F
s
) and L' = (F'
1
, F'
2
, F'
3
..........., F'
s
)
we have
L L' iff u
s
(x
s
)dF
s
(x
s
)

( )
s

u
s
(x
s
)dF'
s
(x
s
)

( )
s

E u
s
(x
s
) [ ]
36

Let us start defining an individuals preferences over two lotteries in state s. In particular, we define state
s preferences on state s lotteries Fs(.) by saying that an individual prefers lottery Fs(.) to Fs(.), both of
them in state s, if and only if the expected utility from lottery Fs(.) is larger than that from lottery Fs(.),
or

( ) ( ) iff ( ) ( ) ( ) ( )
S S S S S S S S S S
F F u x dF x u x dF x



Hence, the state preferences
( )
1 2
, ,...,
S
on state lotteries ( )
1 2
, ,...,
S
F F F are state uniform if
S S
=

for any two states s and s
That is, if for any two states, s and s , the ranking of lotteries coincides ( ) ( )
S S
F F

for any two


lotteries ( )
S
F and ( )
S
F

.
Alternative interpretation, for any two states, s and s , the ranking of expected utilities from playing two
lotteries ( )
S
F and ( )
S
F

coincide, for example, ( ) ( ) ( ) ( )
S S S S S S S S
u x dF x u x dF x


.
With our above definition of state uniform preferences, the Bernouilli utility function in state s and s,
us(.) and us(.), can differ only up to an increasing linear transformation. That is, there is a utility function
u(.) such that

(.) (.)
s s s
u u = +

for every state s, where 0
s
> and
s
(and similarly for us(.), so that the ranking between the expected
utility in state s and s is unaffected).
We can now state the extension of the EU theorem to subjective probabilities. Suppose that a preference
relation over lotteries satisfies continuity and the extended IA. Suppose, in addition, that the derived state
preferences over lotteries are state uniform. Then, there are subjective probabilities
1 2
( , ,..., ) 0
S
>> and
a utility function u(.) on certain amounts of money, such that any two lists of monetary amounts
1 2
( , ,..., )
S
x x x and
1 2
( ' , ' ,..., ' )
S
x x x ,
1 2 1 2
( , ,..., ) ( , ,..., ) iff ( ) ( )
S S S S S S
S S
x x x x x x u x u x



Intuitively, the last expression says that a decision maker prefers the first list of monetary outcomes to the
second if the subjective expected utility he obtain from the first is larger than that he obtains from the
second. This result is interesting since it allows us to generalize much of our previous methodology to the
case of subjective probabilities (i.e., beliefs). Nonetheless, the predictions of the subjective EU theorem
are not necessarily satisfied in certain experimental settings. The following example (the so-called
Ellsberg paradox) presents a behavioral pattern that violates the subjective EU theorem, paralleling the
37

anomalies we described after presenting the IA, namely, the Allais paradox and the Machinas
paradox.
Ellsberg paradox: Consider the following game. Your instructor shows up in the classroom with an urn
containing 300 balls. He/she informs you that, among the 300 balls, 100 are red, but the remaining 200
can be either blue or green. Then he presents you the following two gambles, asking you to choose only
one of them. (In all gambles you first insert your hand in the urn without being able to see which ball you
extract from the urn)
Gamble A: $1000 if the ball you extract is red.
Gamble B: $1000 if the ball you extract is blue.
Confronted with these two gambles, many subjects select gamble A, showing that the objective
probability that the ball is red (1/3) must be higher than their subjective probability that the individual
assigns to the ball being blue. Now your instructor offers you the choice among two other gambles.
Gamble C: $1000 if the ball you extract is not red.
Gamble D: $1000 if the ball you extract is not blue.
In this case, many subjects choose gamble C, showing that the objective probability that the ball is not red
(2/3) must be higher than the subjective probability the individual assigns to the ball not being blue.
However, this choice is contradictory with the previous choice (of gamble A over B) if the decision maker
uses the subjective EU theory described above. First, from gamble A being preferred to B we can infer
that
1/3>p(Blue),
where p(Blue) denotes the subjective probability that the individual assigns to the ball being blue. Second,
from gamble C being preferred to D, we infer that
2/3>p(Not Blue),
where p(Not Blue) represents the subjective probability that this individual assigns to the ball not being
blue. However, from standard probability, we know that p(Blue)=1-p(Not Blue), which contradicts the
previous two results.
1

Chapter 7: Monopoly
In this chapter we examine the output and pricing decision by a firm that holds market power selling its
product to a group of customers, i.e., monopolist. In addition, we evaluate the welfare effects of
monopolies, and describe price discrimination practices often used by monopolists to further increase
their profits beyond those the monopolist obtains when setting a single price to all customers, i.e., usually
referred as uniform pricing.
Profit maximizing output under monopoly
Let us start considering a general demand function x(p), which is continuous and strictly decreasing in p,
i.e., x(p)<0. Similarly as our discussion under perfectly competitive markets, we assume that there is a
price pbar< such that x(p)=0 for all p>pbar, as described in the figure below. Intuitively, this guarantees
that if market prices are sufficiently high, no consumers buy positive amounts of the good.

Figure 7.1
In addition, consider a general cost function c(q) which is increasing and convex in q. (Recall that
convexity guarantees that the first derivate is itself increasing in q, and hence marginal costs are
increasing in q.) Hence, the monopolist profit maximization problem can be expressed as a choice of a
price p such that
max ( ) ( ( ))
p
px p c x p
Alternatively, taking the inverse demand function p(q)=x-1(p), we can rewrite the monopolist problem as
follows, where the choice variable of the monopolist is now its output,
0
max ( ) ( )
q
p q q c q


Differentiating with respect to q, we obtain
( ) ( ) ( ) 0
m m m m
p q p q q c q +
Rearranging,
2

[ ] ( )
( ) ( ) ( ) with equality if 0
d p q q
dq
m m m m m
MC
MR
p q p q q c q q
=
+ >
_ _

In addition, we assume that p(0)>c(0), which graphically implies that the vertical intercept of the demand
curve lies above that of the marginal cost curve, as depicted in the figure. This guarantees that an interior
optimum exists, and hence our above first-order condition holds with equality. That is,
( ) ( ) ( )
m m m m
p q p q q c q + =
This implies that the monopolist increases production until the point where the marginal revenue from
selling an additional unit equals the marginal cost from producing such unit. The following figure
illustrates this result.

Figure 7.2
The figure also shows that market demand lies above the marginal revenue curve. Indeed, since p(qm)<0
(i.e., market prices decrease in total output), marginal revenue must be weakly lower than demand, p(q).
1

This result calls for a more elaborate explanation of the economic intuition behind the MR(q).
( ) ( )
m m m
MR p q p q q = +
Intuitively, an increase in output produces two effects on the monopolists total revenue. First, it produces
a direct (positive) effect, since the monopolist can sell a larger amount of units at a price p(qm). Second,
however, it produces an indirect (negative) effect. In particular, the increased production implies a
movement along the demand curve, lowering prices. Hence, the monopolist must reduce the price it
charges to not only the new additional unit it produces (the marginal unit) but also the units it was
already producing (the so-called inframarginal units, i.e., those below the margin). Specifically, this

1
In addition, note that for q=0 both MR and p(q) coincide. This is also illustrated in the figure, where the vertical
intercept of both curves coincides. You can easily check that MR(q)p(q) for the case of linear demand p(q)=a-bq,
where MR(q)=a-2bq.
3

indirect effect is embodied in the second term of the MR(q) expression, which shows the reduction in
market price due to the increased production, p(qm), times all the units the monopolist produces, qm.
Since p(qm)<0, the indirect effect of larger output in total revenue is negative.
Furthermore, since p(qm)<0 in the above expression of MR(q), we have that p(qm)>c(qm). In words,
the price the monopolist sets is higher than its marginal costs from producing the last unit. In addition,
since in perfectly competitive markets p(q*)=c(q*), we can then conclude that pm>p* and, given that the
demand curve is negatively sloped, qm<q*. The above figure compares market prices under monopoly
and perfectly competitive industries, i.e., pm>p* , and total output, qm<q*.
The above first-order conditions lead us to conclude that the monopolist increases production until the
point in which marginal revenue and marginal costs coincide. In order to show that this is indeed a
production level that maximizes (and not minimizes) monopoly profits, we must next check the second-
order conditions associated to the above profit maximization problem. Taking the FOCs and
differentiating with respect to q again, we obtain

( ) ( ) ( ) ( ) 0
dMC
dMR
dq
dq
p q p q p q q c q + +
_

Or more compactly, this states that the slope of the marginal revenue must be weakly smaller than the
slope of the marginal cost function at the profit-maximizing output qm. This is indeed satisfied at the
optimum, as the following figure illustrates. In particular, the slope of the marginal cost curve is positive
while that of the marginal revenue curve is negative, which implies that the above SOC is satisfied. (Note
that this condition holds even if the marginal costs are constant in output, where the slope of the marginal
cost is zero). Alternatively, this condition can be understood as that the MR(q) curve crosses the MC(q)
curve from above.

Figure 7.3
4

Let us finally check if this property holds when the marginal cost curve is decreasing in output, as the
following figure indicates. In this case, the slope of the marginal cost and that of the marginal revenue
curve are both decreasing in q. Thus, the SOC imposes the restriction that, at the optimum, the marginal
revenue curve must be steeper than the marginal cost curve (or, alternatively, that the MR curve crosses
the MC curve from above).

Figure 7.4
The monopolist profit maximizing condition we found in the above FOCs can be alternatively expressed
in terms of the mark-up that the monopolist charges over marginal costs. Indeed, taking the MR function
MR(q) we have: ( ) ( )
p
q
MR p q p q q p q

= + = + , multiplying by
p
p
,

1/
1
d
d
p p q
MR p p p p
p q p

= + = +


Since at the profit-maximizing output the monopolist sets MR(q)=MC(q), we can set the above
expression of MR(q) equal to MC(q) to obtain
1
d
p MC
p

=
Intuitively, this index (the so-called Lerner index of market power) says that the price mark-up over
marginal cost that a monopolist can charge (as percentage of price) is a function of the elasticity of
demand. In particular, markets with more elastic demand curves have low price mark-ups, while markets
with more inelastic demands (for instance, because there are no close substitutes to the product sold by
the monopolist) imply substantial price mark-ups. The Lerner index can also be written as
1
1
d
MC
p

=
+

5

Perloff uses two examples of different elasticities of demand. First, in the case of the heart-burn medicine
Prilosec OTC, price-elasticity of demand has been estimated at approximately -1.2. Using the Lerner
index, this implies that p=5.88MC, or that the price that this drug company can charge is 5.88 times
higher than its marginal cost of production.
2
The second example considers designer jeans, with a slightly
more elastic demand of -2. In such case, the Lerner index shows that p=2MC, showing a lower price
mark-up than in the previous example.
Special case 1: Monopoly facing a linear demand curve
Consider a monopolist facing a market with linear demand function p(q)=a-bq, where b>0 implying that
the inverse demand curve is negatively sloped. The monopolist cost function is c(q)=cq, where c>0. In
addition, we usually assume that a>c. Note that this assumption just guarantees an interior solution since
it is the application of condition p(0) > c(0) to the current case, i.e., p(0)=a-b0=a and c(q)=c so c(q)=c.
In this case, note that the objective function for the monopolist (its profit function) becomes
( ) a bq q cq =
Taking FOCs we obtain
2 0 a bq c =
which imply a maximum since the SOC (-2b) is indeed negative, implying concavity of the profit
function.
3
Solving for the optimal qm in the above FOC we obtain
2
m
a c
q
b

=
And inserting qm into the inverse demand function p(q), we obtain monopoly prices pm,
2 2
m
a c a c
p a b
b
+
= =



Finally, inserting pm and qm into the monopolist profit function we can find the monopolist profits at the
optimum,

revenue costs
4
m m m m
a c
p q cq
b


= =
We can graphically represent our previous results in the following figure. Interestingly, note that for
linear demand curves, the MR curve has double the slope of the inverse demand function, i.e., it crosses

2
Note that this data corresponds to the elasticity of demand before Prilosec OTC lost its patent, and could be also
sold as a generic drug by supermarkets such as RiteAid, Safeway, etc.) As a consequence, we can anticipate that the
elasticity of demand for this drug is now probably lower, reducing as a consequence the price mark-up.
3
Note that this result is due to the assumption of negatively sloped demand (i.e., b>0). If, instead, demand was
positively sloped (as in the case of Giffen goods) and b<0, then the output decision of the monopolist represented in
the FOC would not guarantee an output level that maximizes the firms profits.
6

the horizontal axis at the midpoint between the origin and the horizontal intercept of the inverse demand
curve.

Figure 7.5
In addition, note that our previous analysis can be easily extended to the case in which marginal costs are
not constant in q, but instead increasing as the following figure illustrates. Indeed, demand is still linear in
output, but the cost function is convex in output, e.g., c(q)=cq
2
implying a marginal cost curve c(q)=2cq,
with a positive slope of 2c as indicated in the figure.

Figure 7.6
7

Special case 2: Constant elasticity demand function
Another case of interest where we can readily apply the Lerner index is that in which the monopolist faces
a constant elasticity demand function q(p)=Ap
-b
where b represents the price-elasticity of demand.
(Applying the definition of price-elasticity of demand it is easy to show that

1
( )
( 1) ( 1) ( 1)
( )
( ) ( )
1

b
b
q p
p
p
q
b b b b
q p p p
q b Ap
p q Ap
b Ap p p b p p b
A

+ + +

= =

= =
_

Hence, using the Lerner index, we can find the monopoly price, as follows
1 1
( )
1 1
m
q b
c c
p

= =
+


Welfare loss of monopoly
The following figure illustrates the welfare loss associated to monopoly. In particular, note that consumer
surplus decreases in B and C (i.e., -B-C) when moving from a perfectly competitive to a monopoly, while
the producer surplus increases in B-E. Therefore, area B simply represents a transfer from consumers to
producers, whereas areas C and E illustrate a net loss for the economy, since they are not transferred to
any other economic agent.
8


Figure 7.7
Thus, the shaded area C+E denotes the deadweight loss of monopoly. More formally, the deadweight loss
is identified by the area below the inverse demand curve, p(q), and above the marginal cost curve, c(q),
lying between the monopoly and the perfectly competitive outputs qm and q*. That is, the deadweight
loss is the area represented by the integral
[ ]
*
( ) ( )
m
q
q
DWL p s c s ds =


9

Interestingly, note that this expression decreases as demand (and/or supply) become more elastic.
4

The following two graphs show how monopoly profits and deadweight loss vary with demand elasticity.

$90

$60
$50
$35

$10



Figure 7.8

D2 is more elastic than D1. With less elastic demand (D1), a monopolist charges higher price ($50) as
compared to the more elastic demand (D2) where it charges only $35. For both demands, the profit
maximizing choice of quantity is where MR = MC where MC1=MC2=MC
Monopoly profit and social deadweight loss increase with decreasing elasticity. The triangle in dark red
on the following figure shows deadweight loss due to the difference in elasticity of D1 and D2 and the
monopoly profit is $750 (=(50-35)(50)) more under D2 than D2 (shown in black).


4
This result resembles that we found when analyzing the size of the deadweight loss associated to the imposition of
a sales tax under perfectly competitive markets.
MC1=MC2
D2
Qm
MR1
MR2
Q
10


Figure 7.9




If, for instance, demand becomes infinitely elastic, p(q)=0. In that case, the inverse demand curve
becomes totally flat, as the following figure represents. Since p(q)=0 then MR(q)=p(q)+0*q=p(q). This
explains why the figure does not show any marginal revenue curve: it essentially coincides with the
inverse demand curve (See recitation #9 Ex. 1).

Figure 7.10
The monopolists profit maximizing rule, MR=MC, becomes therefore p(q)=MC(q), which exactly
coincides with that of a firm operating in a perfectly competitive industry. As a consequence, qm=q* in
the figure, and thus deadweight loss is measured as the area between the inverse demand curve and the
D1
Qm=50
Qc=100
Q
Fig. Monopoly profits and deadweight loss vary with elasticity of demand
(Carlton and Perloff, page 98 and adjusted to fit the explanation here)
11

marginal cost curve between two coincident output levels, qm and q*, leading to a null deadweight loss of
monopoly.
[ ] [ ]

*
since
*
( ) ( ) ( ) ( ) 0
( ) for all
m
q q
m
m m
q q
q q
DWL p s c s ds p s c s ds
p q p q TR p q MR p
=
= = =
= = =


Demand and MR curves totally overlap.




Welfare losses and elasticity
Let us next examine more closely the connection between the welfare loss of a monopoly (DWL) and
elasticity. For simplicity, let us consider a monopolist with constant marginal and average costs, c, who
faces a market demand with constant elasticity, i.e., q(p)=p
-e
where e denotes the elasticity of demand
where e<-1.
5
In the competitive equilibrium, price equals marginal cost, i.e, p
c
=c, whereas under
monopoly, price p
m
can be found by the Lerner index (or rearranging it in the so-called inverse elasticity
pricing rule, IEPR), as follows
1
1
m
c
P
e
=
+

Given that demand is q(p)=p
-e
, the consumer surplus associated with any price p0 can be computed as the
area above p0 and below that demand curve,
0 0
0
1 1
0
( )
1 1
e
P P
e e
P
CS Q P dP P dP
P P
CS
e e

+ +
= =
= =
+ +


Therefore, under perfect competition, where p0=c, consumer surplus becomes
1
1
e
c
c
CS
e
+
=
+


5
Recall that, at the optimum, the monopolist operates in the inelastic portion of the demand curve, i.e., price-
elasticity is smaller than -1. When the monopolist faces a linear demand curve (where elasticity is different at
different points along the curve) we dont need to impose any further assumptions. In this case, however, price-
elasticity is the same along all points of the demand curve. We must hence impose the condition that e<-1.
12

whereas consumer surplus under monopoly, where pm=c/(1+1/e), becomes
1
1
1
1
e
e
m
c
CS
e
+


+

=
+

Taking the ratio of these two surplus, CSm/CSc, we obtain a measure of the difference in consumer
surplus between monopoly and perfectly competitive markets. In particular, a ratio close to zero implies
that the difference between CSm and CSc is extremely large, while a ratio close to one implies that
consumers welfare is approximately the same in both market structures. In particular, dividing CSm over
CSc we obtain
1
1
1
1
e
m
c e
CS
CS
+

=

+


For instance, if price-elasticity is -2, this ratio becomes , intuitively saying that CS under monopoly is
half of that under perfectly competitive markets. The following figure illustrates how this ratio is affected
by reductions in the price-elasticity demand. In particular, the ratio becomes closer to zero as demand
becomes more inelastic (movements to the left in the figure). Intuitively, as demand becomes more
insensitive to price, the monopolist can exercise its market power charging higher mark-ups over
marginal cost ultimately reducing CSm.

Figure 7.11
Let us now focus on the monopoly profits. Notice that, after the monopolist sets profit-maximizing output
and prices, profits are given by
1
1
1 1 1
1
1
1 1 1
m m m m m
e
e e
c
e
m
e e e
c
P Q cQ c Q
c c
e

+

= =

+


= =

+ + +


13

Hence, the monopoly profits can expressed as a function of only two parameters: marginal costs and
price-elasticity of demand. In addition, in order to evaluate the transfer of welfare from CS into monopoly
profits that consumers experience when moving from a perfectly competitive market to a monopolistic
market, we can divide monopoly profits by the CSc, as follows
1
1
1 1
1 1
e
e
m
c e
e e
CS e e

+

+
= =

+ +



Notice that, if price-elasticity is -2, then this ratio is . The following figure illustrates that a more
inelastic demand (leftward movements) increase the percentage of monopoly profits over the consumer
surplus under perfectly competitive markets, i.e., the transfer of CS towards monopoly profits increases.
A good exercise for additional practice: HW#9 Exercise #6.
Finally, note that the social costs of monopoly are not only evaluated by the DWL area defined in
previous sections. Indeed, there are more social costs of monopoly. Here are some examples of social
costs associated to monopolists:
1. R&D expenditure. In some cases, it might be excessive. For instance, in a patent race all (or
most) of the R&D expenditure made by the firm that didnt get the patent is a social cost.
2. Persuasive (not informative) advertising.
3. Lobbying costs (different from bribes). Indeed, note that a bribe cannot be strictly considered as a
social cost since it simply implies the transfer of money from the monopolist to politicians. Other
type of lobbying costs, such as the time spent by lobbyists trying to convince politicians about the
benefits of certain policies, for instance guaranteeing a legal barrier of entry, can be considered a
social cost associated to monopolies.
4. Resources to avoid entry of potential firms in the industry. This occurs, for instance, if the
incumbent (monopoly) overinvests in capacity that sits idle afterwards just to guarantee that any
entry will be severely fought by flooding the market with products for a few periods.
As the above discussion suggests, some expenditures cannot be strictly considered social costs associated
to monopolies. For example, bribes are just a wealth transfer from the monopolist to politicians.
Similarly, some forms of R&D (not directly related with patent races) might produce benefits for the firm
in the long run, and cannot therefore be considered social costs.

Comparative statics
In this subsection we examine how qm varies as a function of marginal cost. As the following figure
indicates, we expect monopoly output to decrease in marginal cost.
14


Figure 7.12
We know that at the optimum, qm(c),
( ( ), )
0
m
m
q c c
q

=


Differentiating with respect to c and using the chain rule, we obtain
2 2
2
( ( ), ) ( ) ( ( ), )
0
m m m
q c c dq c q c c
q dc q c

+ =


And solving for dqm(c)/dc, we have
2
2
2
( ( ), )
( )
( ( ), )
m
m
m
q c c
dq c q c
q c c dc
q


Note that this expression could be immediately obtained by using the Implicit Function Theorem. In
addition, since the monopolist profit function is concave in q, the denominator of the above expression
must be negative. This implies that the sign of the dqm(c)/dc intuitively representing how qm is affected
by changes in c only depends on the sign of the term at the numerator. This conclusion is valid for any
demand function, and even more generally, for the case in which we dont perfectly observe the demand
function, but we have information about the cross-derivative represented in the numerator. Let us next
apply this rule to a linear demand curve p(q)=a-bq. In this case, the cross-derivative is
[ ]
[ ]
2
( )
2
( ( ), )
1
m
a bq q cq
q a bq c
q c c
q c c c


= = =


15

Inserting this result into the above expression for dqm(c)/dc we obtain that this derivative must be
negative since
2
2
2
( ( ), )
( )
( ( ), )
m
m
m
q c c
dq c q c
q c c dc
q


which intuitively implies that an increase in marginal costs, c, produces a decrease in monopoly output
qm. The following two figures illustrate an increase in marginal costs when the monopolist faces a linear
demand function, first for the case in which marginal costs are increasing in q (i.e., total costs are convex
in output) and second, for the case that marginal costs are constant in q. Of course, the above result is
independent on the increasing or constant pattern of the marginal cost curve.

Figure 7.13
16

Multiplant monopolist
In this subsection we briefly analyze the monopolist production decision when it operates more than one
plant. For instance, the monopolist produces in different countries (with potentially different efficiency
levels, salaries, etc.) but sells its global production to an international market. The monopolist decision
about how much total output to produce resembles our previous analysis. Nonetheless, its decision about
how to distribute total output among its different plants requires a more detailed discussion.
6
In particular,
the monopolist produces output q1, q2,, qN across the N plants it operates, with total costs TCi(qi) at
each plant i={1,2,,N}. Hence, the multiplant monopolist profit-maximization problem becomes
1
,...,
1 1 1
max ( )
N
N N N
i i i i
q q
i i i
a b q q TC q
= = =






And taking FOCs with respect to the production level at any individual plant j, qj, we have
1
2 ( ) 0 ( ) ( ) for all
N
i j j j j
i
a b q MC q MR Q MC q j
=
= =


Note that the last condition states that the monopolist must increase its production in plant j, qj, until the
point in which the marginal cost of producing further units in that plant, MCj(qj) coincides with the
marginal revenue that the monopolist obtains by selling this additional unit in the international market,
i.e., MR(Q). The following figure illustrates this profit maximizing condition for a multiplant monopolist
operating two plants with marginal costs MC1 and MC2.

Figure 7.14

6
For introductory references, see Besanko and Braeutigam (section 11.4) and Shy (section 5.4).
17

First, note that the monopolist total marginal costs, MCtotal, are given by the horizontal sum of its
marginal costs among both plants, i.e., MCtotal=MC1+MC2. Given MCtotal, we can easily find the total
production for this monopolist by setting its total marginal costs, MCtotal, equal to its marginal revenue
curve, MR, which occurs at point A. Hence, the monopolist produces Qtotal among both plants, and sells
this total output at a price pm. At this point, we must distribute the total production Qtotal between the
two plants. In order to do that, we first evaluate MR(Qtotal) graphically represented by the height of
point A in the figureand we find the point of the MC1 for plant 1 that reaches this same height. This
determines the output q1 that plant 1 produces, i.e., q1 solves MR(Qtotal)=MC1(q1). Similarly for plant
2, starting from the height of point A, which depicts MR(Qtotal), we extend a dotted line to the left
crossing MC2. This determines q2, i.e., q2 solves MR(Qtotal)=MC2(q2).
In order to find closed-form solutions in the above example, consider that all plants are symmetric and
have the cost function TCi(qi)=F+c(qi)2. Hence, they all produce the same output level, q1=q2==qN=q
and the above FOCs become
2 2 which implies
2( )
j j j
a
a bNq cq q
bN c
= =
+

which implies that the total output produced by the monopolist, Q, is
2( )
j
Na
Q Nq
bN c
= =
+

and market price is
( 2 )
2( ) 2( )
Na a bN c
p a bQ a b
bN c bN c
+
= = =
+ +

As special cases, note that if the monopolist operates a single plant, i.e., N=1, then total output and price
coincide with that under standard monopoly models, i.e., Q=qm and p=pm. In addition, note that an
increase in the number of plants N decreases the individual production for every plant qj. Furthermore, an
increase in N reduces the profits for every individual plant. Finally, we briefly examine an example of a
multiplant monopolist operating two asymmetric plants.
Example. Consider a monopolist facing linear demand p(q)=120-3Q and operating two plants with
marginal costs MC1(q1)=10+20q1 and MC2(q2)=60+5q2, respectively. First, we seek to determine total
output Qtotal. In order to obtain Qtotal we need to find MCtotal. However, note that we cannot obtain
MCtotal summing (10+20q1)+(60+5q2). Indeed, this would be a vertical sum (not a horizontal sum) of
the marginal cost functions. Hence, we first must invert the marginal cost functions for each plant,
obtaining
1
1 1 1
1
10 20
20 2
MC
MC q q = + = and similarly
2
2 2 2
60 5 12
5
MC
MC q q = + =
We can then sum q1+q2= Qtotal to obtain Qtotal=0.25MCtotal-12.5, or inverting it, MCtotal=50+4Qtotal.
Setting MR(Q)=MCtotal(Q), we obtain Qtotal=7 and market price is p=$99. We can now evaluate the
marginal revenue at Qtotal, i.e., MR(Qtotal), obtaining 120-6*7=$78. This allows us to set
MR(Qtotal)=MC1(q1), or 78=10+20q1 which implies q1=3.4 units. Similarly for plant 2, setting
MR(Qtotal)=MC2(q2), or 78=60+5q2 which implies q2=3.7 units. Clearly, Qtotal=q1+q2.
18





Price discrimination
In this section we analyze how monopoly profits can be further increased by setting different prices for
purchases of different quantities of the good (or different prices to different customers).
7
Intuitively, the
monopolist is making positive profits, but could still capture a larger surplus from two segments of
customers, as the following figure illustrates. On one hand, the customers buying the product at pm would
be willing to pay more for the good, paying prices p>pm. On the other hand, there is a segment of
customers who didnt buy the good at pm, but whose willingness to pay for the good is still higher than
the marginal cost of production, i.e., pm>p>c. When setting a uniform price pm for all units, however, the
monopolist captures neither of these segments of potential customers. In fact, in order to capture these
additional surpluses, the monopolist must abandon uniform pricing and use a form of price
discrimination.

Figure 7.15
In particular, we will discuss two types of price discrimination: first (or perfect) price discrimination
where the monopolist charges to every customer his/her maximum willingness to pay for the object and
third degree price discrimination where the monopolist charges different prices to two or more groups of
customers. We do not examine second degree price discrimination where the monopolist offers a menu
(or plan) to customers so that every type of customer self-selects the most convenient menu since that
exposition involves elements of game theory that havent been discussed yet.

7
In this section we follow some parts of NS (pp. 503-509) and of Varian (sections 14.5-14.8).
19

First-degree price discrimination. Under first (or perfect) price discrimination, the monopolist charges a
different price to every buyer (i.e., a personalized price). The first buyer pays p1 for the q1 units, the
second pays p2 for q2-q1 units, and similarly for all other buyers, as the next figure illustrates.
Specifically, the monopolist continues doing so until the last buyer is willing to pay the marginal cost of
production. (Increasing sales any further would imply loses for the monopolist).

Figure 7.16
In the limit, the monopolist captures all the area below the demand curve (representing consumers
willingness to pay) and above marginal cost, as depicted in the following figure.

Figure 7.17
Let us prove the above result in a more formal way. Suppose that the monopolist can offer a combination
of a fixed fee, r*, and an amount of the good, q*, that maximizes its profits. This implies choosing (r*,q*)
that solve the following PMP
,
max
. . ( )
r q
r cq
s t u q r


20

First, note that the monopolist wants to raise the fee r until u(q)=r. (Otherwise, the monopolist could still
increase its profits by further increasing fee r). Hence, we can reduce the set of choice variables (from
(r,q) to only q), as follows
max
q
u(q)-cq
Taking first order conditions with respect to q, we obtain u(q*)-c=0, i.e., u(q*)=c. Intuitively, the
monopolist practicing first-degree price discrimination increases output until the marginal utility that
consumers obtain from additional units (graphically represented by the inverse demand curve) coincides
with the marginal cost of production. Given this level of production q*, we can obtain the optimal fee,
r*=u(q*). This result states that the monopolist charges a fee r* that coincides with the utility that the
consumer obtains from consuming q* units of output. Both of these results are graphically represented in
the following figure where: (1) fee r* is depicted by all the area below p(q) until q* units; and (2) the
monopoly profits are therefore r*-cq*, i.e., the area below the demand curve and above marginal cost.

Figure 7.18
Example. Let us next consider a simple example. A monopolist faces inverse demand curve p(q)=20-q
and constant marginal costs c=$2. When it practices uniform pricing, setting MR equal to MC, the
monopolist produces q=9 units at a price p=$11 with associated profits of $81. These profits are
represented by the shaded area in the following figure.
21


Figure 7.19
If, instead, the monopolist practices first-degree price discrimination, it sets p(q)=MC, producing q=18
units at a price of p=$2, with corresponding profits of $162, graphically represented by the area below the
demand curve and above marginal costs in the previous figure. As expected, the practice of first-degree
price discrimination increases the monopolists profits.
8


Summarizing, under first-degree price discrimination, output coincides with that in perfectly competitive
markets, where p(q*)=c. Unlike perfectly competitive markets, however, the consumer does not capture
any surplus. In contrast, the producer captures now all this surplus. Because this type of price
discrimination requires an enormous amount of information, we do not see many examples of it in real
applications. Nonetheless, some examples approach this type of price discrimination to a large extent. For
instance, financial aid in undergraduate education is often cited as a form of tuition discrimination
practiced by many US colleges. In particular, application forms ask many details about the students (and
his/her family) finances in order to determine his/her willingness to pay for higher education. On a lighter
note, Coca-Cola tried to apply first-degree price discrimination by installing a thermometer in their
vending machines. Specifically, the vending machine increased soda prices according to the temperature,
where potential buyers willingness to pay was higher on a hot day.
9

Third-degree price discrimination. In this type of price-discrimination, the monopolist sells its product
to two (or more) different types of customers that are easily identifiable by the monopolist, e.g., youth

8
For another example, see Example 14.4 in NS.
9
Coca-Colas public image among many customers was damaged by these vending machines, and the company
finally decided to take the vending machines away.
22

versus adult customers at the movies (which can be identified by showing a valid ID).
10
The monopolist
PMP hence becomes
1 2
1 1 1 2 2 2 1 2
,
max ( ) ( )
x x
p x x p x x cx cx +
Taking first order conditions with respect to x1 and x2 we obtain,
1 1 1 1 1 1
2 2 2 2 2 2
( ) ( ) 0
( ) ( ) 0
p x p x x c MR MC
p x p x x c MR MC

+ = =

+ = =

Interestingly, these FOCs coincide with those of a regular monopolist who practices uniform pricing as if
it was serving two different markets, i.e., MR1=MC and MR2=MC. The following figure illustrates this
idea for the example of adults (market 1) and seniors (market 2) at the movies. In particular, p1(x1)=38-
x1 for adults, p2(x2)=14-1/4x2 for seniors and MC=$10 for both markets. Indeed, it is easy to check that
MR1(x1)=38-2x1, which crosses MC=10 at x1=14 units, implying a price for adults of p1=$24. Similarly
for seniors, MR2(x2)=14-0.5x2, which crosses MC=10 at x2=8 units, implying price for seniors of only
p2=$12.

Figure 7.20
Using the property that MRi=MC for every type of customer i. We can rewrite this expression using the
IEPR just as we did for monopolist practicing uniform pricing in previous sections of this chapter. In
particular,

10
Recall that this differentiates this type of price discrimination with that under second-degree, where the
monopolist cannot easily identify different groups of customers, and must offer a menu in order to achieve self-
separation, i.e., that every customer chooses the most convenient menu, e.g., calling plan in a phone company.
23

1
2
1 1
1
2 2
1
( )
1
( )
1
c
p x
c
p x


1 2
1 1 2 2 1 1
1 1
2 1
2 1 2 1
Note that ( ) ( ) if and only if ( ) , which
1 1
1 1 1 1
implies 1 1 .
c c
p x p x p x



> = >

> < >

Therefore, the market with the more elastic demand (the market that is more sensitive to price changes,
i.e., market 2 in our above example) is where the monopolist charges the lower price.
Example. A single airline operates the route Pullman-Seattle and considers charging different prices for
their business class seats and economy seats.
11
According to demand estimates, the price-elasticity of
demand for business class seats is -1.15 while that for economy seats is -1.52, showing a larger sensitivity
to price changes. From the first estimate, and using the IEPR, we can conclude that the price charged for
every business class seat must satisfy pB0.13=MC. Similarly, using the IEPR we obtain that the price
charged for every economy class seat must satisfy pV0.343=MC. Therefore, pB0.13= pV0.343, or
pB=2.63pV. That is, the airline maximizes its profits by charging business class seats a price 2.63 times
higher than that of economy class seats.
12,13


Regulation of Natural Monopolies
Some monopolies exhibit decreasing cost structures, with the MC curve lying below the AC curve, as the
following figure depicts. In this case, having a unique firm serving the entire market might seem better
(more natural than) having multiple firms, since total average costs would be lower in the former than
in the latter case. For this reason, monopolies with decreasing costs are usually referred as natural
monopolies. An unregulated natural monopoly, however, would maximize profits at the point where
MR=MC, producing Q1 units in the figure and selling them at a price p1. If a regulatory agency dislikes
this monopoly output and prices and forces the monopoly to charge marginal cost pricing (as if the market
structure was perfectly competitive) the monopoly will have to charge p2 (where demand crosses MC)
and produce Q2 units. This production level, however, implies a loss of p2-c2 per unit in the figure.

11
If you have been in that plane, you know that the airlines marginal cost of offering business class and economy
class seats is exactly the same!
12
NS presents a similar example in Example 14.5.
13
Note that third-degree price discrimination might imply serving (not serving) some customers who might be not
served (served, respectively) under uniform pricing. This implies that the practice of third-degree price
discrimination can be welfare improving (or welfare reducing) under certain conditions. For a detailed discussion on
this topic, see Varian pp. 250-253.
24


Figure 7.21
The above discussion illustrates a dilemma for regulatory agencies when dealing with natural monopolies:
either they abandon the policy of setting prices equal to marginal cost altogether, or they continue
applying marginal cost pricing but must subsidize the natural monopoly (providing p2-c2 per unit of
production) forever. One way in which regulatory agencies can avoid this dilemma is the implementation
of a multiprice system: charging some users a high price while maintaining a low price (e.g., marginal
cost pricing) to other users. For instance, the regulatory commission can allow charging a high price p1 to
some users while other users are offered a lower price p2, as the following figure illustrates. Specifically,
this produces a benefit p1-c1 per unit of output from 0 to Q1 and a loss of c2-p2 per unit of output for the
additional units (Q2-Q1) sold to the second segment of customers. This approach is frequently used by
several utility companies (electricity, water supply, etc.) that set different prices to different types of
customers (e.g., business, households, etc.)

Figure 7.22
An alternative approach to the regulation of natural monopolies is to allow the monopoly to charge a price
above marginal cost that is sufficient to earn a fair rate of return on capital investments. This approach,
however, presents two difficulties. First, it might be prone to different interpretations about what is a
fair rate of return on capital investments. Second, it leads to overcapitalization, as we show more
formally below.
25

Overcapitalization of natural monopolies (Averch-Johnson effect). Suppose a regulated utility company
has a production function of the form q=f(k,l). Suppose that the rate of return on capital investments, s, is
constrained by a regulatory agency to be equal to s0. Then, the firms profit maximization problem is
represented by the following Lagrangian
[ ]
0
( , ) ( , ) L pf k l wl vk wl s k pf k l = + +
where the constraint states that the rate of return on capital investment dictated by the regulatory agency
is s0. Note that cannot be zero. Otherwise, the above PMP would simply become pf(k,l)-wl-vk. Indeed,
in such case the regulation would be ineffective, and the monopolist would behave as any profit-
maximizing firm. Similarly, cannot be equal to one. Otherwise, the above PMP reduces to (s0-v)k. In
addition, assuming that the rate of return dictated by the regulatory agency s0 is higher than that currently
present in the market, v, s0>v, this will mean that the monopoly will hire infinite amounts of capital. It
must therefore be that 0<<1. In particular, the FOCs are
0
0
( ) 0
( ) 0
( , ) 0
l l
k k
L
pf w w pf
l
L
pf v s pf
k
L
wl s k pf k l

= + =

= + =

= + =


From the first FOC, we obtain that the regulated monopoly increases L (hiring more workers) until pf
l
=w,
i.e., until the point in which the value of the marginal product of labor coincides with the marginal cost of
an additional worker. The result obtained in the second FOC, in contrast, implies that the monopolist
increases capital until
0
0 0
(1 )
( )
1 1
k
k
p f v s
v s s v
p f v



=

= =


and since s0>v and 0<<1, the above condition implies
0
( )
1
k
s v
p f v

+
+


Hence, pf
k
<v. Therefore, the firm would hire more capital (achieving a lower marginal product of capital
f
k
) than under unregulated conditions, where pf
k
=v. This result suggests why some regulated natural
monopolies (such as electricity and water suppliers) might be overcapitalized after being regulated. The
following figure illustrates this overcapitalization result (also referred as the Averch-Johnson effect). In
particular, before regulation the firm selects the input combination (LBR, KBR). After the regulation is
introduced capital becomes cheaper in relative terms (flatter isocost) which leads the firm to choose an
input combination with a larger amount of capital (LAR,KAR).

1

Chapter 8 Externalities and Public Goods
Externalities
An externality is present when the well-being of a consumer (or the production possibilities of a firm) is
directly affected by the actions of another agent in the economy. Accordingly, externality can arise in
many ways, but, however they arise, their affects are always the same. The actions of a consumer or a
producer may benefit or harm other consumers or producers that can be distinguished as effects of
positive or negative externalities.

Negative externalities
Externalities can be negative if they impose costs on or reduce benefits for other producers and
consumers. One of the standard examples in the case of pollution occurred by production is if a
manufacturer of an industrial good causes environmental damage by polluting the air or water. Suppose
that a factory produces and sells tires. In the course of the production, smoke is produced, and everybody
that lives in the neighborhood of the factory suffers because of it. The price consumers are willing to pay
for tires is given by the benefit derived from using the tires. Hence at the market equilibrium, the marginal
cost of producing a tire is equal to the marginal benefit of using the tire, but the market does not
incorporate the additional cost of pollution imposed on those who live near the factory. Thus from the
social point of view, too many tires will be produced by the market. As a result, with the negative
externality, the marginal social cost exceeds the marginal private cost.

Positive externality
With a positive externality, the marginal social benefit from the good or service exceeds the marginal
private benefit. The example of positive externalities in the production is the development of a new
technology like the laser or the transistor benefits not only the inventor, but also many other producers
and consumers in the economy. Another simple example can be a local bakery producing bread example.
People who walk by the bakery get the benefit from the pleasant smell of baking bread, and this is not
incorporated into the price of bread. Therefore, at the equilibrium, the marginal social benefit of another
loaf of bread is equal to the benefit people get from eating the bread as well as the benefit people get from
the pleasant smell of baking bread. However, since bread purchasers do not take into account the benefit
provided to people who do not purchase bread, at the equilibrium price the total marginal benefit of
additional bread will be greater than the marginal cost. From a social perspective, too little bread is
produced.
1

Therefore, in the next section, to illustrate all these externality issues, we will discuss bilateral
externalities and solution to the externality.
Bilateral externalities
2
. In this section we focus on two consumers, i={1,2} that belong to an economy
with several consumers and firms, who consume L traded goods with price vector p. For simplicity, we
will assume price taking behavior.
3
Every individuals wealth level is w
i
, and his/her utility function is

1
For more examples of negative and positive externalities, please refer to Besanko and Braeutigam, 2005, pp.638-
651
2
An additional explanation on bilateral (interfirm) externalities can be found in Snyder and Nicholson, 2008, p.671
3
Note that this implies that the L commodities are traded in perfectly competitive markets. Modifying this
assumption to one in which commodities are traded in monopoly or oligopoly markets can have significant
consequences on our results. See, for instance, Koldstad.
2

u
i
(x
1i
,x
2i
,,x
Li
,h) where h is the measure of the externality that consumer 1s actions cause on consumer
2s wellbeing, e.g., tons of CO2 in the air. Note that in the case of negative externalities, such as the
pollution of a river, global warming, loud music of your officemate, etc. imply that du
i
/dh < 0. In contrast,
when externalities are positive (such as your neighbors care of his garden, vaccination decisions in your
community, etc.) individuals utility level increases in h, i.e., du
i
/dh > 0.
4,5

Let us define, for a given level of the externality, h, individual is Utility Maximization Problem (UMP)
0
max ( , )
. .
i
i i
x
i i
u x h
s t p x w



and define v
i
(p,w
i
,h) to be the value function associated to the above maximization problem. For
simplicity, consider the following quasilinear utility function
u
i
(x
i
,h)=x
1i
+g(x
-1i
,h)
where x
-1i
denotes individual is consumption of all goods other than good 1, i.e., x
-1i
=(x
2i
,x
3i
,,x
Li
).
Then, the Walrasian demand for these L-1 goods, x
-1i
(p,h), is independent of her wealth. Therefore,
1 1
( , , ) ( ( , ), )
i i i i
v p w h x g x p h h

= +
But we know that
1 1 1 1
( , ) ( , )
i i i i i i
x p x p h w x w p x p h

+ = =
Hence,
1
1 1
( , , ) ( , ) ( ( , ), )
i
i i i i i
x
v p w h w p x p h g x p h h

= +


And denoting
1 1
( , )
( , , ) ( , ) ( ( , ), ) ( , )
i
i i i i i i i
p h
v p w h w p x p h g x p h h p h w


= + = +


That is,
( , , ) ( , )
i i i i
v p w h p h w = +
and since the prices of the L goods are unaffected by changes in the externality level h, we can simply
write
i
(h). In particular,
i
(h) reflects how individual is utility is affected by the externality, where we

4
This property of positive externalities is very similar to public goods, where larger contributions of other
individuals to the public good increase every individuals utility level (because of the non-rivalry property). We
return to the connection between positive externalities and public goods below.
5
Externalities can also arise in production. For a worked-out example, see Example 19.1 in NS.
3

assume that
i
(h)0 and that
i
(h)<0 (indicating that the first derivate
i
(h) is decreasing in h), as the
following figure illustrates.

Figure 8.1
Hence, individual i obtains a positive and significant additional benefit from the first unit of the
externality-generating activity, but the additional benefit becomes lower as the amount of the activity
increases. This could be the case, for instance, of a firm generating pollution as a side-effect of its
production process
Competitive equilibrium: In the competitive equilibrium, every individual independently chooses the level
of the externality-generating activity, h, that solves the UMP
0
max ( )
i i
h
h w

+
Taking FOC with respect to h, we obtain
* *
( ) 0, with equality if 0 (interior)
i
h h

>
This result is graphically represented (for an interior solution) in the following figure, where individual i
increases the externality-generating activity until the marginal benefits he would obtain from an
additional unit (net of marginal costs) are exactly zero, at h*.
4


Figure 8.2
Pareto optimal: In contrast, the social planner selects the level of h that maximizes social welfare, that is
1 2
0
max ( ) ( )
h
h h

+
The first-order condition for an interior maximum is:
1 2
( ) ( ) 0, with equality if 0 or
o o o
h h h

+ >
where h
0
is the Pareto optimal amount of h
That is
1 2 1 2
( ) ( ) or ( ) ( ) in the case of interior solutions.
o o o o
h h h h

=
Intuitively, this condition states that, at an interior solution the marginal benefit that consumer 1 obtains
from an additional unit of the externality-generating activity,
1
(h), must be equal to the marginal benefit
that consumer 2 obtains,
2
(h), as the following figure depicts.

5


Figure 8.3
Importantly, note that the above figure represents the case of a negative externality, i.e.,
2
(h)<0 that is
bad externality for consumer 2 (loud music). In this case, the level of h*>h
0
, where too much externality h
is produced.
If, in contrast, the activities of consumer 1 produce a positive externality of consumer 2s wellbeing
(baking bread smell or beautification of the yard), i.e.,
2
(h)>0, h*<h
0
(i.e., there is an underproduction
of the externality-generating activity) as the following figure illustrates.
6


Figure 8.4
A couple of points are in order. First, negative externalities are not necessarily eliminated at the Pareto
optimal solution. Indeed, this would only occur at the extreme case in which the externality-generating
activity produces a sufficiently high damage for consumer 2 such that -
2
(0)>
1
(0) and the Pareto
optimal solution only occurs at the corner where h
0
=0. Second, in this example the quasilinearity
assumption eliminated the wealth effects. A natural question is what would happen if we do not assume
that consumers have quasilinear utility functions. This question is explored in exercises given in the
homework assignment
6
.

Solutions to the externality problem
1. There are two traditional approaches for solving externality problems:Setting quotas (emission
standards). If the social planner is perfectly informed about the benefits and damages of the
externality for all consumers, he can choose to set an emission standard banning production levels
higher than the Pareto optimal level h
0
.

2. Imposing tax on the externality-generating activity or Pigouvian taxation.
7
This policy sets a tax t
h

per unit of the externality-generating activity h. But, what is the level of tax t
h
that restores efficiency?
In order to answer this question, let us start by re-writing consumer 1s UMP including the tax, as
follows
1
0
max ( )
h
h
h t h


Taking FOCs with respect to h, we obtain

6
Exercise 24.1 from Varian, Microeconomic analysis, and exercise 11.B.4 in MWG
7
For additional references about this policy measure, see pp. 355-56 in MWG and Ch. 7 in Koldstad.
7

1 1
( ) 0 ( )
h h
h t h t


which, in the case of interior solutions, implies
1
(h)=t
h
. Since we know that at the optimal level, h
0
,

1
(h
0
)=-
2
(h
0
). Thus setting t
h
=-
2
(h
0
) (which is positive) will lead consumer 1 to choose ht=h
0
,
implementing the social optimum, see below figure .

Figure 8.5

Importantly, note that the tax produces a downward shift in the curve representing consumer 1s
marginal benefit from additional units of the externality-generating activity, as the next figure depicts.
This allows for the new curve of marginal benefits to exactly cross the horizontal axis at h
0
, indicating
that after the tax is imposed consumer 1 voluntarily chooses a level of h that coincides with the Pareto
optimal level h
0
.

Figure 8.6

Intuitively, note that the optimality-restoring tax t
h
is equal to the marginal externality at the optimal
solution. That is, it is equal to the amount of money that consumer 2 would be willing to pay
consumer 1 in order to reduce h slightly from its optimal level h
0
. As suggested above, the tax t
h

induces consumer 1 to internalize the externality that he is causing to consumer 2. These types of
8

optimality-restoring taxes are referred as Pigouvian taxes. Finally, note that in the case that such
negative externality is very substantial (and h
0
=0), we need to impose a tax t
h
=-
2
(0) or higher, as
described in the following figure.

Figure 8.7

All our previous discussion can also be extended to positive externalities. In particular, we similarly
set a tax t
h
=-
2
(h
0
). However, now
2
(h
0
)>0 (since further units of h increase consumer 2s welfare),
which implies that the tax t
h
=-
2
(h
0
)<0, i.e., the tax establishes a subsidy to consumer 1 for every
unit of the positive externality h that he generates. Graphically, this per-unit subsidy produces an
upward shift in the curve representing the marginal benefits that consumer 1 obtains from increasing
the amount of the externality-generating activity, as the following figure represents. This implies that
consumer 1 has incentives to increase h beyond the competitive equilibrium level h* until the Pareto



Figure 8.8
9

Some important points about Pigouvian taxation
8
:
a. A tax t
h
on the negative externality is equivalent to a subsidy inducing agents to reduce the
externality until the Pareto optimal level h
0
. In particular, consider that the social planner sets
a subsidy s
h
=-
2
(h0)>0 for every unit that consumer 1s choice of h is below the equilibrium
level of h*. Hence, consumer 1s UMP becomes

* *
1 1
0
subsidy per unit tax
max ( ) ( ) ( )
h h h
h
h s h h h s h s h

+ = +
Taking FOCs with respect to h, we obtain
1
(h)-th0, i.e.,
1
(h
0
)t
h
. Importantly, this FOC
coincides with that under the Pigouvian taxation described above (taxing the negative
externality at a rate t
h
), plus a lump-sum transfer of s
h
h*. Hence, a subsidy for the reduction
of the externality (combined with a lump-sum transfer s
h
h*) can exactly replicate the outcome
of the Pigouvian tax.
9


b. The Pigouvian tax levies a tax on the externality-generating activity (e.g, pollution) but not
on the output that generated such pollution. In this sense, the externality-generating activity is
directly taxed. If, instead, output was taxed, the firm would reduce output which isnt
guaranteed to reduce pollution emissions.
10


c. The quota and the Pigouvian tax are equally effective under complete information, i.e., the
social planner has accurate information about all agents benefits and cost functions. This
might not be the case if governments lack relevant information about the benefits and costs of
the externality for consumers and firms.
11






Fostering bargaining over externalities

8
For a worked-out example on Pigovian Tax on Newsprint, see Example 19.2 in NS and detailed graphical
illustrations are given in Nechuba, Microeconomics, pp.746-751
9
Koldstad (pp. 124-128) expands on the equivalence between Pigouvian taxes and subsidies.
10
There is, however, one exemption: if emissions bear a fixed monotonic relationship to the level of output, then
every unit of output generates a constant proportion (e.g., ) of emissions. Indeed, emissions can be measured in
such case by simply observing output, and a tax on output induces the firm to reduce output (and as a consequence
emissions) to its optimal level. Therefore, in this case imposing a direct tax on emissions or an indirect tax on output
would yield the same results in terms of total pollution. (One exercise in the homework assignment, MWG 11.B.5
explores this possibility).
11
See Koldstad for regulation under contexts of incomplete information.
10

In this subsection we examine a less intrusive approach to solving the externality problem, namely,
allowing bargaining between the parties generating and affected by the externality. That is a different
approach to the problem relies on the parties to negotiate a solution. The success of this system depends
on clear assignment of property rights. Does the consumer 1 have the right to produce externality h? If so,
how much? Can consumer 2 prevent consumer 1 from producing externality? The result is that as long as
property rights are clearly assigned, the two parties will negotiate in such a way that the optimal level of
the externality-producing activity is implemented (known as the Coase Theorem
12
). Unlike the previous
solutions like quotas, taxes or subsidies, note that bargaining does not imply government intervention.
Let us first assume that we assign property rights to consumer 2 the individual suffering the negative
externality so that at the initial state no externality is generated, i.e., h=0. We refer to this state as the
externality-free environment. In this context, consumer 1 (the polluter) must pay consumer 2 if he
wants to increase the externality over zero. In particular, let us assume that consumer 2 makes a take-it-
or-leave-it offer where consumer 1 pays T dollars in exchange of h units of pollution, i.e., in order to be
allowed by consumer 2 to produce h units of pollution. Specifically, consumer 1 agrees to pay $T to
consumer 2 (in order to pollute h units) if and only if

1 1
current state
( ) (0) h T

Given this constraint on the set of acceptable offers, consumer 2 will choose (h, T) in order to solve the
problem
2
0,
1 1
max ( )
. . ( ) (0)
h T
h T
s t h T

+


Note that the constraint of the UMP is binding (holding with equality) since player 2 will raise the fixed
fee $T he charges to consumer 1 until the point where consumer 1 is made indifferent between accepting
and rejecting such offer. That is,
1 1 1 1
( ) (0) ( ) (0) h T h T = =
Plugging this result into consumer 2s UMP, we obtain
2 1 1
0
max ( ) ( ) (0)
h
T
h h


and taking first order conditions with respect to h,
2 1 1 2
( ) ( ) 0 ( ) ( ) h h h h

+
Importantly, this first order condition coincides with that solving the social planners problem. Therefore,
the level of the externality h is set at the optimal level h=h
0
. The following figure illustrates this result. In
particular, starting from an initial state where h=0 (externality free environment), the above result shows

12
The Coase Theorem states that, regardless of how property rights are assigned with an externality, the allocation
of resources will be efficient when the parties can costlessly bargain with each other, Besanko, 2005, p.653
11

that consumer 1 (the polluter) is willing to pay $T to the consumer 2 in order to increase pollution until
h=h
0
.
13


Figure 8.9
What happens if instead the property rights are assigned to the polluter? First, note that if there is no
bargaining between consumers 1 and 2, consumer 1 would pollute until the marginal benefits are still, i.e.,
h=h*. However, consumer 2 can pay $T the consumer 1 in exchange of a lower level of pollution, h,
where h is reduced from h*. Note that the consumer 1 is willing to take this offer if and only if

*
1 1
current state
( ) ( ) h T h +
Hence, consumer 2s UMP becomes
2
0,
*
1 1
max ( )
. . ( ) ( )
h T
h T
s t h T h


+

(Note that the fee $T now enters negatively into consumer 2s utility, but positively into consumer 1s,
unlike in the previous case, where property rights were assigned to consumer 2). Similarly as in our
previous discussion, consumer 2 reduces the offer T until the point where consumer 1 is indifferent
between accepting and rejecting the offer T. That is,

13
Note that the polluter does not have incentives to raise pollution beyond h
0
since the payment he would have to
make to the consumer (in order to compensate him for his marginal costs) is above the marginal benefit the polluter
obtains from additional units of the externality.
12

* *
1 1 1 1
( ) ( ) ( ) ( ) h T h T h h + = =
inserting this result into consumer 2s UMP, we obtain
*
2 1 1
0
max ( ) ( ) ( )
h
T
h h h


taking first order conditions with respect to h, we obtain
2 1 1 2
( ) ( ) 0 ( ) ( ) h h h h

+
which again coincides with the first order conditions at the optimal level of the externality (social
planners problem), where h=h
0
. The following figure depicts the voluntary reduction of the externality
associated to the bargaining process. Specifically, starting from an initial situation where h=h* consumer
2 pays $T to consumer 1 in order to reduce pollution until h=h
0
.
14


Figure 8.10
We just shown that, regardless of the initial assignment of property rights over the externality-generating
activity, agents can negotiate the increase or reduction of the externality level until reaching the Pareto
optimal level. This result is usually referred as the Coase Theorem, and we present it below.

14
Note that consumer 2 is not willing to reduce pollution below h
0
, since he would have to compensate consumer 1
for his relatively high marginal benefits. Since consumer 2s marginal cost of additional units of pollution (for all
h<h
0
) is lower than consumer 1s marginal benefits from such pollution, consumer 2 is not willing to further reduce
pollution below h
0
. Note that this argument parallels our discussion of why agents do not agree to pollution levels
above h0 when property rights were assigned to consumer 1.
13

Coase Theorem. If bargaining between the agents generating and affected by the externality is possible,
then the initial allocation of property rights does not affect the level of the externality. In particular, the
externality is finally set at the optimal level h=h
0
.
15

Nonetheless, the allocation of property rights affects the final wealth of the two agents:
1. If property rights are assigned to consumer 2 (the individual affected by the externality),
consumer 1 must pay
1 1
( ) (0)
o
T h = to consumer 2.
Indeed, if property rights are allocated to consumer 2, consumer 2s utility is
1
2 1 1
( )
( ) ( ) (0)
o
o o
T
h T
h h


+
+


while that of consumer 1 is
1
2 1 1
1
( )
( ) ( ( ) (0))
(0)
o
o o
h T
h h


Hence, consumer 2s utility is higher than that of consumer 1 if
2 1 1 1 1 2 1
( ) ( ) (0) (0) ( ) ( ) 2 (0)
o o o o
h h h h + > + >
2. If instead, property rights are assigned to consumer 1 (the polluter), consumer 2 must pay
*
1 1
( ) ( )
o
T h h = to consumer 1.
Indeed, if property rights are allocated to consumer 1, consumer 1s utility is
1
* *
1 1 1 1
( )
( ) ( ) ( ) ( )
o
o o
T
h T
h h h h


+
+ =


while that of consumer 2 is
2
*
2 1 1
( )
( ) ( ( ) ( ))
o
o o
h T
h h h



Hence, consumer 1s utility is higher than that of consumer 2 if
* *
1 2 1 1
*
1 1 2
( ) ( ) ( ) ( )
2 ( ) ( ) ( )
o o
o o
h h h h
h h h


> +
> +

Therefore, the agent with the bargaining power has a total utility higher than the agent without the
bargaining power if
*
1 1 2 1
Aggregate welfare at the Pareto Optimum
2 ( ) ( ) ( ) 2 (0)
o o
h h h > + >


Let us examine the distribution of utility levels before/after bargaining using a utility possibility set,
representing the distribution of utility levels (u
1
,u
2
) among the two parties.

15
For an excellent discussion of the Coase theorem, see Kolstad chapter 6.
14


Figure 8.11
Point a denotes the case in which we assign property rights to consumer 2 (and the externality is initially
h=0, at the externality-free environment). In contrast, point b represents the case in which we assign
property rights to consumer 1 (and the externality is initially h*). Therefore, the take-it-or-leave-it offer
leads to point f in the first case and point e in the second case. Anyway, individual 2 uses his bargaining
power since he makes a take-it-or-leave-it offer to individual 1). If, instead, the bargaining procedure was
the opposite, and individual 1 proposes a take-it-or-leave-it offer to individual 2, then individual 1 would
be exploiting individual 2, reaching point d (point c) could be reached after bargaining when property
rights are assigned to consumer 2 (consumer 1, respectively); as the following figure depicts.

Figure 8.12
Finally, note that other more complex bargaining procedures (allowing for offers and counteroffers during
multiple periods, as in game-theoretic models) would yield a more intermediate allocation of utility
levels, graphically represented in points along segment [f,d] (segment [e,c]) when property rights are
allocated to consumer 2 (consumer 1, respectively).
Let us finally emphasize some of the advantages and disadvantages of bargaining as a solution to the
problem, i.e., the Coase theorem. The main disadvantage of the Coase theorem is its assumption that
property rights must be perfectly defined. Otherwise, the agents might not know who they should bargain
15

with, and as a consequence the externality problem might never be solved. In addition, property rights
must be perfectly enforced, i.e., the level of h must be perfectly observable and measurable by both
parties. This might be technologically feasible for some types of externalities, but not others, especially
when several polluters might be responsible for the externality. Indeed, the above two assumptions
(perfectly defined and enforced property rights) are not satisfied in many externalities, which hampers the
possibility of using negotiations in order to solve the externality problem.
Nonetheless, if property rights are well defined and enforceable, the Coase theorem presents an important
advantage over other solutions to the externality problem such as taxes, subsidies or quotas. In particular,
only the parties involved must know the marginal benefits and costs associated to the externality, i.e., the
regulator does not need to know anything! However, note that this assumption is also relatively strong,
since the polluter must know the cost of the externality for the affected consumers, and similarly,
consumers must know by how much the profits of the firm increase as a result of higher emissions, i.e.,
the polluters profit function.
16

Externalities as missing markets. An alternative way to interpret externalities is simply by considering
that externalities are a commodity which lacks a market where it can be traded. Let us show that, if
externalities were a traded commodity, the level of externality produced in the economy exactly coincides
with the Pareto optimal level h=h
0
. Let us start by assuming well defined property rights, and a
competitive market for the right to engage in the externality-generating activity. In addition, let p
h
denote
the price of engaging in one unit of this activity. In this setting, consumer 1 (the polluter) decides how
many polluting rights to purchase, say h
1
, by solving
1
1 1 1
0
max ( )
h
h
h p h


and taking first order conditions with respect to h
1
, we obtain
17

1 1 1
( ) , with equality if 0
h
h p h

>
Similarly, consumer 2 (the individual affected by pollution) decides how many polluting rights to sell,
say h
2
, by solving
2
2 2 2
0
max ( )
h
h
h p h

+
where now the revenues from selling polluting rights, p
h
h
2
, enter positively into consumer 2s utility
function. Taking first order conditions with respect to h
2
, we obtain
18


16
Note that if the two parties are firms (such as a fishery and a refinery) a form of bargaining could be the sale of
one firm to the other. This would imply a Pareto efficient level of the externality, since the now merged firm would
internalize the effects of pollution on the production process of the fishery.
17
In addition, note that second-order conditions are also satisfied since
1 1
''( ) 0 h < by definition.
18
Note that in this case second-order conditions are also satisfied since
2 2
''( ) 0 h < by definition
16

2 2 2
2 2 2
( ) 0, with equality if 0
( ), with equality if 0
h
h
h p h
p h h

+ >

>

In addition, the competitive equilibrium, the market for polluting rights must clear. Hence, h
1
=h
2
=h**,
and we must therefore have
** **
1 2
( ) ( )
h
h p h


or simply
** ** **
1 2
( ) ( ) with equality if 0 h h h

>
Importantly, this condition coincides with the first order conditions under the Pareto optimal level of the
externality h
0
. Thus, the amount of polluting rights exchanged in this market for the externality-generating
activity, h**, coincides with the socially optimal level h
0
, h=h
0
, and the market price for the externalities
then
*
1 2
( ) ( )
o o
h
p h h

= =

Multilateral Externalities
In this subsection we extend our previous discussion to externalities that are generated by multiple parties
and felt by multiple parties. In particular, we will differentiate between depleatable and non- depleatable
externalities. Specifically, a depleatable externality is one in which the experience of the externality by
one agent reduces the amount that will be felt by other agents. For instance, dumping of garbage on
people's property constitutes a depleatable externality. Indeed, if an additional unit of garbage is dumped
on one property, that same unit cannot be dumped on other properties. That is, the externality is rival in
consumption and therefore shares the features of private goods. In contrast a non-depleatable externality
is one in which the amount of the externality experienced by one agent does not reduce the amount felt by
other agents. Examples of non-depleatable externalities are pollution, global warming, etc. in particular
this type of externality shares the characteristics of a public good (or more precisely a public bad) since
they are non-rival in consumption. Let us start by showing that in the case of depleatable externalities the
amount of the externality produced under the competitive equilibrium is Pareto optimal
19
.
Depleatable externalities. Consider a group of I consumers and J firms, both of them sufficiently large so
that none of them maintains any market power. Let p denote a price vector of L traded goods. Every firm j
generates an externality h
j
0 with associated profit of
j
(h
j
). Every consumer experience utility

( )
i i
h
when the amount of externality he suffers is

i
h . Note that, since we are dealing with a depleatable
externality, the amount of externality suffered by individual i,

i
h , is not experienced by any other
individual (rivalry in consumption). We assume the above the profit and utility functions are twice

19
Related exercises are given in the homework
17

differentiable, i.e.,
j
(h
j
)<0 and

'' ( ) 0
i i
h < .
20
For simplicity, we analyze a negative externality, so that

' ( ) 0
i i
h but a similar analysis can be extended to positive externalities. First, note that at the
competitive equilibrium, every firm j (polluter) chooses the level of h
j
that solves its PMP
max ( )
j
j j
h
h
Taking first order conditions with respect to h
j
, we obtain
* *
( ) 0, with equality if 0
j j j
h h

>
In contrast, the Pareto optimal allocation of the externality involves choosing a profile describing the
externality received by every consumer, b

1
0
, b

2
0
, . b

I
0
, and the externality produced by every firm,
0 0 0
1 2
, ,...,
J
h h h , which solves
1 2
1 1
1 2
1 1
max
, ,..., 0 ( ) ( )
, ,..., 0
. .
I J
o o o
j i i j j
i j
o o o
j
I J
i j
i j
h h h h h
h h h
s t h h

= =
= =
+


Note that the previous constraint reflects the depleatability of the externality. Intuitively, if consumer i
experiences one unit more of the externality, the total amount of externality to be experienced by all other
consumers decreases in exactly one unit. The Lagrange from this maximization problem is
1 1 1 1
( ) ( )
I J I J
i i j j i j
i j i j
L h h h h
= = = =

= +





Taking first order conditions with respect to

i
h , we obtain
( ) 0, with equality if 0
o o
i i i
h h

>


taking first order conditions with respect to h
j
, we have
( ) 0, with equality if 0
o o
j j j
h h

+ >
and taking first order conditions with respect to , we obtain

20
Recall that intuitively, this implies that the firm's profit function is concave in the level of the externality, and the
marginal cost that consumers suffer from additional units of the externality is increasing in h, as depicted in all our
previous figures.
18

1 1
I J
i j
i j
h h
= =
=


Importantly, the previous three conditions resembled those we obtain in competitive markets. In
particular, conditions 10.D.3 to 10.D.5 for perfectly competitive markets establish that
21

* *
* *
( ) 0, with equality if 0 (10.D.4)
( ) 0, with equality if 0 (10.D.3)
i i i
j j j
x x
c q q

>

+ >

And
* *
1 1
I J
i i
i j
x q
= =
=


Hence, we can conclude that if well-defined and enforceable property rights can be specified over the
externality, if the externality is depleatable and if the number of consumers and firms I and J are
sufficiently large so that price taking is a reasonable assumption.




Multilateral externalities: non-depletable externalities
When the externality is non-depletable, the market alone is typically unable to result in an efficient
outcome. Let us now assume that the externality is completely non-rival in consumption. Hence, if all J
firms in the economy generate an aggregate amount of externality
1
J
j
j
h
=

, every consumer suffers an


externality
1
J
j
j
h
=

. In the competitive equilibrium, each firm increases its level of h


j
* until the point where

j
(h
j
*)=0, i.e., marginal benefits from further increases in the externality-generating activity are zero. In
contrast, any Pareto optimal allocation involves externality generation levels (h
1
0
,h
2
0
,,h
J
0
) that solve the
social planers problem
1 1 1 2 )
max ( ) ( )
( , ,...., 0
j I
i j j j
i j j j
h h
h h h

= =
+



Taking FOCs with respect to every h
j
, we obtain
22


21
Note that the negative of the profit function can be viewed as the firm's cost function of producing the externality.
19

0
0
'
1
( ) ) 0
( j
I
i
j
j
i j
h h


=
+

with equality if h
j
0
>0

which exactly coincides with the optimality conditions for a public good (as shown in condition 11.C.1 in
MWG):
0 0 '
'
1
( ) ( ) 0
I
i
i
q q
c

, with equality if q
0
>0
where q
0
represent the total amount of public good provided at the optimum.
Therefore, h
j
* does not necessarily coincide with h
j
0
, and unlike in the case of depletable externalities
analyzed in the previous section, the introduction of a market for the externality will not lead to an
optimal outcome. Intuitively, the free-rider problem (common in public good contexts) emerges in non-
depletable externalities and, as a consequence, the equilibrium level of the negative externality exceeds its
optimal level (overproduction of the negative externality)
23
.
If the regulator possesses adequate information about firms profit functions and consumers damage
from the externality, however, it can achieve optimality using quotas or taxes.
1. Setting quotas. First, if the regulator uses quotas, the optimal externality level can be obtained by
setting a quota of h
1
0
for firm 1, h
2
0
for firm 2, etc..
2. Taxes. If, instead, the regulator uses taxes, the tax t
h
that he must impose per unit of externality
generated by every firm j must be
3.
'
0
1
( )
I
h j
i
i j
t h

=
=


Intuitively, the tax must be equal to the marginal cost (disutility) that the externality generates to
all consumers in the economy. It is easy to show that this tax induces every firm j to voluntarily
choose the optimal externality level h
j
0
. In particular, firm js PMP after the tax is imposed
becomes
0
max ( )
j
j j h j
h
h t h



Taking FOCs with respect to h
j
, we obtain
0
' ( ) 0
j j h
h t . Therefore, the value of t
h
that
makes this FOC coincide with that of the social planner is
'
0
1
( )
I
h j
i
i j
t h

=
=


Indeed, in that case the FOC from the firms PMP become

22
Second-order conditions are also satisfied since
''
0 ''
1
( ) 0
I
j j
i
i j
h


=
+ <
23
Worked-out example 19.3 in NS illustrates the free-rider problem
20

'
' 0 0
1
( ) ( ) 0
I
j j j
i
i j
h h

=
+


where h
j
>0
which exactly coincides with the FOCs at the optimal level of the externality, h
j
0
, we found
above.
4. Tradable Externality Permits. Regulators might instead use externality permits to solve the
externality problem. Every externality permit grants the right to generate one unit of the
externality. Suppose that the regulator chooses a number of total permits equal to the socially
optimal aggregate externality, h
0
, i.e., h
0
=
0
j
j
h

. In particular, every firm receives b

]
permits.
24

In addition assume that there is a sufficiently large number of firms, so that they regard the
market price of externality permits as given (i.e., price taking assumption). Specifically, let p
h
*
denote the equilibrium price of these permits. Therefore, every firm js PMP now becomes
*
)
max ( ) ( ) )
( 0
j j j j
h
j h
p
h h h
+


where firm j must pay a price p
h
* for every permit it needs to buy excess of its initial endowment
b

]
.
25
Taking first order conditions with respect to h
j
, we obtain
26

*
'
( ) 0
j j
h
p
h
, with the equality if h
j
>0
In addition, if all J firms are carrying out this PMP, we need the market clearing condition
h
0
=
j
j
h

. Given the above first order conditions for the J firms and the market clearing
condition, we can restore efficiency by setting a price permit p
h
* of
* '
0
1
( )
I
i h
i
p
h

=
=

. Indeed,
setting this price, we modify firm js FOCs as follows,
'
' 0
1
( ) ( ) 0
I
j j j
i
i j
h h

=
+

, with equality if h
j
>0

which exactly coincides with the FOC that solves the social planner problem. Therefore, every
firm j is induced to voluntarily choose an optimal externality level j h
j
=h
j
0
.
Interestingly, the advantage of tradable externality permits, relative to other policy instruments
such as quotas or taxes, is that government officials do not need so much information. In
particular, they only need data about the optimal level of pollution, h
0
. This simply implies having
information about aggregate firms profits (industry profits) and on consumers damage from the

24
The particular procedure by which externality permits are assigned to firms is not explicitly described here, but it
could be done according to every firm's history of emissions, using an auction, etc. for a discussion of different
assignments of permits see Kolstad.
25
Note that if the firm sells permits (because the firm doesn't need to use its initial b

]
(permits) profits increase,
while if the firm to buy further permits (beyond b

]
) profits decrease.
26
Note that second-order conditions are also satisfied since
j
(h
j
)<0 by definition.
21

externality in aggregate terms, but not necessarily about individual firms profit functions or
individual consumers damage function.
27


Public goods
A good is a (pure) public good if, once produced, no one can be excluded from benefiting from its
availability and if the good is non-rival the marginal cost of an additional consumer is zero. Therefore,
public goods are characterized by two properties: non-rivalry and non-excludability. First, non-rivalry
implies that the consumption of the good by one individual does not reduce the quantity available for
consumption to other individuals or a good is non-rival if consumption of additional units of the good
involves zero social marginal costs of production.Second, non-excludability means that if the good is
provided, no consumer can be excluded from consuming it (or more precisely, the cost of excluding
consumers from enjoying the good is extremely high). For example, national defense, mosquito control,
public parks, television and radio signals, and artwork in public goods. The following matrix represents a
taxonomy of four different types of goods:
Rivalrous Non-rivalrous
Excludable Private Good Club Good
Non-excludable Common property resource Public good

1. Private goods, e.g., an apple. These goods are rival in consumption since the consumption of the
good by one individual reduces the amount available to other individuals and excludable in
consumption, given that it is easy to exclude an individual who did not pay for the good;
2. Club goods, e.g., golf course. These goods are non-rival in consumption since the consumption
of the good by one individual reduces the amount available to other individuals
28
but
excludable in consumption, given that it is easy to exclude an individual who did not pay for the
good (e.g., asking for an entry fee);
3. Common property resources, e.g., fishing grounds. These goods are rival in consumption given
that the consumption of the good by one individual (e.g., fishery) reduces the amount of the good
available to other individuals (to other fisheries in the same area) but non-excludable, since the
costs of excluding additional vessels would be extremely high.
4. Public goods, e.g., national defense. These goods are both non-rival and non-excludable, as
described in our previous discussion.
Consider I consumers, one public good x and L traded private goods. Every consumer is utility the
consumption of x units of a public good is ' ( )
i
x , where note that x does not have a subscript because of
non-excludability, i.e., the total amount of public good in the economy, x, is enjoyed not only by

27
This is a very active area of research, with models analyzing, for instance, how to design the initial distribution of
permits, what are the consequences of having a dominant firm in the industry that holds monopolistic power in their
purchases of externality permits, etc.
28
This property, of course, assumes that the amount of users is sufficiently low so that no congestion effects emerge,
reducing the utility of previous users.
22

individual i but also by all other individuals. We consider the case of a public good, where ' ( )
i
x >0 for
every individual i.
29
In addition, assume that '' ( ) 0
i
x < , which intuitively implies a decreasing marginal
utility from additional units of the public good. The following figure illustrates the marginal benefit from
the public good for individual i.

Figure 8.13
On the other hand, the cost of supplying q units of the public good is c(q), where c(q)>0 and c(q)>0 for
all q, i.e., costs of providing the public good are convex in q. The following figure depicts the cost
function.
30


Figure 8.14

29
Note that a public bad would imply ' ( )
i
x <0 for every i.
30
Note that if we were describing a public bad, such as pollution, we would need c(q)<0 since reducing q is costly,
but increasing q is not costly.
23

Let us first find the Pareto optimal allocation. In particular the social planner maximizes aggregate
surplus, as follows
1
max ( ) ( )
( ) 0
I
i
i
q c q
q

=



taking first order conditions with respect to q, we obtain
0 0 '
'
1
( ) ( ) 0
I
i
i
q q
c

, with equality if q
0
>0
and the second order conditions are also satisfied, since
0 0 ''
''
1
( ) ( ) 0
I
i
i
q q
c


in the case of an interior solution, the above first order conditions establish that the optimal level of public
good is achieved for level of q
0
such that
0 0 '
'
1
( ) ( )
I
i
i
q q
c

=
=


Intuitively, this condition implies that the social planner should increase the provision of a public good
until the point in which the sum of the consumers marginal benefit from increasing the public good in
one more unit (also referred as marginal social benefit) is equal to its marginal cost. This condition is
commonly referred as the Samuelson rule. Importantly, the Pareto optimal patient public goods does not
coincide with that of private goods where, for interior solutions, every individual i increases his
consumption of the private good until his marginal benefit is equal to his marginal cost, that is
* '
'
( ) ( )
j
i i j
q q
c

=
Inefficiency of private provision of public goods
Let us next show that the creation of market in which every individual purchases amounts of the public
good does not eliminate the divergence between the Pareto optimal and the equilibrium amount of public
good. In particular, let us consider the case in which a market exists for the public good and that each
consumer chooses how much of the public good to buy, denoted as x
i
0 units, taking as given a market
price of p. The total amount of the public good purchased by all I individuals is hence
31
x=
1
I
i
i
x
=

.
Consider a single producer of the public good (i.e., federal government) with a cost function c(q).
32


31
At this point is important to start to think intuitively about the incentives of every consumer in this model: if you
knew that the amounts of public goods purchased by all other individuals in the society are nonrival (i.e., you can
benefit from them): units of the public good would you buy?
32
We could change this assumption in order to consider J firms producing the public good, then aggregate cost
function for the entire industry that exactly coincides with c(q). [Note that we can do this because of the price taking
assumption, as we did in perfectly competitive markets.]
24

Formally, the competitive equilibrium price p*, each consumer is purchase of the public good xi* must
satisfy
*
1
max ( ) *( )
( ) 0
k
k
p
x x
i i
i
x
i
x



Note that, when determining his purchases of the public good, individual i takes the purchases of all the
other individuals is given,
*
1
k
k
x

, and these purchases enter into his utility function because of the
nonexcludability assumption. In this regard, other individuals purchases are a form of positive
externality. Finally, note that consumer i pays p*x
i
when acquiring x
i
units of the public good. Taking first
order conditions with respect to x
i
, we obtain
*
1
* '
( ) * 0
k
k
p
x
i
i
x

, with equality if x
i
*>0
For compactness, let x* denote the total porches of the public goods so that x*=
* *
i k
k i
x x

. Hence,
'
( *) * 0 x p
i
, with equality if x
i
*>0
On the other hand the firm producing the public good must solve the PMP,
max * ( )
0
p q c q
q


and taking first order conditions with respect to q, we obtain
* '( *) 0 p c q , with equality if q*>0
Finally the market clearing condition implies that the total amount of the public goods produced coincides
with the amount consumed by all individuals q*=x*. Combining the first order conditions for consumers
and the firm, we obtain
'
( *) '( *) q c q
i
= , if q*>0, and
'
( *) '( *) q c q
i
< , if q*=0
The following figure illustrates the above expression for the case of interior solutions. Intuitively
individual i increases his consumption of the public good until the point in which his marginal benefit
from the public good equals the marginal cost.
25


Figure 8.15
If, in contrast, only a corner solution exists, the marginal cost of providing the first unit of the public good
is higher than the marginal benefit that individual i would obtain from such unit, as the next figure
depicts.
26


Figure 8.16

Recall that at the Pareto optimality and we must have
0 0 '
( ) '( )
1
I
c q q
i
i
=
=
. Graphically, this implies a
vertical summation the marginal benefit that all individuals obtain from the public good.
33
This result is
graphically represented in the following figure, which shows that there is an and the provision of the
public good relative to the optimal allocation.

33
Unlike in private goods, where in order to obtain aggregate demand, we conducted a horizontal sum of individual
demands. In that case we found, for a given price p, how many units were demanded by all consumers in the
economy. In the case of public goods, in contrast, we find for a given amount of the public good q, what is the
marginal social benefit that all individuals in the economy obtain.
27


Figure 8.17

Intuitively, individual is purchases of the public good benefit not only him but also of all individuals. In
other words, every individual doesn't have sufficient incentives to purchase additional amount of the
public good, leading to the standard free rider problem.

Not included in these lecture notes:
1. Environmental policy under incomplete information,
2. Groves-Clark mechanism applied to environmental policy,
3. Oligopoly models (an introduction).