Lecture Notes 7 - Repeated Games

CHAPTER 7
COLLUSION IN REPEATED GAMES
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
TACIT
In a repeated game (also called a super-game), a stage game (i.e., a constituent game) is
repeated more than once. Specifically, a repeated game is a multi-stage sequential-move game
in which a specific single-stage simultaneous-move game (along with its associated payoffs)
recurs over time, where the stage-specific simultaneous decisions are made and implemented
within each stage.
Many oligopoly and social interactions are appropriately modeled as super-games because in
such interactions a basic strategic conflict (price-setting, volume-setting, offering community
services, etc.) arises repeatedly over time. The important point to note is that such repetitions
can, under specific circumstances, enable players to sustain cooperative outcomes in a selfenforcing manner i.e., without using any formal sanctions mechanism. [That is why such a
scenario is also referred to as sustenance of tacit (or, implicit) collusion.] In fact, unending
repetition of a similar strategic interaction often provides the best opportunity for players to
mitigate cooperation dilemmas, especially when they care about long-term rather than
immediate payoffs.
In a super-game, over and above a full specification of the stage game, we need to specify each
player is one-period discount factor it for every period t. The interpretation of it is as
follows: Sitting in period (t1), how does player i value receiving a payoff vit in the next
repetition of the stage game given the likelihood of such repetition? Two factors determine the
answer: the players one period patience factor di (0, 1), and the probability pt (0, 1) that
the stage game will be repeated in period t. Then player is expected payoff for the next period
is {itvit}, where it = di pt. Generalizing this construct, player is present value of expected
future payoffs evaluated in period (t1) is: it [vit + it+1{vit+1 + it+2(vit+2 + )}].
A finitely repeated game is one where: pt > 0 for all t < T and pt = 0 for all t T for some finite
T. On the other hand, pt > 0 for all t in an infinitely repeated game. In what follows, we will
discuss the possibility of sustaining Pareto-superior outcomes in repeated cooperation dilemmas
that have unique (Pareto-inferior) stage-game Nash equilibria. In our analysis, we will maintain
the following simplifying assumptions: (i) di= dj = d (0, 1) for all players i and j, and (ii) pt = p
(0, 1) for all t whenever pt > 0.
The fact that a finitely repeated game has a commonly known last period of interaction T
implies the following result, which follows from backward induction: When the stage game has
a unique Nash equilibrium, the finitely repeated game has a unique subgame-perfect Nash
equilibrium that involves repeated play of the stage game equilibrium strategies. It is important
to note that this stark result is routinely violated in laboratory experiments of the finitely
repeated Prisoners Dilemma game, and that fact highlights the backward induction paradox.
[In a Prisoners Dilemma repeated 100 times game, what does a player infer when her rival
plays cooperate in the first five rounds? There has been an attempt to explain initial
cooperative behaviour in finitely repeated cooperation dilemmas by positing an information
GAME THEORY NOTES: CHAPTER 7
ARIJIT SEN
structure where each player believes ex ante that there is a tiny probability that her rival is
genetically predisposed towards cooperation. It is observed that such a small perturbation of
players belief structures can generate significant initial cooperation as an equilibrium
phenomenon.]
In contrast, the following result holds for infinitely repeated games: While infinite repetition of
the stage game equilibrium strategies is certainly one Subgame-perfect Nash equilibrium
(SPNE) of an infinitely repeated game, more cooperative/collusive outcomes can be sustained as
Subgame-perfect Nash equilibrium outcomes whenever the players discount factors it = di pt
are close enough to unity for all i and for all t.
7.1 Sustaining Self-enforcing Cooperation with Nash-reversion Threats
One way to understand how self-enforcing cooperation might be achieved in infinitely repeated
cooperation dilemmas is to view each players strategy as a two-part rule: One part of the rule
prescribes the players behaviour in the cooperative phase of the super-game, while the other
part prescribes punishments paths for deviation/cheating by any player during the cooperative
phase, or from an ongoing punishment path.
Given this structure of players strategies, an obvious way to sustain self-enforcing cooperation
in an infinite super-game is to use the perpetual Nash reversion threat: each players strategy
specifies that cheating in the cooperative phase by any player will lead to every player playing
her unique stage-game Nash strategies forever after.
Let us study the level of collusion that such strategies can sustain in an infinitely repeated
version of the following Bertrand Duopoly game that has been studied earlier (see Chapter 5,
Section 5.2): In the town of Happy-Jolly, two companies make carbonated soft-drinks: HappyCola and Jolly-Cola. The two firms have identical annual cost functions: total costs = 10,000 +
8(crates of Cola produced). The symmetric demand structure for the two drinks in the town
is: (i) annual demand for Happy-Cola = 4400 200PHappy + 100PJolly, and (ii) annual demand
for Jolly-Cola = 4400 200PJolly + 100PHappy. The firms are required to set prices
simultaneously on January 1st, and cannot change these prices within the year (and they cannot
carry forward inventory across years).
Consider the following modifications to the Bertrand game: The two firms play their pricesetting game repeatedly every year. They set annual prices at the beginning of the year, and
cannot change them within the year. Every year, they know that there is a 70% chance that the
game will be played next year. The annual patience factor for each firm is 0.7 [i.e., one dollar
a year from now is valued the same as 70 cents today]. Consequently, the annual discount
factor for each firm is 0.49.
The firms realize that if they competed only for a year, they would charge the Nash equilibrium
prices {20, 20}. Given their objective is to sustain a more collusive price vector, let us start by
considering a modest level of collusion where the firms aim to sustain the outcome of each
charging $24 per crate of cola every year. Consider the following strategy adopted by each firm:
I will start by setting my price at $24, and will continue to do so as long as my rival has
charged $24 in the previous year; if my rival charges any other price in any year, I will revert to
charging $20 thereafter and forever?
ARIJIT SEN
To determine whether such a two-part strategy vector sustains the desired level of collusion as
an SPNE outcome, we calculate the present value payoff to each player if collusion lasts forever.
In our example, that amount is {[1/(1)]22,000} for each firm. Next, we ask: Does any one
firm want to cheat in the collusive phase under the assumption that the other firm plays the
prescribed strategy after all contingencies?
As cheating can take one of many forms, answering the above question might seem a difficult
task. What makes the job easy is the one period deviation principle that holds for every SPNE
in any sequential-move game: To check whether cheating is beneficial for a player, the only
deviation that needs to be considered is one where the player plays a non-equilibrium strategy in
any one period and then returns to playing her prescribed strategy thereafter and ever after.
Using this principle, we determine the present value payoff to any one Cola company from an
optimal one-period deviation; the optimality is with regard to finding the best way to cheat in
that one period. In our example, if Happy Cola wants to cheat in any particular year, it should
follow its static best-response function and set a price of $21. Then its present value payoff
from the one-period deviation will be: {23,800 + [/(1)]18,800}. An identical argument
holds for Jolly Cola.
Putting the two steps together, we conclude that the collusive outcome of charging $24 every
year can be achieved as an SPNE if and only if the following collusion constraint common to
both firms in our symmetric Bertrand game is satisfied:
[1/(1)]22,000 > 23,800 + [/(1)]18,800.
Solving this inequality, we determine that equilibrium self-enforcing collusion on {24, 24} can
be sustained whenever is greater than the critical value (9/25). As we have assumed = 0.49
in our example, both firms charging the collusive price of $24 per crate of cola per year can be
sustained as an SPNE outcome.
Next, suppose that our two Cola companies get more ambitious. Realizing that they care
sufficiently about the future to be able to sustain the collusive price of $24, let them attempt to
sustain the optimally collusive price vector {26, 26} by each of them playing the appropriate
Nash reversion strategy.
Following the above logic, the collusion constraint for each firm now changes to:
[1/(1)]22,400 > 26,450 + [/(1)]18,800.
This inequality is satisfied if and only if is greater than (9/17). Consequently, we find that the
Cola companies do not care enough about the future in order to be able to sustain the maximal
collusion level by employing Nash-reversion threats.
By studying diverse oligopoly interactions, we can establish the following results about the
possibility of achieving sustained cooperation/collusion in infinitely repeated cooperation
dilemmas:
A specific target collusive outcome can be supported as an SPNE if and only if the players
discount factors are greater than a critical value. Further, when a particular collusive level can
be achieved for a given , all less collusive outcomes can also be achieved for that .
A higher discount factor enables attainment of a more cooperative outcome as equilibrium.
Specifically, maximal cooperation can be achieved when is arbitrarily close to unity.
3
ARIJIT SEN
If the players are asymmetric in that for every symmetric strategy vector in the constituent
game one player gets a higher payoff than another, then the latter player generally has a greater
incentive to defect from a proposed collusive strategy than the former, and thus needs to care
more about the future in order for the collusive outcome to be sustained as an SPNE.
Studying alternative oligopoly super-games will also lead one to recognize the robustness of the
following results regarding sustenance of self-enforcing collusion:
As the number of players increase, self-enforcing collusion becomes harder to sustain for any
given value of the common discount factor < 1. The simplest oligopoly super-game that
exhibits this property is an infinitely repeated Bertrand oligopoly among N firms that sell a
homogeneous product under constant unit costs. In this game, the present value payoff to each
firm from colluding on the monopoly price is [1/(1)][(1/N)th fraction of monopoly profits],
while the return from one-period price undercutting under a perpetual Nash-reversion threat is
[a bit less than monopoly profits] + [/(1)][nothing]. Thus, optimal collusion can be
sustained if and only if is arbitrarily close to or above the critical value {1 (1/N)}, which rises
monotonically towards unity as N grows.
For a given number of players and a given common discount factor < 1, consider the case
where the players are involved in M cooperation dilemmas in every period. Then sustaining
collusion in all M interactions might be easier when M is large than when it is small. Consider,
for instance, two duopolists who repeatedly sell a homogeneous product in two independent
markets. In that case, cheating on a collusive outcome in any market can be punished by
perpetual Nash reversion in both markets. Such an across-the-board punishment can help
sustain collusion in both markets in situations where collusion in an individual market cannot be
sustained independently.
Note: The discussion in Sections 7.1 and 7.2 regarding the sustenance of tacit collusion in
infinitely repeated games ignores the issues of renegotiation-proofness and coalitionproofness of proposed SPNEs. These issues will be discussed in Sections 7.3 and 7.4, which
will be included in the second part of these Notes.
7.2 A General Theory of Tacit Collusion in Infinite Cooperation Dilemmas
Recognize that perpetual Nash-reversion is only one of many potential punishment strategies
that might succeed in self-enforcing sustenance of collusive outcomes for any given . Our
previous discussions have left the following questions unanswered:
(i) When Nash-reversion threats are effective, can we identify other punishment schemes that
also sustain cooperation? If so, are their specific criteria by which we can compare the
desirability of alternative punishment schemes?
(ii) When Nash-reversion threats are ineffective, can we find more stringent punishment schemes
that sustain cooperative outcomes for the given ?
In what follows, we address these questions and provide some answers with varying levels of
generality and precision. [As mentioned above, we defer discussion on renegotiationproofness and coalition-proofness of proposed SPNEs to Sections 7.3 and 7.4.]
Consider the following High-Beam game between two east-bound and west-bound drivers on a
road.
4
ARIJIT SEN
East-bound
.
West-bound
cooperate
not
cooperate
7, 7
0, 10
not
10, 0
2, 2
Suppose that the two drivers expect this game to be played repeatedly over time, with the
probability of another interaction being 50%, and with the drivers per-period patience factor
being 80%, so that = 0.4.
In this game, an attempt to sustain cooperation using Nash-reversion threats implies the
following collusion constraint for each driver: [1/(1)]7 > 10 + [/(1)]2. This inequality is
satisfied if and only if > 3/8.
However, in many experiments of such Prisoners Dilemma super-games, an alternative strategy
has been commonly employed by players in an attempt to achieve cooperation namely, the tit
for tat strategy. The best known example of this is the set of experiments conducted by Robert
Axelrod; they are reported in his now-famous book The Evolution of Cooperation. The tit-fortat strategy specifies that a player will begin by cooperating, and will then take that action which
her rival took in the previous period. [Note that tit-for-tat does not have the two explicit phases
of cooperation and punishment that a perpetual Nash-reversion strategy has.]
Following the one period deviation principle to check for the sustainability of cooperation, the
collusion constraint for each player under the tit-for-tat strategy is:
[1/(1)]7 > [1/(12)]10.
Note that the above inequality is satisfied if and only if > 3/7.
So, given = 0.4 in our example, we conclude that while the Nash reversion strategy will
succeed in sustaining self-enforcing cooperation, the tit-for-tat strategy will not. Alternatively, if
it was the case that the two cars expected the High-Beam game to be played repeatedly with the
probability of another interaction being 55% (so that = 0.44), both the Nash-reversion
strategy and the tit-for-tat strategy would achieve sustained cooperation as an SPNE outcome.
When both tit-for-tat and perpetual Nash reversion strategies succeed in sustaining cooperative
outcomes, can a case be made that the former is a better strategy than the latter? Some analysts
have made this claim on the basis of the fact that after an instance of cheating, the tit-for-tat
strategy allows the players the possibility of re-attaining cooperation after just one period of
punishments. There is certainly some merit to this argument, especially in stochastic
environments where non-cheating might be mistakenly interpreted as cheating. [Think of pricecollusion among consultancy firms where one firm has to infer the extent of price discounts
offered by another by the number of clients lost. But clients might be lost for reasons other than
price discounts offered by rivals.]
Note that perpetual Nash-reversion might be an overkill in sustaining cooperation when the
players discount factors are sufficiently high (e.g., if = 0.8 in the repeated High-Beam game).
In that case, the players might consider the following temporary Nash-reversion strategy: I will
start by cooperating, and will continue to do so as long as my rival has cooperated in the
previous play; if my rival does not cooperate in any period, I will not cooperate for the next
periods, and then will restart the original profile. With a judicious choice of the length of the
5
ARIJIT SEN
punishment phase , it will be possible for the players to sustain cooperation. In this case, the
Nash reversion strategy also becomes eventually forgiving (as is the tit-tat-strategy). In the
parlance of Game Theory, the perpetual Nash reversion strategy is an example of a grim
strategy, while the temporary Nash reversion strategy is an example of a trigger strategy.
Aside: When a particular punishment strategy fails to sustain collusion, our arguments do not
imply that the best-response to a rivals strategy is to cheat only once. To be specific, consider a
trigger strategy that imposes Nash-reversion for ten periods. There, if not cooperating once in
period t and then playing according to the trigger strategy subsequently is profitable, then the
following strategy is necessarily more profitable against rival trigger strategy: Cheat in current
period t; survive the ten-period punishment; cheat again in period (t+11); and so on This
result in no way violates the one period deviation principle because that principle is used only
for checking whether or not a proposed strategy vector is an SPNE.
Let us now turn to the second question posed at the beginning of this section: When the players
aim to achieve the best collusive outcome that is possible given their common discount factor
, how should they determine their most effective punishment strategies or their optimal
penal codes in Dilip Abreus terminology? Abreus doctoral research at Princeton University in
the early 1980s provides a set of definitive answers to that question. He establishes that in an
infinite super-game among N players with a common discount factor , their exist (under
plausible conditions) a set of N subgame-perfect punishment paths {P1 , , PN}, one tailored for
each player, such that if any player i deviates from a pre-specified cooperation path C, or a
punishment path Pj, the punishment path Pi holds player i to her worst present value payoff
among all feasible subgame-perfect punishment paths. It is in this sense that Pi is the optimal
penal code for a deviant player i.
Abreu proves that to check whether a specific cooperation path C can be supported as an SPNE
outcome for a given , all we need to do is to determine whether or not the simple strategy
profile {C; P1 , , PN} constitutes an SPNE. This strategy profile has the following
interpretation: The players start by playing according to the cooperation path C; any deviation
by player i from any ongoing prescribed path either the cooperation path, or one of the N
punishment paths is responded to by imposing Pi by all players from then on. The simplicity
of the profile resides in the fact that the penal code for player i is the same irrespective of how
and when she has deviated; the penal code itself can be non-stationary, as we will see below.
I do not attempt to present the formal arguments establishing the above results. Rather, I present
one of Abreus examples to clarify the logic of the results. Consider two symmetric firms
playing a Cournot super-game in supply volumes, given that the only permissible choices are
high volume, medium volume, and low volume. The stage-game payoff matrix is as follows:
Firm 2
.
Firm 1
low
Medium
high
low
10 , 10
3 , 15
0,7
medium
15 , 3
7,7
4 , 5
high
7, 0
5 , 4
15, 15
The stage game has the unique dominance-solved (Nash) equilibrium: {medium, medium}.
6
ARIJIT SEN
Given that, consider the case where the firms aim to support the Pareto-dominant perpetual
outcome {low, low} in the infinite super-game. If the firms attempt to sustain this target
collusion by a strategy of perpetual Nash-reversion, the collusion constraint for each firm is:
[1/(1)]10 > 15 + [/(1)]7. This inequality is satisfied if and only if > 5/8.
So what is the best that the two firms can do if their common discount factor is = 4/7? To
answer this question, we follow Abreus logic and identify the optimal penal codes P1 and P2 for
the two firms. In this symmetric game, these penal codes are also symmetric: they state that the
deviant is required to play medium in the first round after deviation, and low thereafter and
ever after; while the non-deviant is required to play high in the first round after deviation, and
medium thereafter and ever after. [Note that this penal code is non-stationary.]
For = 4/7, let us verify the following properties of the simple strategy profile {C; P1 , P2},
where C requires each firm to play low in every period, and the penal codes P1 and P2 are as
described above. First, note that each penal code, at its period of initiation, gives the deviant
the present value payoff of zero (which is its maximin payoff in the stage game), and generates a
present value payoff of 25 for the non-deviant.
Next, suppose that firm i has deviated from C in some period , and so Pi is initiated from period
+1. In any period t > +1: (a) Firm i gets continuation payoff of 7 from optimal one-period
deviation. As that is identical to the continuation payoff from conforming to Pi from t onwards,
its collusion constraint is (weakly) satisfied. (b) Firm js collusion constraint is strictly
satisfied as it best-responds (period by period) to its rivals perpetual choice.
In period : (a) Firm i gets continuation payoff of zero from optimal one-period deviation. As
that is the same as the present value payoff from conforming to Pi from onwards, its collusion
constraint is (weakly) satisfied. (b) Firm j gets continuation payoff of 7 from optimal one-period
deviation. As that is smaller than the present value payoff of 25 from conforming to Pi from
onwards, its collusion constraint is (strictly) satisfied.
Thus, the specified penal codes are optimal and subgame-perfect. So, we conclude that when
= 4/7, the proposed simple strategy profile sustains cooperation on {low, low} as an SPNE
outcome.
Building on Abreus analysis, Drew Fudenberg and Eric Maskin established the following Folk
Theorem for infinitely repeated games (the result was a part of the folk wisdom in Game
Theory long before it was proved in its full generality):
In an infinite super-game with N players, when the players common discount factor is
sufficiently close to unity, for every feasible and individually rational payoff vector v =
{v1 , , vN } [i.e., a payoff vector that is in the convex hull of all achievable payoff vectors in the
stage-game, and guarantees at least her maximin payoff to each player], there is a SPNE in the
super-game in which every player i gets a present value payoff of [1/(1-)]vi.
7.3 Renegotiation & Coalitional Cheating in Repeated Games
In an infinitely repeated cooperation dilemma, to determine whether a proposed set of stickand-carrot strategies (i.e., strategies that incorporated a proposed path of collusion, and
punishment threats for cheating) rationally sustain the desired level of cooperation (as a
Subgame-perfect Nash equilibrium), one needs to check two things: (i) whether the proposed
7
ARIJIT SEN
punishment path is credible, i.e., an SPNE of the continuation game after a cheating from
collusion has occurred, and (ii) whether the proposed punishment path is effective in
sustaining the desired cooperation level, i.e., whether it satisfies the collusion constraint.
It is important to recognize that even when a proposed set of stick-and-carrot strategies can
rationally sustain a specified level of cooperation (in the sense described above), there can be
two problems which can limit the use of such strategies. One problem has to do with a
deviating player offering to renegotiate away the proposed punishment path. The other
problem (in games with more than two players) has to do with a coalition of players cheating
from collusion in a coordinated way. In this section, we discuss these two issues.
7.3.1 Renegotiation in Infinitely Repeated Cooperation Dilemmas
Reconsider the following High-Beam game between two east-bound and west-bound drivers on
a road, where the two drivers expect this game to be played repeatedly over time.
East-bound
cooperate
not
cooperate
7, 7
0, 10
not
10, 0
2, 2
West-bound
As we have discussed earlier, the following grim strategy can rationally sustain cooperation in
this infinitely repeated game for all > 3/8: I will start by cooperating and continue doing so as
long as both have cooperated before; if someone does not cooperate in any period, I will not
cooperate thereafter and ever after.
To appreciate the renegotiation problem with this strategy, consider the case where the eastbound driver high-beams once, and then approach the other driver with the following
renegotiation plea: What is the point of carrying out our punishment strategies, even though
they are subgame-perfect, forever after? The fact that I have cheated in past play cannot be
undone. If we now embark on the punishment phase, both of us will get low payoffs of 2 in
each period for ever. Looking ahead, the best thing we can do is to let bygones be bygones
and restart cooperation immediately.
On hearing the renegotiation plea, the west-bound driver can think in one of two ways:
On the one hand, he can compare getting 2 in each period forever after (by enforcing the
punishment) to the scenario of forgiving immediately and getting 7 in each period forever
after (under the optimistic assumption that the east-bound driver will never cheat again). Of
course, if he thinks this way, he should forgive ... and then the original punishment threat will
turn out to be an empty threat.
On the other hand, the west-bound driver can think that if he forgives and reverts back to
cooperation, the east-bound driver will immediately high-beam him again and force zero
payoff on him. Under this thought process, the west-bound driver will compare getting 2 in
each period forever after (by enforcing the punishment) to the case of forgiving every time
the renegotiation plea is made and then getting cheated against and thus getting zero payoff in
each period forever after. If he does think in this manner, he will not forgive and the original
punishment threat will be carried out.
8
ARIJIT SEN
These two possible ways of responding to a renegotiation plea leads to the following distinct
definitions of renegotiation-proof Subgame-perfect Nash equilibria:
An SPNE in an infinitely repeated cooperation dilemma is strongly renegotiation-proof if and
only if, after a unilateral deviation from cooperation by any player in any period t , the specified
post-deviation continuation SPNE payoffs of all the non-deviant players are no less than what
their continuation SPNE payoffs would be if the deviation from cooperation did not occur in
period t.
In contrast, an SPNE in an infinitely repeated cooperation dilemma is weakly renegotiationproof if and only if, after a unilateral deviation by any player in any period t, the specified
post-deviation continuation SPNE payoffs of all non-deviant players are no less than what their
continuation SPNE payoffs would be if the deviation from cooperation occurred in every
period after t.
It is straightforward to verify that in our repeated High-Beam game, the specified grim
strategy is weakly renegotiation-proof but not strongly renegotiation-proof. To put it
plainly, if it is common knowledge among the players that once a player deviates from
cooperation and makes a renegotiation plea for forgiveness, the other player(s) will assume that
the past-deviant will deviate again given the opportunity, then the punishment specified in the
original SPNE strategies need only to be weakly renegotiation-proof. [If, however, it is the case
that the other player(s) will assume more optimistically that the past-deviant will never
deviate from cooperation again, then the punishment specified in the original SPNE strategies
need to be strongly renegotiation-proof.]
Of course, in many repeated cooperation dilemmas like our High-Beam game no strongly
renegotiation-proof SPNE strategies exist, while weakly-renegotiation-proof SPNE strategies
do exist. But there is no guarantee that in a repeated cooperation dilemma, the maximum level
of cooperation can be achieved via a weakly renegotiation-proof SPNE even when is close to 1.
7.3.2 Coalitional Deviations in Infinitely Repeated Cooperation Dilemmas
Before discussing the problem of group cheating in a repeated cooperation dilemma, let us
consider the more general problem of coalitional deviation in any strategic game.
In any N > 2 player game, consider an informal deterministic agreement made among the
players in the pre-play communication stage of the game which satisfies the required property
that no individual player has an incentive to deviate from his/her recommended pure strategy.
Such an agreement is necessarily a (pure-strategy) Nash equilibrium. But given the overall
agreement, a sub-group of players might then be able to reach a different sub-agreement that
is self-enforcing for them and that leads to a higher payoff to each sub-group member (when
the non-members are presumed to play according to the original pre-agreement). While the
Nash equilibrium concept only considers individual deviation incentives of players, the
above argument calls for strengthening the Nash equilibrium concept to make it immune to
coalitional deviations. Recognition of this issue has lead to the development of the notion of
coalition-proof Nash equilibria.
To appreciate the potential of coalitional deviations, consider the following game:
ARIJIT SEN
Choosing the Direction Game

Four players Alice, Bob, Cathy, and Dave (who havent met each other before) have each
paid me Rs.1500 to participate in a choosing the direction game which is a four-player
coordination game.
I give each person an arrow, and lead him/her to a private cubicle. There, each player is
required to place his/her arrow pointing north or pointing south on a table. Then, along
with the players, I verify the placement of the four arrows in the four cubicles, and make
payments to the players according to the following rules:
(a) if all four arrows are placed pointing north, I pay each player Rs.2000; (b) if all four
arrows are placed pointing south, I pay each player Rs.1000; (c) if exactly two arrows are
placed pointing south, I play those players Rs.2500 each while the others receive nothing;
and (d) for any other placement of the arrows, I pay each player Rs.750.
If Alice, Bob, Cathy, and Dave can chat with each other (in groups of two and/or three and/or
four) before they individually (and privately) make their directional choices, how will each of
them place his/her arrow once inside a cubicle?
Consider the following pure-strategy combinations of the four players:
[A] All four players place their arrows pointing north (and so each player gets Rs. 2000);
[B] All four players place their arrows pointing south (and so each player gets Rs. 1000);
[C] Two players place their arrows pointing south and two players place their arrows
pointing north (and so the former get Rs.2500 each while the latter get nothing); and
[D] Three players place their arrows pointing in one direction while the remaining player
places his/her arrow pointing in the other direction (and so each player gets Rs.750).
Of these, two strategy combinations are Nash equilibria: [A] and [B].
However, recognize that [A] is vulnerable to coalitional deviation by any 2-player subgroup (say, Alice and Cathy) who can jointly agree to deviate to south and profit from the
joint deviation. Moreover, the agreement between the two players (Alice and Cathy)
constitutes a credible coalitional deviation because neither Alice nor Cathy will want to
further deviate from their bilateral agreement.
A coalition-proof Nash equilibrium is a Nash equilibrium for which there exists no credible
coalitional deviation by any sub-group of players (including the whole group). This
definition is recursive in that a credible coalitional deviation is defined to be one that
constitutes a coalition-proof Nash equilibrium for the deviating coalition.
Following this definition, [A] is not a coalition-proof Nash equilibrium. In contrast, [B] is
a coalition-proof Nash equilibrium of the above game, since it is immune to every credible
coalitional deviation by any sub-group of players (including the whole group). Recognize
that if all four players agree to jointly deviate to north, any two of those four will have a
further incentive to credibly deviate back to south.
The central point to note from the above example is that while the Nash equilibrium concept
focuses exclusively on the possibility of unilateral deviation by individual players, in some
multi-player games, small groups of players can mutually gain by jointly and credibly
deviating from a Nash equilibrium. Coalition-proof Nash equilibria are those Nash equilibria
10
ARIJIT SEN
that are not susceptible to such group deviations. The only problem is that some multi-party
games might have no coalition-proof Nash equilibrium (even in mixed strategies).
Coming back to the realm of repeated games, the underlying problem of group cheating is
conceptually the same as described above. When checking for rational sustenance of
cooperation under a set of stick-and-carrot strategies, the concept of Subgame-perfect Nash
equilibrium only checks for the non-profitability of unilateral cheating but does not ensure that
no sub-coalition of players can credibly gain from joint coordinated cheating.
In contrast, an SPNE in an infinitely repeated game (with three or more players) will be defined
to be a coalition-proof SPNE if and only if there does not exist any sub-group of players who
would want to jointly and credibly deviate after any history of the game i.e., while the game
is in the cooperation phase or while the game is in a specific punishment phase. [Here,
credibility refers to the fact no sub-group of players within the sub-group of cheating players
would want to jointly and credibly cheat further from the parent groups cheating strategies.]
There are no general results as to whether a class of infinitely repeated cooperation dilemmas
will or will not have coalition-proof Subgame-perfect Nash equilibria. Each repeated game
has to be individually studied to determine whether a specific collusion target can be supported
by a set of SPNE strategies that are coalition-proof.
Fortunately, in a large class of oligopoly repeated games games with strategic substitutes
(Cournot) as well as games with strategic complements (Bertrand) SPNE using grim strategies
(with punishment threat of perpetual Nash reversion) or trigger strategies (with punishment
threat of Nash reversion for a certain number of time periods) are indeed coalition-proof.
To see this, consider the following Cournot triopoly example:
There are three Cournot-competitors in an industry (i.e. three firms selling identical products
and competing in volume). These firms compete repeatedly in an infinitely repeated game
given . In each period, market-clearing price is: P = 140 total supply volume. The constant
marginal cost of each firm is 20.
Consider each firm playing the following grim strategy: I will start by setting supply volume
= 20, and will continue to do so as long as all three have set volume = 20 in the past; if anyone
sets a different volume in any period, I will set supply volume = 30 thereafter.
When will it be the case that when each firm pursues this strategy, maximal collusion will be
rationally sustained in a coalition-proof manner?
To answer the above question, first focus on the one-stage Cournot game among the three
firms. Recognize the following: If the firms can collude perfectly, then to maximize the sum
of their profits (and thus achieve maximal collusion) each firm should produce 20 (one-third
of monopoly output); each will then make profits of 1,200 in one period. Alternatively, if the
firms compete non-cooperatively, the best response function of each firm f is: qf = 120 Qf,
where Qf denotes the aggregate output of all rivals of firm f. As a result, if the firms compete
non-cooperatively in a single period, each will produce output = 30 and will make profits of
900 in the unique Nash equilibrium of the one-stage game.
Now consider the infinitely repeated Cournot game, and the proposed grim strategy (with its
threat of perpetual Nash reversion). Recognize that when two firms are each producing
11
ARIJIT SEN
output = 20, the optimal way for a single firm to cheat will be to produce 40 in one period
and get greater profits of 1,600 in that period. As a result, the collusion constraint to prevent
unilateral cheating by any one firm is: [1/(1)]1,200 > 1,600 + [/(1)]900. Recognize
that this constraint is satisfied if and only if > 4/7. We thus conclude that when > 4/7, the
proposed grim strategy will rationally sustain maximal collusion.
However, the previous statement says nothing about the possibility of credible and profitable
joint deviations by a set of two firms. Note that if two firms jointly deviate from the collusion
path, the best way to do so will be by each producing 25 and thus getting profits of 1,250.
[This is because the aggregate two-firm best response to the third firm producing 20 is to
produce a total output = 50, which generates total profits of 2,500.] But note that this joint
cheating will not be credible in the following sense: If firms 1 and 2 plan to cheat in this
manner (by producing 25 each, while firm 3 produces the collusive output 20), then each of
them will want to further cheat from this plan and produce a different output = 37.5 (as that is
the individual best response to aggregate rival output of 45).
Recognize that the only credible way for firms 1 and 2 to cheat against firm 3 is for each of
the former firms to produce output = 33.33 (as 33.33 is the best response to 33.33 given firm 3
is not a part of this cheating coalition and is producing 20). But then firm 1 and firm 2 will get
cheating profits of 1,110.89 each, and thus for > 4/7, such credible coalitional cheating will
not be profitable.
We thus conclude that for > 4/7, the proposed grim strategy vector is an SPNE that rationally
sustains the collusive outcome {20, 20, 20} perpetually. Further, this SPNE is coalition-proof.
And finally, the SPNE is also weakly renegotiation-proof I leave it to you to verify that fact.
12

Lecture Notes 7 - Repeated Games

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lecture Notes 7 - Repeated Games

Transféré par

Droits d'auteur :

Formats disponibles

CHAPTER 7

COLLUSION IN REPEATED GAMES

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

Choosing the Direction Game

GAME THEORY NOTES: CHAPTER 7

GAME THEORY NOTES: CHAPTER 7

Vous aimerez peut-être aussi