Vous êtes sur la page 1sur 47

Sets of Models and Prices of Uncertainty

Lars Peter Hansen

Thomas J. Sargent

March 8, 2015

Abstract
A representative consumer expresses distrust of a baseline probability model by using
a convex set of martingales as likelihood ratios to represent other probability models.
The consumer constructs the set to include martingales that represent particular
parametric alternatives to the baseline model as well as others representing only
vaguely specified models statistically close to the baseline model. The representative
consumers max-min expected utility over that set gives rise to equilibrium prices
of model uncertainty expressed as worst-case distortions to the drifts in his baseline
model. We calibrate a quantitative example to aggregate US consumption data.

Key words: Risk, uncertainty, uncertainty prices, Chernoff entropy, robustness, shock
price elasticities, affine stochastic discount factor

We thank Scott Lee, Botao Wu, and especially Lloyd Han and Paul Ho for carrying out the computations.

Introduction

Specifying a set of probability distributions is an essential part of applying the Gilboa and
Schmeidler (1989) max-min expected utility model. This paper proposes a new way to
imagine that a decision maker forms that set and provides an application to asset pricing.
When a representative investor describes risks with a set of probability models, uncertainty premia augment prices of exposures to those risks. We describe how our method for
specifying that set affects prices of model uncertainty.
Our experiences as applied econometricians attract us to robust control theory. We
always regard our own quantitative models as approximations to better models that we had
not formulated. This is also the attitude of the robust decision maker modeled in Hansen
and Sargent (2001) and Hansen et al. (2006). The decision maker has a single baseline
probability model with a finite number of parameters. He wants to evaluate outcomes
under alternative models that are statistically difficult to distinguish from his baseline
model. He expresses distrust of his baseline model by surrounding it with an uncountable
number of alternative models, many of which have uncountable numbers of parameters. He
represents these alternative models by multiplying the baseline probabilities with likelihood
ratios whose entropies relative to the baseline model are less than a bound that expresses
the idea that alternative models are statistically close to the baseline model.
The decision theory presented in this paper retains the starting point of a single baseline
model but differs from Hansen et al. (2006) in how it forms the set surrounding the baseline
model. A new object appears: a quadratic function of a Markov state that defines alternative parametric models to be included within a set of models surrounding the baseline
model. The decision maker wants valuations that are robust to these models in addition
to other vaguely specified models expressed as before by multiplying the baseline model
by likelihood ratios. The quadratic function can be specified to include alternatives to the
baseline model including ones with fixed parameters, time varying parameters, and other
less structured forms of model uncertainty.
For asset pricing, a key object that emerges from the analysis in Hansen and Sargent
(2010) is a vector of worst-case drift distortions to the baseline model. The negative of
the drift distortion vector equals the vector of market prices of model uncertainty that
compensate the representative investor for bearing model uncertainty. The effects that our
new object the quadratic function indexing particular alternative models has on market prices of uncertainty are all intermediated through this drift distortion. We show how

the quadratic function can produce drift distortions that imply stochastic discount factors
resembling ones attained by earlier authors under different assumptions about the sources
of risks. For example, models that posit that a representative consumers consumption
process has innovations with stochastic volatility introduce new risk exposures in the form
of the shocks to volatilities. Their presence induces time variation in equilibrium compensations for exposures to shocks that include both the stochastic volatility shock as well as
the original shocks whose volatilities now move. By way of contrast, we introduce no
stochastic volatility and no new risks. Instead, we amplify the prices of exposures to the
original shocks. We induce fluctuations in those prices by modeling how the representative consumer struggles to confront his doubts about the baseline model. We extend these
insights to the analysis of uncertainty prices over alternative investment horizons.
Section 2 describes a representative consumers baseline probability model and martingale perturbations to it. Section 3 describes two convex sets of martingales that perturb
the baseline model. Section 4 uses one of these sets to form a robust planning problem that
generates a worst-case model that we use to calibrate key parameters measuring the size
of a convex set of models. Section 5 constructs a recursive representation of a competitive
equilibrium. Then it links the worst-case model that emerges the robust planning problem
to competitive equilibrium compensations that the representative consumer earns for bearing model uncertainty. This section also describes a term structure of these market prices
of uncertainty. By borrowing from Hansen and Sargent (2010), section 6 describes a quantitative version of a baseline model as well as a class of models that particularly concern the
robust consumer and the robust planner. Section 7 uses the quantitative model to compare
the set of models that concern both our robust planner and our representative consumer
with two other sets featured in Anderson et al. (2003) and Hansen and Sargent (2010),
one based on Chernoff entropy, the other on relative entropy. Section 8 offers concluding
remarks. Six appendices provide technical details.

The model

2.1

Mathematical framework

.
A representative consumer cares about a stochastic process for Y = {Yt : t } described
by the baseline model1
d log Yt = (.01) (
+ Xt ) dt + (.01) dWt

dXt = dt
Xt dt + dWt ,

(1)

where W is a multivariate Brownian motion, X is a scalar process initialized at a random


b and
||2 is the date t growth
variable X0 governed by probability distribution Q,
+Xt + .01
2
, , Q)
b characterizes the baseline
rate of Y expressed as a percent. The quintet (
,
, ,

model.2
Because he doesnt trust the baseline model, the consumer also cares about Y under
probability models obtained by multiplying probabilities associated with the baseline model
(1) by likelihood ratios. We represent a likelihood ratio by a stochastic process Z h that is
a positive martingale with respect to the baseline model and that satisfies3
dZth = Zth ht dWt ,

(2)

or

1
(3)
d log Zth = ht dWt |ht |2 dt,
2
where h is adapted to the filtration F = {Ft : t 0} associated with the Brownian motion
W and satisfies
Z
t

|hu |2 du <

(4)

with probability one. Imposing the initial condition Z0h = 1, we express the solution of
stochastic differential equation (2) as:
Zth

= exp

Z

t
0

1
ht dWt
2

t
2

|hu | du .
0

(5)

We let X denote the stochastic process, Xt the process at date t, and x a realized value of the state.
In earlier papers, we sometimes referred to what we now call the baseline model as the decision makers
approximating model or benchmark model.
3
James (1992), Chen and Epstein (2002), and Hansen et al. (2006) used this representation.
2

When we want to allow Z0 to be drawn from an unknown probability distribution, we take


a Borel measurable function g > 0 satisfying Eg(X0 ) = 1 and consider martingales equal
to
g(X0)Zth .
Let G denote the collection of all such functions g(x). A pair (g, h) represents a perturbation
of the baseline model (1).
Definition 2.1. Z denotes the set of all martingales g(X0 )Z h constructed via representation (5) with some process h adapted to F = {Ft : t 0} and satisfying (4) and g G.
We use a martingale g(X0)Z h to construct an alternative probability distribution as
follows. Starting from the probability distribution associated with the baseline model (1),
we use h to represent another probability distribution conditioned on F0 . To do this, think
of taking any Ft -measurable random variable Yt and multiplying it by Zth before computing
expectations conditioned on X0 . Associated with h are probabilities defined implicitly by


E h [Bt |F0 ] = E Zth Bt |F0

for any t 0 and any bounded Ft -measurable random variable Bt . Similarly, we write
E g B0 = E [g(X0)B0 ]
for any bounded random variable B0 in the date zero information set F0 .
Here the positive random variable Zth acts as a Radon-Nikodym derivative for the
date t conditional expectation operator E h [|X0 ]. The martingale property of the process
Z h ensures that the conditional expectations operators for different ts are compatible in
the sense that they satisfy a Law of Iterated Expectations. The random variable g(X0 )
acts as a Radon-Nikodym derivative for the date zero unconditional distribution vis a
b over the date zero state vector X0 .
vis a baseline probability distribution Q
While under the baseline model W is a standard Brownian motion, under the alternative
h model distribution this process has increments

dWt = ht dt + dWth ,

(6)

where W h is a standard Brownian motion. While (3) expresses the evolution of log Z h in

terms of increment dW , the evolution in terms of dW h is:


1
d log Zth = ht dWth + |ht |2 dt.
2

(7)

In light of (7), we can write model (1) as:


d log Yt = (.01) (
+ Xt ) dt + (.01) ht + (.01) dWth

dXt = dt
Xt dt + ht + dW h ,
t

which implies that Y has a (local) growth rate


+ Xt + ht + .01
||2 under the h model.4
2

2.2

Quantifying Probability Distortions

Discounted relative entropy quantifies how a (g, h) pair distorts baseline model probabilities.
We construct discounted relative entropy in two steps. First, we condition on X0 = 0 and
b
focus solely on h; second, we focus on misspecifications of Q.
i) Our first step is to compute






(Z ; x) =
exp(t)E Zth log Zth X0 = x dt
 0 Z



1

exp(t)E Zth |ht |2 X0 = x dt
=
2
0
h

(8)

where the second equality follows from an application of integration by parts. We


write as a function of Z h instead of h because is convex in Z h . The discounted
entropy concept (8) quantifies how h distorts baseline probabilities.5
b is the
ii) Our second step applies when we dont condition on X0 . Suppose that Q

stationary probability distribution for X under the baseline model and that g is the
b We average over the initial state via
density used to alter Q.
h

(g; Z )

b
(Z ; x)g(x)Q(dx)
+
h

b
g(x) log g(x)dQ

(9)

which includes a relative entropy penalty for the initial density g.


4

The growth rate includes a multiplication by 100 that offsets one of the .01s.
Hansen et al. (2006) used the representation of discounted relative entropy that appears on the right
side of the first line of (8).
5

Convex sets of models

Two convex sets that surround the baseline model are designed to include parametric
probability models that a decision maker cares about. One set can readily be used for
robust control problems, but the other cannot. Nevertheless, the second set is useful
because it generalizes Chernoff (1952) entropy to a Markov environment and thereby has
an explicit statistical interpretation. We are interested in how these two convex sets are
related.

3.1

Alternative parametric models

The following parametric model nests baseline model (1) within a bigger class:
d log Ct = .01 ( + Xt ) dt + .01 dWth
dXt = dt Xt dt + dWth ,

(10)

where W h is a Brownian motion and (6) continues to describe the relationship between the

processes W and W h . Here (
, ,
) are parameters of the baseline model (1), (, , ) are
parameters of model (10), and (, ) are parameters common to both models. We want to
use drift distortions h for W to represent models in a parametric class defined by (10). We
can express model (10) in terms of our section 2.1 structure by setting
ht = (Xt ) 0 + 1 Xt
and using (1), (6), and (10) to deduce the following restrictions on 0 and 1 :
"
#
" #

0 =

"
#
" #
0

1 =
.

(11)

By maintaining restrictions (11), we can find pairs (0 , 1 ) that represent members of a


class of models having the parametric form (10). Among other possibilities, we can set
b to assign all of the probability to an initial state or we can set it equal to the stationQ

ary distribution implied by the baseline model, in which case we can construct g so that
b
g(x)dQ(x)
is the stationary distribution under the alternative model implied by (0 , 1 ).
6

We consider restrictions on the martingales Z h pertinent for modeling probabilities


conditioned on X0 .
Definition 3.1. Z + is the set of martingales g(X0)Z h constructed by (i) selecting a pair
(0 , 1 ) that satisfies (11), (ii) pinning down an associated ht = 0 +1 Xt , (iiii) constructing
an implied martingale Z h via (2), and (iv) selecting g G.
We can further restrict a family of models by using a nonnegative quadratic function of x
.
=
(x)
0 + 21x + 2 x2 0

(12)

to express a collection of alternatives to a baseline model. For example, to induce to


capture a prespecified
, form
1
.
=
(

)2 x2 .
(x)
0 + 21x + 2 x2 =
||2
This choice of makes both =
and = 2

be alternative parameter configurations


that are among the models to be included in a convex set of Zs. More generally, we can

select (
, ,
) and compute 1 and 1 by solving the counterpart to (11). Then
= |
(x)
0 + 1 x|2 .
We again pick up additional parametric models by casting our restrictions in terms of the

quadratic function .
Definition 3.2.

n
o
t)
Z o = Z h Z + : |ht |2 (X

for all t with probability one.

Next we construct a larger set of martingales that contains Z o but allows departures
from the parametric structure (10).
Definition 3.3.

for all t with probability one.

n
o
h
2

e
Z = Z Z : |ht | (Xt )

(13)

Z h s in Z o represent models with time-invariant parameters; Z h s in Ze can represent


models whose parameters vary over time.6

3.2

Appropriate restrictions on the h process

We want to construct a set of models by using a measure of statistical discrepancy from


a baseline model. For that purpose, we shall replace the instant-by-instant inequality in
the definition (13) of Ze with restrictions cast in terms of probability weighted averages
of |ht |2 . These restrictions allow the trade-offs among intertemporal dimensions of model
comprehended by statistical model discrimination criteria.7

3.3

We construct our first convex set of martingales Z h by starting with a drift distortion h
that represents a particular alternative parametric model created along lines described in
section 3.1. We use the following functional of a process Z h :
Z

2

Z ; |h| , x =





exp(u)E Zuh log Zuh du|X0 = x

h
i

u | du|X0 = x

exp(u)E Zuh | h
Z0
h

i
1
h
2
2

exp(u)E
|hu | |hu | |X0 = x .
=
2 0

(14)

Three important features of are:


i) is convex in Z h ;
ii) can readily be computed under the h model;
only through the scalar process |h|
2.
iii) depends on h
This is accomplished by requiring Z h to belong to Z rather than Z + .
e we impose the instant-by-instant constraint on ht described in (13). But
In constructing Z o and Z,
some models that dont satisfy such an instant-by-instant constraint are equally difficult to distinguish
statistically from the baseline model. The ambiguity averse decision maker of Chen and Epstein (2002)
considers a set of models characterized by martingales that are generated by h processes that satisfy
instant-by-instant constraints on h. Anderson et al. (1998) explored consequences of this type of constraint
without the state dependence.
6

Let
2 = (x)

|h|
and introduce a positive number .
Definition 3.4.


Z h
Z
i

h
h
b
b
Z = g(X0 )Z Z : Z ; (X), x g(x)Q(dx) + g(x) log g(x)Q(x) 0 .

(15)

Z includes martingales in Ze that are associated with the parametric probability models
of Section 3.1. In light of feature i) , the set Z is is convex in g(X0 )Z h and necessarily

contains Z h and Z h = 1. Feature ii) makes it tractable to use Z to pose a recursive robust
to include parametric
decision problem. Feature iii) provides a convenient way to use {}

models like those discussed in Section 3.1 within Z .


In section 5, we pose a robust decision problem in which Z serves as a family of positive
martingales. Evidently, both the baseline model (1) and the alternative models captured

by the quadratic function (X)


play important roles in shaping the set Z . These models
also shape our next convex set, which considers the entropy of a Z Z relative to the
baseline model (1).

3.4

Zb

help shape Z , they dont influence a set of martingales Zb


While the drift distortions h
that emerges from studying how Brownian motions disguise probability distortions of a
baseline model, making them difficult to distinguish statistically. To construct Zb we use
Chernoff (1952) entropy, which differs from discounted relative entropy. Although Chernoff

entropys connection to a specific statistical decision problem makes it attractive, it has the
disadvantage that it is less tractable than relative entropy for the types of robust decision

problems that interest us.


In the spirit of Anderson et al. (2003), we use Chernoff (1952) entropy to measure
a distortion Z to a baseline model. Think of a pairwise model selection problem that
statistically compares the baseline model (1) with a model generated by the martingale
Z h . The logarithm of the martingale evolves according to
1
d log Zth = |ht |2 dt + ht dWt .
2
9

Consider a statistical model selection rule based on a data history of length t that takes the
form log Zth log , where Zth is the likelihood ratio associated with the alternative model
for a sample size t. To construct a bound on the probability that this model selection rule
incorrectly chooses the alternative model when the baseline model governs the data, we use
an argument from large deviations theory that starts from the inequality
1{log Z h } = 1{r +r log Z h 0} = 1{exp(r )(Z h )r 1} exp(r )(Zth )r ,
t
t
t
which holds for 0 r 1. The expectation of the term on the left side equals the
probability of mistakenly selecting the alternative model when the data are a sample of
size t generated by the baseline model. We bound this mistake probability for large t by
following Donsker and Varadhan (1976) and Newman and Stuck (1979) and studying
lim sup
t



r 
r 
1
1
log E exp(r ) Zth = lim sup log E Zth
t
t
t

for alternative choices of r. The threshold does not affect this limit. Furthermore, the
limit is often independent of the initial state X0 = x. To get the best bound, we compute
inf lim sup

0r1


r 
1
log E Zth ,
t

a limit that is typically negative because mistake probabilities decay with sample size. A
measure of Chernoff entropy is then
(Z h , x) = inf lim sup
0r1


r 
1
log E Zth .
t

(16)

Appendix E describes how to compute Chernoff entropy.


To help interpret (Z h , x), consider the following argument. If the actual decay rate
of mistake probabilities were constant, then mistake probabilities for two sample sizes
Ti , i = 1, 2, would be
mistake probabilityi = .5 exp(Ti )
for = (Z h , x). We define a half-life as an increase in the sample size T2 T1 > 0 that
multiplies the mistake probability by a factor of .5:
.5 =

mistake probability2
exp(T2 )
.
=
mistake probability1
exp(T1 )
10

So the half-life is approximately


T2 T1 =

log .5
.

(17)

The preceding back-of-the-envelope calculation justifies the detection error bound computed by Anderson et al. (2003). The bound on the decay rate should be interpreted
cautiously because, while it is constant, the actual decay rate is not. Furthermore, the pairwise comparison oversimplifies the challenge truly facing a robust decision maker, which is
statistically to discriminate among multiple models.
We could conduct a symmetrical calculation that reverses the roles of the two models,
so that the h model with martingale Z h becomes the model on which we condition. It is
straightforward to show that the limiting rate remains the same. Thus, when we select a
model by comparing a log likelihood ratio to a constant threshold, the two types of mistakes
share the same asymptotic decay rate.
Our second convex set is a ball formed using Chernoff entropy (16).
Definition 3.5.



Zb = Z h Z : (Z h ; x) .

(18)

The radius of the ball can be adjusted to attain a specified half-life.

Calibrating and

In subsections 4.1 and 4.2, we formulate a robust planning problem for an economy with
a representative consumer having an instantaneous utility function that is logarithmic in
consumption. Associated with the worst-case probability from the robust planning problem
is a greatest lower bound of expected discounted utility over the family Z of alternative
probability distributions. In subsections 4.3, 4.4, and 4.5, we represent the worst-case
probability as a drift distortion to the multivariate Brownian motion in the baseline model
(1). We then use that drift distortion to guide the calibration of the parameters and
that pin down the size of the set Z . In section 5, we show how that same worst-case drift
distortion appears in a recursive representation of competitive equilibrium prices for an
economy with a representative investor. We deduce uncertainty prices and connect them
to the worst-case drift distortion from our robust planning problem.

11

4.1

Robust planners problem

Consider a consumption process C = Y . Guess a value function (x, , ) + log Y . We


use a family martingales in the set Z to represent alternative probabilities. We take the
function to be
= (x),

(x)
where for the moment is an arbitrary parameter and is pre-specified. Eventually, we will
allow to be specified a priori up to a scale determined by a scalar that well calibrate
by imposing a model detection probability half-life defined in terms of Chernoff entropy.
Let be a multiplier on the constraint:
Z

Z
h
i
h
b
b
Z ; (X), x g(x)Q(dx) + g(x) log g(x)Q(x)
0

(19)

In subsection 4.2, for a given (, ) we construct a recursive representation of a worst-case


drift distortion.

4.2

Recursive representation of worst-case drift distortion

Given (, ), we compute h by solving HJB equation


1
0 = min (x, , ) + (.01)(
+ x) (x, )
x + ||2 (x, , )
h
2

+ (.01) h + (x, , ) h + |h|2 (x).


2
2

(20)

The solution of HJB equation (20) is quadratic in x:


(x, , ) =


1
2 (, )x2 + 21 (, )x + 0 (, ) ,
2

which implies that the minimizing h is linear in x:

1
h (x, , ) = [.01 2 (, )x 1 (, )] .

4.3

(21)

Determining

To set , we must decide how to weight the initial state. Previous research by Petersen et al.
b to be a mass point over single value of x and thereby
(2000) and Hansen et al. (2006) set Q
12

effectively conditioned on the initial state. Here we suggest an alternative approach.


b having density q with
Under the baseline model, X has a stationary distribution Q
and variance ||2 /(2
mean /
). Consider any g G. Introduce a nonnegative parameter

that we temporarily take as a fixed number. To make a conservative adjustment of the


probability measure over the initial state, compute g given (, ) by solving
min
gG

b
(x, , ) g(x)Q(dx)
+

Z


b
log g(x)g(x)Q(dx)
.

(22)

The minimizing g is an exponentially tilted density




1
g(x, , ) exp (x, , )

(23)

that is evidently normal with precision


(, ) =
and mean

1
2

2 (, )
2
||

"
#
1 2
1
(, ) =
+ 1 (, ) .
||2

To compute , we substitute g into (22) to obtain the maximand in the following


problem:
max log
>0

Z




1
b
exp (x, , ) Q(dx) .

, ) denote the maximizing , which depends on the value of . If we set and


Let (
defining the set of alternative models, we know and through representation (21) have a
complete characterization of the worst-case model. In the next subsection we describe how
to use a half-life for a model detection statistic to determine . This will tell us how to
scale for a given value of . We could specify a priori. But in the next subsection, we
suggest an alternative approach.

13

Specifying

4.4

Instead of setting a priori, we set it to solve:


Z

, ), ] log g[x, (
, ), ]Q(dx)
b
g[x, (
= .

(24)

, ). Recall the relative entropy concept


Let denote the resulting and let () = (
(g; Z h ) from equation (9)
h

(g; Z )
where

b
g(x)(Z ; x)Q(dx)
+
h

1
(Z ; x) =
2
h

b
g(x) log g(x)Q(dx),



exp(u)E h |hu |2 |X0 = x .

The relative entropy measure includes an adjustment for distorting the initial distribution.
R
b
By imposing (24), we have set exactly to offset the term g(x) log g(x)Q(dx)
when
evaluating the constraint (19) at the minimizing choice of g and Z.

4.5

Setting via Chernoff entropy

We refine a suggestion of Anderson et al. (1998). Compute h [x, (), ] and evaluate the
associated Chernoff entropy. Then adjust to match a target half-life.8 A larger value of
should lead to a smaller half-life. Call the resulting , and let
g (x) = g [x, ( ), ]

Robust Portfolio Choice and Pricing

In this section, we describe equilibrium prices that reconcile a representative consumer to


bearing risks presented by the environment described by baseline model (1) in light of his
concerns about model misspecification as expressed with the set Z . We construct equilibrium prices by appropriately extracting shadow prices from the robust planning problem of
subsection 4.1. We decompose risk prices into separate equilibrium compensations for bearing risk and for bearing model uncertainty. We also describe an equilibrium term structure
8

We expect but havent proved that the half-life is monotone in .

14

of compensations for bearing model uncertainty. We begin by posing the representative


consumers portfolio choice problem.

5.1

Robust representative consumer portfolio problem

A representative consumer faces a continuous time Merton portfolio problem in which


individual wealth K evolves as
dKt = Ct dt + (.01)Kt (Xt )dt + Kt At dWt + (.01)Kt (Xt ) At dt,

(25)

where At = a is a vector of chosen risk exposures, (x) is the instantaneous risk free rate
expressed as a percent, and (x) is the vector of risk prices evaluated at state Xt = x.
Initial wealth is K0 . The investor has discounted logarithmic preferences but distrusts his
probability model.
Key inputs to a representative consumers robust portfolio problem are the baseline
model (1), the wealth evolution equation (25), the vector of risk prices (x), and the
quadratic function in (12) that defines the alternative explicit models that concern the
representative consumer. As in the robust planners problem analyzed in section 4.1, let
be a penalty parameter or Lagrange multiplier on the constraint (19). For the recursive
competitive equilibrium, we take (, ) as given. We described how we calibrate these
parameters in section 4.
Under the guess that the value function takes the form (x, , ) + log k + log , the
HJB equation for the robust portfolio allocation problem is
0 = max min (x, , ) log k log + log c
a,c

c
+ (.01)(x)
k


|a|2 
+ (.01)(x) a + a h
+
x (x, , ) + h (x, , )
 22

|h|
 2
||2
2 x + 21 x + 0 .
(x, , ) +

+
2
2
2

(26)

First-order conditions for consumption are

1
= ,

c
k
which implies that c = k, an implication that flows partly from the representative consumers unitary elasticity of intertemporal substitution. First-order conditions for a and h
15

are
(.01)(x) + h (x, , ) a (x, , ) = 0

(27a)

a (x, , ) + h (x, , ) + (x, , ) = 0.

(27b)

Here we appeal to arguments like those in Hansen and Sargent (2008, ch. 7) to justify
stacking first-order conditions and not worrying about who goes first in the two-person
zero-sum game.9

5.2

Competitive equilibrium prices

We show here that the drift distortion h that emerges from the robust planners problem of
subsection 4.1 determines prices that a competitive equilibrium awards for bearing model
uncertainty. To compute a vector (x) of competitive equilibrium vector of risk prices, we
find a robust planners marginal valuation of exposure to the W shocks. We decompose
that price vector into separate compensations for bearing risk and for accepting model
uncertainty.
Noting from the robust planners problem that the shock exposure vectors for log K
and log Y must coincide implies
a = (.01).
Thus, from (27a), = , where
(x) = 100h(x, ).

(28)

Similarly, in the problem for a representative consumer within a competitive equilibrium,


the drifts for log K and log Y must coincide:
+ (.01)(x) + (.01)[(.01) h (x)]

.0001
= (.01)(
+ x),
2

so that = , where
(x) = 100 + (
+ x) + h (x, )
9

.01
.
2

(29)

If we were to use a timing protocol that allows the maximizing player to take account of the impact
of its decisions on the minimizing agent, we would obtain the same equilibrium decision rules as those
described in the text.

16

We can use these formulas for equilibrium prices to construct a solution to the HJB equation
of a representative consumer in a competitive equilibrium by letting = and g = (.01).

5.3

Reinterpreting the worst-case portfolio problem

As described by Hansen et al. (2006), there is an ordinary (i.e., non-robust) portfolio


selection problem that confronts a representative consumer who has a completely trusted
model for the exogenous state dynamics and whose decision rule matches the decision rule
that attains the value function associated with the HJB equation (26) for the robust
representative consumer portfolio problem. This consumers completely trusted model
of course differs from the baseline model (1). Hansen et al. (2006) call this an ex post
problem because it comes from exchanging orders of maximization and minimization in
the two-person zero-sum game that gives rise to the robust portfolio choice rule. This ex
post portfolio problem is a special case of a Merton problem with a state evolution that is
distorted relative to the baseline model. The distorted evolution imputes to the process W
a drift h so that
dWt = h (Xt )dt + dWt
= (0 + 1 Xt ) dt + dWt ,
where W is a multivariate standard Brownian motion under the h probability distribution.
Thus,
d log Yt = .01(
+ Xt )dt + (.01)h (Xt )dt + (.01)
dWt
dXt =
Xt dt + h (Xt )dt + dWt .
A value function e + log a satisfies the HJB equation

0 = max (x) log k + log c


a,c

[
x +

1 x] e (x)

First-order conditions are

|a|2
c
+ (.01)(x) + (.01)(x) a + a (0 + 1 x)
k
2

||2 e
+
(x).
2

1
=0

c
k
17

(.01)(x) + 0 + 1 x a = 0,
which lead to decision rules c = k and
a = (.01)(x) + 0 + 1 x.
Because the exposure and drift for log K and log Y should coincide in equilibrium, it follows
that
+ (.01)(x) + .01 (x) + .0001

.0001
= (.01) [
+ x + h (x)] .
2

Thus, the ordinary decision rules that solve the ex post portfolio problem imply the same
equilibrium prices as the robust portfolio problem, so that = and = , as given by
(28) and (29), respectively.

5.4

Term structure of uncertainty prices

We now study how competitive equilibrium uncertainty prices vary over an investment horizon by computing a pricing counterpart to an impulse response function. Our continuoustime formulation means that the pertinent shock that occurs during the next instant is
an incremental change that will have incremental effects on prices across all future time
periods. An asset exposed to these shocks earns compensations that depend on the horizon.
Shock-price elasticities are state dependent because they vary with the growth state. In
this section, we compute the elasticities and produce what we regard as a dynamic value
decomposition. We present a quantitative example in section 7.3.10
5.4.1

Local uncertainty prices

The equilibrium stochastic discount factor process for our robust representative consumer
economy is
1
d log St = dt .01 (
+ Xt ) dt .01 dWt + ht dWt |ht |2 dt.
2

(30)

The stochastic discount factor has a linear local mean and a quadratic local variance.
The exponential-quadratic formulation has been used extensively in empirical asset pricing
10

Hansen (2011) presents an overview of such shock elasticities.

18

applications. Duffie and Kan (1994) described term structures of interest rates implied by
models with affine stochastic discount factors. Ang and Piazzesi (2003) estimated a term
structure model with an affine stochastic discount factor process driven by macroeconomic
variables.
The entries of the vector, (Xt ) given by (28), which equal minus the local exposures to
the Brownian shocks, are usually interpreted as local risk prices, but we shall reinterpret
them. Motivated by the decomposition
minus stochastic discount factor exposure =

.01
risk price

ht ,
uncertainty price

we prefer to think of .01 as risk prices induced by the curvature of log utility and ht
as uncertainty prices induced by a representative investors doubts about the baseline
model. Here ht = 0 + 1 x, as described in equation (21). When 1 = 0, ht is constant;
but when 1 differs from zero, the uncertainty prices ht = h (Xt ) are time varying and
depend linearly on the growth state Xt . When the dependence of h on x is positive, these
uncertainty prices are higher in bad times than in good times. Countercyclical uncertainty
prices emerge endogenously from a baseline model that excludes stochastic volatility in the
underlying consumption risk as an exogenous input. Such fluctuations emerge endogenously
from a baseline model that excludes stochastic volatility in the underlying consumption risk
as an exogenous input. Stochastic volatility models introduce new risks to be priced while
also inducing fluctuations in the prices of the original risks. The mechanism in this
paper simultaneously enhances and induces fluctuations in the uncertainty prices, but it
introduces no new sources of risk. Instead, it focuses on the impact of uncertainty about
the implications of those risks.
Following Borovicka et al. (2011), we assign horizon-dependent uncertainty prices to risk
exposures. To represent shock price elasticities, we study the dependence of logarithms of
expected returns on an investment horizon. The logarithm of the expected return from a
consumption payoff at date t is
log E


!
"  
#
Ct
Ct
X0 = x log E St
X0 = x .
C0
C0

(31)

The first term captures the expected payoff and the second the cost of the payoff. A shock
in the next instance affects the consumption and the stochastic discount factor processes.

19

In continuous time, this leads formally to what is called a Malliavin derivative. There is
one such derivative for each Brownian increment. The date zero shock influences both the
expected asset payoff at date t (aggregate consumption in this case) and the cost of an asset
with this payoff. Its impact on the logarithm of the expected return is the price elasticity
and its impact on the logarithm of the expected payoff is the exposure elasticity.
Consider initially the expected payoff term on the left side of (31). Let D0 Ct denote
the derivative vector of Ct with respect to dW0 . The familiar formula for a derivative of a
logarithm applies so that
D0 Ct = Ct D0 log Ct .
A contribution to the elasticity vector for horizon t is
E

i
log Ct |X0 = x
h
i
.
E CC0t |X0 = x

Ct
D
C0 0

There is a distinct elasticity for each shock. Since log C has linear dynamics, D0 log Ct can
be shown to be the same as the vector of impulse responses of log Ct to shocks at date zero,
which does not depend on the Markov state. We call this an exposure elasticity.
Consider next the cost term on the right-hand side of (31). For the product M = SC, a
calculation analogous to the preceding one confirms that the contribution of the cost term
to the elasticity is
i
h
Mt
D
log
M
|X
=
x
E M
0
t
0
0
h
i
.
Mt
E M
|X
=
x
0
0

The dynamic evolution of the stochastic discount factor is not linear in the state variable,
and as a result
D0 log Mt = D0 log St + D0 log Ct
is no longer a deterministic function of time.
The shock price elasticity combines these calculations:
E

h
h
i
i
i
Mt
Mt
E M
E
log Ct |X0 = x
D
log
M
|X
=
x
D
log
S
|X
=
x
0
t
0
t
0
M0 0
0
i
i
i
h
h
h

=
. (32)
Mt
Mt
|X
=
x
|X
=
x
E CC0t |X0 = x
E M
E
0
0
M0
0

Ct
D
C0 0

This is a valuation analogue of the impulse response function routinely estimated by em-

20

pirical macroeconomists. As we vary t 0, we trace out a trajectory of uncertainty price


elasticities for each shock. We give nearly analytical formulas for these in Appendix F.

A quantitative example

For a laboratory, we use our baseline model (1) evaluated at the following maximum likelihood estimates computed by Hansen and Sargent (2010):11

= .465,
"
#
.468
=
0
= 0

= .185
"
#
0
=
.149

(33)

We consider Chernoff entropy balls associated with half-lives of 40 quarters, 80 quarters,


and 120 quarters. For comparison, we include a model with no concern for robustness, which
is equivalent to a half-life equal to infinity. We consider two specifications of :
= x2
(x)
= 1.
(x)
Tables 1 and 2 report worst case models that emerge from the robust planners HJB

equation (20) for the specifications (x)


= x2 and (x)
= 1, respectively. The worst-case
models endow X with a negative mean given by . This in turn shifts the implied growth
in consumption. Since the worst case model also includes a change in , we report the
composite outcome + . As we reduce the half-life, the worst-case model makes the
constant parameter adjustment smaller. When = x2 , we also increase the persistence in
the growth-rate process. Notice that while the minimizing agent could choose to reduce
persistence even more, he instead chooses to allocate some of the entropy distortion to the
11

The estimates are for Y being consumption of nondurables and services for aggregate U.S. data over
the period 1948II to 2009 IV.

21

constant terms.
Consistent with findings of Anderson et al. (2003) and Hansen and Sargent (2010), when
= 1, there is no change in the persistence parameter . The worst-case model targets
(x)
the constant terms in the consumption evolution and the state evolution. The worst-case
analysis reduces to a determination of how much to distort the respective constant terms
only.
Half-Life

+ (/)

0.4650

0.1850

0.4650

120

0.4270 -0.0255 0.1491

0.2562

80

0.4239 -0.0301 0.1361

0.2024

40

0.4228 -0.0391 0.1073

0.0579

= x2 associated with HJB equation (20).


Table 1: Worst-case parameter values for (x)

Half-Life

+ (/)

0.4650

0.1850

0.4650

120

0.4140 -0.0276 0.1850

0.2648

80

0.4026 -0.0338 0.1850

0.2198

40

0.3768 -0.0478 0.1850

0.1182

= 1 associated with HJB equation (20).


Table 2: Worst-case parameter values for (x)

Corresponding to each of our two specifications of (x),


figures 1 and 2 plot the expected
consumption growth over horizon t. The expected growth rate at t is:
h
i
1 0 exp

"
# !" #
0 1


t
= + exp(t).

0

Integrating this growth rate over an interval [0, t] gives a worst-case trend for log consumption:



t + 2 exp(t) 2
+
(34)

Notice that the initial growth trend growth rate is and that the eventual growth rate is
+ . In this calculation, we impose the distorted model starting at date zero and consider
22

its implications going forward. The shift in the constant term for the evolution of X has
no immediate impact on the growth of log C. Its eventual impact is determined in part by
the persistence parameter . We applied formula (34) to alternative models including the
worst-case models to compute the long-run drifts reported in Figures 1 and 2.

Figure 1: Long-run drift of log Ct for the three target half-lives when = x2 .

23

Figure 2: Long-run drift of log Ct for the three target half-lives when = 1.

Next we consider the distributional impacts. The new information about log Ct log C0
(scaled by 100) is:
Z tZ
0

exp(v) dBuv du +

dBu =
0

Z tZ
0

exp[(u r)] dBr du +

Z tZ
0

1
=

exp[(u r)]du dBr +

r
t

Z 0t

dBu
dBu

exp(r) [exp(r) exp(t)] dBr


0

+
dBu
0
Z
Z t
1 t
[1 exp[(t r)]] dBr +
=
dBu
0
0
The variance is .0001 times the following object
1
2

[1 2 exp[(t r)] + exp[2(t r)]||2dr + ||2t

2
1
1
= 2 ||2 t + ||2t 3 [1 exp(t)]||2 + 3 [1 exp(2t)]||2 ,

24

where we have used the fact that = 0. Using this calculation, figures 3 and 4
display the interdecile ranges of the distribution for consumption growth over alternative
horizons. These figures depict deciles for both the baseline model and the worst-case models
associated with a half-life of 80. The region between the deciles illustrates a component
of risk in the consumption distribution. The variation across the baseline and worst-case
models reflects a broader notion of uncertainty driven by skepticism about the baseline
model. The upper decile of the worst-case model overlaps the lower decile of the baseline
model in both figures. The interdecile range is somewhat larger when is quadratic in x
than when is constant.

Figure 3: Expected values of log Ct scaled by 100 for the baseline model and for a half-life
of 80 when = x2 . The shaded black and red areas show the .1 and .9 interdecile ranges
under the baseline model and the worst-case model for a half-life of 80. The black line is
the mean growth for the baseline model, and the red circle line is the mean growth for the
worst-case model.

25

Figure 4: Expected values of log Ct scaled by 100 for the baseline model and for a half-life
of 80 when = 1. The shaded black and red areas show the .1 and .9 interdecile ranges
under the the baseline and the worst-case model for a half-life of 80. The black line is the
mean growth for the baseline model, and the red circle line is the mean growth for the
worst-case model.

Comparing sets of models

We have used the set of models




Z 


h
h

b
Z = Z : Z ; , x g (x)Q(dx)
0

to describe our robust representative consumers concerns about misspecification of his


baseline model 1. In this section, we compare Z to a corresponding Chernoff entropy ball
Zb and also to another set called an entropy ball that we now describe.

26

7.1

Entropy ball

Anderson et al. (2003) and Hansen and Sargent (2010) focused primarily on entropy balls.
Here we are interested in constructing a new set Z that we define as the smallest entropy
ball that contains Z . An entropy ball is a family of Z h s that satisfy12
h

(Z ) =

Z Z

exp(t)E

Zth

log Zth |X0



1
b
= x g (x)Q(dx)

2

(35)

for some constant > 0. By constructing an entropy ball that contains Z , we compute
how large relative entropy can be for martingales in the set Z .
and pose a maximum
To determine this magnitude, we take the quadratic function (x)
problem that starts from the observation that a martingale Z h that is biggest in terms of
its relative entropy satisfies the constraint in definition 3.4 at equality:
Z

h
i
t ), x q (x)Q(dx)
b
Z h ; (X
= 0.

This leads us to maximize the discounted objective


Z  Z
E

t )dt|X0
exp(t)Zth (X

subject to the constraint:


Z


b
= x g (x)Q(dx)



x g (x)Q(dx)
b
Z h ; ,
0.

Associated with this optimization problem is the HJB equation:


1
1
(x, )
0 = max (x, ) + (x)
x + ||2 (x, )
h
2
2

+ (x, ) h |h|2 + (x),


2
2

(36)

where we can regard as a Kuhn-Tucker multiplier on a relative entropy constraint. Be12

We use the term ball loosely because typically a ball in mathematics is defined using a metric. Although
relative entropy quantifies statistical discrepancy, it is not a metric because it depends on which of two
models is taken as the baseline model.

27

cause the maximizing h takes the form


h = 0 () + 1 ()X,
it is one of the models in the parametric class from section 3.1. To compute in constraint
(35), we solve
Z
b
= 2 min (x, )g (x)Q(dx).
0

Let

.
Z=

By construction Z Z.

7.2

Z :


1
b
Z ; x g (x)Q(dx)

2
h

Comparing sets

b While it is
We compare intersections of Z o with each of the three sets Z , Z, and Z.

tractable to use the sets Z and Z to formulate robust decision problems, these sets are
not directly linked to statistical discrimination problems. The set Zb is closely linked to
statistical discrimination, but for forming robust decision problems it is not as tractable
b at least in
as the other two. It would be comforting if Z were closely to approximate Z,
regions near the worst-case model that emerges from the robust planners HJB equation
(20).


We compute and report the projection Zb Z o of the Chernoff ball on Z o for three
half-lives in figure 6. We represent these projections using the three parameter values that

characterize Z o . For comparison, we also report (Z Z o ). The sets are distinct, but
the big differences occur for larger values of at which the Chernoff ball contains points
not included in Z . Such large values of turn out not the be the ones that the robust

planner most fears. Overall, the regions are closer for longer specifications of the half-life
of Chernoff entropy.

28

Figure 5: Projections of Sets I and II onto three-parameter axis. From Top to Bottom:
Target half-lives 120, 80, and 40, respectively. Left: = x2 . Right: = 1. (Z Z o ) shown
in blue mesh. (Zb Z o ) shown in yellow. The solution to the robust planners problem is
shown as the red point.

29

Half-Life

69.78
42.19
16.65

+ (/)
0.4650
0
0.1850
0.4650
0.3982 -0.0362 0.1850
0.2024
0.3791 -0.0466 0.1850
0.1273
0.3282 -0.0742 0.1850
-0.0726

= x2 under the set (Z Z o ).


Table 3: Worst-case Parameter Values for (x)

In figure 6, we compare the entropy ball (Z Z o ) projection to both (Z Z o ) and

(Zb Z o ) when (x)


= x2 . The left side of this plot shows how large an entropy ball

would have to be to contain the set used in the robust planning problem affiliated with
HJB equation (20). The right side compares the Chernoff ball to this entropy ball. As is

evident from this figure, the resulting entropy ball is much larger. When we solve the robust
planners problem with this constructed ball, we reduce the implied half lives, as reported
in Table 7.2, decline from 120, 80 and 40 to 70, 42 and 17, respectively. The constant terms
for both the consumption equation (the first equation of (10)) and the consumption growth
equation (the second equation of (10)) are reduced while the autoregressive parameter is
not altered in comparison to Table 6.

30

Figure 6: Comparing entropy balls to sets I and II when = x2 . From Top to bottom:
half-lives 120, 80, and 40, respectively. (Z Z o ) shown in blue, (Z Z o ) shown in black
mesh. (Zb Z o ) shown in yellow. The solution to the robust planners problem is the red

point.

31

7.3

Shock price elasticities

Figure 7: Shock-price elasticity to a shock to X for the three target half-lives when = x2 .
From Top to Bottom: the half-lives 120, 80, and 40, respectively. The shaded regions show
interquartile ranges of the shock-price elasticities under the stationary distribution for X.

32

The uncertainty price elasticities depend on the initial state x. In figure 7, we display
the shock elasticities evaluated at the median and the two quartiles of the stationary distribution for X. We shade in interquartile ranges. Figure 7 shows shock price elasticity
trajectories for the growth rate shock. They are nearly constant across horizons. Increasing
the concern for robustness, as reflected by the sizes of the associated Chernoff half lives,
makes the elasticities larger and increases their variation across horizons.

Concluding remarks

We have applied our proposal for constructing a set of models surrounding a decision
makers baseline probability model to an asset pricing model in which a representative
consumers responses to model uncertainty make so-called prices of risk be countercyclical.
We say so-called because they are actually compensations for model uncertainty, not risk.
And their countercyclical components are entirely due to fears of model misspecification.
We have produced an affine model (30) of the log stochastic discount factor whose so-called
risk prices reflect a robust planners worst-case drift distortions ht . We describe how these
drift distortions should be interpreted as prices of model uncertainty. The dependency of
these uncertainty prices ht on the growth state x, and thus whether they are pro cyclical or

countercyclical, is shaped partly by a function (x)


that describes some particular models
that serve as alternatives to a baseline model. In this way, the theory of countercyclical
risk premia in this paper is all about how our robust consumer responds to the presence
of the particular alternative models among a huge set of more vaguely specified alternative
models that concern our representative consumer. We have demonstrated that this is a
simple way of generating countercyclical risk premia.
It is worthwhile comparing this papers way of inducing countercyclical risk premia with
three other macro/finance models that also get them. Campbell and Cochrane (1999) proceed in the standard rational expectations single-known-probability-model tradition and so
exclude any fears of model misspecification from the mind of their representative consumer.
They construct a history-dependent utility function in which the history of consumption
expresses an externality. This history dependence makes the consumers local risk aversion
depend in a countercyclical way on the economys growth state. Ang and Piazzesi (2003)
use an affine stochastic discount factor in a no-arbitrage statistical model and explore links
between the term structure of interest rates and other macroeconomic variables. Their
approach allows movements in risk prices to be consistent with historical evidence without
33

specifying a general equilibrium model. A third approach introduces stochastic volatility into the macroeconomy by positing that the volatilities of shocks driving consumption
growth are themselves stochastic processes. A stochastic volatility model induces time
variation in risk prices via exogenous movements in the conditional volatilities impinging
on macroeconomic variables.
What drives countercyclical risk prices in Hansen and Sargent (2010) is a particular
kind of robust model averaging occurring inside the head of the representative consumer.
The consumer carries along two difficult-to-distinguish models of consumption growth, one
asserting i.i.d. log consumption growth, the other asserting that the growth in log consumption is a process with a slowly moving conditional mean.13 The consumer uses observations
on consumption growth to update a Bayesian prior over these two models, starting from
an initial prior probability of .5. The prior wanders over the post WWII sample for US
data, but ends up about where it started. Each period, the Hansen and Sargent representative consumer expresses his specification distrust by exponentially twisting a posterior over
the two baseline models in a pessimistic direction. That leads the consumer to interpret
good news as temporary and bad news as persistent, causing him to put countercyclical
uncertainty components into equilibrium risk prices.
In this paper, we propose a different way to induce variation in risk prices. We abstract
from learning and instead consider alternatives models with parameters whose future variations are not discernible from from the past. These time-varying parameter models differ
from the decision makers baseline model, a fixed parameter model whose parameters can
be well estimated from historical data. We ensure that among the class of alternative models are ones that allow for parameters persistently to deviate from those of the baseline
model in statistically subtle and time-varying ways. In addition to this class of alternative
models, the decision maker also includes other statistical specifications in the set of models
that concern him. The robust planners worst-case model responds to these forms of model
ambiguity partly by having more persistence than in a baseline models. Our approach gains
tractability because the worst-case model turns out to be a time-invariant model in which
projections for long-term growth are more cautious and stochastic growth is more persis13
Bansal and Yaron (2004) and Hansen and Sargent (2010) both start from the observation that two such
models are difficult to distinguish empirically, but they draw different conclusions from that observation.
Bansal and Yaron use the observation to justify a representative consumer who with complete confidence
embraces one of the models (the long-run risk model with persistent log consumption growth), while Hansen
and Sargent use the observation to justify a representative consumer who initially puts prior probability
.5 on both models and who continues to carry along both models when evaluating prospective outcomes.

34

tent than in the baseline model. Worst-case shock distributions are shifted in an adverse
fashion and with additional persistence that gives rise to enduring effects on uncertainty
prices. Adverse shifts in the shock distribution that drive up the absolute magnitudes of
uncertainty prices larger were also present in some of our earlier work (for example, see
Hansen et al. (1999) and Anderson et al. (2003)). In this paper, we induce state dependence
in uncertainty prices in a different way, namely, by specifying the set of alternative models
to capture concerns about the baseline models specification of persistence in consumption
growth.
Models of robustness and ambiguity aversion bring new parameters. In this paper,
we extend our earlier work on Anderson et al. (2003) on restricting these parameters by
exploiting connections between models of statistical model discrimination and our way of
formulating robustness. We build on mathematical formulations of Newman and Stuck
(1979), Petersen et al. (2000), and Hansen et al. (2006). We pose an ex ante robustness
problem that pins down a robustness penalty parameter by linking it to an asymptotic
measure of statistical discrimination between models. This asymptotic measure allows us to
quantify a half-life for reducing the mistakes in selecting between competing models based
on historical evidence. A large statistical discrimination rate implies a short half-life for
reducing discrimination mistake probabilities. Anderson et al. (2003) and Hansen (2007)
had studied the connection between conditional discrimination rates and uncertainty prices
that clear security markets. By following Newman and Stuck (1979) and studying asymptotic rates, we link statistical discrimination half-lives to calibrated equilibrium uncertainty
prices.

35

Z Reconsidered

be a vector process adapted to the filtration F = {Ft : t 0}. Replace h with h h

Let h

in formula (5) to construct a martingale Z hh that determines what we shall call an h h


model.
Let Bt be a bounded Ft -measurable random variable. Define


E h (Bt |X0 ) E Zth Bt |X0
h
i

E hh (Bt |X0 ) E Zthh Bt |X0 .

Here E h denotes an expectation under the h model and E hh denotes an expectation under
model.
the h h
Zh
By using ht h as a Radon-Nykodym derivative at time t, we can represent the h model
Zt

model:
in terms of the h h


E h (Bt |X0 ) = E Zth Bt |X0
"
!
#
Zth

=E
Zthh Bt |X0

hh
Z
!
#
"t
h
Z

t
Bt |X0 .
= E hh

hh
Zt
Recall that under the h probability distribution, W h is a multivariate standard Brownian
motion where from (6), dWt = ht dt + dWth . Thus,

t ) dWt 1 |ht h
t |2 dt
d log Zthh = (ht h
2
h
t |2 dt + (ht h
t ) ht
t ) dW 1 |ht h
= (ht h
t
2
t |2 dt.
t ) dW h + 1 |ht |2 dt 1 |h
= (ht h
t
2
2

(37)

Conditioned on date zero information, the discounted relative entropy of the h model
model is:
with respect to the h h

Z

h 

i
1 h

h
h
hh
2

exp(t)E Zt log Zt log Zt


|X0 = x dt = E
exp(t)|ht | dt | F0
2
0
(38)
36

where we have used integration by parts and the evolution in (37). We are interested in the
discrepancy between: (i) the relative entropy (8) of h with respect to the baseline model,
model:
and (ii) the relative entropy (38) of the h model with respect to the h h
Z


h 2
Z ; |h| , x =




exp(t)E Zth log Zth |X0 = x dt

h 

i

exp(t)E Zth log Zth log Zthh |X0 = x dt


Z 0
h

i
1
t |2 |X0 = x dt.
exp(t)E h |ht |2 |h
=
2 0

(39)

Construction of entropy ball

For a given multiplier, write the value function that solves HJB equation (36) in the form:
(x, ) =
which gives us


1
2 ()x2 + 2 1 ()x + 0 ()
2

1
[ 2 ()x + 1 ()] .

We can solve for 2 , 1 , and 0 by comparing the coefficients for x2 , x and the constant
h (x, ) =

terms, respectively. Solving first for 2 :


q

)2 4 ||2 2 (+1)
+ 2
( + 2

.
2 () =
2
2 ||

1
+ ||2 2 () 1 () = 0.
1 () + ( + 1)1 1 ()

Thus

Finally,

1 () = 2( + 1)

1
2

( + 2
) 4 ||

2 (+1)

1
( + 1)
1

0 + 2 ()||2 + 1 ()2 ||2 = 0.


0 () +
2
2
2
2
Thus



1
1
2
2
2
( + 1)0 + 2 ()|| + 1 () || .
0 () =

37

To build the associated entropy ball, we construct




= min 2 ()x2 + 2 1 ()x + 0 ()
0

for X0 = x.

Asymptotic error rates

Consider an alternative model:


ft
d log Yt = (.01) ( + Xt ) dt + (.01) dW
ft .
dXt = dt Xt dt + dW

i) Input values of , , and . Construct the implied h(x) = 1 x + 0 by solving;


"
# " #

= 0


"
# " #
0

= 1

for 1 and 0 .
,
and
ii) For a given r, construct 0 , 1 , 2 ,
, ,
from:
(r + r2 )|h(x)|2 = (r + r2 )|0 + 1 x|2 = 0 + 21 x + 2 x2

= (1 r)
+ r
= (1 r) + r
= r

= (1 r)
+ r
iii) Solve
=


1
0 + 21 x + 2 x2 + ( x
)(log e) (x)
2
38

(log e) (x) 2 [(log e) (x)]2 2


|| +
||
2
2

where log e(x) = 1 x + 21 2 x2 . Thus,

2 =

q
(
)2 + 2 ||2
||2

Given 2 , 1 solves
2=0
1
1 + 1 2 ||2 +
or
1 =
Finally,

2
2
1
1
q
.
=

2 ||2

(
)2 + 2 ||2

1
1
1
1.
= 0 ||2 2 ||2 (1 )2
2
2
2

iv) Repeat for alternative rs and maximize as a function of r.

Robust value function

We solve for and .

D.1

Solving for

Consider
1
0 = min (x) + (.01)(
+ x) + (x)( +
x) + ||2 (x, )
h
2


2 x2 + 21 x + 0
+ (.01) h + (x, ) h + |h|2
2
2

Recall that the value function is quadratic


(x, ) =
which implies


1
2 ()x2 + 21 ()x + 0 ()
2

1
h(x, ) = [.01 2 ()x 1 ()] .

39

We can solve for 2 , 1 , and 0 by matching the coefficients for x2 , x and the constant
terms, respectively. Solving first for 2 :
q

2
2
) 4 || 2
+ 2
( + 2

2 () =
2
2 ||

=
2 ,

where the last equation essentially defines 2 .

.01 .01 ( ) 2 +
2 + 1
q
1 () =2
+ ( + 2
)2 4 ||2 2


1
1
2
2
1 () + || 2 + |.01 1 ()| + 0
+
0 () = .02

For convenience, we write


1 () =
1,1 + 1,0
0 () =
0,1 + 0,0 + 0,1 1 .

D.2

Solving for

We want to solve
||2
2
2 () ||2

1
max

1 ()2 +


 1

log 2
2 () ||2 log (2
) 0 () .
2
2

satisfies
=

2
||2 1,0
+ (2
2 ||2 ) 0,1
2
||2 1,1
+ (2
2 ||2 ) [ log (2
2 ||2 ) + log (2
) + 0,1 + 2]

40

Operationalizing Chernoff entropy

Here is how to compute Chernoff entropies for parametric models of the form (10). Because
the hs associated with them take the form
ht = (Xt ),
these alternative models are Markovian. This allows us to compute Chernoff entropy by
using an eigenvalue approach of Donsker and Varadhan (1976) and Newman and Stuck
r
(1979). We start by computing the drift of Zth f (Xt ) for 0 r 1 at t = 0:
. (r + r2 )
|(x)|2f (x) + rf (x) (x)
[Gf ](x) =
2
f (x) 2

|| ,
f (x)x +
2

where [Gf ](x) is the drift given that X0 = x. Next we solve the eigenvalue problem
[G(r)]e(x, r) = (r)e(x, r),
whose eigenfunction e(x, r) is the exponential of a quadratic function of x. We compute
Chernoff entropy numerically by solving:
(Z h , x) = max (r).
r[0,1]

To deduce a corresponding equation for log e, notice that


(log e) (x) =
and

e (x)
e(x)

 2
e (x)
e (x)

.
(log e) (x) =
e(x)
e(x)

For a positive f
[Gf ](x) . (r + r2 )
=
|h(x)|2 + r(log f ) (x) h(x) (log f ) (x)x
f (x)
2
log f (x) 2 [log f (x)]2 2
+
|| +
|| .
2
2
41

(40)

Using formula (40), define

Then we can solve


[Gf ](x)
.
G(log f ) (x) =
f (x)



G(log e) (x) =

for log e to compute the positive eigenfunction e.


These calculations allow us numerically to compute the largest and smallest Chernoff
entropies attained by members of the set Z o .

Computing shock elasticities

We compute shock price elasticities in four steps:


i) Stochastic impulse response for log S. We solve the recursion:

d log St1 = .01Xt1 dt + h1t dWt ht h1t dt


dXt1 =
Xt1 dt
dXt =
Xt + dWt
ht = 0 + 1 Xt
h1t = 1 Xt1
where X01 = u and log S01 = .01 u + h (x) u. The quadratic terms in the
evolution equation of log S make log St1 stochastic.
ii) Deterministic impulse response for log C. We solve the recursion:
d log Ct1 = .01Xt1dt
dXt1 =
Xt1 dt,

where log C01 = (.01) u and X01 = u. Thus,


log Ct1 =

.01
[1 exp (
t)] u + (.01) u

42

iii) Compute

where Mt = St

Ct
C0


. Note that

E (Mt log St1 |X0 = x)


E (Mt |X0 = x)

1
d log Mt = dt + ht dWt |ht |2 dt.
2
Let dWt have drift ht and compute expectations conditioned on X0 = x recursively:
ft ) ht h1t dt
d log St1 = .01Xt1 dt + h1t (ht dt + dW
ft
= .01X 1 dt + h1 dW
t

dXt1

Xt1 dt

ht = 0 + 1 Xt
h1t = 1 Xt1 ,

where X01 = u and log S01 = (.01) u + h (x) u. Thus,


.01
E (Mt log St1 |X0 = x)
=
[1 exp (
t)] u .01 u + h (x) u.
E (Mt |X0 = x)

iv) Construct elasticities:


(a) Shock-exposure elasticity for consumption:
E


log Ct1 |X0 = x
.01


=
[1 exp(
t)] u + .01 u,

E CC0t |X0 = x

Ct
C0

which is also the continuous time impulse response for log C.


(b) Shock-price elasticity
E


log Ct1 |X0 = x
E (Mt log St1 |X0 = x)
E [Mt (log St1 + log Ct1 ) |X0 = x]


=

E (Mt |X0 = x)
E (Mt |X0 = x)
E CC0t |X0 = x

Ct
C0

.01
[1 exp(
t)] u + .01 u h (x) u.

43

which implements formula (32).

44

References
Anderson, Evan W., Lars Peter Hansen, and Thomas J. Sargent. 1998. Risk and Robustness
in Equilibrium. Available on webpages.
. 2003. A Quartet of Semigroups for Model Specification, Robustness, Prices of Risk,
and Model Detection. Journal of the European Economic Association 1 (1):68123.
Ang, Andrew and Monika Piazzesi. 2003. A No-Arbitrage Vector Autoregression of the
Term Structure Dynamics with Macroeconomic and Latent Variables. Journal of Monetary Economics 50:745787.
Bansal, Ravi and Amir Yaron. 2004. Risks for the Long Run: A Potential Resolution of
Asset Pricing Puzzles. Journal of Finance 59 (4):14811509.
Borovicka, Jaroslav, Lars Peter Hansen, Mark Hendricks, and Jose A. Scheinkman. 2011.
Risk-Price Dynamics. Journal of Financial Econometrics 9 (1):365.
Campbell, John Y. and John Cochrane. 1999. Force of Habit: A Consumption-Based Explanation of Aggregate Stock Market Behavior. Journal of Political Economy 107 (2):205
251.
Chen, Zengjing and Larry Epstein. 2002. Ambiguity, Risk, and Asset Returns in Continuous
Time. Econometrica 70:14031443.
Chernoff, Herman. 1952. A Measure of Asymptotic Efficiency for Tests of a Hypothesis
Based on the Sum of Observations. Annals of Mathematical Statistics 23 (4):pp. 493507.
Donsker, Monroe E. and S. R. Srinivasa Varadhan. 1976. On the Principal Eigenvalue
of Second-Order Elliptic Differential Equations. Communications in Pure and Applied
Mathematics 29:595621.
Duffie, Darrell and Rui Kan. 1994. Multi-Factor Term Structure Models. Philosophical
Transactions: Physical Sciences and Engineering 347 (1684):577586.
Gilboa, Itzhak and David Schmeidler. 1989. Maxmin expected utility with non-unique
prior. Journal of Mathematical Economics 18 (2):141153.
Hansen, Lars Peter. 2007. Beliefs, Doubts and Learning: Valuing Macroeconomic Risk.
American Economic Review 97 (2):130.
45

. 2011. Dynamic Valuation Decomposition within Stochastic Economies. Econometrica 80 (3):911967. Fisher-Schultz Lecture at the European Meetings of the Econometric
Society.
Hansen, Lars Peter and Thomas Sargent. 2010. Fragile beliefs and the price of uncertainty.
Quantitative Economics 1 (1):129162.
Hansen, Lars Peter and Thomas J. Sargent. 2001. Robust Control and Model Uncertainty.
American Economic Review 91 (2):6066.
. 2008. Robustness. Princeton, New Jersey: Princeton University Press.
Hansen, Lars Peter, Thomas J. Sargent, and Jr. Tallarini, Thomas D. 1999. Robust Permanent Income and Pricing. The Review of Economic Studies 66 (4):873907.
Hansen, Lars Peter, Thomas J. Sargent, Gauhar A. Turmuhambetova, and Noah Williams.
2006. Robust Control and Model Misspecification. Journal of Economic Theory
128 (1):4590.
James, Matthew R. 1992. Asymptotic analysis of nonlinear stochastic risk-sensitive control
and differential games. Mathematics of Control, Signals and Systems 5 (4):401417.
Newman, C. M. and B. W. Stuck. 1979. Chernoff Bounds for Discriminating between Two
Markov Processes. Stochastics 2 (1-4):139153.
Petersen, I.R., M.R. James, and P. Dupuis. 2000. Minimax optimal control of stochastic
uncertain systems with relative entropy constraints. Automatic Control, IEEE Transactions on 45 (3):398412.

46