Cours Droit Des Affaires s5 Gestion

8759_9789814460378_TP.
indd 1 13/2/18 4:35 PM

b2530 International Strategic Relations and China’s National Security: World at the Crossroads
This page intentionally left blank
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

World Scientific
8759_9789814460378_TP.indd 2 13/2/18 4:35 PM

Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data

Names: Poon, Ser-Huang, author.
Title: Advanced finance theories / Ser-Huang Poon (Manchester University, UK).
Description: New Jersey : World Scientific, [2018]
Identifiers: LCCN 2017044680 | ISBN 9789814460378 (hc : alk. paper)
Subjects: LCSH: Finance. | Finance--Mathematical models.
Classification: LCC HG101 .P66 2018 | DDC 332.01--dc23
LC record available at https://lccn.loc.gov/2017044680
British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.
Copyright © 2018 by World Scientific Publishing Co. Pte. Ltd.

All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.
For any available supplementary material, please visit

http://www.worldscientific.com/worldscibooks/10.1142/8759#t=suppl
Desk Editors: Suraj Kumar/Philly Lim
Typeset by Stallion Press

Email: enquiries@stallionpress.com
Printed in Singapore
Suraj - 8759 - Advanced Finance Theories.indd 1 18-01-18 1:50:29 PM

February 22, 2018 13:11 Advanced Finance Theories 9in x 6in b3091-fm page v
To my students
v
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

February 22, 2018 13:11 Advanced Finance Theories 9in x 6in b3091-fm page vii
Preface
This book provides modern treatments to key areas of finance theo-

ries in Merton’s collection of continuous-time work, namely optimum
consumption and intertemporal portfolio selection, option pricing
theory, corporate finance, complete market general equilibrium and
static analyses of capital market theories. It is a first semester course
in PhD Finance training in business schools, where the emphasis is
placed equally on mathematical rigour as well as economic reason-
ing. Where appropriate, the lecture note is supplemented by other
classical text such as Ingersoll (1987) and materials on stochastic
calculus.
The main features of the book are that it contains:
(i) Complete and explicit exposition of classical finance theories
core to theoretical finance research.
(ii) Modern treatments of some classical derivations.
(iii) Supplementary coverage on related and key publications and
update on more recent finance research questions.
(iv) Detailed proofs and explicit coverage to aid understanding of
first-year PhD students.
(v) List of exercises with suggestion solutions.
This book is suitable for graduates, doctoral students,
researchers, academic and professional in theoretical financial
modelling in mainstream finance, financial economics, corporate
vii
February 22, 2018 13:11 Advanced Finance Theories 9in x 6in b3091-fm page viii
viii Preface
finance, stochastic analysis and differential equations, mathemati-

cal finance/economics, and derivative securities. The primary and
secondary markets for such an advanced text book would include
graduate students reference text, personal copies, and library
collection.
February 22, 2018 13:11 Advanced Finance Theories 9in x 6in b3091-fm page ix
About the Author
Dr Ser-Huang Poon is a Professor of Finance

at the Alliance Manchester Business School and
has held several visiting appointments at uni-
versities in the U.S., Canada, the Netherlands,
Australia and Singapore. She is internationally
renowned for her volatility research. Her work,
with Nobel laureate Clive Granger, was cited on
the Nobel web site as reference reading in volatil-
ity, and has won the Financial Analysts Journal
Graham and Dodd Scroll Award for Excellence for 2005. She has pub-
lished papers in international journals and has written three books.
All proceeds from the royalty derived from this book is to be
contributed to the Nightline Association, United Kingdom to provide
emotional support to students in distress, to reduce mental health
stigma and to raise awareness of maintaining emotional well being.
ix
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

February 22, 2018 13:11 Advanced Finance Theories 9in x 6in b3091-fm page xi
Acknowledgements
I would like to thank the many generations of Finance PhD students

at Alliance Manchester Business School for their many hours of hard
work and interesting discussions at the advanced finance theory class.
This group was later joined by PhD students from Lancaster Univer-
sity. Many students have helped solved the proofs in Merton’s book,
provided solutions to many new questions as well as improved some
existing solutions. I hope they have learned as much as I do from this
class.
I am grateful to Professor Richard Stapleton who introduced me
to Asset Pricing when I was a PhD student. I would like to thank
my colleague, Professor Alexandros Kostakis, for many inspiring dis-
cussions especially those on Cochrane’s work.
xi
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

February 22, 2018 15:11 Advanced Finance Theories 9in x 6in b3091-fm page xiii
Contents
Preface vii
About the Author ix
Acknowledgements xi
Note for PhD Students xix
1 Utility Theory 1
1.1 Risk Aversion and Certainty Equivalent . . . . . . 3
2 Pricing Kernel and Stochastic Discount Factor 5

2.1 Arrow–Debreu State Prices . . . . . . . . . . . . . 5
2.1.1 The pricing kernel, φi . . . . . . . . . . . . 6
2.1.2 Equilibrium model . . . . . . . . . . . . . . 8
2.2 Cochrane Two-period Consumption Problem . . . . 11
2.2.1 Stochastic discount factor . . . . . . . . . . 12
2.2.2 Further notation . . . . . . . . . . . . . . . 13
2.2.3 Risk-free rate . . . . . . . . . . . . . . . . . 13
2.2.4 Risk corrections . . . . . . . . . . . . . . . 15
2.2.5 Idiosyncratic risk does not affect prices . . 16
2.3 Expected Return-Beta Representation . . . . . . . 17
xiii
February 22, 2018 15:11 Advanced Finance Theories 9in x 6in b3091-fm page xiv
xiv Contents
3 Risk Measures 19
3.1 One-period Portfolio Selection . . . . . . . . . . . . 19
3.2 Rothschild and Stiglitz “Strict” Risk Aversion . . . 21
3.2.1 Efficient portfolio . . . . . . . . . . . . . . . 22
3.2.2 Portfolio analysis . . . . . . . . . . . . . . . 23
3.3 Merton’s Risk Measures . . . . . . . . . . . . . . . 26
3.3.1 Properties of Merton’s risk measure bp . . . 29
3.3.2 Relationship between bp and conditional
expected return E[Zp |Ze ] . . . . . . . . . . 33
3.3.3 Discussion . . . . . . . . . . . . . . . . . . . 35
Exercises: Capital Market Theory, Risk Measures . . . . 38
4 Consumption and Portfolio Selection 39

4.1 Basic Set-up . . . . . . . . . . . . . . . . . . . . . . 39
4.2 One Risky and One Risk-Free Asset . . . . . . . . . 41
4.2.1 The Bellman equation . . . . . . . . . . . . 41
4.2.2 Infinite time horizon . . . . . . . . . . . . . 44
4.3 Constant Relative Risk Aversion . . . . . . . . . . 45
4.3.1 Solution for J . . . . . . . . . . . . . . . . . 47
4.3.2 Solution for C and w . . . . . . . . . . . . . 49
4.3.3 Economic interpretation . . . . . . . . . . . 50
4.4 Constant Absolute Risk Aversion . . . . . . . . . . 51
4.4.1 Solve for J . . . . . . . . . . . . . . . . . . 51
4.4.2 Solve for C* and w* . . . . . . . . . . . . . 53
4.5 Hyperbolic Absolute Risk Aversion (HARA) . . . . 54
4.5.1 Relationship with CRRA and CARA . . . . 54
4.5.2 Portfolio choice . . . . . . . . . . . . . . . . 55
4.5.3 Solution for J . . . . . . . . . . . . . . . . . 56
4.5.4 Solve for C* and w* . . . . . . . . . . . . . 58
4.6 Optimal Rules Under Finite Horizon . . . . . . . . 59
4.6.1 CRRA with finite horizon . . . . . . . . . . 61
4.6.2 CARA with finite horizon . . . . . . . . . . 61
Exercises: Intertemporal Portfolio Section . . . . . . . . . 63
February 22, 2018 15:11 Advanced Finance Theories 9in x 6in b3091-fm page xv
Contents xv
5 Optimum Demand and Mutual Fund Theorem 65

5.1 Asset Dynamics and the Budget Equation . . . . . 65
5.2 The Equation of Optimality . . . . . . . . . . . . . 66
5.3 Optimal Investment Weight and Special Cases . . . 68
5.3.1 No risk-free asset . . . . . . . . . . . . . . . 69
5.3.2 GBM and risk-free rate . . . . . . . . . . . 71
5.4 Lognormality and Mutual Fund Theorem . . . . . . 73
5.4.1 “Separation” or “mutual-fund” theorem . . 73
5.4.2 Key assumptions and uniqueness . . . . . . 75
5.4.3 Tobin–Markowitz separation theorem . . . 79
Exercises: Optimum Demand and Mutual Fund
Separation . . . . . . . . . . . . . . . . . . . . . . . 82
6 Mean–Variance Frontier 83
6.1 Mean–Variance Frontier . . . . . . . . . . . . . . . 83
6.1.1 The Sharpe ratio . . . . . . . . . . . . . . . 85
6.1.2 Calculating the mean–variance frontier . . . 86
6.1.3 Decomposing the mean–variance frontier . . 89
6.1.4 Spanning the frontier . . . . . . . . . . . . 92
6.1.5 Hansen–Jagannathan bounds . . . . . . . . 93
7 Solving Black–Scholes with Fourier Transform 95

7.1 Option Pricing with Fourier Transform . . . . . . . 95
7.1.1 Black–Scholes hedge portfolio . . . . . . . . 96
7.2 Black–Scholes Fundamental PDE . . . . . . . . . . 96
7.2.1 Fourier transform . . . . . . . . . . . . . . . 97
7.2.2 Solution through transform method . . . . 98
8 Capital Structure Theory 101

8.1 Objective Function for the Firm . . . . . . . . . . . 101
8.2 Partial Equilibrium One-period Model . . . . . . . 103
8.2.1 Pricing kernel . . . . . . . . . . . . . . . . . 103
8.2.2 Probability-cum-utility function . . . . . . 105
8.2.3 m assets . . . . . . . . . . . . . . . . . . . . 105
February 22, 2018 15:11 Advanced Finance Theories 9in x 6in b3091-fm page xvi
xvi Contents
8.2.4 Introducing the concept of dQ . . . . . . . 107

8.2.5 What is eητ ? . . . . . . . . . . . . . . . . . 107
8.3 Payoff of Risky Debt . . . . . . . . . . . . . . . . . 108
8.4 Pricing Risky Debt . . . . . . . . . . . . . . . . . . 111
8.4.1 Solving the FPDE . . . . . . . . . . . . . . 112
8.5 Price of a Warrant . . . . . . . . . . . . . . . . . . 114
8.6 Convertible Bond . . . . . . . . . . . . . . . . . . . 116
8.6.1 Reverse convertible . . . . . . . . . . . . . . 117
8.6.2 Call option enhanced reverse convertible . . 118
8.6.3 Policy implications . . . . . . . . . . . . . . 118
8.7 Bankruptcy Cost and Tax Benefit . . . . . . . . . . 120
8.7.1 Solution under time invariant . . . . . . . . 120
8.7.2 Protected debt covenant . . . . . . . . . . . 121
8.7.3 Optimal capital structure . . . . . . . . . . 122
8.8 Deposit Insurance . . . . . . . . . . . . . . . . . . . 126
Exercises: Capital Structure Theory . . . . . . . . . . . . 128
9 General Equilibrium 129

9.1 Firms and Securities . . . . . . . . . . . . . . . . . 129
9.2 Individuals . . . . . . . . . . . . . . . . . . . . . . . 130
9.3 Aggregate Demand . . . . . . . . . . . . . . . . . . 131
9.4 Market Portfolio . . . . . . . . . . . . . . . . . . . 132
9.5 Security Market Line . . . . . . . . . . . . . . . . . 134
9.6 Three-fund Separation . . . . . . . . . . . . . . . . 135
9.7 Empirical Application of CAPM . . . . . . . . . . . 136
Exercises: General Equilibrium . . . . . . . . . . . . . . . 138
10 Discontinuity in Continuous Time 141

10.1 Counting and Marked Point Process . . . . . . . . 141
10.2 Poisson Process . . . . . . . . . . . . . . . . . . . . 142
10.3 Constant Jump Size . . . . . . . . . . . . . . . . . 145
10.3.1 Fundamental PDE with constant jump
size . . . . . . . . . . . . . . . . . . . . . . 146
10.3.2 Market price of jump risk . . . . . . . . . . 149
10.3.3 European call price . . . . . . . . . . . . . . 150
10.3.4 Immediate ruin . . . . . . . . . . . . . . . . 151
February 22, 2018 15:11 Advanced Finance Theories 9in x 6in b3091-fm page xvii
Contents xvii
10.4 Random Jump Size . . . . . . . . . . . . . . . . . . 152

10.4.1 When J has a lognormal distribution . . . 153
10.5 Intertemporal Portfolio Selection with Jumps . . . 154
10.5.1 Portfolio selection . . . . . . . . . . . . . . 156
10.5.2 Stock markets systemic and idiosyncratic
risk . . . . . . . . . . . . . . . . . . . . . . 158
Exercises: Discontinuity in Continuous Time . . . . . . . 160
11 Spanning and Capital Market Theories 163

11.1 Necessary Conditions for Non-trivial Spanning . . . 164
11.2 Efficient Portfolio and Spanning . . . . . . . . . . . 167
11.3 Market Portfolio Spanning and CAPM . . . . . . . 176
11.4 Arbitrage Pricing Theory (APT) . . . . . . . . . . 183
11.5 Modigliani–Miller Hypothesis . . . . . . . . . . . . 184
11.6 Comment on Spanning . . . . . . . . . . . . . . . . 189
11.7 HARA . . . . . . . . . . . . . . . . . . . . . . . . . 190
Exercises: Spanning & Capital Market Theories . . . . . 192
Bibliography 193
Calculus Notes 195
Index 203
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

February 22, 2018 13:11 Advanced Finance Theories 9in x 6in b3091-fm page xix
Note for PhD Students
The core of the Advanced Finance Theory class covers three main
finance areas in Merton’s (1990) book (with chapter reference below)
• Asset pricing (Chapters 2, 4, 5)
• Option pricing (Chapters 7, 8, 9)
• Capital structure (Chapters 11, 12, 13)
If there is time, one could touch briefly on Intertemporal CAPM
(Chapter 15) and Complete Markets General Equilibrium Theory
(Chapter 16) which are extensions of Asset Pricing.
Chapter 3 is on Ito’s lemma. It is included in the supplementary
materials together with other mathematical tools such as stochastic
calculus. These are basic tools in continuous time mathematics that
all graduate students should master them well in order to tackle the
problems in the core chapters.
Before we start, here are some health warnings from Merton:
“. . . the foundation of modern finance theory rests on the perfect-market
paradigm of rational behavior and frictionless, competitive, and infor-
mationally efficient capital markets. With its further assumption of con-
tinuous trading, the base of our theory should perhaps be labeled the
super perfect-market paradigm. The conditions of this paradigm are not,
of course, literally satisfied in the real world. Furthermore, its accuracy
as a useful approximation to that world varies considerably across time
and place. The practitioner should therefore apply the continuous-time
theory only tentatively, assessing its limitations in each application. Just
so, the researcher should treat it as a point of departure for both problem
finding and problem solving.”
xix
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

February 22, 2018 13:10 Advanced Finance Theories 9in x 6in b3091-ch01 page 1
Chapter 1
Utility Theory
This chapter derives asset prices in a one-period model. We derive

a version of the Capital Asset Pricing Model (CAPM) using a com-
plete market, state-contingent claims approach. We define the for-
ward pricing kernel and then use the assumption of joint normality
of the cash flows and Stein’s lemma to establish the CAPM. We then
derive the pricing kernel in an equilibrium representative investor
model. But first, we need to understand a few properties of utility
function.
A common utility function we use in economics/finance is the
power utility. Its functional form is:
W 1−γ − 1 W 1−γ
U (W ) = 1−γ
or U (W ) = 1−γ
with γ = 1.
This may seem a strange choice for a utility functional form, but it
is actually a very clever one. The Arrow–Pratt measures of (absolute
and relative) risk aversion (RA) are
U (W )
ARA = −
U (W )
and
U (W )
RRA = − W.
U (W )
1
2 Advanced Finance Theories
By the assumption of a risk averse investor, U (W ) is increasing

and strictly concave
U (W ) > 0, U (W ) < 0 and A(W ) > 0.

The inverse of RA − UU(W )
(W ) is also known as risk tolerance.
1
Using the power utility function, we get U (W ) = W −γ and

U (W ) = −γW −γ−1 . Therefore, the Arrow–Pratt measure of Rela-

tive Risk Aversion (RRA) under power utility is RRA = γ. If γ > 0,

then the agent is risk averse. If γ < 0, we would call her risk seeker
(or lover). To satisfy the second common assumption of concavity, we
need γ > 0. In other words, power utility function with γ > 0 refers
to an investor with RRA that is independent of her level of wealth,
which is why it is also called the constant RRA utility function.
In the case where γ = 1 we get a special utility function, called the
logarithmic function, U (W ) = log(W ). You can see that by taking
the limit
W 1−γ − 1
lim U (W ) = lim
γ→1 γ→1 1−γ
(−1)W 1−γ log(W )
= lim
γ→1 −1
= log(W ),
after applying l’ Hôpital’s rule. Essentially, log utility function is a

CRRA utility function with RRA = 1.
Another commonly used utility function is the negative exponen-
tial utility
exp(−ηW )
U (W ) = − , η = 0,
η

U (W ) = exp(−ηW ),

U (W ) = −η exp(−ηW ),
1
Indeed, with the advances of research, we now know that these lower order of
risk preference measures are not sufficient in distinguishing risks represented by
higher moments of the risky return distribution. But we will confine our scope
here to the classical analyses only omitting e.g. skewness preference.
Utility Theory 3
so ARA = η and RRA = ηW . This is why this utility function is

called the Constant Absolute Relative Risk Aversion (CARA) util-
ity function. For an investor to be risk averse, we would require
η > 0.
Finally, a linear utility function of the form U (W ) = a + bW ,
corresponds to a risk-neutral investor. Why? Because U (W ) = b and
U (W ) = 0. In other words, the function is not concave (obviously,
since it is linear in W ) and the Arrow–Pratt measures of risk aversion
are ARA = RRA = 0.
1.1 Risk Aversion and Certainty Equivalent

For a given utility function U (·) and uncertain terminal wealth W ,
we can write W in terms of its certainty equivalent Wc as follows:
Wc ≡ U −1 {E[U (W )]}. (1.1)
The term “risk averse” as applied to investors with strictly con-
cave utility functions is descriptive in the sense that the certainty-
equivalent end-of-period wealth is always less than the expected value
E(W ) of the associated portfolio for all such investors. The proof
follows from Jensen’s inequality: if U is strictly concave, then
E[U (W )] < U [E(W )],
Wc < E[W ].
The smaller the Wc , the more risk averse is the investor.
U(W)
U(E[W])
E[U(W)]
W
WC E[W]
An investor is said to be more risk averse than a second investor

if, for every portfolio, the certainty-equivalent end-of-period wealth
for the first investor is less than or equal to the certainty equivalent
end-of-period wealth associated with the same portfolio for the sec-
ond investor. This statement is always true disregarding the shape
of the risky return distribution and the order of risk preference.
Chapter 2
Pricing Kernel and Stochastic

Discount Factor
2.1 Arrow–Debreu State Prices

We assume that there are a finite number of states of the world at
time t + T , indexed by i = 1, 2, . . . , I, each with a positive probability
of occurring. Let pi be the probability of state i occurring. A state-
contingent claim on state i is defined as a security which pays $1 if
and only if state i occurs.
We now assume that markets are complete. Specifically, we
assume that it is possible to buy a state-contingent claim with a
forward price qi for state i. In complete markets, the qi prices exist,
for all states i. It follows that an asset j, which has a time t + T
payoff xj,t+T,i in state i, has a forward price

Fj,t,t+T = qi xj,t+T,i . (2.1)
i
For simplicity, when there is no ambiguity, we drop the time sub-

scripts and write Fj = i (qi xj,i ).
Since qi represents the price of a claim which pays $1 if a state
with positive probability occurs, it is a claim with positive utility and
thus must have a positive price, i.e. qi > 0. Moreover, the state prices

sum to 1, i.e. i qi = 1. To prove this i qi = 1, we use the relation
in Eq. (2.1). If xj is a certain cash flow, for example the payoff on a
5
zero-coupon bond, xi = $1 for all i. In this case, the forward price

must be equal to $1, which means from (2.1) that

F = 1 · qi = 1
i
A set {qi } which is positive and sums to unity is a “probability”

measure. Note that it is similar in many respects to the set of prob-
abilities {pi } which is also positive and sums to unity. Hence, qi is
often referred to as the risk-neutral measure.
So far, we have defined the state space as the product of the states
of all the individual firms in the economy. We now simplify the state
space, defining the states of the world by different outcomes of xm ,
the aggregate market cash flow using the concept of a pricing kernel.
2.1.1 The pricing kernel, φi

The pricing kernel, φ, is defined by
qi
φi = ,
pi
i.e. it is the forward price of a state-contingent claim relative to the

probability of the state occurring. It is sometimes referred to as the
“probability deflated” state price. Note that the pricing kernel here
is more precisely described as the “forward pricing kernel”, since qi
is the forward state price.
Since pi > 0 and qi > 0, this means the pricing kernel φi is
a positive function. Moreover, E (φ) = 1. This follows immediately
from the fact that the sum of the state prices is 1. We have
qi
E (φ) = pi · φi = pi · = qi = 1.
pi
i i i
The pricing kernel is often stated as a function of the aggregate

cash flow in the economy, i.e. φ = φ(xm ) (see Fig. 2.1). Given our
definition of the pricing kernel, we find, rewriting Eq. (2.1), that the
Pricing Kernel and Stochastic Discount Factor 7
φ(xm)
xm
Figure 2.1: The pricing kernel.
forward price of asset j is

Fj = xj,i qi = pi [φ(xm,i )xj,i ] = E [φ(xm )xj ] . (2.2)
i i
It follows that the case where φi = 1, for all i, is of particular

significance. In this case we would have

Fj = qi xj,i = pi xj,i = E (xj ) .
i i
Here, the forward price equals the expected value of the cash flow.
This occurs if the cash flow can be priced under the assumption of
risk neutrality. Hence the case where φi = 1, for all i, equates to the
case of risk neutrality.
In order to appreciate the importance of the pricing kernel, con-
sider the following expansion of Eq. (2.2). Using the definition of
covariance, the forward price is
Fj = E [xj φ(xm )]
= E [φ(xm )] E (xj ) + Cov [φ(xm ), xj ]
and given that E[φ(xm )] = 1, we have
Fj = E (xj ) + Cov [φ(xm ), xj ] .
It follows that the behaviour of φ, in particular its covariance with

the cash flow xj , determines the risk premium for the asset, which is
represented by the excess of the expected value of the cash flow over
its forward price. In most cases, it turns out that φ(xm ) is negatively
correlated with xj , in which case the risk premium is positive.
2.1.2 Equilibrium model

Assume that the market acts as if there is just one investor with
“average” characteristics. This is often referred to as the “represen-
tative agent” assumption. Let wt+T,i be the total wealth in state i
at time t + T . Assume also that the initial wealth is wt at time t in
the form of cash. The investor can purchase state-contingent claims
which pay $1, if and only if state i occurs at time t + T . The price
of the claims are qi for i = 1, 2, . . . , I. The investor’s problem is to
choose a set of state-contingent claims paying wt+T,i , given wt .
The investor maximises the expected value of a utility function
u(wt+T ):

max E [u (wt+T )] = pi u(wt+T,i )
wt+T,i
i
subject to

wt+T,i qi Bt,t+T = wt . (2.3)
i
The utility function has the properties u > 0 (non-satiation) and

u< 0 (risk aversion). The first property follows from more basic
assumptions of rational choice.1 The second property guarantees that
1
This follows from the Von Neuman–Morgenstern expected utility theorem, see
Fama and Miller (1972). Basically, it states that if the investor behaves accord-
ing to five axioms of choice under uncertainty, then maximising expected utility
should always lead to maximising utility and hence an optimal investment choice.
The five axioms govern the comparability, transitivity, independence, certainty
equivalence and ranking of choices.
satisfying the first-order condition leads to an optimal and unique

solution. Note that the discount factor, Bt,t+T , enters the budget
constraint because the qi are forward prices, whereas the given cash
wealth wt is a time t allocation.
We solve the optimization problem by forming the Lagrangian:

−1
L= pi u(wt+T,i ) − λ wt Bt,t+T − qi wt+T,i .
i i
Then the first-order condition for a maximum is

∂L
= pi u (wt+T,i ) − qi λ = 0. (2.4)
∂wt+T,i
Summing Eq. (2.4) over the states i, we then find

pi u (wt+T,i ) = λ qi
i i
or
E[u (wt+T )] = λ,

since i qi = 1. Now, substituting for λ in (2.4), the first-order
condition becomes
pi u (wt+T,i )
= qi ,
E [u (wt+T )]
or
qi u (wt+T,i )
φi = = .
pi E [u (wt+T )]
In this model, a condition for the investor’s expected utility to

be maximised is that the pricing kernel equals the ratio of marginal
utility in a state to the expected marginal utility. To complete the
model, we need to determine the investor’s wealth at time t + T ,
in each state. However, in equilibrium the single investor’s demand
for state-contingent claims must equal the available supply. Hence,
wt+T,i must equal xm,i , the aggregate market cash flow in state i.
Substituting in the expression for the pricing kernel, we conclude

that
u (xm,i )
φi = ,
E [u (xm,i )]
for all i. Hence we have
φ = φ(xm ),
as assumed earlier in the chapter. Since utility is a positive function

of xm and we may assume u (xm ) < 0, it follows that φ(xm ) is a
declining function of xm as shown in Fig. 2.1.
We noted above that the set of forward state prices {qi } is a
probability measure. In the literature, it is often referred to as the
Equivalent Martingale Measure, or simply EMM.
Let P = {pi } and Q = {qi } be two probability measures. P and
Q are equivalent if qi > 0 if and only if pi > 0. Let E P (.) and E Q (.)
be expectations under probability measures, P and Q, respectively.
From Eq. (2.1), dropping the j subscript

F = qi xi = E Q (x)
i

= pi (φi xi ) = E P (φx) .
i
Now rewrite the forward price, F , as Ft,t+T . Also, note that the time-
t + T spot price, x, can be expressed as Ft+T,t+T . This is because the
forward price at t + T for immediate delivery is simply the spot price
at t + T . Hence
Ft,t+T = E Q (Ft+T,t+T ) . (2.5)
If such a relationship holds, the variable is said to have the Martingale

property, and Q is therefore referred to as the equivalent martingale
measure (EMM). In the literature, E Q is often used loosely as the
risk-neutral measure since it has the same property that the true
measure would have under risk neutrality.
2.2 Cochrane Two-period Consumption Problem

The simple pricing kernel concept based on aggregate cash flows in
the previous section can also be presented in Cochrane’s framework
for the stochastic discount factor. In the two-period world, terminal
cash flows and terminal consumption are the same.
In Cochrane’s two-period problem, the endowment in the two
periods are et and et+1 . The investor consumes ct in the first period,
and saves the remainder in ξ units of risky asset at price pt
et = ct + pt ξ
or by rearrangement
ct = et − pt ξ.
In the second period, the investor consumes endowment et+1 , plus

the payoff of the risky asset investment at time t
ct+1 = et+1 + xt+1 ξ.
In this setting, the agent maximises the following objective function
max U (ct ) + βEt [U (ct+1 )],

ξ
with β < 1, i.e. utility from future consumption will be discounted

as it does not count as much as utility from current consumption.
Substituting the constraints into the objective function, we get
max U (et − pt ξ) + βEt [U (et+1 + xt+1 ξ)]

ξ
and under the first-order condition,
U (et − pt ξ)(−pt ) + βEt [U (et+1 + xt+1 ξ)xt+1 ] = 0

βEt [U (et+1 + xt+1 ξ)xt+1 ] = U (et − pt ξ)pt
βEt [U (ct+1 )xt+1 ] = U (ct )pt (2.6)

U (ct+1 )
pt = Et β xt+1 . (2.7)
U (ct )
Equation (2.7), which is the central asset pricing formula, states the
price pt of the risky asset given the payoff xt+1 and the optimal
consumption levels ct and ct+1 .
2.2.1 Stochastic discount factor

Cochrane defines the stochastic discount factor (SDF)2
U (ct+1 )
mt+1 ≡ β
U (ct )
such that
pt = Et (mt+1 xt+1 ). (2.8)
Let Rf ≡ 1 + r f is the gross risk-free rate, and R > Rf is the risk-
adjusted discount factor of risky asset cash flow xt+1
1
Et (xt+1 ).
pt =
R
Instead of asset specific discount rate, the stochastic discount factor
mt+1 R, can be applied to all assets via the covariance between the
random components of mt+1 and xt+1 . From (2.8),
p = Cov(m, x) + E(m)E(x).
Risk correction to asset prices is driven by the covariance of asset
payoffs with marginal utility of consumption. The rationale is that
other things equal, investors dislike any risky asset that does badly
in bad states of nature (e.g. recession) when consumption is low and
marginal utility is high (and hence m is high). Such an asset with
negative covariance between x and m should sell for a lower price.
On the other hand, investors would not mind as much if a risky
asset does badly in good states of the economy (e.g. boom), where
consumption is high and marginal utility is low (and hence m is low).
Hence, one would be ready to pay a high price (i.e. accept a lower
2
The stochastic discount factor is also known as the marginal rate of substitu-
tion, because it gives us the rate at which the investor is willing to substitute
consumption at time t + 1 for consumption at time t. It is the same as the pricing
kernel or state-price density in the previous section.
risk premium) for an asset that does well in bad states of the world,
because it would yield the extra payoff exactly when it is needed
most, i.e. in bad states when wages, endowments, etc. are low and
marginal utility is high (i.e. positive correlation between m and x).
To sum up, according to asset pricing theory, the riskiness of an asset
does not depend on its variance, but on its co-variance with marginal
utility (of consumption or wealth).
2.2.2 Further notation

We are using the capital letter Rt+1 to denote the gross return, while
we use the small letter rt+1 to denote the net return. So for an asset
with price pt and next-period payoff xt+1 = pt+1 + dt+1 , the gross
return is Rt+1 ≡ xt+1 xt+1 pt
pt = pt − pt + 1 = 1 +
xt+1 −pt
pt = 1+ rt+1 . For
continuously compounded returns, r = ln(R).
Given this notation, an insightful alternative expression of the
central asset pricing expression pt = E(mt+1 xt+1 ) is
1 = E(mt+1 Rt+1 )
The excess returns are Re ≡ R − Rf where Rf ≡ 1 + r f .
2.2.3 Risk-free rate

A risk-free asset is an asset that has price 1 at t and pays next period
Rf = 1 + r f . Substituting into our basic pricing equation, we get
1 = E(mRf ) = E(m)Rf ,
1
Rf = . (2.9)
E(m)
Similarly, the price of excess returns is zero because

p (Re ) = E (mRe ) = E m R − Rf = E (mR) − Rf E(m) = 0.

For power utility function and write mt+1 ≡ β UU(c (ct+1
t)
)
, the last
expression in (2.9) becomes

f U (ct ) c−γ
t 1 ct+1 γ
R = = −γ = . (2.10)
βU (ct+1 ) βct+1 β ct
Interest rate is high when people are impatient (β low) or when

consumption growth ( ct+1ct ) is high. A high interest rate motivates
investors to consume less today in order to save, earn the high
rate and consume more tomorrow. Interest rate is more sensitive
to consumption growth as the degree of investors’ risk aversion (γ)
increases. A risk-averse investor with a higher γ wants to maintain a
smooth consumption path, i.e. he is less willing to re-arrange his con-
sumption over time as a response to interest rate incentives. Hence,
large interest rate changes are necessary to convince him to consume
less today in order to have more tomorrow.
From Eq. (2.9), we have for a power utility investor,
1 1 1
Rf = = = −γ . (2.11)
E(m) U (c )
Et β U (ct+1
t) βEt ct+1
ct
Let us now assume that consumption growth ( ct+1 ct ) is lognormally

distributed, and Δ ln ct+1 = ln ct+1 −ln ct . We can see that Δ ln ct+1 =
ln( ct+1 ct+1
ct ) is normally distributed since ct is lognormally distributed.
3
In particular, we get

ct+1 −γ ct+1
ln = −γ ln = −γΔ ln ct+1
ct ct

∼ N −γE (Δ ln ct+1 ) , γ 2 σt2 (Δ ln ct+1 ) .

−γ
ct+1 2 /2
)σt2 (Δ ln ct+1 )
Et = e−γEt (Δ ln ct+1 )+(γ (2.12)
ct
For r f = ln Rf , β ≡ e−δ , the equality in (2.11) becomes:

1
Rf = 2 2
e−δ e−γEt (Δ ln ct+1 )+(γ /2)σt (Δ ln ct+1 )
2 /2)σ 2 (Δ ln c
r f = − ln[e−δ e−γEt (Δ ln ct+1 )+(γ t t+1 )
]
γ2 2
= δ + γEt (Δ ln ct+1 ) − σ (Δ ln ct+1 ) . (2.13)
2 t
3
If x is lognormally distributed, then log x is normally distributed with a mean
μ and variance σ 2 , i.e. log x ∼ N (μ, σ 2 ). The expected value of x is equal to
1 2 2 2
E(x) = eμ+ 2 σ and its variance is equal to Var(x) = (eσ − 1)e2μ+σ .
Equation (2.13) indicates that the more volatile consumption is,

risk-averse investors want to save more because they are afraid of
bad consumption states tomorrow (which become more likely due
to volatility), and hence they drive interest rates down. This fea-
ture captures the precautionary savings motive of the power utility
investor.
2.2.4 Risk corrections

From
E(x)
p = Cov(m, x) + E(m)E(x) = + Cov(m, x). (2.14)
Rf
Using the definition of the SDF m, we get
E(x) Cov[βU (ct+1 ), xt+1 ]
p= + . (2.15)
Rf U (ct )
Since marginal utility U (ct+1 ) is high when consumption ct+1
is low, if the payoffs of an asset x comove positively with marginal
utility (i.e. negatively with consumption), such an asset should have a
higher price. Insurance products are good examples. Their expected
value may be even negative, but they pay out in very bad states
(fire, theft, death, natural disasters, etc.) exactly when the marginal
utility is extremely high, and hence their price according to (2.15)
could be very high. On the other hand, assets whose payoffs covary
negatively with marginal utility (i.e. positively with consumption)
should be traded at a lower price.
The previous arguments underlie the basic concept that the risk-
averse investor cares about the variance of his consumption. As a
result, he takes into account the covariance of the asset’s payoff with
his consumption. The investor does not necessarily care about the
volatility of his assets or portfolio if he can keep a smooth consump-
tion stream.
We can also present the same concept in terms of returns. Starting
from
1
1 = E(mR) = Cov(m, R) + E(m)E(R) = Cov(m, R) + f E(R)
R
E(R) − Rf = −Rf Cov(m, R).
Hence expected returns of an asset should be equal to the risk-

free rate plus a risk-adjustment −Rf Cov(m, R), which depends on
the covariance of the returns with the SDF.
Since R = 1 + r, we can equivalently write
E(r) − r f = −(1 + r f )Cov(m, r).
2.2.5 Idiosyncratic risk does not affect prices

Equation (2.14) says that if Cov(m, x) = 0 then the risk-adjustment
to the price of the asset is zero. So, p = E(x)
Rf
. Moreover, the expected
return of the asset is equal to the risk-free rate. This means that
even if the payoff of the asset is volatile, it still pays no extra return
and no risk adjustment is necessary if this payoff is uncorrelated with
the SDF.
More generally, a fundamental statement in asset pricing is that
only exposure to systematic risk is compensated, no compensation is
received for holding idiosyncratic risk. We can decompose any payoff
x into a part correlated with the discount factor m and an idiosyn-
cratic part uncorrelated with the SDF, via running a regression of x
on m. The projection of x on m
E(mx)
proj(x | m) = m,
E(m2 )
is the part of x which is perfectly correlated with m equal to the fitted
value x of the linear regression x = βm + ε. So decomposing
= βm
the payoff x, we get
x = proj(x | m) + ε,
where ε is the residual of the regression, i.e. the uncorrelated (orthog-

onal) part with the SDF m.
The price of the projection, proj(x | m) is given by

E(mx)
p [proj(x | m)] = p m
E(m2 )

E(mx)
=E m m
E(m2 )
E(mx)
= E(m2 )
E(m2 )
= E(mx) = p(x).
Therefore, the price of the idiosyncratic component of the payoff
is zero, i.e. p(ε) = p(x) − p[proj(x | m)] = 0.
2.3 Expected Return-Beta Representation

From
p = E(mx)
E(R) − Rf = −Rf Cov(m, R)
Cov(m, R)
=− .
E(m)
Multiplying and dividing the RHS by Var(m), we get:

Cov(m, R) Var(m)
E(R) = Rf + − (2.16)
Var(m) E(m)
Cov(m,R)
Defining β ≡ Var(m) , because this is the regression coefficient
of R on m, and λm ≡ − Var(m)
E(m) , we can rewrite Eq. (2.16) as a beta
pricing model:
E(R) = Rf + βλm .
Where expected return is proportional to the consumption beta, β.
While β should be different for each asset, λm is the same for all
assets and depends mainly on the volatility of the stochastic discount
factor. λm is called the price of risk, while β is called the quantity of
risk.
Hence, an asset that has a higher consumption beta (higher
covariance with consumption growth) dictates a higher expected
return (premium).
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

Chapter 3
Risk Measures
This chapter and Chapter 11 later are based on Merton (1990, Chap-
ter 2) which is supposed to be an introductory chapter. However,
there are many very important concepts of risk, risk measures and
mutual fund theorems that are key to finance theories that deserve
careful and detailed coverage to facilitate the development of new
finance theories. Hence, it is now separated into two chapters. This
chapter covers the concept of risk and riskiness following Rothschild
and Stiglitz (1970, 1971) and those by Merton (1990). The Merton’s
risk measure is for an individual utility function but has proper-
ties closely resemblance the CAPM beta in the general equilibrium
setting. This whole area of work tends to focus only on the first
two moments of risky returns distribution which is rather restric-
tive in the modern context. Rothschild and Stiglitz’s risk concept
is very loosely defined using utility and is always valid disregard-
ing the shape of the risky returns distribution. Though not discussed
here, Rothschild and Stiglitz’s risk concept has now been extended to
include higher order of risk preference such as prudence, cautiousness
and downside risk aversion. In this chapter and Chapter 11, invest-
ment and asset pricing are evaluated in a static one-period framework
without consumption.
3.1 One-period Portfolio Selection

In the one-period setting, the consumption-saving decision can be
taken as given such that the portfolio choice can be analysed
19
independently from the consumption decision. As such, the

portfolio-selection problem can be formulated as
n n

max E U wj Zj W0 s.t. wj = 1,
{w1 ,...,wn }
1 j=1
where wj is the fraction of wealth invested in asset j, Zj is the return

on asset j, and W0 is the initial wealth.
Next, we set up the Lagrange equation. For the ease of exposition,
we assume for now that W0 = $1
⎡ ⎛ ⎞⎤ ⎡ ⎤
n n
L = E ⎣U ⎝ wj Zj ⎠⎦ + λ ⎣1 − wj ⎦ ,
j=1 j=1
where λ is the Lagrange multiplier. From the first-order condition

(f.o.c.), we get
⎡ ⎛ ⎞ ⎤
n

∂L
= E ⎣U ⎝ wj Zj ⎠ · Zi ⎦ − λ = 0,
∂wi
j=1
⎡ ⎛ ⎞ ⎤
n
λ = E ⎣U ⎝ wj Zj ⎠ · Zi ⎦ . (3.1)
j=1
Let wi∗ ≡ (w1∗ , w2∗ , . . . , wn∗ ) be the solution set satisfying f.o.c., and
hence the return on the optimal portfolio, Z ∗ , can be written as
n

Z∗ ≡ wj∗ Zj .
j=1
Then from (3.1),
E[U (Z ∗ ) · Zi ] = λ for i = 1, . . . , n.
If a riskless security is added to the set of available securities,

then the objective function is
⎧ ⎡⎛ ⎞ ⎤⎫
⎨ n
⎬
max L = E U ⎣⎝ wj (Zj − R)⎠ + R⎦ .
{w1 ,...,wn } ⎩ ⎭
j=1
Risk Measures 21
n
Here, j=1 wj = 1 is not binding. Instead we have wn+1 +
n
w
j=1 j = 1 assuming that there is no limit on borrowing and lend-
ing, and wn+1 is the amount borrowed or lent. Here, the prob-
lem reduces to solving for the portfolio of risky assets, i.e. solve
for wj for j = 1, . . . , n, and wn+1 is treated as a residual value with

wn+1 = 1 − nj=1 wj . This is done by optimizing the objective func-
tion and the f.o.c. implies
∂L
= E[U (Z ∗ )(Zi − R)] = 0, (3.2)
∂wi
where
n

Z∗ = wj∗ (Zj − R) + R,
j=1
is the optimal portfolio for individuals with utility function U . Note

that so far, we have not made any assumption on preference or on
asset return distribution other than requiring that all the moments of
the asset return distributions are well defined. This rules out stable
distributions with scaling parameter smaller than 2; expected utility
could be infinite if the scaling parameter is less than 2.
3.2 Rothschild and Stiglitz “Strict” Risk Aversion

Here, Rothschild and Stiglitz define “increasing risk” so that the
“riskiness” of two securities or portfolios can be compared. For two
portfolios with the same mean returns, i.e. W 1 = W 2 , the first port-
folio with random outcome denoted by W1 is said to be less risky than
the second portfolio with random variable outcome denoted by W2 if,
for all concave utility function U ,
E[U (W1 )] ≥ E[U (W2 )]. (3.3)
FSD
(i) Relationship (3.3) is true if W1 > W2 , where F SD denote
first-order stochastic dominance.
(ii) If random variable
W2 = W1 + noise,
then Eq. (3.3) is again true, provided that noise and W1 are not
(negatively) correlated.
A(y) A(y)
W1
W2
y y
a b a b
Figure 3.1: The left graph denotes second-order stochastic dominance, whereas,
the right graph denotes first-order stochastic dominance. A(y) denotes the cumu-
lative density function of wealth (i.e. F (x) and G(x)).
(iii) (Second-order stochastic dominance) If W2 has more weights

in its tails, then Eq. (3.3) is true also. That is, for the closed
interval [a, b]
y
F (x) ≡ W1 (x)dx,
a
y
G(x) ≡ W2 (x)dx,
a
y
T (y) ≡ [G(x) − F (x)]dx ≥ 0, given T (b) = 0.
a
These relationships are presented in Fig. 3.1.
3.2.1 Efficient portfolio

A feasible1 portfolio with random return is an efficient portfolio Ze
if there exists an increasing strictly concave utility function U such
that
(i) Ze is optimal and the optimality condition below is satisfied:
E[U (Ze )(Zj − R)] = 0.
1
A feasible portfolio has a set of portfolio weights that satisfy and, at the same
time, are constrained by the market supplies of securities.
Risk Measures 23
(ii) There is no other feasible portfolio that is less risky than Ze in

Rothschild and Stiglitz’s sense according to Eq. (3.3).
Note that Rothschild and Stiglitz’s definition of efficient portfolio
above is utility dependent; a different concave utility function will
lead to a different efficient portfolio. If the optimality condition is
satisfied for a portfolio, then any other portfolios with the same mean
return cannot be optimal also if they produce lower expected util-
ity. Hence, we can conclude that optimal portfolio must be efficient.
This, we can consider as the birth of the concept of efficient frontier.
However, it is worth noting that, even for a given utility function,
optimal portfolios need not be unique. If we restrict our discussion to
the first two moments of returns, then the solution is unique only if
the entire variance–covariance matrix is non-singular and an interior
solution exists. With higher moments, it is possible that different
portfolios (with different values for at least some moments) might
produce the same amount of expected utility. Hence, we may con-
clude that, for a given utility function, there could be more than one
optimal portfolio that are all efficient.
Merton argued that the Rothschild and Stiglitz’s definition of
“less risky” is conditioned on the mean return being equal. If assets
(or portfolios) have different mean returns, no statement about more
or less risky can be made. To appreciate this statement, we can refer
to two different positions on the, what we now known as the efficient
frontier at two different expected return levels; it is not possible to
choose between them. As we will note later, the only statement we
can make about them is that the portfolio that is riskier must have
a higher return. Hence, the efficient frontier, when plotted on the
risk-return space, must be upward sloping. Merton argued that his
measure of risk can overcome this shortcoming. But Merton’s risk
measure, as we will see later, works only if higher moments of risky
return are ignored or the third and higher-order derivatives of the
utility function are all zeroes.
3.2.2 Portfolio analysis

Theorem 2.1 (Risk and returns of two risky portfolios). Effi-
cient portfolio must have a higher return if it is riskier.
Let Ze denote the return on efficient portfolio, and Z denote the

return on a feasible portfolio. If (Ze −Z e ) is riskier than (Z −Z), then
Z e > Z. This statement is proved by contradiction. From Rothschild
and Stiglitz’s definition of riskiness in (3.3), (Ze −Z e ) is riskier means
E{U (Z − Z)} > E{U (Ze − Z e )}.
If Z ≥ Z e contradicts Rothschild and Stiglitz’s prediction, then it

must be true that
E[U (Z)] > E[U (Ze )].
But by the definition of an efficient portfolio, Ze , Eq. (3.3) suggests
E[U (Ze )] > E[U (Z)].
Hence, by contradiction Z must be smaller than Z e .
Corollary 2.1 (Risky vs. risk free). Return on the efficient port-
folio must be higher than the risk-free rate unless the efficient port-
folio is risk-free
If Ze is riskless, then Ze = Z e = R. If Ze is not riskless, then

Ze − Z e is riskier than R − R. From Theorem 2.1
Z e > R = R.
This corollary implies that expected return on an efficient portfolio

must be greater than risk free rate, unless it is risk free, in which
case its expected return is equal to the risk-free rate.
Theorem 2.2 (Condition for risk-free asset as the optimal

portfolio). R is optimal iff Z j = R for all j.
The iff (if and only if) statement is proved in two parts; in the
first part we show that if Z j = R, then the return of the optimal
portfolio must be R. In the second part, we show that if R is optimal
then all Z j = R. First from f.o.c.,
E{U (Z ∗ )(Zj − R)} = 0.

Risk Measures 25
If Z j = R for all j, then

n

∗
Z = wj∗ Z j = R
1
is a solution to the required condition. This proves the first “if” part.
To prove the “only if” part, note that if Z ∗ = R is an optimal
solution, then
E[U (R)(Zj − R)] = 0,

U (R)[E(Zj − R)] = 0.
Since U (R) > 0 from non-satiation, it must mean that
E(Zj − R) = 0,
E(Zj ) = Z j = R.
This means that if R is the solution, then Z j must equal to R for

all j. In another words, investor will choose the riskless security as
the optimal portfolio if and only if Z j = R for all j. It follows from
this theorem and the corollary above that a risk averse investor will
choose a risky portfolio if Z e > R and Z j = R for at least one j.
This is quite a powerful insight which is not quite obvious from raw
intuition.
Theorem 2.3 (Inefficiency and noise). If Zs is the efficient port-

folio, Zp , plus noise, then s is not in any efficient portfolio.
If s has return Zs = Zp + s or s = Zs − Zp . Moreover, s is a

zero mean noise2 uncorrelated with Zj
E{s } = E{s |Zj , j = 1, . . . , n, j = s} = 0, (3.4)
If Ze is the return on an efficient portfolio with δ fraction allocated to

security s, and Z has the same return as Ze but it contains δ portion
2
In the risk and uncertainty literature, such a zero mean noise, s , is called pure
noise.
in Zp instead of Zs . Then
Ze = Z + δ(Zs − Zp ) (3.5)
= Z + δs .
From the definition in (3.4),
Z e = Z.
For δ = 0, Ze is riskier than Z in the Rothschild–Stiglitz sense. This
contradicts the definition of an efficient portfolio Ze . Hence, δ = 0 in
every efficient portfolio.
Corollary 2.3 in Merton’s book extends Theorem 2.3 to n
securities. Theorem 2.3 and Corollary 2.3 together demonstrate that
all risk averse investors would want all “unnecessary” uncertainties
to be eliminated. In particular, by this theorem, lottery is a noise
(negative mean) that is not in any efficient set. Thus, the existence
and popularity of lottery seem to contradict the strick risk aversion
(in Rothschild–Stiglitz sense) on the part of lottery buyers. One
possible explanation for lottery is offered by Friedman and Savage
(1948) who argue that part of the utility function is concave (for the
normal portion below current income) while another part is convex
(that matches lottery payoff). There are also other explanations that
are based on prospect theory and behavioural economics which is
beyond the scope of this book.
3.3 Merton’s Risk Measures

Following Sec. 3.2.1, an investor with a concave utility function V
with
VK (ZeK ) ≡ VK ,
will select portfolio K with return ZeK as his optimal portfolio. From
the discussion in Sec. 3.2.1, K is an efficient portfolio. Merton com-
ments that while VK will always exist, it will not be unique.3 The fact
3
That is, different investor with different utility functions may choose the same
portfolio ZeK . Moreover, as in all Merton’s work, this uniqueness here is defined
in the context of mean–variance world with a non-singular covariance matrix.
Risk Measures 27
that the investor has chosen ZeK as his optimal portfolio, it must
mean that
E[VK (ZeK )(Zj − R)] = 0 for all j = 1, . . . , n. (3.6)
Next define
VK − E[VK ]
YK ≡ . (3.7)
Cov(VK , ZeK )
Equation (3.7) is a key step in the derivation; the use of covari-
ance term immediately rules out all relationships that involve higher
moments.
For concave utility function, VK > 0 and VK < 0 which
means that (i) VK = 0, (ii) σ(Ze ) > 0 (strictly positive), and
(iii) ρ(VK , Ze ) < 0. The covariance (correlation) term is negative
suggesting that higher portfolio return distribution Ze is associated
with lower levels of marginal utility. As shown below, VK is down-
ward sloping convex function of Ze (similar to the pricing kernel
relationship with total wealth).
V ′K
Ze
The variable YK is like a “market price of risk” measured in terms

of marginal utility. Given
VK − E(VK ) VK − E(VK )
YK = = ,
Cov(VK , ZeK ) ρσ(VK , )σ(Ze )
the more concave the utility function, the larger the standard devi-
ation σ(VK , ) and (VK − E(VK )) (the latter is in fact a measure for
deviation).
YK can be separated into four components; VK − E(VK ), σ(VK , ),

ρ and σ(Ze ). If σ(VK , ) increases, VK − E(VK ) also increases, so the
V −E(V )
impact on the numerator and the denominator of Kσ(V ,)K roughly
K
cancel out. Thus, YK is not likely to be very sensitive to the standard
deviation σ(VK , ). On the other hand, YK is affected by ρ and σ(Ze ).
In absolute term, the larger the ρ or σ(Ze ), the smaller the “price of
risk” as measured in terms of marginal utility.
Finally, Merton’s risk measure bK p for portfolio p is defined with
respect to an efficient portfolio K as follows:
bK
p ≡ Cov(YK , Zp )
Portfolio p is riskier than portfolio p if bK K

p > bp .
Theorem 2.4. Risk premium of bK

p .
Merton’s risk measure bK p has a return-risk prediction similar to

Security Market Line in the Capital Asset Pricing Model. If Zp is the
return on a feasible portfolio p, then
K
Z p − R = bK
p (Z e − R). (3.8)
Proof. This risk-return prediction is an outcome of (3.6). To prove

this, assume portfolio p has δj weight invested in security j,
n

Zp = δj (Zj − R) + R.
1
From (3.6), multiply investment weight j and sum all n equations,

n

δj E[VK (ZeK )(Zj − R)] = 0,
1
n

E VK (ZeK ) δj Zj − R = 0,
1
E[VK (ZeK )(Zp − R)] = 0.
This is simply the f.o.c. restated for portfolio P . Now apply the same
argument but use the portfolio weights pertaining to Zek , we have
E[VK (ZeK )(ZeK − R)] = 0.
Risk Measures 29
Then given
E(XY ) = E(X)E(Y ) + Cov(XY )
If E(XY ) = 0
Cov(XY ) = −E(X)E(Y ).
Since Cov(VK , Zp ) < 0, we get
Cov(VK , Zp ) = (R − Z p )E[VK (ZeK )]
K
Cov(VK , ZeK ) = (R − Z e )E[VK (ZeK )]
K
VK (Ze ) − E[VK (ZeK )]
Cov(YK , Zp ) = Cov , Zp
Cov(VK (ZeK ), ZeK )
1
= Cov(VK (ZeK ), Zp )
Cov(VK (ZeK ), ZeK )
R − Zp
= K
,
R − Ze
which leads immediately to (3.8).
3.3.1 Properties of Merton’s risk measure bp

Lemma 2.1 (If bK L
p = 0 then bp = 0). If Zp has a zero beta with
K
respect to Ze , it will have a zero beta with respect to any other effi-
cient portfolio ZeL .
Since VK (ZeK ) is a continuous monotonic function (for strictly

concave utility function), VK and ZeK has a one-to-one correspon-
dence and
E[Zp |VK ] = E[Zp |ZeK ]
for efficient portfolio K. If E[Zp |ZeK ] = Z p , then Zp and ZeK are
independent
Cov(Zp , VK (ZeK )) = E{(Zp − Z p )[VK − E[VK (ZeK )]]}
= E{[VK − E[VK (ZeK )]]E[(Zp − Z p )|ZeK ]}
= E{[·] × 0}
= 0,
which means that bK

p = 0 and, from Theorem 2.4, this means Z p = R.
From Corollary 2.1, the average return of every efficient portfo-
lio must be greater than the risk free rate, Z L > R. Hence from
Theorem 2.4
Z p − R = bL
p (Z L − R),
if Z p = R, bL
p must be zero for the RHS to go to zero. This leads to
the conclusion that if bK L K
p = 0 then bp = 0 for efficient portfolios Ze
and ZeL .
Property 1 (Chain rule)
Chain rule applies to bK
p with respect to different efficient portfolios
bK K L
p = bL · bp
L
Ze − R Zp − R Zp − R
bK
L = K
, bK
p = K
, bL
p = L
Ze −R Ze −R Ze − R
Property 2 (Beta of risky asset)

If L and K are efficient portfolios, then bK L
K = 1 and bK > 0. From
K
Theorem 2.4, bK must be 1, because
K
Z K − R = bK
K (Z e − R)
ZK − R
bK
K = k
Ze − R
K
and Z K = Z e as K is an efficient portfolio. In the case of bL
K
L
Z K − R = bL
K (Z e − R).
L
Since Z e > R and Z K > R
Zk − R
bL
k = L
> 0.
Ze − R
The message here is that all efficient portfolios have positive sys-
tematic risk relative to any efficient portfolio; beta of risky asset is
greater than zero.
Risk Measures 31
Property 3 (Beta of risk-free asset)
Zp = R iff bK
p =0
for every efficient portfolio K.

This follows from Theorem 2.4
K
Z p − R = bK
p (Z e − R).
If bK
p = 0 is zero then RHS and LHS will have to be zero and Z p = R.
This is the if part. If Z p = R, the LHS is zero, hence the RHS is also
K
zero. Given Z e > R, bK p must be zero. This is the only if part. This
completes the proof. Moreover, beta of the risk-free asset is zero with
reference to any other efficient portfolio; since bK L
p = 0 leads to bp = 0
for all efficient portfolios L from Properties 1 and 2 above.
Property 4 (Unique ordering)
Let p and q be two feasible portfolios and K and L are two efficient
portfolios.
bK K
p bq iff bL L
p bq .
(i) For equality: if
bL L
p = bq = 0
then from Property 3,
bK K
p = bq = 0.
(ii) For bL
p = 0
bL
q bL K
k bq bK
q
L
= L
= .
bp K
bk bp bK
p
Hence from Property 1, K and L provide the same ordering of risk

for any reference efficient portfolio.
Property 5 (Idiosyncratic risk is not priced)
If
(Zp − R) = bK K
p (Ze − R) + εp , and E(εp ) = 0.
Then
E{εp VL (ZeL )} = 0
for every efficient portfolio L. That is unsystematic risk is not priced.
Proof. Given E(εp ) = 0. Form a portfolio q by
holding $1 in p +R + bK K
p (Ze − R) + εp
long $bK
p in risk free
K
+bp R
short $bK
p in efficient portfolio K −bK K
p Ze .
Then return of q is
Zq = R + εp
Z q = R.
From Property 3, this implies
bK
q = 0 and bL
q = 0,
bL
q = 0 ⇒ Cov(Zq , VL ) = 0,
Cov(Zq , VL ) = 0,
Cov(R + εp , VL ) = 0,
Cov(εp , VL ) = 0,
E(εp VL ) − E(εp )E(VL ) = 0,

0
E(εp VL ) = 0 for all L.
Property 6 (Sum of betas)

If portfolio p is
n

Zp = wj Zj
j=1
then
n

bK
p = wj bK
j
j=1
Risk Measures 33
from the linear property of the covariance operator. Hence, the sys-
tematic risk of a portfolio is the weighted sum of the systematic risk
of its components.
3.3.2 Relationship between bp and conditional

expected return E[Zp|Ze]
The risk of a security is measured by its marginal contribution to
the risk of an optimal portfolio. Hence, there is a direct relationship
between the risk measure bp and the expected return Zp . Now define
the conditional expected return
Gp (Ze ) ≡ E[Zp |Ze ].
where Ze is the return on an efficient portfolio.
Theorem 2.5. If p is riskier than q, then Z p > Z q
Given p and q are both feasible portfolios. Ze is an efficient port-

folio. If
dGp (Ze ) dGq (Ze )
≥ ,
dZe dZe
then (i) p is riskier than q, and (ii) Z p > Z q .
Proof.
V − E(V ) dV (Ze )
Y (Ze ) = , and V =
Cov(V , Ze ) dZe
bp = Cov(Y (Ze ), Zp )
bp − bq = Cov(Y (Ze ), Zp − Zq )
= Cov[Y (Ze ), Gp (Ze ) − Gq (Ze )] given Ze .
If [Gp (Ze ) − Gq (Ze )] > 0 and is a non-decreasing function of Ze , then

bp − bq > 0 and bp > bq . From Theorem 2.4, this implies Z p > Z q .

Theorem 2.6. Risk premium a for bp = bq + a

If
dGp (Ze ) dGq (Ze )
− = a, (3.9)
dZe dZe
then
bp − bq = a
Z p = Z q + a(Z e − R).
Proof. From (3.9)

Gp (Ze ) − Gq (Ze ) = aZe + h
where h is a constant.
bp − bq = Cov(Y (Ze ), Gp (Ze ) − Gq (Ze ))
= Cov(Y (Ze ), aZe + h).
Since
Cov(Y (Ze ), Ze ) = 1, and Cov(Y (Ze ), h) = 0
bp − bq = a
bp = a + bq
and
Z p = R + bp (Z e − R)
= R + bp (Z e − R) + a(Z e − R)
= Z q + a(Z e − R).
Theorem 2.7. Relative change in conditional return.

Let
dGp (Ze )
βp = .
dZe
For all possible efficient portfolio Ze :
(i) βp > 1 then Z p > Z e ;
(ii) 0 < βp < 1 then R < Z p < Z e ;
(iii) βp < 0 then Z p < R;
(iv) βp = a then Z p = R + ap (Z e − R).
Risk Measures 35
Proof. If q = e, Zp = Ze ,
dGq (Ze ) dGe (Ze )
= = 1.
dZe dZe
If Zq = R,
dGq (Ze ) dGR (Ze )
= = 0.
dZe dZe
3.3.3 Discussion
K
From Theorem 2.4, if we have (Z e − R) as the risk premium, then
the excess return (Z p −R) is proportion to Merton’s risk measure bK p ;
the larger the bK
p , the larger the expected return Z p . Merton gives
three reasons why bK p is a better risk measure as follows:
(i) The expected excess return on portfolio, Z p − R, is direct pro-

K
portional to risk bK
p (because Z e > R). Rothschild and Stiglitz
compare riskiness of portfolios but tell us nothing about the
premium for risk.
(ii) For investor with utility function U and objective function,
max E[U (wZj + (1 − w)Z)]
w
i.e. the investor holds portfolio Z and decides if he should switch

investment to security j. Setting L = E[· · · ], the f.o.c. leads to
∂L
= E{U [wZj + (1 − w)Z](Zj − Z)} = 0
∂w
and solve for w∗ . (Note that Z might already contains some Zj .
So we can interpret w∗ as excess demand.)
If the original portfolio Z is optimal, then w∗ = 0, and Z =
∗
Z ; the optimal portfolio must be an efficient portfolio. Then
we can invoke Theorem 2.4,
∗
Z j − R = b∗j (Z − R), (3.10)
where b∗j measures the contribution of security j to the optimal
portfolio.
∗
(a) If Z j − R equal b∗j (Z − R), the investor is indifferent to a
marginal change in the holding of j.
∗
(b) If Z j −R > b∗j (Z −R), the investor will increase his holding
in j, and vice versa.
(c) As risk b∗j increases, Z j − R must increase accordingly if
portfolio holding is to remain unchanged.
As Merton notes, Eq. (3.10) is like the security market line
representing the excess demand function and personal portfolio
equilibrium.
(iii) Ordering of j by their “systematic risk” relative to a given effi-
cient portfolio is identical to the ordering relative to any other
efficient portfolio. That is “security j is riskier than security i”
is unambiguous. (This is proved in Property 4.)
It is clear that Merton’s risk measure closely resemblance what

we now know as the CAPM beta. However, the systematic risk mea-
sures used in the two models are completely different in terms of
their definitions and interpretations. The CAPM beta is defined as
βp = Cov(Zp , ZM )/Var(ZM ). On the other hand, Merton’s risk mea-
sure is defined in terms of marginal utilities. It can be interpreted
as measuring how the feasible portfolio (or security) return covaries
with the marginal utility derived from a change in the efficient port-
folio’s return relative to the covariance between the efficient port-
folio’s return and the marginal utility derived from a change in it.
Both covariance terms, in the numerator and the denominator, of the
Merton risk measure are negative because higher portfolio return dis-
tribution is associated with lower levels of marginal utility and as a
result bKp will always be positive whereas the CAPM definition the-
oretically allows for negative βp .
One main weakness of Merton’s risk measure lies in the fact that
it depends on marginal utilities and the efficient portfolio is defined
with respect to a particular utility function. As such, Merton’s risk
measure will be different for different investors with different utility
functions and different efficient portfolios. So, it is not possible to
compare which portfolio is more risky for different investors. That is,
Merton’s risk measure is not necessarily unique. In this regard, we
often assume there is a unique CAPM market portfolio, which also
may not be true in practice.
Risk Measures 37
Merton’s systemic risk and RS’s definition of increasing risk can

produce different outcome. For example, if security j is independent
of the return on efficient portfolio K, then bKj = 0. Therefore, by the
K
bj measure, security j and R are both riskless and Z j = R. However,
if σj > 0, then by RS measure, security j is riskier than R.
Merton argued that the two risk measures are complementary.
The RS definition measures the total risk of a security by comparing
the expected utility from holding a security (or portfolio) alone. It
is the appropriate definition for identifying optimal portfolios and
determining the efficient frontier. It is not useful for defining the risk
of a security that is part of a portfolio. Merton’s bKj measures only
the part of the security’s risk that contribute to the total risk of his
optimal portfolio, i.e. it measures the systemic risk with respect to
the efficient portfolio K. But to determine bK j , the efficient set must
first be determined and the RS measures does just that. Hence, the
two measures are complementary.
Exercises: Capital Market Theory, Risk Measures

1. Merton argues that Theorem 2.3 and Corollary 2.3 together imply
that the existence of lotteries is inconsistent with risk aversion.
(a) Explain.
(b) Use utility theory to explain why people buy lottery. [Hint:
you may like to refer to Friedman and Savage (1948).]
2. Relate Yk to CAPM market portfolio and bkp to CAPM beta β.
Compare and contrast the necessary and sufficient conditions that
lead to bkp and β as measures for risk premium.
3. How should Yk and Merton’s risk measure bkp be defined when
distribution of ZeK has high moments?
Chapter 4
Consumption and Portfolio Selection
This chapter follows closely the materials in Merton (1990,

Chapter 4). Section 4.1 first presents a simple single-period portfolio
selection problem and shows how the solution is simplified when there
is a risk free rate for lending and borrowing. The context is then
extended to multiperiod where the focus is on the asset allocation
decision between risk free and risky assets and all individuals have
finite life span. In this case, the asset allocation decision has to be
solved via dynamic programming based on the Hamilton–Jacobi–
Bellman equation. If we relax the assumption to infinite horizon, for
example by taking the position of a pension fund that never liqui-
dates, then analytical solutions are possible. It is under this special
setting that we solve the optimal asset allocation decision and obtain
the optimal portfolio for individuals with three different types of util-
ity functions, viz. constant relative risk aversion (CRRA), constant
absolute risk aversion (CARA) and hyperbolic absolute risk aver-
sion (HARA). In this chapter, we omit the discussion on the bequest
valuation function.
4.1 Basic Set-up

Classical finance theory concerns the roles of firms, financial inter-
mediaries and capital markets in the efficient allocation of resources
across time and under uncertainty. Very often, households, their
tastes and their endowments are exogenous to the theory. On the
other hand, firms and financial organizations are endogenous to
39
the theory on the assumption that their existence is solely for the
function they serve. Similarly, the capital markets exist to provide
households with risk pooling and risk sharing opportunities and to
facilitate the efficient allocation of resources. In the analyses in the
subsequent sections, the following assumptions are made:
(i) Frictionless markets, i.e. no transaction costs or taxes, all secu-
rities are perfectly divisible.
(ii) All individuals are price takers.
(iii) There is no arbitrage opportunity. Market is at equilibrium.
Returns (per dollar) of all riskless assets is R = ln(1 + r).
(iv) There are no institutional restrictions; short-sales are possible,
borrowing rate equals lending rate.
First, we start with discrete time and the budget equation below
m m
Xi,t
Wt = wi,t0 (Wt0 − Ct0 h) s.t. wi,t0 ≡ 1,
1
Xi,t0 1
where Wt is the wealth at t, Xi,t is a stochastic variable represent-

ing the price of ith asset at t, i = 1, . . . , m, Ct is a decision variable
for the investor representing the consumption per unit time at t,
wi,t , the portfolio weight invested in asset i, is the investment deci-
sion that the investor has to make, and h is the time interval such
that t ≡ t0 + h.
Assume that Xi,t is a geometric Brownian motion1 with constant
mean, αi , and variance, σi2 ,
dXi,t = αi Xi,t dt + σi Xi,t dzit .
Since wealth is being invested in m assets, through aggregation,

m m

dWt = wi,t αi Wt − Ct dt + wi,t σi Wt dzit ,
1 1
where dzit may be correlated.
1
As one would appreciate later, the geometric Brownian motion (GBM) assump-
tion is very crucial to deriving the results in the following sections. This is in fact
a key assumption made throughout Merton’s (1990) book.
Consumption and Portfolio Selection 41
4.2 One Risky and One Risk-Free Asset

Assume now that there are only two assets, one of which is a risk-free
investment with the following price dynamic:
dX = rXdt.
Let w be the weight of the risky asset and (1 − w) be the weight of

the riskless asset. Then
dWt = {[wα + (1 − w) r] Wt − Ct } dt + wσWt dzt . (4.1)
Let U (W ) denote the von Neumann–Morgenstern utility func-

tion2 of the end-of-period wealth. Throughout the analysis here, we
assume that U is an increasing strictly concave function and that U
is twice differentiable. With this special two-asset portfolio, the life
time objective function of the individual is
T
−ρt
max E e U [Ct ]dt + B[WT , T ] , (4.2)
C,w 0
where ρ is the investor’s impatient factor, and B[WT , T ] is the

bequest valuation function at the time of death T , and is usually a
concave function in WT with diminishing marginal utility as wealth
increases. Equation (4.2) aims to maximise the utility from life time
consumptions and bequest at death.
4.2.1 The Bellman equation

The life time objective function in (4.2) is restated in a dynamic
programming form so that the Bellman principle of optimality can
be applied. Next, define the conditional objective function
T
−ρs
I [Wt , t] ≡ max Et e U [Cs ] ds + B [WT , T ] (4.3)
C(s),w(s) t
with terminal condition at T as I[WT , T ] = B[WT , T ].
2
Neumann–Morgenstern utility is the foundation of all expected utility theories.
Examples of non-expected utility theories include Allais paradox, framing and
behavioural finance.
In general, at time t0 , Eq. (4.3) is written as
t
−ρs
I [Wt0 , t0 ] = max Et0 e U [Cs ] ds + I [Wt , t] . (4.4)
C(s),w(s) t0
In particular, (4.4) shows that an optimum decision for I[Wt0 , t0 ]

must also satisfy the constraint that I[Wt , t] is optimum for all t > t0 .
Normally, solving the Bellman equation is a complex dynamic control
problem and it is normally evaluated iteratively from the last period.
Here, we exploit the time homogeneity of the problem and simplify
the problem as follows.
First, apply Taylor series expansion at t ≡ t0 + h and write
I[Wt0 , t0 ] as It0 . Then, Eq. (4.4) can be written as

∂It0 ∂It0
I [Wt0 , t0 ] = max Et0 e−ρτ U [Cτ ] h + It0 + h+ [Wt − Wt0 ]
C,w ∂t ∂W

1 ∂ 2 It0 2
+ [W t − W t0 ] + o (h) . (4.5)
2 ∂W 2
t
The cumulative consumption utility t0 e−ρs U [Cs ]ds is approximated,
by the mean value theorem for integral,3 as e−ρτ U [Cτ ]h under expec-
tation with τ ∈ [t0 , t]. In continuous time method, h ≡ dt. As h → 0,
we can write τ as t.
Next, take expectation Et0 of each term in the RHS of (4.5) and
note that
Et0 [It0 ] = Et0 {I [Wt0 , t0 ]} = I [Wt0 , t0 ] ,
which cancelled out the LHS of (4.5).
3
R b if G is a
The simplest form of the mean variance theorem for integral states that
continuous function, then there exists a number x ∈ (a, b) such that a G(t)dt =
G(x)(b − a).
Et0 [Wt − Wt0 ] and Et0 [Wt − Wt0 ]2 are the drift and the variance
rates of (4.1). As h → 0, o(h) is dropped, we get4

∂It ∂It
0 = max e−ρt U [Ct ] + + [(wt (α − r) + r) Wt − Ct ]
C,w ∂t ∂W

1 ∂2I 2 2 2
+ w σ Wt , (4.6)
2 ∂W 2 t
where It is short for I[Wt , t]. The subscript t0 is replaced by t because
(4.6) holds for any t. This is an important step in the solution as we
have just reduced the dynamic control problem in (4.4) into a single-
state partial differential equation (PDE) problem.
The optimal solution is obtained when
⎧ ⎫
⎪
⎪ φ = 0, ⎪
⎪
⎨ ⎬
φC = 0,
.
⎪
⎪ φw = 0, ⎪
⎪
⎩ ⎭
I [WT , T ] = B [WT , T ]
If we define the differential operator
∂It ∂It
φ ≡ e−ρt U [Ct ] + + [(wt (α − r) + r) Wt − Ct ]
∂t ∂W
1 ∂ 2 It 2 2 2
+ w σ Wt , (4.7)
2 ∂W 2
then (4.6) can be written as
max φ (w, C; W ; t) = 0
C,w
and the partial derivatives of φ, together with their first-order con-

ditions (f.o.c.s), are used to solve for w∗ and C ∗
∂φ
φC = = 0,
∂C
∂It
e−ρt U [Ct∗ ] − = 0,
∂W
∂It
Ct∗ = UC−1 eρt , (4.8)
∂W
4
In the multiassets case, we just need to replace wt by wt and wt2 σ 2 by wt wwt
and the proof will carry through.
where UC is the partial derivative of U with respect to C, and UC−1

is the inverse function of UC . Similarly,
∂φ
φw = = 0,
∂w
∂It ∂ 2 It ∗ 2 2
(α − r) Wt + w σ Wt = 0,
∂W ∂W 2 t
IW (α − r)
wt∗ = − , (4.9)
Wt IW W σ 2
where the subscripts of φ and I denote the differential variable. Equa-
tions (4.8) and (4.9) must be satisfied in all solutions.
The second-order condition requires that

φCC φCw

φCC < 0, φww < 0, > 0.
φwC φww
Since φCw = φwC = 0, and given that U is strictly concave, φCC =

e−ρt U (C) < 0 and φww = σ 2 W 2 IW W < 0 by strict concavity of It .
Hence, the second-order condition would be satisfied.
4.2.2 Infinite time horizon

The problem presented in the previous section is easier to solve if
T → ∞, as B[W∞ , ∞] drops out and I[Wt , t] becomes independent
of explicit time and is a function of W only. The objective function in
Eq. (4.6) is then reduced to an ordinary differential equation in W .
Next define,
ν ≡ s − t, s = ν + t,
ds = dv,
s → [t, ∞] v → [0, ∞] .
First, let
J [Wt , t] = eρt I[Wt , t]

∞
−ρ(s−t)
= max Et e U [C] ds
C,w t
∞
−ρν
= max E0 e U [C] dν
C,w 0
= J[Wt ],
and the partial derivatives are
∂I ∂J ∂2I 2
−ρt ∂ J
= e−ρt , = e , (4.10)
∂W ∂W ∂W 2 ∂W 2
∂I
= −ρe−ρt J,
∂t
with ∂J
∂t = 0. Since the terminal wealth WT is now not relevant, we
will write Wt as W from now on without the risk of confusion.
Substitute the partial derivatives of I in (4.10) into (4.6). With
all the e−ρt term cancelled out and drop the time subscript t (for
presentation only) to give5

0 = max U [C] − ρJ + J [(w (α − r) + r) W − C]
C,w

1 2 2 2
+ J σ w W . (4.11)
2
It is now obvious that the PDE in (4.6) is reduced to an ordinary
differential equation (ODE) in (4.11) above; there is no differential
variable with respect to t. So Eq. (4.11) is no longer a function of
time.
Finally, substitute (4.10) into (4.8) and (4.9) to give
U [Ct∗ ] = J , (4.12)
(α − r) J
wt∗ = − . (4.13)
σ 2 Wt J
4.3 Constant Relative Risk Aversion

In this section, we continue to assume that time horizon is infinite
and use all the intermediate results in Sec. 4.2.2. Suppose we have a
5 ∂2I
From here onwards, we will write I(W ) as I, ∂I
∂W
as I , ∂W 2
as I , J(W ) as J,
2
∂J
∂W
as J , and ∂ J
∂W 2
as J .
power utility function as follows:6
1 γ
U (C) = C (4.14)
γ
for γ < 1 and γ = 0. When γ = 0, we have U (C) = log C.
1
U (C) = γC γ−1 = C γ−1 , (4.15)
γ
U (C) = (γ − 1) C γ−2
and the relative risk aversion (RRA) measure7
U − (γ − 1) C γ−2
RRA = − C = C
U C γ−1
= − (γ − 1) = 1 − γ = δ
which is a constant as according to the definition of CRRA. Investors

who are CRRA (constant RRA) will invest a fixed portion of wealth
in risky asset(s). In another words, the absolute amount of money
invested in risky asset increases as the investor becomes wealthier.
To solve (4.11), first substitute U in (4.15) into (4.12) to give
1
C ∗ = J γ−1 , (4.16)
which we use to replace all the terms involving C in (4.11) to give
1 γ−1
γ 1
U (C ∗ ) − J C ∗ = J − J J γ−1
γ

1 γ 1 − γ γ−1
γ
= − 1 J γ−1 = J .
γ γ
6
Note that in Merton (1990), U (C) = γ1 (C γ − 1). This specification of the utility
function will not lead to the required result. This is corrected in the subsequent
reprint of the book.
7
From here onwards, we will write U (C) as U , U (C) as U , and U (C) as U .
Substitute this result and the solution for w∗ in (4.13) into (4.11)
and evaluate it at the optimum (C ∗ , w∗ ), we have

2
1 − γ γ−1
γ (α − r) J
0= J − ρJ + J − 2 W + rW
γ σ W J

1 2 (α − r)2 J 2 2
+ J σ 2 2 2 W ,
2 σ σ W J

1−γ γ (α − r)2 [J ]2
0= J γ−1 − ρJ − + rW J . (4.17)
γ 2σ 2 J
In Sec. 4.3.1, we need to solve the functional form of J in terms

of W and then use it to solve for C ∗ and w∗ .
4.3.1 Solution for J

From (4.17), J appears in the denominator and hence J = 0. More-
over, the second and fourth terms of (4.17) suggest that J and W J
have the same order in W . So, it is reasonable to assume that
J = AW B , J = ABW B−1 , J = AB(B − 1)W B−2
where A and B are some appropriate constants. Then, Eq. (4.17)

becomes

1−γ γ
0= ABW B−1 γ−1 − ρAW B
γ
2
(α − r)2 ABW B−1
− + rW ABW B−1 . (4.18)
2σ 2 AB (B − 1) W B−2
On the other hand, the first and second terms of (4.17) suggest that
J and J also have the same order in W . Hence, we have, by equating
the power of W ,
γ
(B − 1) = B,
γ−1
B = γ.
So (4.18) becomes

1−γ γ (α − r)2 AγW γ
0= (Aγ) γ−1 W γ − ρAW γ − + rAγW γ .
γ 2σ 2 (γ − 1)
Dropping W γ from all terms and simplifying the last three terms, we
have

2
1−γ γ (α − r) γ
0= (Aγ) γ−1 − A ρ + − rγ
γ 2σ 2 (γ − 1)

1−γ γ
= (Aγ) γ−1 − Aμ,
γ
2
where μ = ρ + (α−r) γ
2σ2 (γ−1)
− rγ. Solving for A, we have

1−γ γ
(Aγ) γ−1 = Aμ,
γ
γ γ γ
A γ−1 −1 γ γ−1 = μ,
1−γ
1 γ
+1 1
A γ−1 = γ 1−γ μ,
1−γ
1 γ−1 γ−1
γ 1−γ 1 μ b
A= μ = = ,
1−γ γ 1−γ γ
μ γ−1
for b = ( 1−γ ) . Hence the solution for J is
b γ
J= W ,
γ
b
J = γW γ−1 = bW γ−1 ,
γ

J = b (γ − 1) W γ−2 .

Substitute J, J , J , back into (4.17), we get
1−γ γ b
0= bW γ−1 γ−1 − ρ W γ
γ γ
(α − r)2 b2 W 2γ−2
− + rW bW γ−1
2σ 2 b (γ − 1) W γ−2
1 − γ γ−1
γ b (α − r)2 bW γ
= b Wγ − ρ Wγ − + rbW γ
γ γ 2σ 2 γ − 1
1 − γ γ−1
1 ρ (α − r)2 1
= b − − +r
γ γ 2σ 2 γ − 1
and with rearrangement,
1 − γ γ−1
1 ρ (α − r)2 1
b = − − r,
γ γ 2σ 2 1 − γ
1 ρ γ (α − r)2 rγ
b γ−1 = − − . (4.19)
1 − γ 2σ 2 (1 − γ)2 1 − γ
4.3.2 Solution for C and w

The solution for J is only a stepping stone in finding the optimal con-
sumption and investment weights. To complete the task, substitute

J, J and J into (4.16) and (4.13) to give
∗
1 1
C∞ = bW γ−1 γ−1 = b γ−1 W, (4.20)
∗ α−r bW γ−1 α−r α−r

w∞ =− = 2 = 2 .
σ 2 W b (γ − 1) W γ−2 σ (1 − γ) σ δ
(4.21)
Given that 0 ≤ γ < 1, 0 < δ ≤ 1. As the constant relative risk

aversion δ increases, w∗ decreases. Substitute the result in (4.19)
into (4.20) and get

∗ ρ (α − r)2 r
C∞,t = −γ +
1−γ 2σ 2 (1 − γ)2 1 − γ
(ρ − γv) W
W = , (4.22)
1−γ
(α−r) 2
where v = r + 2(1−γ)σ 2 . The solution suggests that the investor
will consume more as she becomes more impatient (higher ρ) and

consumes less if investment returns ((α − r) and r) are larger.8 For

positive consumption, there is an upper bound for γ.
4.3.3 Economic interpretation

Equations (4.22) and (4.21) give us the solution for optimal con-
sumption and portfolio decision rules when time horizon is infinite
and investor’s utility function is CRRA. It is very important to note
that, in the case of CRRA, the portfolio-selection decision, wt , is
independent of the consumption decision, Ct . Moreover, the require-
ment of CRRA implies that investor’s attitude towards financial risk
(i.e. relationship between α and σ) is independent of her wealth level.
Moreover, given that (α, r and σ 2 ) are constant, asset price change
and the resulting wealth level change has no impact on portfolio deci-
sion wt . On the other hand, as the constant relative risk aversion δ
increases, she will put less proportion of her wealth into risky asset.
This fits in well with the description of a CRRA investor.
In log utility where γ = 0, we have the special risk-neutral case,
the separation of investment and consumption decisions now goes
both ways
∗
C∞ (t) = ρWt ,
∗ α−r
w∞ (t) = .
σ2
Now w∞ ∗ (t), independent of consumption, is related to the Sharpe
ratio of the asset. Consumption is independent of w and the finan-

cial parameters (α and σ) and is a linear function of wealth (W ).
In another words, total investment in (risky) assets is also a linear
function of wealth, everything else equal.
8
In the multi-assets case, we have
» „ «–
∗ ρ (α − r) w−1 (α − r) r
C∞,t = −γ + W,
1−γ 2 (1 − γ)2 1−γ
∗ (α − r) w−1 (α − r)
w∞,t = .
(1 − γ)
4.4 Constant Absolute Risk Aversion

An example of a CARA investor is one with exponential utility where
1
U = − e−ηC , η > 0,
η
1
U = − e−ηC (−η) = e−ηC ,
η
U = −ηe−ηC ,
with the measure for risk aversion (RA)

U −ηe−ηC
RA = − = − = η,
U e−ηC
as the name implies, RA itself is a constant. CARA investor is known
to invest a fixed amount of money in risky asset disregarding her
wealth level. Substitute U = e−ηC into Eq. (4.12), we have
∗
e−ηC = J ,
1
C ∗ = − ln J , (4.23)
η
J J
U (C ∗ ) − J C ∗ = − + ln J . (4.24)
η η
Substitute U (C ∗ ) − J C ∗ above and w∗ from (4.13) into (4.11), we
get
J J (α − r)2 (J )2
0=− − ρJ + rJ W + ln J − . (4.25)
η η 2σ 2 J
4.4.1 Solve for J

Now, the task is to find the functional form for J in Eq. (4.25) in
terms of W . Since J is in the denominator of the fifth term, it
suggests J = 0 and that J is twice differentiable. This rules out
linear solution of the form J = A + BW . Given that there is only one
equation, it suggests there is at most two unknown (say A and B).
For this class of financial models and knowing that cubic functions
are not allowed, one could try solution in the form J = AeBW , which
gives rise to J = ABeBW and J = AB 2 eBW . Substitute these

results into (4.25), we have
ABeBW ABeBW
0=− − ρAeBW + rABeBW W + ln ABeBW
η η
(α − r)2 A2 B 2 e2BW
−
2σ 2 AB 2 eBW

AB AB 2 AB (α − r)2
=− − ρA + W rAB + + (ln AB) − A
η η η 2σ 2

B B2 B (α − r)2
= − − ρ + W rB + + (ln AB) − . (4.26)
η η η 2σ 2
One important observation to note is that Eq. (4.26) holds for all
and any W . This suggests that the sum of the coefficients of W is
zero. Hence
B2
rB + = 0,
η
B = −ηr = −q and
B
= −r.
η
B
Substitute η = −r back into (4.26)
(α − r)2
r − ρ − r ln AB − = 0,
2σ 2

1 (α − r)2
ln AB = r−ρ− ,
r 2σ 2
⎡ ⎤
(α−r)2
r − ρ − 2σ2
AB = exp ⎣ ⎦ = p,
r
p p
A= =− .
B q
Now, we can write the suggested solution as

p
J = AeBW = − e−qW ,
q

J = pe−qW ,

J = −pqe−qW .
4.4.2 Solve for C* and w*

Given the functional form of J, we can now attempt to solve for w∗

and C ∗ . This is done by substituting J , J , J and q = ηr into
Eqs. (4.13) and (4.23) to give

∗ (α − r) J(α − r) pe−qW (α − r)
w∞ (t) = − 2
= −qW
= , (4.27)
σ2W J σ W pqe ηrσ 2 W
∗ 1 1 1
C∞ (t) = − ln J = − ln pe−qW = − (ln p − qW )
η η η
2
r − ρ − (α−r)
2σ2
= rW − . (4.28)
ηr

In the CARA case, consumption is no longer a constant proportion
of wealth although it is still linear in the form of C = a + bW .
The solution suggests that the minimum consumption (at W = 0)
is smaller the greater the excess return and the smaller the variance
of the risky asset. As wealth increases, consumption increases at the
rate of r. There is no restriction or cap on total consumption. This
model is less logical as consumption is nonzero even when there is
no wealth, whereas the CRRA model gives zero consumption when
there is no wealth.
The total amount invested in risky asset is determined by
α−r
w∗ W = ,
ηrσ 2
where increases in risk aversion η and risk-free rate r have the effect of
reducing w∗ . Note that the absolute amount invested in risky asset
(w∗ W ) is constant for different wealth levels; as W increases, w∗

decreases. This is in complete contrast to the CRRA model where
w∗ is constant and independent of W . This means that as wealth
increases, the CARA investor invests almost all his wealth in risk-free
asset and consumes all his income which corresponds to an old-age
pensioner who invests solely in fixed income and spending all the
(annuity) income till death.
4.5 Hyperbolic Absolute Risk Aversion (HARA)

Here, we solve the infinite horizon problem in (4.11) and derive the
solutions for the optimum consumption in (4.12) and the portfo-
lio investment in (4.13) using a generalised HARA utility function.
Given the HARA utility function
γ
1−γ βC
U (C) = +η ,
γ 1−γ
γ−1
1−γ βC β
U (C) = γ +η
γ 1−γ 1−γ
γ−1
βC
=β +η ,
1−γ
γ−2
β βC
U (C) = β (γ − 1) +η
1−γ 1−γ
γ−2
2 βC
= −β +η ,
1−γ
βC
for γ = 1, β > 0, 1−γ + η > 0; η = 1 if γ = −∞. HARA utility is
hyperbolic in consumption with positive absolute RA. The absolute
(relative) RA can be increasing, decreasing or constant. The absolute
RA is mostly controlled by γ, whereas the RRA is controlled by η.
4.5.1 Relationship with CRRA and CARA

For HARA utility, the absolute RA measure, is
−1
U (C) βC C η −1
RA = − =β +η = + .
U (C) 1−γ 1−γ β
When η = 0, the HARA utility becomes CRRA as the RRA is

constant

1−γ
RRA = C · RA = C = 1−γ = δ
C
As γ → +∞, the HARA utility becomes CARA as the absolute

RA is now constant
β
RA = .
η
4.5.2 Portfolio choice

From (4.12),
U (C ∗ ) = J ,
1
βC ∗ J γ−1
+η = ,
1−γ β
1
1−γ∗ J γ−1
C = −η ,
β β
γ
1−γ
∗ J γ−1
U (C ) = . (4.29)
γ β
For infinite time horizon, use (4.11) and (4.13)

γ
1−γ J γ−1 α − r J
0= − ρJ − J (α − r) W
γ β σ 2 W J
1
1 −γ J γ−1 1 2 2 α − r J 2
+ J rW − J −η + J σ W
β β 2 σ 2 W J
γ
1−γ J 1 (α − r)2 (J )2
γ−1
= − ρJ −
γ β 2 σ2 J
1
1 − γ J γ−1 1−γ
+ J rW − J + J η
β β β
γ
1−γ J γ−1 (α − r)2 (J )2
= − (1 − γ) − ρJ −
γ β 2σ 2 J

1−γ
+ J rW + η
β
γ
(1 − γ)2 J γ−1 (α − r)2 (J )2 1−γ
= − ρJ − + J rW + η
γ β 2σ 2 J β
(4.30)
which closely resembles (4.17).
4.5.3 Solution for J

Comparing (4.30) with (4.17), we note that we cannot use the same
solution as that for isoelastic marginal utility (or CRRA) unless we
change variable
! = rW + 1−γ
rW η,
β
! = W + 1 − γ η.
W
rβ
Then we have the trial solution

" #
! = AW
J (W ) = J W !B
with
! B−1 ,
J = AB W
! B−2 ,
J = AB (B − 1) W
J !
W
= .
J (B − 1)
Then following the steps in Sec. 4.3, we have

γ
(1 − γ)2 J γ−1 (α − r)2 (J )2 !
0= − ρJ − + J rW
γ β 2σ 2 J
γ
(1 − γ)2 AB ! B−1 γ−1 !B
= W − ρAW
γ β
(α − r)2 AB ! B !B.
− W + rAB W
2σ 2 (B − 1)
Following the same argument that the first and second terms, involv-
! , we have
ing J and J, are in the same order in W
(B − 1) γ
= B,
γ−1
Bγ − γ = Bγ − B,
B = γ.
Then
γ
(1 − γ)2 Aγ γ−1 ! γ !γ
0= W − ρAW
γ β
(α − r)2 Aγ ! γ !γ
− W + rAγ W
2σ 2 (γ − 1)
γ
(1 − γ)2 Aγ γ−1
= − Aμ,
γ β
(α−r)2 γ
where μ = ρ + 2σ2 (γ−1)
− rγ, which means that
−γ
1
γ−1
γ γ−1 γ
A = μ,
β (1 − γ)2
1−γ
β γ (1 − γ)2
A= .
γ μ
Hence
1−γ 1−γ
βγ (1 − γ)2 ! γ−1 γ (1 − γ)2 ! γ−1 ,
J = γW =β W
γ μ μ
!
J W
= .
J (γ − 1)
4.5.4 Solve for C* and w*

Substituting the solution for J into (4.29) to give
⎡ ⎛ ⎞ 1 ⎤
1−γ γ−1
2
1 − γ ⎢ 1−γ
1
⎝β γ (1 − γ) ! γ−1 ⎠ ⎥
C∗ = ⎣β W − η⎦
β μ

1−γ βμ !
= W −η
β (1 − γ)2
μ ! 1−γ
= W− η
1−γ β

μ 1−γ 1−γ
= W+ η − η
1−γ rβ β
and expanding the term μ, we have
" #
(ρ − γν) W + (1−γ)η
βr (1 − γ) η
∗
C∞ = − , (4.31)
(1 − γ) β
(α−r)2
with ν ≡ r + 2(1−γ)σ2 . So consumption is still linear in wealth.

Similarly, substituting the solution for J into (4.13), we have
α−r W !
∗
w∞ =−
σ 2 W (γ − 1)

α−r 1 1−γ
=− W+ η
σ 2 W (γ − 1) rβ
α−r η (α − r)
= 2
+ .
(1 − γ) σ βrσ 2 W
It is interesting to note that the solution for investment decision

here has two components representing the solutions under CRRA
and CARA, respectively. As wealth increases, w∗ decreases as the
second component related to CARA decreases. By setting η = 0,
the HARA utility becomes CRRA and the solutions for C ∗ and w∗
are identical to (4.22) and (4.21). By setting γ → ∞, the HARA
utility becomes CARA. The solutions for w∗ is identical to (4.27).
γ 1
To get C ∗ for CARA note that as γ → ∞, (γ−1) → 1 and 1−γ → 0.
Then

∗ γ (α − r)2 W η η
C∞ = ρ − γr − 2
+ − (1 − γ)
2 (1 − γ) σ 1 − γ βr β

(α − r)2 W η η
= ρ − γr + 2
+ − (1 − γ)
2σ 1 − γ βr β

η ρ (α − r)2
= rW + −γ+ −1+γ
β r 2rσ 2

η ρ (α − r)2
= rW + + −1 .
β r 2rσ 2
η 1
Writing β as η∗ we get

∗ 1 (α − r)2
C∞ = rW − ∗ r−ρ− ,
η r 2σ 2
which is the same as (4.28).
4.6 Optimal Rules Under Finite Horizon

In the case of finite horizon, we revert back to solving Eq. (4.7) as the
individuals now maximise life-long utility within a finite period. Fol-
lowing the notations in Merton’s Section 5.6, we will use J (instead
of I) as the objective function in this section. As with Merton, we
assume that B ≡ 0 for simplicity. For HARA utility, we get

γ−1
∗ βC
U (C ) = β +η = eρt JW ,
1−γ
1
∗ 1 − γ eρt JW γ−1 (1 − γ) η
C = − ,
β β β
γ
∗ 1 − γ eρt JW γ−1
U (C ) = .
γ β
(α−r)
Substitute this and the solution for wt∗ = − W IIW
WW σ2 into (4.7),
we get (5.44) in Merton
γ
(1 − γ)2 −ρt eρt JW γ−1
0= e + Jt
γ β

(1 − γ) η J 2 (α − r)2
+ + rWt JW − W
β JW W 2σ 2
with boundary condition J(W, T ) = 0. Merton claims a solution for
J is
⎛ ⎞1−γ
(ρ−γv)(T −t)
−
(1 − γ) 1 − e 1−γ
(1 − γ) β γ −ρt ⎜
⎜
⎟
⎟
J (W, t) = e ⎝ ⎠
γ ρ − γv
γ
W η −r(T −t)
× + {1 − e } ,
1 − γ βr
(α−r) 2
where v ≡ r + 2(1−γ)σ 2 . This solution is harder to verify as it involves
solving a two-dimensional PDE.

With the solution for J(W, t), Merton shows
(1−γ)η
(ρ − γν)(Wt + βr {1 − er(t−T ) }) (1 − γ)η
Ct∗ = (ρ−γν)(t−T )
− (4.32)
β
(1 − γ){1 − e (1−γ) }
α−r η (α − r)
wt∗ = 2
+ {1 − er(t−T ) }. (4.33)
(1 − γ) σ βrσ 2 Wt
As there are two {} terms in (4.32), the time effect on consumption

is harder to predict. As mentioned before, the investment decision
under HARA is the sum of the solutions under CRRA and CARA.
However, the analyses in the following subsections show that as
(T − t) → 0, consumption Ct∗ increases under CRRA and under
CARA. So it must mean that Ct∗ would increase under HARA for
older person as he approaches death. For wt∗ , finite horizon decreases
risky investment under CARA, and hence decreases risky investment
in the HARA case also.
4.6.1 CRRA with finite horizon

From Sec. 4.5.1, we note that HARA has CRRA and CARA as special
cases. This leads to the optimal solution under CRRA with finite
horizon when η = 0
(ρ − γν) Wt
Ct∗ = (ρ−γν)(t−T )
,
(1 − γ){1 − e (1−γ) }
α−r
wt∗ = .
(1 − γ) σ 2
Comparing this set of solutions with (4.21) and (4.22) under infinite
horizon, we can see that finite horizon has no impact on investment
decision, wt∗ , as the solution for wt∗ does not depend on (T − t). Note
(ρ−γν)(t−T )
that 0 < {1−e (1−γ) } < 1. For a very young person, (T −t) → ∞
and {} → 1, so Ct∗ → C∞ ∗ . For an old person, (T − t) → 0 and
{} → 0, Ct∗ increases. Thus, finite horizon has the effect of increasing

consumption as the investor approaches death.
4.6.2 CARA with finite horizon

The optimal solution under CARA is obtained when γ → +∞,
γ 1
1−γ → −1, 1−γ → 0, we have
η (α − r)
wt∗ = {1 − e−r(T −t) }.
βrσ 2 Wt
The investment solution above is identical to the infinite horizon case
except for the {} term. For the old person, as (T −t) → 0 and {} → 0,
wt∗ decreases. For a very young person, as (T − t) → ∞ and {} → 1,

we have wt∗ → w∞ ∗ . Hence, finite horizon will cause the investor to
invest less in risky asset, and even lesser as death approaches.

Note that
„ «
ρ(t−T ) γ(t−T ) (α−r)2
ρ(t−T )
− γν(t−T ) − (1−γ) r+
(1−γ) 2(1−γ)σ 2
1−e (1−γ) (1−γ) = 1−e .
γ 1
As γ → +∞, 1−γ → −1 and 1−γ →0
ρ(t−T )
− γν(t−T )
1 − e (1−γ) (1−γ) = 1 − er(t−T ) .
Then (4.32) becomes

∗ γ (α − r)2 Wt r(t−T ) −1 η
Ct = ρ − γr − {1 − e } +
2 (1 − γ) σ 2 (1 − γ) βr
η
− (1 − γ)
β

rWt η ρ (α − r)2
=, -+ + −1 .
1−e−r(T −t) β r 2rσ 2
For a young person, as (T − t) → ∞, {} → 1, Ct∗ → C∞ ∗ . For the
old person, as (T − t) → 0, {} → 0, Ct∗ increases. It is clear that,

for an old person, consumption will increase under CARA, while
investment weight decreases.
In summary, the restriction of decision horizon from infinite
to finite has the effect of increasing consumption progressively as
death approaches. The investment on risky asset under CRRA is not
affected by investment horizon, but for CARA and for HARA in gen-
eral, finite horizon will reduce risky investment progressively as death
approaches. There are many other interesting properties that one can
obtain under HARA class of utilities. Merton shows that HARA is
the only utility class that will lead to investment-consumption solu-
tions that are linear in wealth. But this is left for future exposition.
Exercises: Intertemporal Portfolio Section

1. Samuelson (1969) provides a solution for lifetime portfolio selec-
tion with more general probability distributions.
(a) Summarise the proof and findings in Samuelson (1969).
(b) Compare and contrast Samuelson’s (1969) and Merton’s
(1990) solutions.
2. Extend the solution for the special case T → ∞ to finite horizon
when utility is HARA. Use the HARA solution to provide the
solutions for the special cases of CRRA and CARA when horizon
is finite. Provide the economic interpretation for all three cases
and compare them with the infinite horizon case. [Hint: see Merton
equations (4.28) and (4.29) for the HARA case.]
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

Chapter 5
Optimum Demand
and Mutual Fund Theorem
This chapter is based on Merton’s (1990) Chapter 5. It is an extension

of the previous chapter on “Consumption and Portfolio Selection” to
the multi-assets context with a general price process. This chapter is
no doubt the most important cornerstone in asset pricing theories. It
identifies the optimum portfolios with and without the risk-free inter-
est rate, which is the foundation of the mutual fund and separation
theorems, and later become the beta factors models.
Under the assumption of GBM for the price process, a gen-
eral separation or mutual-fund theorem is established such that the
Markowitz–Tobin mean–variance rule holds without the requirement
of a quadratic utility. Here we will omit the consideration of wage
income (Merton 1990, Section 5.7), uncertainty of life expectancy
and the possibility of default (both in Section 5.8), and other price
dynamics (Section 5.10). The alternative price dynamics considered
in Merton (1990, Section 5.10) include a normal distribution for the
price level, a mean reverting drift and a drift that is estimated with
error. These will not be discussed here.
5.1 Asset Dynamics and the Budget Equation

Assume that all income is generated by capital gains from invest-
ment, i.e. there is no wage income. Let Pi,t be the price per share
65
with the following diffusion process

dPi
= αi (P, t) dt + σi (P, t) dzi , (5.1)
Pi
where αi (P, t) and σi (P, t) are functions of price and time.1 The price
function in (5.1) is conditionally lognormal. Since the drift and the
volatility are time varying and are both functions of price and time,
the unconditional price process will not be lognormal. The GBM
assumption is introduced in Sec. 5.4 later.

The dynamic of wealth with n1 wi,t = 1 is
n

dW = wi,t Wt [αi dt + σi dzi ] − Ct dt
1
n
n

= wi,t Wt αi dt − Ct dt + wi,t Wt σi dzi , (5.2)
1 1
where Ct is the amount of consumption per unit time taking place

between t and t + h, and the Brownian motion dzi of the different
assets could be correlated. When risk-free asset is added to the port-
folio choice, we have
n
n

dW = wi,t (αi − r) Wt dt + (rWt − Ct ) dt + wi,t Wt σi dzi
1 1
and the weight constraint becomes non-binding with wn+1,t +

n
1 wi,t = 1, and wn+1,t is the portfolio weight invested in the riskless
asset.
5.2 The Equation of Optimality

As before the objective function of the investor is to optimise life
time consumption and the function, B[·], of terminal bequest upon
1
Note that under the assumption of geometric Brownian motion, αi (P, t) = αi
and σi (P, t) = σi are both constant and prices will be lognormally distributed.
This is not the case if αi and σi are functions of price and time, in which case
the price is only locally or conditionally lognormal.
Optimum Demand and Mutual Fund Theorem 67
death as follows:
T
max E0 U [Ct , t] + B [WT , T ] .
0
Following the previous chapter, the conditional objective function of

the total utility from life-time consumption and terminal wealth is
T
J (W, P, t) ≡ maxE0 U [Ct , t] + B [WT , T ] .
C,w t
Define the differential operator
φ (C, w; W, P, t) ≡ U [Ct , t] + LC,w

W,P (J) ,
where LC,w 2
W,P is the Dynkin operator over the variables P and W for
a given set of w and C.
The price of individual asset Pi now affects the solution via αi
and σi as they are both functions of Pt and t. So unlike the previous
chapter, J and φ are now functions of W , P , and t. Given (5.1) and
(5.2), LC,w
W,P now contains all the (cross-product) terms in the Taylor
series expansion:
n n
∂ ∂ ∂
L≡ + wi αi W − C + αi Pi
∂t ∂W ∂Pi
1 1
n n n n
1 ∂2 1 ∂2
+ σij wi wj W 2 + Pi Pj σij
2 1 1
∂W 2 2 1 1
∂Pi ∂Pj
n
n
∂2
+ Pi W wj σij .
∂Pi ∂W
1 1
2
Define the differential generator
» –
G (Pt+h , t + h) − G (Pt , t)
G̊ (P, t) = lim Et
h→0 h
conditional on knowing Pt . A heuristic method for finding the differential gen-
erator is to take the conditional expectation of dG (found by Ito’s lemma)
and “divide” by dt. The result of this operation will be LP (G), i.e. formally
Et (dG)/dt = G̊ = LP (G). The same applies to the two variables case to get
LW,P .
Then the optimal consumption-investment rule is obtained when
0 = max φ (C, w; W, P, t)
C,w
with Lagrangian
n

L=φ+λ 1− wi
1
0 = LC (C ∗ , w∗ ) = UC (C ∗ , t) − J
0 = Lwk (C ∗ , w∗ ) for k = 1, . . . , n
n
n

= −λ + J αk W + J σkj wj∗ W 2 + JjW σkj Pj W (5.3)
j=1 j=1
n

0 = Lλ (C ∗ , w∗ ) = 1 − wi∗ . (5.4)
i=1
where subscript denotes partial derivatives with the usual notations

for partial derivatives involving J as follows:
J ≡ ∂J/ ∂W,

J ≡ ∂ 2 J ∂W 2 ,
Ji ≡ ∂J/ ∂Pi ,

JjW ≡ ∂ 2 J ∂Pj ∂W
subject to the boundary condition at T
J (W, P, T ) = B (W, T ) .
Because LCC = φCC = UCC < 0, LCwk = φCwk = 0, Lwk wk =

σk2 W 2 J , Lwk wj = σkj W 2 J for k = j and [σij ] is a positive-definite
matrix, a sufficient condition for a unique interior maximum is that
J < 0, that is the utility function J be strictly concave in W .
5.3 Optimal Investment Weight and Special Cases

From here onwards, we will use underscore to denote vectors and
matrices.
5.3.1 No risk-free asset

To solve for the optimum invest weights, first write (5.3) in matrix
form as follows
0 = −λ1 + J W α + J W 2 w w ∗ + W w d, (5.5)
where 1 is an n × 1 vector of 1, α is an n × 1 vector of α, Ω ≡ [σij ]

is the variance–covariance matrix, w ∗ is an n × 1 vector of w∗ with
1 w ∗ = 1, and
⎡ ⎤
J1W P1
⎢ .. ⎥
d=⎣ . ⎦.
JnW Pn
is an n × 1 vector.
If the portfolio does not include risk-free component and V ≡
[νij ] ≡ Ω−1 exists, then from (5.5),
λ 1
w∗ = Ω−1 1 + mΩ−1 α − d
J W 2 J W
λ 1
= 2 V 1 + mV α − d, (5.6)
J W J W
J
m = −
J W
and m can be interpreted as the inverse of the RRA.
∗
Multiply both sides of (5.6) by 1 . Since wk = 1 and
n
n
−1
Γ≡1Ω 1=1V1= νij ,
1 1
is the sum of all the elements in the inverse of the variance–covariance

matrix, then
λ 1
1 w ∗ = Γ + m1 V α − 1 d = 1,
J W 2 J W (5.7)
λ 1
Γ = 1 − m1 V α + 1 d.
J W 2 J W
V1
Now multiply both sides of (5.7) by Γ to give
λ V1 m 1 1
V1 = − V 1 1 V α + V 1 1 d .
J W 2 Γ Γ J W Γ
n×1 n×1 1×1 n×1 1×1
Substitute this result into (5.6), we get

⎛ ⎞ ⎛ ⎞
1 m 1 ⎝
w ∗ = V 1 + ⎝Γ V α − V 1 1 V α⎠ − Γ d − V 1 1 d ⎠
Γ Γ ΓJ W
n×1 n×1 n×1 1×1 n×1 n×1 1×1
= h + mg + f , (5.8)
where the sum of all the elements in h equals to 1,

V1 V1
1h=1 =1 =1
Γ 1 V 1
and the sum of all the elements in g, and similarly in f , equals to 0

V 11 V α 1 V 11 V α
1g =1 Vα− = 1Vα− = 0, (5.9)
1 V 1 1 V 1
1
1 f = −
Γ1 d−1 V 11 d = 0. (5.10)
ΓJ W
For the individual asset weight, wk∗ , with k = 1, . . . , n
⎛ ⎞
n ⎜ n n n n ⎟
1 m⎜ ⎟
wk∗ = νkj + ⎜
⎜Γ νkl αl − νkl νij αj ⎟
⎟
Γ Γ⎝ ⎠
j=1 l l=1 i=1 j=1

1×1 1×1 1×1
⎛ ⎞
⎜ n n
⎟
1 1⎜
⎜
⎟
− ⎜ ΓJkW Pk − νkj JjW Pj ⎟
⎟
J WΓ⎝ ⎠
j j

1×1 1×1
= hk (P, t) + m (P, W, t) gk (P, t) + fk (P, W, t) , (5.11)
where
n
νkj
hk (P, t) ≡ ,
Γ
j=1
⎛ ⎞
n n n
1
gk (P, t) ≡ νkl ⎝Γαl − νij αj ⎠ , (5.12)
Γ
l=1 i=1 j=1
⎛ ⎞
n n
fk (P, W, t) ≡ −⎝ΓJkW Pk − JiW Pi νkj ⎠ ΓW J
i=1 j=1
n

JkW Pk JiW Pi
=− + hk (P, t) . (5.13)
J W J W
i=1
5.3.2 GBM and risk-free rate

For the case when one of the assets is “risk-free”, the solution for
Eq. (5.3) are simplified because the problem can be solved directly
as an unconstrained maximum leading to the f.o.c.
n
n

0 = J (αk − r) W + J σkj wj∗ W 2 + JjW σkj Pj W (5.14)
j=1 j=1
and the weight on the risk-free asset is

n

∗
wn+1 =1− wk∗ .
k=1
Now, we can solve (5.14) via matrix form and linear algebra
0 = J W (α − r1) + J W 2 ww∗ + W wd,

J 1
w∗ = − V (α − r1) − d
J W J W
for a single asset
m
J 1
wk∗ = −
νkj (αj − r) − JkW Pk
WJ J W
j=1
= m (P, W, t) gk (P, t) + fk (P, W, t) , (5.15)

where gk (P, t) and fk (P, W, t), shown below, are now much simpler
than their predecessors in (5.12) and (5.13)
m

gk (P, t) = νkj (αj − r) , (5.16)
j=1
JkW Pk
fk (P, W, t) ≡ − . (5.17)
W J
If Pk is a GBM with constant mean and variance rate, then J
will be a function of W and t only and not P . Recall that Jt = eρt It
is the life-long objective function at time t. If J is independent of P ,
∂J
then Jk = ∂P k
= 0 and (5.15) becomes
n
J
wk∗ = m (W, t) · gk = − νkj (αj − r) (5.18)
J W
j=1
and when there is only one risky asset, we get as in the previous
chapter
1 (α − r) J
wk∗ = − .
W σ2 J
market price of risk

The solutions obtained in the previous sections are very profound
indeed. We have seen in the previous section that the most general
form of demand function for individual asset k is (5.11). When there
is a risk-free interest rate, we have (5.15). How does each of the
variables, h, m, g and f , impact on the investment weight wk∗ ?

In Eq. (5.11), hk (P, t) = Γ1 nj=1 νkj consists of elements in the
inverse of covariance matrix only and is independent of the utility
function and the price process. It is scaled in such a manner that
1 h = 1 while all the elements in g and f sum to 0, i.e. 1 g = 1 f = 0.
Indeed h is a measure for ensuring that the portfolio weights sum to 1
and, in the absence of (excess) returns, h ensures that the investment
weights reflect asset’s relative risk contribution and diversification
impact. The smaller the covariance contribution of asset k (or the
greater its covariance inverse), the bigger is the investment weight
for k. In the presence of a risk-free interest rate, the need for the
weights to sum to 1 disappears together with the term hk (P, t).

In the case when there is a risk-free asset, gk = nj=1 νkj (αj − r).
When there is only one risky asset g = (α−r) σ2
. Clearly, g in (5.12)
is the multi-asset equivalent of the sharpe ratio. It is the ratio of
excess returns of all assets scaled by the respective covariance risk
contribution due to k. The higher the excess return to risk ratio,
g, the bigger is the investment weight for k. When there is no risk-
free asset, the calculation is a bit more tedious; it involves calculating

νij αj for all i and all j to get a mean, and then calculating nj=1 νkj αj
net of this mean scaled by the covariance contribution of k.

The multiplier of g, m = − J JW is the equivalence of the inverse of
the RRA. When multiplying g by m, the (excess) returns are scaled
by J (or U ), and the covariances are scaled by J (or U ).
In the presence of risk-free interest rate, fk (P, W, t) ≡
− W1J JkW Pk . If asset price k is driven by some state variables
that are not related to W , then JkW = 0, and fk (P, W, t) = 0.
When the price process is GBM with constant mean and variance,
fk (P, W, t) = 0 also. In the case of stochastic interest rate where
interest rate changes investment opportunity set and hence W , the
impact of the changes in interest rate is then captured by fk . In this
case, the measure for fk also has to be de-mean in the same manner
as g above.
5.4 Lognormality and Mutual Fund Theorem

Section 5.4.1 shows that the GBM assumption is sufficient to lead to
mutual fund separation theorem. Specifically, when the underlying
assets are all GBM, we need only two funds to span the asset space.
In the special case when the risk-free asset is available, the result in
Sec. 5.4.3 will lead us to the famous Capital Market Line.
5.4.1 “Separation” or “mutual-fund” theorem

The Separation Theorem states that there is a unique3 pair of mutual
funds constructed from linear combinations of n assets such that
3
Here, unique means non-arbitrary and up to a non-singular transformation; it
does not mean there is only one.
independence of preferences, wealth distribution, or time horizon,

individuals will be indifferent between choosing from a linear combi-
nation of these two funds or a linear combination of the original n
assets.
A sufficient condition for the separation theorem to hold is that
the asset prices are lognormally distributed. Under the GBM assump-
tion where αk (P, t) = αk and σk (P, t) = σk , i.e. the mean and the
variance rates of assets are both constant, we have the special case of
multi-variate lognormal prices. Now that αk and σk are not a func-
tion of Pt , the third term on the RHS of (5.11) involving Ji ≡ ∂J/∂Pi
and Jk ≡ ∂J/∂Pk is dropped out. This is in fact a special case of
Sec. 5.3.1 with solution given in equation (5.8) but without the third
term on the RHS. Hence, from (5.8),
w∗ = h + mg. (5.19)
Equation (5.19) is a parametric representation of a straight line

in the hyperplane defined by n1 w∗ = 1 (see Cass and Stiglitz, 1970,
p. 15). Any position on this line can be identified by two orthogonal
components. This implies that there exist two linearly independent
vectors (namely, the vectors of asset proportions held by the two
mutual funds) which form a basis for all optimal portfolios chosen
by the individuals. Therefore, each individual would be indifferent
between choosing a linear combination of the mutual fund shares or
a linear combination of the original n assets.
Let us denote the two funds as δ and λ. If δk is the weight of
one mutual fund’s investment in the kth asset, and λk is the weight
of the other mutual fund’s investment in the kth asset such that

δk = λk = 1, then
1−η
δk = hk + gk , (5.20)
ν
η
λk = hk − gk (5.21)
ν
where ν, η are arbitrary constant with ν = 0. The separation is
complete because the actual funds investment decisions, δk and λk ,
are functions of hk and gk (both distributional parameters) and
two arbitrary constant (η and ν). The investors’ preference, wealth

distribution or age do not influence δk and λk , for k = 1, . . . , n.
Now we have an investor who holds “a” position in mutual fund
δ and “1 − a” position in mutual fund λ. Instead of holding the two
∗
mutual funds, the investor could hold wk∗ in asset k, for wk = 1.
Given that the individual is indifferent between these mutual fund
holdings or an optimal portfolio chosen from the original n assets, it
must be that
wk∗ = hk + m (W, t) gk =
aδk + (1 − a) λk . (5.22)

as before indifference
Next solve for “a”, the investment weight in mutual fund δ,

a (1 − η) (1 − a) η
hk + m (W, t) gk = ahk + gk + (1 − a) hk − gk ,
ν ν
a (1 − η) (1 − a) η
m (W, t) = − ,
ν ν
ν m (W, t) = a − aη − η + aη,
a = ν m (W, t) + η ν = 0. (5.23)
J
Since m(W, t) = − is the inverse of RRA, the fraction of
J W
wealth invested in the first mutual fund a (and hence the fraction
of wealth invested in the second mutual fund, 1 − a), is also a func-
tion of the individual utility preference U and wealth W at time t.
Hence, individual investor’s preference dictates how she will invest in
mutual funds, but such individual preference has no bearing on how
mutual funds should manage their funds. This is the key essence of
the separation theorem in finance.
5.4.2 Key assumptions and uniqueness

The mutual fund separation theorem above provides a generaliza-
tion of the classical Tobin–Markowitz separation theorem. As it is
pointed out by Cass and Stiglitz (1970), Tobin and Markowitz’s
result establishes that the investor’s portfolio problem can be divided
in two stages: first, to choose the optimum mix of the risky assets,
and, second, to decide how his wealth is split between the portfolio
of risky assets and a riskless asset. However, their result relies on
the assumption of a quadratic utility function. Merton’s two-fund

separation theorem only requires the assumption of log-normality
of asset prices; there is no requirement on the type of investor’
preferences. Furthermore, Merton’s two-fund separation theorem is
more general than Tobin and Markowitz’s results because it does not
require that one of the two funds to be a riskless asset, and it includes
Tobin and Markowitz’s result as a special case.
Uniqueness
The pair of mutual funds in Sec. 5.4.1 is unique. Recall that “unique”
here means non-singular, and non arbitrary. The proof below shows
how the mutual funds’ investment rules (5.20) and (5.21) are derived.
First, we want to establish that h and g in (5.19) are orthogonal. The
Gram–Schmidt process states that if g is orthogonal to h, then the
projection of g on h must be zero:
h g h g
g⊥h= h = h = 0, (5.24)
h h h
where h = h h is the norm of h. To prove (5.24), we need to prove

h g = 0 since all the other elements are not equal to zero.
The matrix V , if non-singular, can be written as
V = QΛQ−1 ,
where Λ is a diagonal matrix whose diagonal elements are eigenvalues

of V , and columns of Q is the eigenvector of V with
Q−1 = Q and QQ−1 = QQ = I
by the orthonormality of Q. This means that

(V 1) = QΛQ−1 1 = 1 QΛQ−1 = 1 V
and from Eq. (5.8)

1 V V 11 V
hg= Vα− α
Γ Γ

1V V 11 V
= V − α
1 V 1 1V1

1V 11 V
= V I− α, (5.25)
1 V 1 1V1
where I is the identity matrix. From (5.9), we have 1 g = 0 or

11 V
1V I− α = 0.
1V1
Since V = 0 and α = 0,
11 V
I− = 0.
1 V 1
Apply this result to (5.25), we get
h g = 0
and hence Eq. (5.24) is true; h and g are orthogonal.

Since h and g that span the set nk=1 wk∗ = 1 are orthogonal,
there must be another unique pair of orthogonal vectors that will
span the same set. Here, we want to show that δ and λ in Sec. 5.4.1
is the other unique pair. To prove this, first express (5.22) in matrix
form,
! "
∗
a
w = δ λ . (5.26)
1−a
If δ and λ are orthogonal

n

δ λ = 0, or δk λk = 0.
k=1
From (5.26),
! " ! " ! "
δ ∗ δ a
w = δλ
λ λ 1−a
! "! "
δδ δλ a
=
λ δ λ λ 1 − a
! ∗" ! "! "
δw δ 0 a
=
λ w ∗ 0 λ 1−a
! " ! "−1 ! ∗ "
a δ 0 δw
=
1−a 0 λ λ w ∗
# $! "
δ w∗
1
δ 0
=
0 λ1
λ w ∗
⎡ δ w∗ ⎤
δ
=⎣ ⎦.
λ w ∗
λ
Take the value of a from above, the definition of w ∗ in (5.22), and

the expression for a in Eq. (5.23), we have
δh δ g
a= + m = νm + η
δ δ
where
δg δ h
ν≡ and η≡ . (5.27)
δ δ
Substitute this value of a back into (5.22),
wk∗ = (νm + η) δk + (1 − νm − η) λk
= (νδk − νλk ) m + (1 − η) λk + ηδk . (5.28)
Again from (5.22),
gk = νδk − νλk , (5.29)

hk = (1 − η) λk + ηδk . (5.30)
Apply (5.29)×η−(5.30)×ν, we have
−ηgk + νhk = ηνλk + ν (1 − η) λk

η
λk = hk − gk . (5.31)
ν
Substitute this into (5.29), we have
1 η
δk = gk + hk − gk
ν ν
(1 − η)
= hk + gk . (5.32)
ν
Hence, δk and λk can be fully determined by asset distribution param-
eters and are not affected by investor’s utility function. It is clear
that once δk (for k = 1, . . . , n) are determined, ν and η are fixed
from (5.27), and vice versa. The values for λk (for k = 1, . . . , n) are
fully determined once δk , ν and η are fixed. That is given δ (and ν,
η), there is a unique solution for λ. This completes the proof.
5.4.3 Tobin–Markowitz separation theorem

Here, we want to show that when the risk-free asset is available,
Tobin–Markowitz’s result, whereby one of the mutual fund contains
only the risk-free asset and the other mutual fund contains only risky
assets, can be obtained as a special case. When one of the asset is
risk-free and asset prices are lognormal, hk = 0, we have from (5.20)
and (5.21)
1−η
δk = gk ,
ν
η
λk = − gk .
ν
For the two mutual funds
n+1

δk = 1
1
n

δn+1 = 1 − δk
1
n
1−η
=1− gk
1
ν
n
# n $
1−η
=1− νkj (αj − r)
ν
1 1
and similarly
n

λn+1 = 1 − λk
1
n
# n $
η
=1− νkj (αj − r) .
ν
1 1
n n
Given h = 0, from (5.27) η = 0 and ν = 1 1 νij (αj − r), we get
the two mutual funds λ and δ
η
λk = − gk = 0,
ν
n

λn+1 = 1 − λk = 1 i.e. only risk-free asset,
1
n
1 1 νkj (αj − r)
δk = gk = n n , (5.33)
ν 1 1 νij (αj − r)
n

δn+1 = 1 − δk = 1 − 1 = 0 i.e. no risk-free asset.
1
This means that mutual fund λ holds only the risk-free asset, while
mutual fund δ holds only the risky assets.
For the risky asset k, k = 1, . . . , n, it is possible to show that δk in
(5.33) is derived by finding the locus of points in the mean-standard
deviation space of composite returns which minimise variance for a
given mean, i.e. the risky efficient frontier, and then by finding the
point where a line drawn from the point (0, r) is tangential to the
locus.
α*
Expected Return
Locus of
minimum
variance
for a
given mean
r
0 σ* σ
Standard Deviation of Return
From Ingersoll (1987, p. 89), the minimum variance portfolio tan-

gential to the efficient frontier has investment weights
w −1 (α − r1) w−1 (α − r1)
w= = (5.34)
1 w−1 α − (1 w −1 1) r 1 w −1 (α − r1)
We noted from above that when the risky assets follow GBM
and there is a risk-free asset, we obtain Tobin–Markowitz separation
theorem, where one of the funds (λ) is risk-free and the other (δ)
holds only risky assets. So we want to prove that δ has the same
form as (5.34). In particular, from (5.33), the weight on asset k is
n
1 νkj (αj − r)
δk = n n
1 1 νij (αj − r)
and in matrix form
w−1 (α − r1)
δ=
1 w−1 (α − r1)
which is identical to Eq. (5.34). This result shows that mutual fund
“δ” corresponds to the point where the line joining itself to mutual
fund “λ” (risk-free rate) is tangential to the efficient frontier. This
line is known as the Capital Market Line.
Exercises: Optimum Demand and Mutual Fund

Separation
1. Given the demand function for individual k in (5.11), and (5.15)
when there is a risk-free asset, interpret the variables h, m, g and
f . Explain how each of these variables impact on the investment
weight wk∗ . What is the implication of the fact that h and g are
orthogonal? [Hint: You may like to use the special case for n = 2
to illustrate your answers.]
2. Define Separation Theorem. What is the significance of this theo-
rem? What are the possible conditions that will lead to this sep-
aration theorem (GBM or otherwise)?
3. What are the assumptions and conditions needed to obtain the
Capital Market Line? What are the connections between mutual
fund theorem (with and without risk-free asset) and the Capital
Market Line?
Chapter 6
Mean–Variance Frontier
In this chapter, we re-visit the derivation of the mean–variance

frontier following Cochrane’s (2005), Chapter 5, and show how the
theorems are used to derive the Hansen–Jagannathan bounds.
6.1 Mean–Variance Frontier

Starting from the basic equation
1 = E(mR)
= Cov(m, R) + E(m)E(R)
= ρm,R σ(R)σ(m) + E(m)E(R).
1
Since E(m) = Rf
, dividing by E(m), we can write the previous
equation as
1 σ(m)
= ρm,R σ(R) + E(R),
E(m) E(m)
σ(m)
Rf = ρm,R σ(R) + E(R),
E(m)
σ(m)
E(R) − Rf = −ρm,R σ(R) ,
E(m)
since correlation coefficient |p| ≤ 1,

E(R) − Rf σ(m)
≤ if σ(Ri ) = 0.
σ(R) E(m)
83
From these equations, we can see that means and variances of

asset returns must lie in the wedge-shaped region illustrated in
Cochrane (2005, Fig. 1.1, p. 18). This is called the mean–variance
frontier corresponds to the question “What is the maximum level of
expected return for a given level of standard deviation (risk)?” The
frontier is generated by |ρm,R | = 1, i.e. for the assets lying on the
σ(m)
frontier where |E(R) − Rf | = E(m) σ(Ri ). The slope of this frontier
σ(m) σ(m)
is E(m) for the upper segment and − E(m) for the lower segment.
Moreover, the intercept of the frontier is found by setting σ(R) = 0
in the line equation, where E(R) = Rf .
The fact that |ρm,R | = 1 on the frontier means that all returns on
the frontier are perfectly correlated with the discount factor m. The
upper segment contains the assets with perfectly negatively corre-
lated returns with the SDF (i.e. perfectly positively correlated with
consumption). On the other hand, the lower segment contains the
assets with perfectly positively correlated returns with the SDF (neg-
atively correlated with consumption). These can provide the best
insurance against consumption fluctuation but obviously yield lower
expected returns for the same level of standard deviation relative to
the assets on the upper segment.
All frontier returns are also perfectly correlated with each other,
since they are perfectly correlated with the SDF m. As a result, we
can span or synthesise any other frontier return (asset that lies on the
frontier) from two such returns. For example, for any single frontier
return Rm , all other frontier returns Rmv become
Rmv = Rf + a(Rm − Rf ),
for some number a.

Since each point on the mean–variance frontier is perfectly cor-
related with the discount factor, we can write
m = a + bRmv
Rmv = d + em,
for some constants a, b, d and e.

Mean–Variance Frontier 85
Any mean–variance efficient return carries all pricing

information. Given a mean–variance efficient return and the risk-
free rate, we can find a discount factor that prices all assets and vice
versa.
We can plot the decomposition of a return R into a “priced” or
“systematic” component and a “residual” or “idiosyncratic” com-
ponent as in Cochrane (2005, Fig. 1.1, p. 18). The priced part is
perfectly correlated with the discount factor, and hence perfectly
correlated with any frontier return. The idiosyncratic part generates
no expected return, and it is uncorrelated with the discount factor
or any frontier return.
6.1.1 The Sharpe ratio

The ratio of the expected excess returns to standard deviation:
E(R) − Rf
σ(R)
is called the Sharpe ratio. It measures the excess return of an asset
per unit of risk (measured by standard deviation). Recall that this is
equal to the slope of the upper segment of the frontier we saw before,
E(R)−Rf σ(m)
σ(Ri )
= E(m) . Hence, the slope of the mean–standard deviation
frontier is the largest available (feasible) Sharpe ratio. It correspond
to question “how much mean return I can get for that level of risk?”
For any frontier (upper or lower) return Rmv ,

E(Rmv ) − Rf σ(m)
mv
= = σ(m)Rf . (6.1)
σ(R ) E(m)
Thus, the slope of the frontier and hence the level of the Sharpe ratio
is governed by the volatility (standard deviation) of the discount
factor, σ(m).
For the power utility function, m = β( ct+1 ct )
−γ and substitute
into (6.1)
−γ
σ ct+1
E(Rmv ) − Rf ct
= −γ .
σ(Rmv ) ct+1
E ct
Under the assumption of lognormality for consumption growth, we

get

E(Rmv ) − Rf 2 2
= eγ σ (Δ ln ct+1 ) − 1 ≈ γσ (Δ ln c) . (6.2)
σ(Rmv )
Equation (6.2) shows the slope of the mean–standard deviation fron-
tier is higher if consumption is more volatile or if investors are more
risk averse.
Under both cases, investors want a greater reward to take up
extra unit of risk, and hence the slope of the frontier is higher. If
the market portfolio yields a high Sharpe ratio, then this is either
because consumption growth is very volatile or because investors are
very risk averse.
6.1.2 Calculating the mean–variance frontier

Let w denote the n × 1 vector of weights to each of the n risky
assets, Σ is the n × n variances–covariances of the n assets, and
E(r) ≡ E is the n × 1 expected returns of the n assets. The portfolio
optimisation problem has a solution as long as the covariance matrix
Σ is not singular (and hence it is invertible). In finance terms, this
assumption means that there are no redundant assets. Moreover,
n T
i wi = w 1 = 1, where 1 is a n × 1 vector of ones, portfolio returns
and variance are denoted by
n

wi E(ri ) = wT E = E(rp ) ≡ μ
i
σp2 = wT Σw.
Formally, the risk-averse investor wants to minimise the variance of
his portfolio returns for every level of expected return
min wT Σw
w
T
s.t. w E = μ and wT 1 = 1.
The Lagrangian function of this constrained minimisation problem
is given by

L = wT Σw−2λ wT E−μ − 2δ wT 1 − 1
Taking the FOC with respect to w and the Lagrangian multipliers

2λ and 2δ, we get1
∂L
= 2Σw∗ −2λE − 2δ1 = 0, w∗ = Σ−1 (λE+δ1) (6.3)
∂w
∂L
= w∗T E−μ = 0
∂ (2λ)
∂L
= w∗T 1 − 1 = 0.
∂ (2δ)
Substitute w∗ from (6.3) into the second and the third equations, we
get
ET Σ−1 (λE + δ1) = μ

1T Σ−1 (λE + δ1) = 1.
which can be written as

ET Σ−1 E ET Σ−1 1 λ μ
= .
1T Σ−1 E 1T Σ−1 1 δ 1
T
Define A ≡ ET Σ−1 E, B ≡ ET Σ−1 1 = 1 Σ−1 E, C ≡ 1T Σ−1 1.
Then

A B λ μ
=
B C δ 1

λ 1 C −B μ
=
δ AC − B 2 −B A 1
and
Cμ − B A − Bμ
λ= and δ= .
AC − B 2 AC − B 2
1
If x and b are n × 1 vectors and A is a n × n keep symmetric matrix then the
following are true:
∂(xT A) ∂(xT b) ∂(xb) ∂(xT Ax)
∂x
= A, ∂x
= b, ∂x
= bT and ∂x
= 2Ax.
Substitute this solution into (6.3) to get

1
w∗ = Σ−1 [E (Cμ − B) + 1 (A − Bμ)] . (6.4)
AC − B 2
So for a given level of portfolio return μ, there is a unique vector
of portfolio weights that gives us the portfolio with the minimised
variance. After some tedious algebra, we get

Cμ2 − 2Bμ + A
σp2 = w∗T Σw∗ = λET + δ1T Σ−1 (λE + δ1) =
AC − B 2
As a result, minimised portfolio variance is a quadratic function
of the mean return μ, i.e. it is a parabola. The square root of a
parabola is a hyperbola, that is why the minimum variance frontier
in the mean–standard deviation space is a hyperbolic region.
A particularly interesting portfolio for asset management is the
globally minimum variance (gmv ) portfolio. To find the coordinates
of this portfolio, differentiate the previous portfolio variance expres-
sion with respect to the mean return and set the f.o.c.:
∂σp2 2Cμ − 2B
= = 0,
∂μ AC − B 2
B
μgmv = .
C
The weights of this portfolio are given by

gmv Σ−1 E C B C − B + 1 A − B B
C
w = 2
AC − B
AC−B 2
Σ−1 1 C
=
AC − B2
Σ−1 1 Σ−1 1
= = T −1
C 1 Σ 1
and the variance itself is given by
−1
1T Σ−1 ΣΣ 1 1
wT Σw = 2 = .
(1T Σ−1 1) 1T Σ−1 1
Hence, we can find the weights of the globally minimum variance
portfolio by only knowing the covariance matrix Σ.
E(R)
Mean–variance frontier
Tangency portfolio Risky asset frontier

of risky assets
Original assets
Rf
σ (R)
Figure 6.1: Minimum–variance frontier.
The mean–variance frontier is essentially the boundary of the set

of means and variances of the returns of all portfolios all possible.
Any return to the left of the frontier is desirable but infeasible. Any
return to the right is feasible but undesirable (and inefficient). The
upper part of the frontier (see Fig. 6.1) is called the efficient fron-
tier, because it contains the portfolios that yield the highest level
of expected returns for a given level of standard deviation (risk).
The lower segment is sometimes also called “inefficient” because for
each portfolio there we can find its mirror in the upper segment
that has a higher level of expected return for the same level of
variance.
Where the risk-free asset is available, the efficient frontier
becomes a line that starts from the expected returns-axis at the risk-
free rate and tangential to the upper segment of the frontier of the
risky assets at one point (we call it tangency portfolio).
6.1.3 Decomposing the mean–variance frontier

In this section, we show that we can decompose any portfolio return
in three orthogonal components to characterise the returns on the
frontier. In order to perform this decomposition, we will make use of
two “special” returns. Let us define them as follows.
R∗ is the return corresponding to the payoff x∗ , i.e. the payoff

that can act as the discount factor. So the price of x∗ is p(x∗ ) =
E(x∗ x∗ ) = E(x∗2 ). Therefore, it holds that
x∗ x∗
R∗ ≡ = .
p(x∗ ) E(x∗2 )
Re∗ is the excess return defined as the projection of 1 onto the

space of excess returns, Re = {x ∈ X s.t. p(x) = 0}
E (1 · Re ) e E (Re )
Re∗ ≡ proj (1 | Re ) = R = Re .
e 2 e 2
E (R ) E (R )
Re∗ is an excess return that represents means (expectations) on Re

with an inner product. More specifically, it holds that
⎛ ⎞
e
E (R )
E (Re∗ Re ) = E ⎝ Re Re ⎠ = E (Re )
e 2
E (R )
Let us now state the orthogonal decomposition.

Theorem. Every return Ri can be expressed as
Ri = R∗ + wi Re∗ + ni , (6.5)
where wi is a number and ni is an excess return with the property
E(ni ) = 0.
The three components are orthogonal (in the vector sense,
i.e. their inner product is zero), i.e. E(R∗ Re∗ ) = E(R∗ ni ) =
E(Re∗ ni ) = 0.
Most importantly, a return Rmv is on the mean–variance frontier
if and only if
Rmv = R∗ + wRe∗ , (6.6)
for some real w.
Proof. Since any excess return has zero price, i.e. it is orthogonal to
∗ Re∗ ) p(Re∗ )
the discount factor. So, E(R∗ Re∗ ) = E(x
E(x∗2 )
= E(x ∗2 ) = 0, because
R is an excess return by definition, and hence the price of Re∗ ,

e∗
p(Re∗ ) = 0.
For the same reason, ni , being an excess return too, is orthogonal

to R∗ , i.e. E(R∗ ni ) = 0 because p(ni ) = 0. Furthermore, from the
definition of Re∗ , it holds that E(Re∗ ni ) = E(ni ).
i )−E(R∗ )
Defining wi ≡ E(RE(R e∗ ) , we can see by taking expectations
of both part of (6.5) that E(ni ) = 0, and hence Re∗ and ni are
orthogonal too, i.e. E(Re∗ ni ) = E(ni ) = 0.
Given these properties, we get the mean and the variance for each
return Ri

E Ri = E (R∗ ) + wi E (Re∗ )
and

σ 2 Ri = σ 2 R∗ + wi Re∗ + σ 2 ni
We can see that for each level of expected return E(Ri ), variance
is minimised only if ni = 0. Intuitively, this is because the zero-mean
ni does not contribute to expected return but increases variance,
hence it is undesirable. Setting ni = 0 in (6.5), we verify that the
returns on the frontier are of the form (6.6). For each desired level
of expected return E(Ri ), there is a unique wi . Varying wi , we can
construct the entire frontier.
We can utilise this decomposition to see how we can construct the
frontier in the familiar mean–standard deviation space (see Fig. 6.2).
Note the second moment of return is

E R2 = E R∗2 + w2 E Re∗2 + E n2 .
This expression is minimised when n = 0 (i.e. when we are on

the frontier) and when w = 0. Hence, we can confirm that R∗ is the
return on the frontier with the minimum second moment.
Lines of constant second moment returns are represented by cir-
cles in the mean–standard deviation space because from the definition
of variance, we get

σ 2 (R) = E R2 − [E(R)]2 ,

[E(R)]2 = E R2 − σ 2 (R),

E(R) = ± E (R2 ) − σ 2 (R).
E(R) Ri
R* + wiRe*
i
n
R*
σ(R)
Figure 6.2: Decomposition of the minimum variance frontier.
Plotting this function for constant E(R2 ) produces a circle in

the mean–standard deviation space. As we previously showed, the
smallest such circle that intersects with the frontier is given by the
return R∗ (see Fig. 6.2).
Having located where R∗ is, if we add Re∗ according to weight w,
we move along the frontier (recall 6.6). If we then add any ni , this
will not affect the level of expected return, but only the standard
deviation. Hence, ni is the idiosyncratic return that moves an asset’s
return Ri to the right of the frontier.
6.1.4 Spanning the frontier

From (6.4), the weights w of the frontier portfolios are a linear func-
tion of μ. Therefore, we can get to any point on the minimum variance
frontier by starting with any two returns on the frontier and form-
ing portfolios. Technically, the frontier is spanned by any two frontier
returns. For example, if we have any two distinct mean returns μ1 and
μ2 , the weights on a third portfolio with mean μ3 = λμ1 + (1 − λ)μ2
are given by w3 = λw1 + (1 − λ)w2 .
The previous decomposition shows that this spanning can be
achieved, for example, using portfolios with returns R∗ and Re∗ .
Equation (6.6) is essentially a two-fund theorem that allows us to
express every frontier return as a portfolio of R∗ and Re∗ , with vary-
ing weights on the latter.
Equivalently, we can span the frontier with any two distinct linear
combinations of R∗ and Re∗ . Let us see more formally this property.
Take any return Rα
Rα = R∗ + γRe∗ with γ = 0
Rα − R∗
Re∗ = .
γ
We can express the minimum variance frontier in terms of Rα
and R∗
Rmv = R∗ + wRe∗
Rα − R∗
= R∗ + w
γ
= R + y (R − R∗ )
∗ α
= (1 − y)R∗ + yRα .
where y ≡ wγ . It is important to note that the corresponding portfolio

variance is not a linear combination of the individual variances.
6.1.5 Hansen–Jagannathan bounds

Recall from the fundamental property for excess returns
|E (Re )| σ(m)
e
≤ , (6.7)
σ (R ) E(m)
because |ρm,Re | ≤ 1 by definition.
Equation (6.7) has been interpreted by Hansen and Jagannathan
(1991) as the lower bound for the volatility m. The higher the Sharpe
ratio, the tighter is the lower bound on σ(m). Indeed, the highest
σ(m)
attainable Sharpe ratio will be equal to the ratio E(m) . Recall that
the highest Sharpe ratio is achieved at the tangency portfolio, i.e. the
point at which the line starting from the risk-free rate is tangential to
the frontier of risky assets. Hence, the slope at the tangency portfolio
σ(m)
is equal to E(m) , see Fig. 6.3.
As we increase 1/E(m) (i.e. the risk-free rate), the maximum
Sharpe ratio obviously becomes smaller because the excess return
becomes smaller, and hence the slope at the tangency portfolio
Figure 6.3: The correspondence between minimum variance frontiers and

Hansen–Jagannathan bounds.
becomes lower and the Hansen–Jagannathan bound decreases. The

slope of the tangency portfolio is at its minimum when 1/E(m) is
equal to the expected return of the globally minimum variance port-
folio. This is the minimum value of the Hansen–Jagannathan bound.
If we increase 1/E(m) any further then the tangency point is at the
lower segment of the frontier (negative slope), but as we take the
absolute value of the slope, this starts to increase again and so does
the bound as shown in Fig. 6.3.2
We conclude that there is an interesting duality relationship
between discount factors and Sharpe ratios
σ (m) |E (Re )|
min = max
{m that price x∈X} E (m) {all Re in X} σ (Re )
This duality means that as we have hyperbolic regions (frontiers)

within which all asset returns must lie, similarly all stochastic factors
must lie within such a hyperbolic region. Hence, we have a mean–
standard deviation frontier for discount factors themselves.
2
Note that if the risk-free rate is given, then E(m) is known and the Hansen–
Jagannathan bound is essentially a bound on the volatility of the discount factor
σ(m).
Chapter 7
Solving Black–Scholes with Fourier

Transform
7.1 Option Pricing with Fourier Transform

This part of the solution is for equity only and not for risky debt!
Please note that S is the underlying asset and V is the derivative. If
you want to compare the results in this chapter with Chapter 8, the
firm structure S here is equivalent to firm value V, whereas V here
is equivalent to equity or debt F in Chapter 8.
The objective here is to demonstrate the complete derivation of
the Black–Scholes formula below for European call and put based on
Fourier transform method. Given
c = S0 N (d1 ) − Ke−rT N (d2 ),
p = Ke−rT N (−d2 ) − S0 N (−d1 ),
ln(S0 /K) + (r + 12 σ 2 )T
d1 = √ ,
σ T (7.1)
√
d2 = d1 − σ T ,
d1
1 2
N (d1 ) = √ e−0.5z dz
2π −∞
with stock price following a geometric brownian Motion (GBM)
dSt = μSt dt + σSt dZt , (7.2)
which is basically a lognormal distribution for price and normal dis-
tribution for returns.
95
7.1.1 Black–Scholes hedge portfolio

Given that share price has a GBM dynamics in (7.2), form a portfolio
of one option V (S, t) and short Δ amount of stock. The dynamic of
this portfolio is
dΠ = dV − ΔdS

∂V 1 ∂2V ∂V
= + σ 2 S 2 2 dt + dS − ΔdS.
∂t 2 ∂S ∂S
∂V
The portfolio is fully hedged by setting Δ = ∂S , then

∂V 1 ∂2V
dΠ = + σ2 S 2 2 dt.
∂t 2 ∂S
7.2 Black–Scholes Fundamental PDE

Under risk neutrality and no arbitrage, a hedged portfolio earns risk-
free return

∂V
rΠdt = r V − · S dt.
∂S
Therefore, we obtain
dΠ = rΠdt,
∂V 1 ∂2V ∂V
+ σ 2 S 2 2 = rV − r S,
∂t 2 ∂S ∂S
∂V 1 ∂2V ∂V
+ σ2 S 2 2 + r S − rV = 0. (7.3)
∂t 2 ∂S ∂S
Equation (7.3) is called the Black–Scholes fundamental PDE, which

has been solved through many different ways including numerical
methods and through the use of transition probability.
Solving Black–Scholes with Fourier Transform 97
7.2.1 Fourier transform

Characteristic function φx (u) of variable x is defined as
∞
φx (u) = E[eiux ] = eiux f (x)dx,
−∞
where f (x) is the density function of x. Take the case of the normal
density as an example where
1 x2
f (x) = √ e− 2σ2 , (7.4)
σ 2π
the corresponding characteristic function is
∞
1 x2
φx (u) = eiux √ e− 2σ2 dx
−∞ σ 2π
∞
1 1 2 2
= √ e− 2σ2 (x −2σ iux) dx
−∞ σ 2π
∞
1 2 2 1 1 2 2
= e− 2 σ u √ e− 2σ2 [x−(σ iu)] dx
−∞ σ 2π
1 2 u2
= e− 2 σ . (7.5)
On the other hand, we have the inverse Fourier transform

∞
1
f (x) = e−iux φx (u)du
2π −∞

1 ∞
= Re[eiux φx (u)]du.
π 0
The use of characteristic function here is to exploit its special
property

∂φ iux ∂f (x) ∂ iux
= e dx = − e f (x)dx
∂x ∂x ∂x

= −iu eiux f (x)dx = (−iu)φ,
∂2φ ∂φ
= −iu = (−iu)2 φ = −u2 φ.
∂x2 ∂x
7.2.2 Solution through transform method

(i) Log transform with x = ln S + (r − 12 σ 2 )τ
dS = rSdt + σSdZt , S ∈ [0, ∞]

1 2
d ln S = r − σ dt + σdZt , ln S ∈ [−∞, ∞]
2

1 2
ln S = ln St + r − σ dt + σdZt ,
t+1
2
xt+1
xt
dxt = σdZt .
Hence, xt is a martingale with zero drift.

(ii) Next let Vt = e−rτ Wt (x, τ ), i.e. Wt is the forward version of Vt .
Then from Ito’s lemma,
∂W 1 ∂2W
E Q [dW ] = + σ2 = 0.
∂t 2 ∂x2
∂ ∂
(iii) Since τ = T − t, ∂t = − ∂τ
∂W 1 ∂2W
= σ2
∂τ 2 ∂x2
which will carry through for the characteristic function
∂φ 1 ∂2φ 1
= σ 2 2 = − σ 2 u2 φ.
∂τ 2 ∂x 2
(iv) Let the guess solution be
φ = eAτ ,
∂φ
= AeAτ = Aφ,
∂τ
1
A = − σ 2 u2 ,
2
1 2 u2 τ
φ = e− 2 σ
which is the same as (7.5).

Solving Black–Scholes with Fourier Transform 99
(v) By invoking the Fourier inverse transform or by the equivalent

expressions (7.5) and (7.4),
1 x2
f (x) = √ e− 2σ2 τ
σ 2πτ
and x here is the log return over the τ period.
(vi) Option pricing; note with the transformation we have done so
far, at maturity date:
V (S, T ) = Payoff (S),

W (x, τ = 0) = Payoff (ex ).
Hence
∞
W (x, τ ) = f (x)Payoff(ex )dx,
−∞
∞
−rτ −rτ
V (S, t) = e W (x, τ ) = e f (x)Payoff(S)d ln S
−∞
∞
e−rτ
1 σ 2 )τ ]2
[ln ST −ln St −(r− 2
dS
= √ e− 2σ 2 τ Payoff(S) .
σ 2πτ 0 S
(7.6)
Given the option payoff function in (7.6), we can derive Black–

Scholes call and put formulae in (7.1). Take the case of a call
option as an example, we can write the integral in (7.6) as
∞
(ex − K)+ f (x)dx
0
∞ ∞
x
= e f (x)dx − K f (x)dx
ln K ln K

ln SKt + (r − 12 σ 2 )τ + σ 2 τ 1 2 τ + 1 σ2 τ
=N √ eln St +rτ − 2 σ 2
σ τ

ln SKt + (r − 12 σ 2 )τ
− KN √
σ τ

St 1 2
ln + (r + σ )τ
= St erτ N K √ 2
σ τ

ln SKt + (r − 12 σ 2 )τ
− KN √ . (7.7)
σ τ
Substitute (7.7) into (7.6), we get the European call price for-
mula in (7.1).
Chapter 8
Capital Structure Theory
This chapter follows materials in Merton (1990, Chapters 11 and 12,

and part of the discussions in Chapter 13). These are combined with
some modern techniques for pricing contingent claims and some new
materials in Leland (1994) concerning bankruptcy cost and tax effect
on capital structure. Some new debt instruments proposed after the
2008 financial crisis to help solve the problem of debt overhang are
also discussed in this chapter.
8.1 Objective Function for the Firm

In the basic set up, we assume that there is only one firm held by
a representative agent whose objective is to maximise the utility
derived from terminal wealth,
max Et {U [Vt+τ ]} ,
where Vt+τ is the value of the firm at t + τ ; in the absence of sub-
script, V ≡ Vt . The probability distribution of Vt+τ is independent
of the capital structure of the firm. As Modigliani and Miller (1958)
suggest, except where there is a friction, the firm should concentrate
only on the business risk and not the capital structure. Capital struc-
ture per se will not change the value of the firm. We assume there is
no tax to begin with and include bankruptcy cost and tax later. Fur-
thermore, we assume there is no game involved between shareholders
and bondholders, and there is no agency complication between the
management and the capital providers.
101
Let Fi denote a claim on the firm which is more senior than all
the other claims i > j. It has a terminal (par) value Bi , which mature
at t + τ , where
Fi,t+τ = min (Bi , Vt+τ ) .
Now consider a firm that is funded by n types of securities, all

with maturity τ . Fi (Vt , τ ) is the current value of the ith security.
Fi is a function of Vt as well as the values of all Fj , j < i, that
are more senior than i (ignoring the case of specific charge for now).
Since all Fi , i = 1, . . . , n, are function of Vt , we may use only Vt to
index Fi ;
n
n

Vt = Fi (Vt , τ ) and Vt+τ = Fi (Vt+τ , 0) .
1 1
Define wi ≡ FVi as the fraction of the firm’s assets financed by the

ith security. Assume that the firm is also the only asset in the econ-
omy, wi is the portfolio weight of the representative agent invested in
security i,
n
n
Fi,t Fi,t+τ n
Fi,t+τ
Vt wi = Vt = Fi,t+τ = Vt+τ .
1
Fi,t 1
Vt Fi,t 1
The objective function can now be stated as
max Et {U [Vt+τ ]}
w
n
n
Fi,t+τ
= max Et U Vt wi s.t. wi = 1. (8.1)
w Fi,t
1 1
The firm’s value, and hence the wealth of the representative agent,
is not affected by capital structure wi . It is in this setting that, the
prices of the securities within the capital structure are derived.
Capital Structure Theory 103
8.2 Partial Equilibrium One-period Model

Continue with (8.1), set the objective function with the Lagrange
multiplier as follows:
n
n

Fi,t+τ
maxL = Et U Vt wi +λ 1− wi ,
w Fi,t
1 1
From here onwards, we assume Vt = $1 for ease of exposition. For

∂L
the first-order condition, ∂w i
= 0, we have
⎧
⎫
F
⎨ n
Fi,t+τ ∂ Vt n1 wi Fi,t+τ ⎬
U Vt
i,t
Et wi −λ=0
⎩ Fi,t ∂wi ⎭
1
n

Fi,t+τ Fi,t+τ
Et U Vt wi =λ
Fi,t Fi,t
1
In the next two subsections, we will have a little digression to discuss

the concept of pricing kernel, or what Marton called the “probability-
cum-utility” function.
8.2.1 Pricing kernel

Vt+τ
Let Vt be the current value, Z = Vt and Z ∼ iid GBM.1
∞
E [Z] = ZdP (Z) = eατ ,
0
where α is the mean expected rate of return on the asset per unit
time, and P (Z, τ ) is the probability distribution for the value of the
firm at the end of the period. In the special two-asset case where one
of the two assets is a risk-free investment with return R = erτ , the
1 Vt+τ
This means log return, ln Z = ln Vt
, is normally distributed with mean α and
standard deviation σ.
objective function is to
∞
max U [(1 − w) R + wZ] dP.
w 0
First-order condition leads to, writing [(1 − w)R + wZ] as [· · · ],

∞
0= U [· · · ] (Z − R) dP
0
∞ ∞
U [· · · ] ZdP = U [· · · ] RdP
0 0
∞
U [· · · ]
∞ ZdP = R.
0 0 U [· · · ] dP
Define the pricing kernel
U [· · · ]
φ = ∞ .
0 U [· · · ] dP
Then, we have
∞
φZdP = E P [φZ] = R, (8.2)
0
where E P denotes expectation under the P -measure (also known as

the physical, or the real probability measure).
For the special case where Z = R for all states, then from (8.2)
∞ ∞
φZdP = φRdP = R.
0 0
This means that φ must have expectation equal to 1,

∞
φdP = E P [φ] = 1. (8.3)
0
8.2.2 Probability-cum-utility function

Define
U [· · · ]
dQ = dP = φdP,
U [· · · ] dP
dQ
= φ.
dP
Merton (1990) and Samuelson (1969) called dQ the probability-cum-
utility function. In the case where dP is the probability distribution
of the state variable, dQ is commonly known as the risk-neutral prob-
ability. Given that φ has expected value equal to 1 according to (8.3),
∞
Qi or dQ = 1
i 0
this means that Q has all the characteristics of a probability distribu-

tion. In fact, it is the marginal utility-weighted physical probability
for each state. Since the drift is typically higher under P than under
Q, φ is often represented as a downward sloping curve with respect to
Z. However, in cases where there are many assets and P is the prob-
ability distribution of a specific asset, the corresponding dQ is the
projection of the distribution of the state variable onto the specific
asset distribution. The resulting dQ is then called the asset-specific
pricing kernel. For asset-specific pricing kernel, it depends on how
many times the P -measure crosses the Q-measure, the asset-specific
pricing kernel could be upward or downward sloping at various parts
as shown in Fig. 8.1. Such a non-monotonic asset-specific pricing ker-
nel is a result of the asset distribution, and is still consistent with
the investor being risk averse (see Vitiello and Poon, 2014).
8.2.3 m assets
In general, for m assets:
m
∞
max U wj Z dP (Z1 , . . . , Zm )
w 0 1
pdf (z) pdf (z)

Q
P
Q
P
dQ dQ
φ= φ=
dP dP
U″ > 0 U″ < 0
1 1
Z Z
Figure 8.1: The shape of the asset specific pricing kernel and its relationship
with the ratio of risk-neutral to real probability measures for specific asset Z.
and writing it in Lagrangian form and denoting the joint distribution

as dP ≡ dP (Z1 , . . . , Zm )
m m

∞
maxL = U wj Z dP + λ 1 − wj .
w 0 1 1
∂L
Then f.o.c. with ∂wk = 0 means
m
∞
Zk U wj Z dP − λ = 0,
0 1
m
∞
Zk U wj Z dP = λ,
0 1

∞
U [ m wj Z] λ
Zk ∞
1m dP = ∞ m = λ∗ .
0 0 U [ 1 wj Z] dP 0 U [ 1 wj Z] dP
So
E P [φZk ] = E Q [Zk ] = λ∗ for k = 1, . . . , m.
This means that the expected return on all assets Z1 , . . . , Zm in this

util-prob (dQ) space must be the same and equal to λ∗ . Since these
assets are funded by Fi , i = 1, . . . , n, it implies that Fi will also
produce the same expected returns.
8.2.4 Introducing the concept of dQ

For all i, j = 1, . . . , n,
∞ ∞
Fi,t+τ Fj,t+τ λ
dQ = dQ = ∞ m = λ∗ .
0 Fi,t 0 Fj,t 0 U [ 1 wj Z] dP
(8.4)
Write
λ∗ = eητ ,
∞
Fi,t+τ
dQ = eητ ,
0 Fi,t
∞
−ητ
Fi,t = e Fi,t+τ dQ = e−ητ E Q [Fi,t+τ ] . (8.5)
0
F
Recall that [Vt n1 wi Fi,t+τ
i,t
] = Vt+τ , given that Q is related only to
the probability distribution of firm value Vt+τ and utility preference
function U , Q is not affected by capital structure wi .
8.2.5 What is eητ ?

Equation (8.4) holds for all capital structure. It must hold also for
n = 1, i.e. where there is only one type of security.
Ft+τ = Vt+τ and Ft = Vt
then
∞ ∞
−ητ ητ
Vt = e Vt+τ dQ or Vt e = Vt+τ dQ,
0 0
and
∞
ητ
e = ZdQ (Z; τ )
0
is the “risk-neutral” or risk-adjusted return of the firm, i.e. it is

the aggregate expected return on all the firm securities in the util-
probability space. Equation (8.4) further states that expected return
on all securities in the util-probability space must be the same. When
there is more than one security,
dQ ≡ dQ (Z1 , . . . , Zm ) .
Finally, in the complete market setting where perfect hedging is pos-

sible, η = r, the risk-free interest rate.
8.3 Payoff of Risky Debt

Consider the simple case where there is only one debtholder and one
equity holder. Let F1 (V, τ ) denote the value of debt and F2 (V, τ )
denote the value of equity. Debt has seniority in fixed income and in
terms of capital protection during bankruptcy, but has no upside
potential on income and residual value of the firm. So at debt
maturity,
F1 (Vt+τ , 0) = min (B, Vt+τ ) .
Vt+τ
With Z = Vt , the integrating condition for F1 is
Vt+τ ≤ B,
Vt+τ B
≤ ,
Vt Vt
B
Z≤ .
Vt
From (8.5), the current debt value is, writing dQ ≡ dQ(Z; τ ),

B/Vt ∞
F1 (V, τ ) = e−ητ Vt+τ dQ + BdQ
0 B/Vt

∞ B/Vt
−ητ
=e BdQ + (Vt+τ − B) dQ
0 0
B/Vt
−ητ −ητ
= e B −e (B − Vt+τ ) dQ. (8.6)
0
Risk free bond
put option
That is, a defaultable bond can be decomposed into a risk-free

bond plus a short put on the value of the firm at a strike price B;
the bondholder has given the shareholder the right to sell the firm
to the bondholder at a price B. It is clear that the shareholder
will exercise this right if V < B at bond maturity. This result is
graphically presented in Fig. 8.2.
Alternatively, a risky debt can be viewed as a long position in
the firm’s asset and a short call option at the strike price level B as
Cash flow or pay off
Risk-Free Bond rf
Risky Bond
Vt + τ
B/V
Short put
– ∫o
B/V
(B-Vt + τ) dQ
Figure 8.2: Risky bond as the combination of a risk-free bond and a short put.
shown below
F1 (V, τ ) = Vt − F2 (V, τ )

∞ ∞
−ητ
=e Vt+τ dQ − (Vt+τ − B) dQ
0 B/Vt
∞
−ητ
= Vt − e (Vt+τ − B) dQ. (8.7)
B/Vt
long firm
short call option
The debtholder owns the firm but has given the shareholder the
right to buy the firm at the strike price level B. It is clear that the
shareholder will exercise this right if V > B at debt maturity. This
result is graphically presented in Fig. 8.3.
As the value of the firm, Vt , increases, the debt ratio VBt → 0,
F1 (V, τ ) → e−ητ B, the bond becomes risk free. In the limit, when
B
Vt → 0, the put option in (8.6) is deep-out-of-the-money and worth
zero, while the call option in (8.7) is deep-in-the-money and worth
Vt+τ −B. In the complete market setting, F1 (V, τ ) = e−ητ B = e−rτ B,
and η can be replaced by the risk-free interest rate r.
Cash flow or pay off
long firm Value Vt + τ
Risky Bond
B/V
Vt + τ
Short call
∞
– ∫ B/V (Vt + τ –B) dQ
Figure 8.3: Risky bond as the combination of the firm value and a short call.
8.4 Pricing Risky Debt

There are two ways to price risky debt. The first is through solving
the fundamental partial differential equation (FPDE) of debt price
process directly. Assume for simplicity that the term structure is flat
with bond price
P (τ ) = e−rτ .
The firm value is
dV = (αV − CV ) dt + σV dZ, (8.8)
where α is the firm’s return and CV is the regular outflow such as

interest and dividend payments.
The risky debt value F has the price dynamic
dF = (αF F − CF ) dt + σF F dZF . (8.9)
Note that F is also a function of V and t. Hence, Ito’s lemma of (8.8)

gives

1 2 2
dF = (αV − CV ) FV + σ V FV V + Ft dt + σV FV dZ, (8.10)
2
∂F ∂ F 2 ∂F
where FV = ∂V , FV V = ∂V 2 and Ft = ∂t .
Comparing (8.9) and (8.10), we have

1
αF F ≡ (αV − CV ) FV + σ 2 V 2 FV V + Ft + CF ,
2
∂F
σF F ≡ σV ,
∂V
dZF ≡ dZ.
Following Black–Scholes, fully hedged portfolio argument with

α = r led to the following FPDE:
1
rF = (rV − CV ) FV + σ 2 V 2 FV V + Ft + CF . (8.11)
2
Assuming that there is no interest or interim dividend payment CV =
CF = 0. The bond has a face value B to be paid at time T , and time
to maturity is τ = T − t. Hence, replacing +Ft with −Fτ , the FPDE

in (8.11) is reduced to
1 2 2
σ V FV V + rV FV − rF − Fτ = 0, (8.12)
2
which can be solved subject to boundary conditions
F (V, 0) = min(V, B), (8.13)
in order to produce a preference free option pricing formula and

where the actual value of α is not required.
Equation (8.12) looks identical to the Black–Scholes fundamen-
tal PDE except that we now have a different boundary condition
(8.13). The boundary condition for the Black–Scholes call option is
max(V − B, 0) and the boundary condition for put is max(B − V, 0).
In the following subsections, we present some specific solutions. In
Sec. 8.4.1, Merton’s solution for (8.13) is explicitly solved in the con-
text of (8.12). Section 8.7.1 shows a general functional form from
Leland (1994) for the solution for F in (8.12) when the debt matu-
rity is infinite. We have previously showed in Chapter 7, how PDE
in the form of (8.12) can be solved through Fourier transform and
characteristic function.
8.4.1 Solving the FPDE

Assuming that there is no interim dividend nor interest payment, the
risky bond in Eq. (8.14) can be priced as the residual value of the
firm when the equity is valued as a call option on the firm’s value.
Let f (V, τ ) be the value of the equity, and F (V, τ ) is the value of the
bond, and the firm’s value is a sum of debt and equity
V ≡ F (V, τ ) + f (V, τ ), (8.14)

debt equity
with F ≥ 0, f ≥ 0 and VF ≤ 1, Vf ≤ 1. The equity can be priced

according to Black–Scholes as a call option,
1 2 2
σ V fV V + rV fV − rf − fτ = 0
2
with boundary condition at t = T ,
f (V, 0) = max (V − B, 0) .
We have, from Black–Scholes,
f (V, τ ) = V Φ (x1 ) − Be−rτ Φ (x2 ) ,

ln VB + r + 12 σ 2 τ
x1 = √ ,
σ τ
√
x2 = x1 − σ τ .
From (8.14), the debt value is
F (V, τ ) = V − f (V, τ )
= V − V Φ (x1 ) + Be−rτ Φ (x2 )
= V [1 − Φ (x1 )] + Be−rτ Φ (x2 )
= V [Φ (−x1 )] + Be−rτ Φ (x2 ) .
We could have stopped here; to obtain the expression in Merton,

write
⎧ ⎫
⎪
⎪ ⎪
⎪
⎨ V ⎬
−rτ
F (V, τ ) = Be [Φ (−x1 )] + Φ (x2 ) . (8.15)
⎪
⎪ Be−rτ ⎪
⎪
⎩ ⎭
1/d
Define leverage ratio

B −rτ
d≡ e ,
V

B
ln d = ln − rτ .
V
Substitute this into (8.15)

−rτ 1
F (V, τ ) = Be [Φ (−x1 )] + Φ (x2 )
d

−rτ 1
= Be [Φ (h1 )] + Φ (h2 ) ,
d
where

ln VB + r + 12 σ 2 τ
h1 = − √
σ τ
1 2
B
2σ τ − √ ln V − rτ
=−
σ τ
1 2
σ τ − ln d
=−2 √
σ τ
and

ln VB + r + 12 σ 2 τ √
h2 = √ −σ τ
σ τ
1 2
2 σ τ√
+ ln d
=− .
σ τ
8.5 Price of a Warrant

Consider the case where warrants are issued by a firm that already
has debt with a terminal value $B and equity of which N shares are
outstanding with current price per share $S. The capital structure
of this firm is made up of three types of securities, viz. debt with
current value F1 (V, τ ), equity with current value F2 (V, τ ) = N S and
warrants with value F3 (V, τ ) = nW , where there are n warrants
outstanding with current market value per warrant of $W .
Since debt is a senior security to warrant, the current value of
the debt for the firm is unaffected by the issuance of warrants, and
remains the same as that when the firm has only debt and equity.
Thus,
∞
B/V
F1 (V, τ ) = e−rτ VT dQ + BdQ
0 B/V

∞ ∞
−rτ
=e VT dQ − (VT − B) dQ
0 B/V
∞
= V − e−rτ (VT − B) dQ,
B/V
where z ≡ Vt+τ Vt and Q is a function of (z, τ ). Here, we write Vt as

V , and Vt+τ as VT . Moreover, from here onwards, we assume dQ is
in the util-prob space (i.e. risk neutral) and η is replaced by r in a
complete market.
Next, we consider the value of equity. Supposing that each war-
rant gives the holder the right to purchase one share of stock at S
dollar per share. Let γ denote the maximum value of VT such that
the price per share of equity is less than or equal to S. Thus, if
VT ≤ γ, the warrant holders will not exercise their warrants. The
equity holders will receive the full residual value of the firm, VT − B.
If VT > γ, then the warrant holders will pay nS dollars in return for
n shares of equity. The total value of the equity is then VT + nS − B.
However, since the number of shares increases, the ownership as well
as the value of shares for the existing shareholders will be diluted
N
by a fraction of n+N . Hence, the current value of equity can be
expressed as
∞
γ/V
−rτ N
F2 (V, τ ) = e (VT − B)dQ + (VT + nS − B)dQ
B/V n + N γ/V

∞
= e−rτ (VT − B) dQ
B/V

∞
N n+N
+ VT + nS − B − (VT − B) dQ
n+N γ/V N

N e−rτ ∞
n
= V − F1 (V, τ ) + nS − (VT − B) dQ
n + N γ/V N
∞
ne−rτ ! "
= V − F1 (V, τ ) + N S − (VT − B) dQ,
n + N γ/V
where γ = N S + B.2
2
To determine γ, let S be the price per share that the warrants are not exercised,
i.e. S ≤ S. Then V − B = N S or V = N S + B. Given that γ is defined as the
maximum value of V such that S ≤ S. Hence, γ = N S + B.
Now we are ready to derive the value of the warrants
F3 (V, τ ) = V − F1 (V, τ ) − F2 (V, τ )

∞
ne−rτ ! "
=− N S − (VT − B) dQ
n + N γ/V
∞
ne−rτ
= (VT − γ) dQ.
n + N γ/V
8.6 Convertible Bond

A convertible bond with face value of $B can be exchanged into a
total of n shares of equity with current share price per share of $S.
At debt maturity τ = 0, if VT < B, the equity value F2 (V, 0) = 0. On
the other hand, if VT > B, the bond may or may not be converted
n
depending on n+N VT ≷ B. Therefore, the equity value is

N
F2 (V, 0) = max 0, min VT − B, VT .
n+N
Again, define γ as the maximum value of VT such that the bond will
not be converted. Then, γ = n+N
n B, and the equity value becomes
∞
γ/V
−rτ N
F2 (V, τ ) = e (VT − B) dQ + VT dQ
B/V n + N γ/V
∞
∞
−rτ N
=e (VT − B)dQ + VT
B/V n + N γ/V

n+N
− (VT − B)dQ
N
∞
∞
n
= e−rτ (VT − B) dQ − (VT − γ) dQ
B/V n + N γ/V
and the value of the convertible bond is

F1 (V, τ ) = V − F2 (V, τ )

∞ ∞
−rτ
=e VT dQ − (VT − B) dQ
0 B/V
∞
n
+ (VT − γ) dQ .
n + N γ/V
∞
B/V
−rτ
=e VT dQ + BdQ
0 B/V

∞
n
+ (VT − γ) dQ .
n+N γ/V
8.6.1 Reverse convertible

A reverse convertible is a bond with debt face value $B which will
be converted into n number of equity shares when the leverage ratio
hit a fixed threshold, or the equity value falls below a threshold (see
Kashyap et al. 2008). As before, if F1 (V, τ ) is the debt value and
F2 (V, τ ) is the equity value, then we have the following boundaries:
n
F1 (V, τ ) = min (B, yVT ) , y = ,
n +N
N
F2 (V, τ ) = max VT , VT − B ,
n+N
where yVT is the value of the converted shares, and S is the conver-
sion price such that nS = yVT . The shareholder’s value will never
reach zero because whenever the value drops below the threshold,
the bond will be exchanged into equity. Hence,
∞ y/V
B/V
N
F1 (V, τ ) = e−rτ VT dQ + BdQ − VT dQ
0 B/V n+N 0
y/V
∞
N
= V + e−rτ − (VT − B) dQ − VT dQ .
B/Vt n+N 0
This is really a long position in the firm’s asset value, plus a short call
option on the firm’s asset with strike price B and a short position in
asset or nothing binary put strike at y. Next, we can write the equity
value as a residual of firm value minus debt
F2 (V, τ ) = V − F1 (V, τ )

∞ y/V
−rτ N
=e (VT − B) dQ + VT dQ .
y/V n+N 0
Flannery (2005) and Kashyap et al. (2008) propose the use of reverse
convertible bond as an alternative to capital regulations for banks
of which investors might be willing to buy, as an investment in a
“defaultable catastrophic bond” that will automatically provide capi-
tal to banks in low probability huge loss event. In return, the investors
will receive a premium or a higher coupon payment.
8.6.2 Call option enhanced reverse convertible

Compare with the classical reverse convertible, the call option
enhanced reverse convertible (COERC) (see Pennacchi et al. 2010)
is different in that (i) the conversion price is significantly lower than
the trigger price and (ii) the equity holder has an option to buy the
converted equity back from the bondholder at the same significantly
lower price than the trigger price. Here, we can view the bond as
a risky bond (as before) which has an embedded short put as well
as an embedded short call. As an extension of the case of reverse
convertible above, the bond value is
∞
B/V
−rτ
F1 (V, τ ) = e VT dQ + BdQ
0 B/V

y/V ∞
N N
− (γ − VT ) dQ − (VT − γ) dQ ,
n+N 0 n+N y/V
F2 (V, τ ) = V − F1 (V, τ ) .
8.6.3 Policy implications

Both types of convertibles have been proposed for solving debt over-
hang in banks during a crisis. A debt overhang emerges if a firm has a
positive NPV project but cannot capture the investment opportunity
due to an existing debt position. When firms are in financial distress,

debt overhang discourages firms from recapitalising. Since the firm
is not able to raise capital, if it is to issue new shares to fund pos-
itive NPV project, the shareholders might take extra risk and shift
the cost back to the debtholder. The reverse convertible is one way
to resolve the debt overhang deadlock. If the conversion is exercised
automatically when the threshold is reached, then such a mechanism
will prevent bank failures and bank run as new capital will be raised
when banks critically need liquidity after some big losses. Such a
safeguard will prevent further sharp drop of equity price and present
itself as a mechanism for automatic deleveraging (Flannary, 2005).
The reduction of debt overhang during a reverse conversion not only
restores the bank’s capital to at least the threshold level but reduces
the amount of outstanding debt by converting debt into new equity.
In the case of COERC, the firm can buy back the shares, previously
converted to debt, at the same low conversion price and hence avoid
wealth transfer from the old shareholders to the new shareholders.
There are some practical issues however. The mechanism for some
classical reverse convertible which uses annual measure does not fit in
the situation should the bank’s capital structure deteriorates quickly
(as in the case of the 2008 crisis). Since the reverse convertible has a
forced conversion, the payoff structure may not be attractive enough
to fixed income investors, hence bringing in thin marketability prob-
lem.
Recently, the improved types of reverse convertible such as
COERC has resolved some of the problems listed above. For exam-
ple, they now reference to the market value instead of accounting
measure, and hence can respond readily to changes in bank’s market
value. By giving the shareholders the right to buy back the con-
verted shares at the same significantly lower price than the trigger
price, COERC provides strong incentive for shareholders to buy back
the shares and payback the bond, and hence lower the default risk.
This helps to reduce the excessive risk taking behaviour that are usu-
ally present in a levered firm. However, it is possible that since equity
holder has the right to buy back the share and converted equity at
the same price, the shareholder might take the riskier position and
increase the risk of failure when firm’s value is at or below the exercise
price.
8.7 Bankruptcy Cost and Tax Benefit

8.7.1 Solution under time invariant
In Leland (1994), it is assumed that debt is perpetual because of very
long time to maturity or the debt being constantly rolled over. In this
case, all contingent claims on the firm’s value has no explicit time
dependence, and the partial derivative Ft (V, t) = 0. Next, writing
CV = 0 and CF = C in order to match the notations in Leland
(1994), Eq. (8.11) becomes:
1 2 2
σ V FV V (V ) + rV FV (V ) − rF (V ) + C = 0. (8.16)
2
Leland (1994) then assumes that the ODE above has a general solu-
tion of the following form:
F (V ) = A0 + A1 V λ + A2 V β ,
where λ < 0 < β. Substitute this solution into the ODE in (8.16)
gives
1 2 2

0= σ V λ (λ − 1) A1 V λ−2 + β (β − 1) A2 V β−2
2 # $ # $
+ rV λA1 V λ−1 + βA2 V β−1 − r A0 + A1 V λ + A2 V β + C

1 2 λ 1 2
= σ λ + r (λ − 1) A1 V + σ β + r (β − 1) A2 V β
2 2
+ (C − rA0 ). (8.17)
The solution needs to be valid for all V > VB , the default threshold.3
One possible solution is when the coefficients of V of each order in
(8.17) are all zero. This leads to λ = 1, β = −2r/σ 2 and A0 = C/r,
3
VB , the default threshold, in Leland (1994) is equivalent to the debt’s par or
principal value, B, in Merton (1990) book.
hence
C
F (V ) = + A1 V + A2 V −x ,
r
where x = σ2r2 . In Leland’s (1994) time-independent setting, all claims
with financial payout C must have this functional form. The bound-
ary conditions when firm defaults will depend on the payout rule of
the securities.
8.7.2 Protected debt covenant

Leland (1994) analyses the case when debt is protected by covenant
such that default takes place as soon as the firm’s value drops below
the default threshold V < VB , at which case a fraction 0 ≤ α ≤ 1 of
the firm value will be lost as bankruptcy costs leaving the debtholders
with (1 − α)VB . Hence, the debt value can be determined with the
following boundary conditions:
⎧
⎨C As V → ∞
D (V ) = r .
⎩
(1 − α) V As V ≤ VB
C
From (8.17), we note that for risk-free debt, F (V ) = r as V → ∞.
Hence,
C C
D (V ) = + A1 V + A2 V −x = ,
r r
which means that A1 = 0. Moreover, when default happens at
V = VB ,
C
D (V ) = + A2 VB−x = (1 − α) VB ,
r

C
A2 = (1 − α) VB − VBx .
r
Recall from section 8.7.1 that x = σ2r2 . So, the debt value can be
written as
−x
C C V
D (V ) = + (1 − α) VB − ,
r r VB
where the first term represents the risk-free component and the
second term represents a (negative) default risk premium.
The debt value can also be reformulated as
C
D (V ) = (1 − PB ) + PB (1 − α) VB ,
r
where PB ≡ ( VVB )−x can be interpreted as the probability of
bankruptcy.
In Leland’s (1994) framework, firm’s asset value, V , is not affected
by capital structure but value of the levered firm is affected as follows:
v (V ) = V + T B (V ) − BC (V ) ,
where T B is the tax benefit and BC is the bankruptcy cost, and

both can be valued as time independent “securities” as follows:
BC (V ) = αVB PB ,
C
T B (V ) = τ (1 − PB ) ,
r
where τ is the corporate tax rate. Hence, the value of the levered
firm and equity are
C
v (V ) = V + τ (1 − PB ) − αVB PB , (8.18)
r
E (V ) = v (V ) − D (V )
C C
= V + τ (1 − PB ) − αVB PB − (1 − PB ) − PB (1 − α) VB
r r
C
= V − (1 − PB ) (1 − τ ) − PB VB . (8.19)
r
The equity value is the value of the unlevered firm minus the after-
tax debt value when there is no default, and minus the debt value at
bankruptcy.
8.7.3 Optimal capital structure

The second type of bankruptcy studied in Leland (1994) is where
default threshold is endogenously determined by equity holders, who
maximise the value of equity

∂E (V ; VB )
=0
∂VB
giving
(1 − τ ) C
VB∗ = . (8.20)
r + 12 σ 2
To prove (8.20), substitute PB = ( VVB )−x into (8.19) and differ-

entiate E(V ) with respect to VB
C
E (V ) = V − 1 − V −x VBx (1 − τ ) − V −x VB1+x ,
r
∂E (V ) C
= xV −x VBx−1 (1 − τ ) − (1 + x) V −x VBx = 0,
∂VB r
C
x (1 − τ )
VB∗ = r .
1+x
2r
Next, substitute x = σ2
, we get
C 2r
r σ2 (1 − τ ) C (1 − τ )
VB∗ = 2r = .
1+ σ2
r + 12 σ 2
It is interesting to note that VB∗ is independent of the current

asset value V , and bankruptcy cost α. Higher tax rate, lower coupon,
high interest rate and asset volatility will all lead to a lower default
threshold and hence higher default probability.
When V is closed to VB , as in the case of junk bond, a smaller
VB will reduce the bankruptcy cost αVB , lower the probability of
bankruptcy PB ≡ ( VVB )−x and increase the value of debt. When VB
is sufficiently high, higher volatility σ increases PB and decreases
debt value.
Given asset value V , the debt value increases as coupon increases
for small coupons. But as coupon increases, VB becomes higher and
the effect of bankruptcy dominates reducing the debt values (and the
value of the levered firm). This means there exists an optimal coupon
level Cmax (V ) and optimal debt capacity of the firm. According to
Leland (1994),
1
∗
Cmax (V ) = V [(1 + x) h]− x ,
where h is a function of bankruptcy cost α and tax rate.

To derive C ∗ , first write
x
1 x (1 − τ )
m= ,
(1 + x) r (1 + x)
then
x
x (1 − τ )
(VB∗ )x =C x
= C x (1 + x) m
r (1 + x)
Now substitute VB∗ into v (V ) in (8.18)
Cτ Cτ −x x
v (V ) = V + − V VB − αV −x VB1+x PB
r r
Cτ Cτ −x x
=V + − V C (1 + x) m
r r
C
x (1 − τ ) x
− αV −x r C (1 + x) m
1+x
Cτ τ V −x C 1+x
=V + − [1 + x + αx (1 − τ )/ τ ] m
r r
Cτ τ V −x C 1+x
=V + − h,
r r
where h = [1 + x + αx(1 − τ )/τ ]m.

Now differentiate v(V ) with respect to C,
∂v (V ) τ hτ V −x C x
= − (1 + x) = 0,
∂C r r
Vx
(C ∗ )x = , or C ∗ = V [(1 + x) h]−1/x .
(1 + x) h
Substituting C ∗ (V ), one obtains D ∗ (V ) and v ∗ (V ), and optimal

leverage L∗ = D∗ /v ∗ .
C ∗τ τ V −x Vx
v ∗ (V ) = V + − C ∗h
r r (1 + x) h

C ∗τ x
=V +
r 1+x

τ x
=V 1+ [(1 + x) h]−1/x .
r 1+x
Now substitute VB∗ into PB
∗ x ∗ x
VB C
PB = = (1 + x) m
V V
and use this to evaluate D ∗ (V ) below
∗ x
∗ C∗ C
D (V ) = 1 − (1 + x) m
r V
∗ x
C ∗ x (1 − τ )
+ (1 + x) m (1 − α) C
V r (1 + x)
∗
∗ x
C C
= 1− m [(1 + x) − (1 − α) (1 − τ ) x]
r V
∗ x
C∗ C
= 1− k ,
r V
where k = m[(1 + x) − (1 − α)(1 − τ )x]. Next substitute the value of
C∗

∗ V [(1 + x) h]−1/x k
D (V ) = 1− .
r (1 + x) h
Hence,

−1/x k
D∗ (V
) [(1 + x) h] 1 − (1+x)h
L∗ = ∗ = % # $ &.
v (V ) r 1 + τr 1+xx
[(1 + x) h]−1/x
When tax rate, τ , increases, the tax benefits for shareholder

increases leading to a higher optimum leverage ratio. When
bankruptcy cost, α, increases, firm will takes less leverage to avoid
costly bankruptcy. Hence, tax rate and bankruptcy cost exert

a trade-off effect on capital structure. Optimal leverage L∗ also
increases as asset volatility σ decreases and as interest rate r
rises. When volatility increases, a firm becomes more risky and
should therefore reduce debt to avoid bankruptcy cost. Even though
increased r raise the cost of borrowing, such a cost is more than offset
by the tax benefit of debt. Hence, optimum leverage ratio increases
as interest rate increases.
8.8 Deposit Insurance

Deposit insurance is a contract of insuring the deposits of a given
institution against potential default in an effort to enhance financial
stability. Merton (1977) shows that the deposit insurance contract
can be linked to a put option. First assumes that the bank’s asset
value, Vt , follows a geometric Brownian process with constant mean,
μ, and volatility, σ. Denote the face value of the interest bearing
debt by D and assuming that all debt are insured, Merton derives a
model for the market value of deposit insurance per dollar of insured
deposits at time t as:
1
g(d, τ ) = Φ(h2 ) − Φ(h1 ), (8.21)
d
ln d − τ2
h1 = ,
τ 1/2
h2 = h1 + τ 1/2
and d = D/V is the current deposit-to-asset value ratio, and τ = σ 2 T

is the total variance of the logarithmic change in the value of the
assets during the term of the deposits. Since most deposits are of the
demand type, Merton assumes T is the length of time until the next
audit of the bank by the guarantor. It is clear that delta, ∂g ∂d > 0,
∂g
and vega, ∂τ > 0; any increase in the deposit-to-asset value ratio,
volatility and the length of time the insurance is in force will increase
the cost per dollar of deposit.
In practice, difficulty arises because the values for the model
parameters are unknown and the bank’s asset value cannot be
observed. One can, nevertheless, view the equity value of the bank,
which is directly observable, as a call option on bank’s asset. By Ito’s
lemma, we get
σe F ≡ σV Fv ,
F
σ = σe Fv ,
V
where σe is the volatility of option, and Fv is the Black–Scholes delta
of equity as a call option on V . The equity to firms value ratio, VF ,
could be proxy by one minus leverage ratio calculated as the ratio of
total of long-term debt to total asset value.
In contrast, the value of the loan guarantee is directly a function
of credit spread and loan time to maturity in addition to volatility
and leverage. Let B exp[−R(T )T ] be the market value of the (risky)
debt when there is no guarantee, where R(T ) is the promised yield.
On the other hand, the market value of the debt with a guarantee is
B exp[−rT ] and
G(T ) + B exp[−R(T )T ] = B exp[−rT ],
G(T )
= 1 − exp[−(R(T ) − r)T ], (8.22)
B exp[−rT ]
where G(T ) is the cost of the loan guarantee as a fraction of the
amount of money raised.
Exercises: Capital Structure Theory

1. In Merton (1990) Chapter 12, default can take place only at debt
maturity. Analyse the impact on the solution if default can take
place at any time when the firm’s value V drops below a threshold
V < B × R,
where R < 1 is the recovery ratio of the debt value B. Note that
the debt maturity or default time, τ , is now stochastic.
2. Assuming that VT has a lognormal distribution and Vt follows
a GBM, show the impact of volatility on the value of debt and
equity at the various critical threshold levels of a “Call Option
Enhanced Reverse Convertible” (see Pennacchi et al. 2010). [You
may like to choose another exotic debt instrument that will help
to reduce debt overhang during financial crisis.]
3. Solve the optimum default threshold in Leland (1994)
(1 − τ ) C
VB∗ =
r + 12 σ 2
and the optimum coupon level
1
∗
Cmax (V ) = V [(1 + x) h]− x .
Given these two solutions, derive D ∗ (V ), v ∗ (V ) and the optimal
leverage L∗ = D∗ /v ∗ . Demonstrate how L∗ is affected by volatility,
interest rate and tax rate.
Chapter 9
General Equilibrium
Merton (1990, Chapter 11) shows how, on the demand side given
the asset price and interest rate dynamics, the individuals in their
separate pursuance of maximising utility from wealth and consump-
tion, interact with the supply side of securities and firms to reach
market equilibrium. In a simplified setting, the Capital Market Line,
the Security Market Line and the Capital Asset Pricing Model are
the natural outcomes when markets clear.
In this basic set-up, there are K individuals, n security where the
nth security is a risk-free asset, and m = n − 1 is the number of risky
assets. For the kth investor, his objective function is
k
T
k k k k k k
max E0 U [Ct , t]dt + B [W (T ), T ] ,
0
where E0 is conditional expectation operator, conditional on

investor’s current wealth W0k and on the current value of the firms,
Vi,0 for i = 1, . . . , n; Ctk is his instantaneous consumption at time t,
and B k is his bequest function at the time of death.
9.1 Firms and Securities

The price per share of individual firm i, Pi,t , follows a GBM
dPi
= αi dt + σi dZi . (9.1)
Pi
Here, the mean αi and variance rates σi2 may change through time
but they must change in such a way that is uncorrelated with price
129
change such that

dαi dZj = dσi dZj = 0 for i, j = 1, . . . , n.
The interest rate, on the other hand, has a normal distribution
dr = αr dt + σr dq, (9.2)
where dq is a simple Gauss–Wiener process. [Note: dq is NOT a jump
process. Also, unlike all the other chapters in Merton’s book, r is now
stochastic.]
Since dZi dq will not be zero in general, to avoid complication, it
is further assumed that αi and σi are functions of the stochastic risk
free rate rt only. That is, investors revise their expectations about
risky asset returns only if interest rate changes.
9.2 Individuals
For individual k, and omitting the wage income, her wealth process
is
m m

k k k
dW = wi (αi − r) + r W dt + wik W k σi dzi − C k dt.
1 1
Define the “derived” utility-of-wealth function as

k
T
k k k k k k k k
J (W , r, t) ≡ Et U [Cs , s]ds + B [W (T ), T ] .
t
Follow the steps in the previous chapters, by taking Taylor series

expansion on Ito’s processes, we have
0 = max {φ},
C k ,wk
φ = U k (C k , t) + Jtk + Jrk αr + JW
k
{[w k (α − r) + r]W k − C k }
1 k 2 1 k
+ Jrr σr + JW W (w k ww k )(W k )2 + JW k k k
r w σr W , (9.3)
2 2
∂J ∂ J 2 ∂ J 2
where JW = ∂W , JW W = ∂W 2 , JW r = ∂W ∂r , and underscore denotes
vectors and matrices. In particular, σ r is the vector of covariations,

σir , between security return and interest rate.
General Equilibrium 131
The optimal decision is obtained when φC = 0, and we have

UCk (C k , t) = JW
k
and with φwi = 0, we have

k k k k k
JW (α − r) + JW W (ww )W + JW r σ ir = 0. (9.4)
Then with V = w −1 ≡ [νij ], we have the demand for security i,
JWk Jk
dk = w k W k = − k
w−1 (α − r) − kW r w−1 σ ir ,
JW W JW W
k m k m

JW JW
dki = wik W k =− k νij (αj − r) − k r
νij σjr , (9.5)
JW W j=1 JW W j=1
for i = 1, . . . , m.
9.3 Aggregate Demand

When individual’s life time consumption is optimised, the demand
for ith stock by the kth investor is
m
m

dki = Ak νij (αj − r) + H k νij σjr ,
j=1 j=1
JWk JWk
Ak = − k
, Hk = − k
r
.
JW W JW W
The aggregate demand Di for ith security from all investors is

K
m
m

Di ≡ dki = A νij (αj − r) + H νij σjr , (9.6)
k=1 j=1 j=1
K
K

A≡ Ak and H≡ Hk.
1 1
In matrix form and for all risky securities i = 1, . . . , m

D = AΩ−1 (α − r) + HΩ−1 σ r , (9.7)
where σ r denote m × 1 vector of σjr .
9.4 Market Portfolio

At market equilibrium, the aggregate market portfolio of m stocks is
equal to the aggregate of individual’s demand in (9.6)
m

M≡ Di .
i=1
If the market portfolio M has a price dynamics dPM , then

m
dPi
dPM
= wi , (9.8)
PM Pi
i=1
where wi = D i
M denote stock i market weight in the market portfolio.
(Note: Take care not to mix up wi and wik , the latter represents
individual optimal investment weight in (9.5).)
From (9.8), substitute the price dynamics of m risky assets plus
the risk-free rate to give
m m
dPM
= wj (αj − r) + r dt + wj σj dzj , (9.9)
PM 1 j=1
where dzi dzj = ρij dt, (dt)2 = 0, and dzi dt = 0.

From (9.9), the mean return of the market portfolio is
m
dPM
αM ≡ E = wj (αj − r) + r.
PM
1
The covariance of the market return on the ith asset for i = 1, . . . , m

is

dPM dPi
σiM dt = ,
PM Pi
⎧ m ⎫
⎨ m
⎬
= wj (αj − r) + r dt + wj σj dzj , {αi dt + σi dzi } .
⎩ ⎭
1 j=1
Omitting the dtdt, dtdz and dzdt terms, we have

m

σiM dt = wj ρij σi σj dt. (9.10)
j=1
The variance of the market portfolio,

m m

dPM dPM
2
σM dt = , = wi σi dzi , wj σj dzj
PM PM
i=1 j=1
m
m
= wi wj ρij σi σj dt
i=1 j=1
m
m
= wi wj σij dt.
i=1 j=1
The covariance of market return and interest rate changes is

dPM dPr
σM r dt = , ,
PM Pr
dPr
where Pr is the price of risk free asset. But Pr = 1 and Pr = dr,
thus

dPM
σM r dt = , dr . (9.11)
PM
Substitute (9.9) and dr in (9.2) into (9.11),

m m

σM r dt = wj (α − r) + r dt + wj σj dzj , {αr dt + σr dzr }
1 1
m

= wj σj dzj , σr dzr
1
m

= wj σjr dt (9.12)
1
given dzj dzr = ρjr .

9.5 Security Market Line

Rearranging the demand equation in (9.7), we get
D H
(α − r) = w − σr .
A A
Dj
Then for individual security i = 1, . . . , m, use the definition wj = M
or Dj = M wj ,
m
M H
αi − r = wj σij − σir .
A 1 A
Recall from (9.3) that σir is the covariation between security return
and interest rate, then from (9.10)
M H
αi − r = σiM − σir . (9.13)
A A
Multiply both sides of (9.13) by wi and sum over m,
m
m
m m
M H
wi αi − wi r = wi σiM − wi σir .
A A
1 1 1 1
From (9.10) and (9.12), we get the market excess return
M 2 H
αM − r = σM − σM r .
A A
When interest rate is constant, σM r = σir = 0.
M M 2
αi − r = σiM , and αM − r = σ ,
A A M
αi − r αM − r
= 2 ,
σiM σM
σiM
αi = (αM − r) 2 + r = (αM − r)βiM + r. (9.14)
σM
Equation (9.14) is the well-known Security Market Line, the founda-

tion of the Capital Asset Pricing Model (CAPM).
The CAPM has been widely used since its inception; there are
many assumptions critical to its validity:
(i) The model develops by first having individuals optimising
consumption and investment based on utility and bequest func-
tions that are strictly concave.
(ii) Homogenous expectation, i.e. all investors have the same expec-
tation regarding the returns and risk of all assets as well as the
risk-free interest rate.
(iii) The interest rate in the CAPM is constant, this implies that in
the more general two-fund separation, the changes in rates are
not correlated with returns on other assets.
(iv) Market will reach equilibrium when all demands meet all sup-
plies.
9.6 Three-fund Separation

So far we have assumed that investment opportunity set is constant
i.e. the efficient frontier does not change when interest rate changes.
If changes in the interest rate affect the yields of other assets, the effi-
cient frontier will change. Hence, interest rate now enters as a source
of risk that investor wants to avoid. The optimization problem of
portfolio choice now involves minimising interest rate risk alongside
minimising variance of asset returns. This leads to the three-fund sep-
aration theorem. The first two funds provide choices for investors to
optimise their consumption, while the third provides means of hedg-
ing against movements in the investment opportunity set. A natural
way to hedge against a variable is to hold portfolio that are perfectly
correlated with it. Therefore, the third fund is a portfolio that is
perfectly correlated with changes in the interest rate. If asset prices
follow the diffusion process in (9.1), when risk-free interest rate is
considered
wk∗ = m(P, W, t)gk (P, t) + fk (P, W, t)
(see the discussion in Chapter 5). The solution we obtained previ-

ously has a constant risk free rate. If interest rate is also stochastic,
then wk∗ will have another term, G, to reflect the dependence of the
optimisation problem J on r
wk∗ = m(P, W, t)gk (P, t) + fk (P, W, t) + G(r, W, t).
The solution will lead to the three-fund separation theorem.
9.7 Empirical Application of CAPM

Merton commented that the SML in (9.14) is a relationship between
asset returns. In empirical studies using a regression in (9.15) below
with equity instead of firm’s return, systematic bias would be
introduced
Rit − r = βi (RM t − r) + γi + ξit , (9.15)
Pt
where Rt = ln Pt−1 .
To analyse this observation, we write firm’s value
dV
= αdt + σdZ (9.16)
V
and equity value F (V, τ ) = V (t) − D(V, τ ) with dynamic
dF
= αe dt + σe dZ, (9.17)
F
where αe and σe are functions of V and debt maturity τ . Moreover,
like every security in the economy, the equity and the asset value of
the firm must satisfy CAPM in (9.14) in equilibrium,
ρσe σM
αe − r = 2 (αM − r), (9.18)
σM
ρσσM
α−r = 2 (αM − r). (9.19)
σM
Combine (9.18) and (9.19), we get
αe − r (α − r)
= ,
σe σ
σe
αe − r = (α − r).
σ
Apply Ito’s lemma on (9.16), we get

∂F 1 2 2 ∂2F ∂F ∂F
dF = + σ V 2
+ αV dt + σV dZ.
∂t 2 ∂V ∂V ∂V
Comparing this with (9.17), we get

σe F ≡ σV FV ,
V FV
(αe − r) = (α − r).
F
This means that the SML regression coefficient would now become
β V FFV , and will vary with changes in V .
Hence, although the value of the firm follows a simple dynamic
process with constant parameters as described in (9.16), the individ-
ual component securities follow more complex processes with chang-
ing expected returns and variances. Thus, in empirical examinations
using a regression such as (9.15), if one were to use equity instead
of firm values, systematic biases would be introduced. One may find
cases where the equity of one firm is more comparable with the risky
debt of another firm than its equity.
Exercises: General Equilibrium

1. Given individual k has wealth function
m m

k k k
dW = wi (αi − r) + r W dt + wik W k σi dzi − C k dt,
1 1
where asset price Pi follow geometric Brownian motion for i =
1, . . . , m
dPi
= αi dt + σi dZi
Pi
and interest rate has a normal distribution
dr = αr dt + σr dq
(a) Show that the individual demand function is

k m k m
JW JW
dki = wik W k = − k
ν ij (αj − r) − k
r
νij σjr .
JW W j=1 JW W j=1
(b) Explain in your own words the effect of the derived utility
J and its partial derivatives on the FOC φwi = 0 and the
optimal solution
k k k k k
JW (α − r) + JW W (ww )W + JW r σ ir = 0.
2. Given the individual demand function

k m k m

k k k JW JW r
di = wi W = − k νij (αj − r) − k νij σjr .
JW W j=1 JW W j=1
(a) Drive the aggregate demand D below

D = AΩ−1 (α − r) + HΩ−1 σ r ,
K
K

k
A≡ A and H≡ H k,
1 1
Jk J k
Ak = − kW , Hk = − kW r .
JW W JW W
(b) The individual asset excess returns

M H
αi − r = σiM − σir
A A
and interpret this solution using your own words. [Hint: see
Merton (1990) Chapter 15, at the end of Sec. 15.5.]
3. Consider the general case
M 2 H
αM − r = σM − σM r ,
A A
where interest rate is stochastic.
(a) Explain how, in the case of a constant interest rate, it leads
to CAPM and the two-fund separation theorem.
(b) Explain how, in the case of a stochastic interest rate, it leads
to the three-fund separation theorem. Discuss the connection
with
wk∗ = hk (P, t) + m(P, W, t)gk (P, t) + fk (P, W, t)
in Eq. (5.8). [Hint: see Merton (1990) Chapter 15, at the end
of Sec. 15.7, Theorem 15.2.]
4. Prove Breeden’s Theorem 15.7, consumption-based capital asset
pricing model (CCAPM). Explain in your own words why this is
a preference-free specification.
5. Under the condition that there exist traded securities with returns
that are instantaneously perfectly correlated with the changes in
all state variables in the economy, the continuous time model
is equivalent to Arrow–Debreu complete markets model and the
dynamic trading in securities can be a substitute for a full set
of market for pure securities. Discuss the necessary and sufficient
conditions under which these claims are true under
(a) Three-fund separation theorem (see Merton (1990), Sec. 15.7).
(b) m+2-fund separation theorem (see Merton (1990), Sec. 15.10,
Eqs. (15.48) and (15.50) in particular).
b2530_FM.indd 6 01-Sep-16 11:03:06 AM

Chapter 10
Discontinuity in Continuous Time
It has been known for some time in the credit literature that many
empirical results cannot be explained by diffusion processes alone.
For an asset value that follows diffusion process, there is a mini-
mum time required for the asset value to drop below the default
threshold. Empirically observed credit spread and numerous finan-
cial crises show that default can take place at any time and instantly.
Separately, the implied volatility surface observed in the financial
markets also requires the possibility of a stock price jump in order to
explain the steep skewness of the implied volatilities of short matu-
rity options. In this chapter, we show how continuous time technique
can be expanded to analyse and model jump processes needed to
address many empirically observed phenomena.
10.1 Counting and Marked Point Process

Let It be an indicator process that takes the value of 1 when a
particular event occurs and 0 otherwise, as shown in Fig. 10.1.
A point process, Nt , indicates how many times the event has
occurred at and before time t. As shown in Fig. 10.2, Nt is càd-làg,
i.e. right continuous with left limit.
Let Xt be a marked point process
Nt

Xt = Yτi ,
i=0
141
It
A 1
0 time
τ1 τ2 τ3 τ4 τN
B time
τ1 τ2 τ3 τ4 τN
Figure 10.1: Indicator function.
Nt
time
τ1 τ2 τ3 τ4 τN
Figure 10.2: Point process.
i.e. Xt is an indicator process It coupled with a sequence of Yτi ,

where τi denotes the time when the event occurs and Yτi follows
some distribution. Figure 10.3 shows a typical process for Xt , and
Fig. 10.4 shows a cumulated Xt process starting at zero and with
mean drift adjusted to zero making it a martingale process. Such a
zero drift-adjusted process is also known as a compensated process.
10.2 Poisson Process

The Poisson process is normally characterised by the jump intensity,
λ, i.e. the probability of occurrence of the jump event over a unit
time interval. We assume here that the number of jump occurrence in
non-overlapping time intervals are independent. We will show below
how λ can be used to estimate the number of jumps in a given time
interval and the expected number of jumps in unit time.
Discontinuity in Continuous Time 143
–1
–2
–3
–4
–5
0 2 4 6 8 10
Figure 10.3: Simulation of the marks associated with a point process.
–1
–2
–3
0 1 2 3 4 5 6 7 8 9 10
Figure 10.4: Trajectory of a compensated Poisson process.
T
First, divide [0, T ] into n intervals such that Δt = n. Then,
Pr (Nt+Δt − Nt = 1) = λΔt,
Pr (Nt+Δt − Nt = 0) = 1 − λΔt.
Over consecutive periods, we have, due to iid,
Pr (Nt+2Δt − Nt = 0) = (1 − λΔt)2
= Pr (N2Δt − N0 = 0) .
Hence, the probability of no jump over [0, T ] is
Pr (NnΔt − N0 = 0) = (1 − λΔt)n

T n
= 1−λ
n
lim Pr (NnΔt − N0 = 0) = e−λT
n→∞
and similarly, probability of one jump is
Pr (NnΔt − N0 = 1) = n · λΔt · (1 − λΔt)n−1

λT T n
= 1−λ .
1 − λ Tn n
Making use of the fact that (1 − λ Tn )n → e−λT and λ Tn → 0 as

n → ∞, we have
lim Pr (NnΔt − N0 = 1) = λT e−λT .

n→∞
In general, the probability of m jumps over [0, T ] interval is

n
Pr (NnΔt − N0 = m) = (λΔt)m (1 − λΔt)n−m
m
m
n! λT T n
= 1−λ .
(n − m)!m! n − λT n
n!n−m
For a very large n, (n−m)! 1. Hence,
(λT )m −λT
lim Pr (NnΔt − N0 = m) e .
n→∞ m!
For any selected time interval,
[λ (T − t)]m −λ(T −t)
Pr (NT − Nt = m) = e .
m!
For unit time where T − t = 1 and set N0 = 0 at the start, then

for the particular period, the probability of m jumps is
λm −λ
Pr (N = m) = e . (10.1)
m!
Lemma 10.2.1. For the special series below:
∞
am
aea = m .
m!
m=0
From the result of Lemma 10.2.1, we can derive the expected

number of jumps as
∞

E (N ) = m Pr (N = m) .
m=0
Substitute this result into (10.1), we get

∞
∞

λm λm
E (N ) = me−λ = e−λ m .
m! m!
k=0 m=0
Using the results in Lemma 10.2.1 gives
E (N ) = e−λ · λeλ = λ.
Here, we show that given the probability of jump (or jump inten-
sity) λ, the expected number of jumps per unit time turns out to
also be λ. It is important to note that the actual number of jumps at
time t, Nt , cannot be easily estimated from observed data using max-
imum likelihood. Nt is known as an incidental or nuisance parameter
(or variable).
10.3 Constant Jump Size

In this section, we first assume a simple case where the jump size
is constant. We use this special case to provide the intuition why
the market price of jump risk is related to the jump intensity, λ. We
will discuss, in the next section, the case when the jump size is not
constant. First, consider the continuous part dS c and the jump part
dS d separately for the price process
dS c = μS c dt + σS c dW,
dS d = (J − 1) S d dq, (10.2)

0 with probability 1 − λdt
dq =
1 with probability λdt.
In the event of a jump, dq = 1
dS d = (J − 1) S d × 1,
S + − S − = (J − 1) S − ,
S+
J= .
S−
That is, (J − 1) is the percentage change in the stock price when the
Poisson event occurs.
Ito lemma on only the jump part for a derivative V (S, t) is

dV = V + − V − dq,

V + = V JS − , t = V S + , t ,

V − = V S−, t
and the total variation, including both the jump and the diffusion
parts, is

∂V 1 ∂2V ∂V ∂V
dV = μS + σ2S 2 2 + dt + σS dW + V + − V − dq
∂S 2 ∂S ∂t ∂S
∂
with ∂S = ∂S∂ c for the continuous part. In general, dS is written
without specifically separating the diffusion and the jump parts as
follows:
dS = μSdt + σSdW + (J − 1) Sdq. (10.3)
10.3.1 Fundamental PDE with constant jump size

To achieve no arbitrage, create a portfolio Π with one derivative V ,
Δ amount of stock and Δ1 amount of a second derivative written on
the same stock
Π = V − ΔS − Δ1 V1 ,
dΠ = dV − ΔdS − Δ1 dV1

1 2 2 ∂2V ∂V 1 2 2 ∂ 2 V1 ∂V1
= σ S + dt − Δ1 σ S + dt
2 ∂S 2 ∂t 2 ∂S 2 ∂t

∂V ∂V1
+ − Δ − Δ1 (μSdt + σSdW )
∂S ∂S

+ V + − V − − Δ (J − 1) S − Δ1 V1+ − V1− dq. (10.4)
Assuming for now that J is constant, hence there is a pair of

(Δ, Δ1 ) that will eliminate the uncertainties caused by the diffu-
sion and jump such that the portfolio is risk free and earns risk-free
return1
dΠ = rΠdt.
This means, to eliminate the diffusion part,
∂V ∂V1
− Δ − Δ1 =0
∂S ∂S
∂V ∂V1
Δ= − Δ1
∂S ∂S
and to eliminate the jump part
(V + − V − ) − Δ(J − 1)S − Δ1 (V1+ − V1− ) = 0,

∂V ∂V1
(V + − V − ) − (J − 1)S + Δ1 (J − 1)S − Δ1 (V1+ − V1− ) = 0
∂S ∂S
or
(J − 1) S ∂V
∂S − (V − V )
+ −
Δ1 = −
. (10.5)
(J − 1) S ∂V
∂S
1
− V 1
+
− V 1
1
Note that this is a case of solving two equations with two unknowns.
Now return to (10.4)

1 2 2 ∂2V ∂V 1 2 2 ∂ 2 V1 ∂V1
dΠ = σ S + dt − Δ1 σ S + dt
2 ∂S 2 ∂t 2 ∂S 2 ∂t
= rΠdt
= r (V − ΔS − Δ1 V1 ) dt.
Rearranging, we get

1 2 2 ∂ 2 V ∂V 1 2 2 ∂ 2 V1 ∂V1
σ S + −Δ 1 σ S +
2 ∂S 2 ∂t 2 ∂S 2 ∂t

∂V ∂V1
= r V −S + Δ1 S − Δ1 V1
∂S ∂S
∂V 1 2 2 ∂ 2 V ∂V
+ σ S 2
+rS − rV
∂t 2 ∂S ∂S

∂V1 1 2 2 ∂ 2 V1 ∂V1
= + σ S + rS − rV1 Δ1 .
∂t 2 ∂S 2 ∂S
Let us define the differential operator
∂ 1 ∂2 ∂
L≡ + σ 2 S 2 2 + rS − r.
∂t 2 ∂S ∂S
Then
LV = Δ1 × LV1 ,
LV
Δ1 = .
LV1
Substituting Δ1 from (10.5)
LV (J − 1) S ∂V −
∂S − (V − V )
+
= −
,
LV1 (J − 1) S ∂V
∂S − V1 − V1
1 +
LV LV1
= . (10.6)
∂V −
(J − 1) S ∂S − (V − V )
+ (J − 1) S ∂S − V1+ − V1−
∂V1
A closer examination of (10.6) reveals that the LHS involves func-

tion of V only (not V1 ) and the RHS is expressed in terms of V1 only
(not V ). This means that either side must be applicable for any
other derivatives Vi , i = 1, 2, 3, . . . and for this reason, the LHS (or

the RHS) must be a function of only S and t and not V . Let us
denote this function as Ψ(S, t). Thus, we may write
LV
= Ψ (S, t) ,
(J − 1) S ∂V
− (V + − V − )
∂S

+ −
∂V
LV + Ψ (S, t) V − V − (J − 1) S = 0. (10.7)
∂S
10.3.2 Market price of jump risk

Now, we exploit the fact that Eq. (10.7) must apply to any deriva-
tive written on S, including all special cases. Consider a special case
where the derivative V pays £1 at T if there is no jump and zero
otherwise. Since V does not depend on the value of S c , ∂V
∂S = 0. Also,
if there is a jump

V + = V S + , t = 0,

V − = V S − , t = V.
Hence, Ψ(S, t) = Ψ(t) is a function of t only and Eq. (10.7)

becomes
∂V
− rV = Ψ (t) V,
∂t
1 ∂V
= Ψ (t) + r.
V ∂t
Integrating both sides,
T T T
1
dt = Ψ (t) dt + rdt,
0 V 0 0
T
ln VT − ln V0 = Ψ (t) dt + rT,
0
RT
Ψ(t)dt+rT
VT = V0 e 0 ,
RT
V0 = VT e− 0
Ψ(t)dt−rT
= VT e−(λ+r)T .
We can see that Ψ(t) is related to the hazard rate (or jump
intensity) of the Poisson process. If Ψ(t) = λ, then

+ −
∂V
LV + λ V − V − (J − 1) S = 0. (10.8)
∂S
Here, λ reflects the jump risk and the bracket [· · · ] measures the
market risk premium for jump risk.
10.3.3 European call price

Let H(ST ) = (ST − K)+ be the payoff function of a European call
option, and τ = T − t is the time to maturity. The European call
price is
F (S, τ ) = e−rτ E Q [H (ST )] .
First, note that under the risk neutral measure, the drift of the return
process in (10.3) must satisfy:

dS
E = μdt + λE [J − 1] dt = rdt,
S
μ = r − λκ, κ = E [J − 1] .
Now, we evolve the process in (10.3) in log:

1 2
d log S = r − λκ − σ dt + σdW + log Jdq.
2
Over time τ = T − t and condition on Nτ = n, log ST has the
following distribution:

1 √
log St + r − λκ − σ 2 τ + σ τ N (0, 1) + n log J
2

n −λκτ 1 2 √
= log St J e + r − σ τ + σ τ N (0, 1)
2
which can be priced as a Black–Scholes with stock price St∗ =
St J n e−λκτ . Note that since we assume here that jump is not stochas-
tic, jump does not contribute to the total variance. The jump impacts
on the mean drift only through the compensator and has a determin-
istic impact, J n , on stock price St condition on there being n jumps.
Define the random variable Xn to have the same distribution

as the product of n independently distributed random variables,
each identically distributed to the random variable J in (10.2), and
X0 ≡ 1. Define En to be the expectation operator over the distribu-
tion of Xn . Given that the jump size is fixed, Xn = J n is also fixed
conditioning on a given value for n.
The derivative price is then a weighted sum of Black–Scholes
prices taking into account the different values of n
∞

F (S, τ ) = e−rτ Pr (Nt = n) EnQ [H (ST )] .
n=0
λn e−λ
Recall that the probability of n jumps per unit time is n! . Then
over τ period,
∞ −λτ
e (λτ )n
F (S, τ ) = e−rτ EnQ [H (ST )]
n=0
n!
∞ −λτ
e (λτ )n
= En W S ∗ , τ, σ 2 , r, K , (10.9)
n!
n=0
where W ≡ W (S ∗ , τ ; σ 2 , r, K) is the Black–Scholes option price, with

S ∗ = SXn e−λκτ .
10.3.4 Immediate ruin

Samuelson (1972) specifies a special case where the stock price goes to
zero if the Poisson event occurs. In this case, we need only to consider
the n = 0 case for (10.9) as Xn = 0 for n = 0. Moreover, given the
definition of the payoff function, J ≡ 1, κ = −1, S ∗ = Se−λτ
Q
F (S, τ ) = Pr (Nt = 0) e−rτ En=0 [H (ST )]
= e−λτ W (Seλτ , τ ; σ 2 , r, K).
Since

0 λτ 1 2 1 2
log St J e + r − σ τ = log St + r + λ − σ τ,
2 2
λ can be absorbed into r to give

F (S, τ ) = W (S, τ ; σ 2 , (r + λ), K), (10.10)
which is a Black–Scholes with r replaced by r + λ.
10.4 Random Jump Size

In the previous section, we assume that J is fixed. If J takes two
values, then we need one extra derivative to achieve complete hedge.
If the jump size J is a random variable following some distributions,
then we will need an infinite number of hedging assets! So when
jump size is random, it is not possible to fully hedge away the jump
risks. Under the risk neutral measure (i.e. one through risk premium
compensation rather than though hedging), the drift of the combined
process must still be a martingale:

dS
E = μdt + λE(J − 1)dt = rdt, (10.11)
S
which means μ = r−λκ for κ = E(J−1) where J denotes a stochastic
variable in contrast to J with a fixed jump size.
So, under the risk neutral measure,
dS = (r − λκ) Sdt + σSdW + (J − 1)Sdq (10.12)
and we write (10.8) more generally as

+ −
∂V
LV + λ V − V − κS = 0. (10.13)
∂S
Since, a jump process with random size cannot be fully hedged,
Merton assumes “jump risk” is diversifiable, and therefore no pre-
mium is attached to jumps. This is not true. In the physical “real-
world” process in (10.11), asset pricing process will adjust the stock
price today such that the equilibrium μ reflects the magnitude of λ
and κ. Under the risk neutral process in (10.12), the market price of
derivative will be adjusted for the jump risk. This means that the λ
and κ (risk neutral) parameters when calibrated to market deriva-
tive prices are typically “inflated” for put option if the jump size is
negative. The difference between the magnitude of the risk neutral
and real-world parameter values will be an indication of risk aversion

of the investor and the cost of indirect hedge.
Separately, Cont et al. (2004, Chapter 10) makes several sug-
gestions for pricing asset with jumps. The first is the superhedging
approach, which will lead to price bounds corresponding to a long
hedge and a short hedge. The second approach is to find the opti-
mal hedge by minimising the hedging error usually formulated as
the expectation of a convex loss function. This approach can then
lead to a utility indifference price. Finally, quadratic hedging aims to
minimize the hedging error in a mean square manner assuming that
one can trade the gains and losses systematically.
10.4.1 When J has a lognormal distribution

[For a complete derivation, see Cont and Tankov (2004, p. 322) or
a step-by-step derivation, see Joshi (2003) book.] Following Merton
(1990, p. 321), J has a lognormal distribution with mean γ and vari-
ance δ2 . Merton commented that γ ≡ log(1 + κ), which we can also
write κ ≡ eγ − 1. Now, we evolve the process in (10.12) in log and
over the time period τ = T − t. Assuming that there are n jumps,
log ST has the following distribution:
n
1 2 √
log St + r − λκ − σ τ + σ τ N (0, 1) + log Jj
2
j=0
n
1 √
= log St + rn − vn2 τ + σ τ N (0, 1) + δ N (0, 1) ,
2
j=0
where
n 2
vn2 ≡ σ 2 + δ ,
τ
n
rn ≡ r − λκ + γ.
τ
Here, the jump is stochastic and has an impact on the drift and the
variance rates of the stock price process. First, it enters the drift
through the compensator −λκ. Next, depending on the number of
actual jumps, n, it increases the drift by nτ γ and increases the vari-
ance by nτ δ2 .
Conditioned on the number of jumps n, the price of the call

option is
1 2 1 2
fn (S, τ ) = e−rτ St ern τ − 2 vn τ + 2 vn τ N

ln St + rn − 12 vn2 τ − ln K √
× √ + vn τ
vn τ

ln S t + rn − 1 2
v
2 n τ − ln K
− e−rτ KN √
vn τ

−λκτ n ln SKt + rn + 12 vn2 τ
= St e (1 + κ) N √
vn τ

−rτ ln SKt + rn − 12 vn2 τ
− e KN √ . (10.14)
vn τ
The actual call price F (S, τ ) is the sum of these option prices2
each weighted by the probability that a Poisson random variable will
take place on the value n
∞ −λτ
e (λτ )n
F (S, τ ) = fn (S, τ )
n!
n=0
10.5 Intertemporal Portfolio Selection

with Jumps
In Das and Uppal (2004), the jump diffusion process
dSi
= αi dt + σi dzi + Ji − 1 dQ (λ) , i = 1, . . . , N (10.16)
Si
2
In Merton, the following expression is given
X∞
exp (−λ τ ) (λ τ )
n
F (S, τ ) = fn (S, τ ) , (10.15)
n=0
n!
where λ ≡ λ(1 + κ) and fn (S, τ ) ≡ W (S, τ ; vn2 , rn , K), the Black–Scholes option
price. This is not correct; while exp(−λτ )(λτ )n×exp(−λκτ )(1+κ)n = exp(−λ(1+
κ)τ )(λ(1 + κ)τ )n , this applies to the St component only and not the second
component associated with the strike price in (10.14).
has only systemic (common) jumps

dQi (λi ) = dQ (λ) , i = 1, . . . , N,

ln Ji ∼ N μi , ν 2 .i
That is, jumps for all assets are assumed to arrive at the same time;
conditioned on there being a jump, the jump size is assumed to be
perfectly correlated across assets; the value of all assets jumps in the
same direction. For n = 0, we have risk-free asset
dS0 = rS0 dt.
In this model, the total covariance is

dSi dSj J
Et × = σij dt + σij dt,
Si Sj

= [σij ] ≡ [σi σj ρij ] ,
J
J
= σij ≡ f λ, μi , νi2 .
Matching of moments with the wrong assumption of no-jump
produces the mean and total covariance
i = αi + αJi ,
α
J
ij = σij + σij
σ .
As Merton (1990, Section 9.4, Chapter 9) commented, when total
variance and covariance are correctly estimated but jumps omitted,
the valuation of option price may not be very different. The most
sensitive region will be the deep OTM (ITM) option, where there is
relatively little probability that stock price will exceed or fall below
the exercise price prior to expiration if the underlying process is con-
tinuous. However, the possibility of a large jump in price significantly
changes this probability and hence makes the option more valuable.
These differences will be magnified as one goes to short maturity
options, and the percentage difference could be substantial for OTM
options (i.e. put at low strike and call at high strike). If jumps have
directions (i.e. positive jump and negative jump), then the negative
jumps will have a direct impact on OTM put whereas the positive
jumps will have a direct impact on OTM calls.
10.5.1 Portfolio selection

With a power utility
WT1−γ
U (WT ) =
1−γ
the objective is to maximise the value function

WT1−γ
V (Wt , t) = maxE
{wn } 1−γ
with dynamic of wealth

dWt
= w R + r dt + w (σ · dZ t ) + w J t dQ (λ) , W0 = 1,
Wt
where λ is the jump intensity which is constant across time and for
all assets, w is the N × 1 vector of risky asset portfolio weights, R ≡
{α1 −r, . . . , αN −r} is the excess return, σ is the vector of volatilities,
dZ t is the vector of diffusion shocks, and J t ≡ {J1 − 1, . . . , JN − 1}
is the vector of random jump amplitudes for the N assets at time t.
Bellman equation now becomes

∂V ∂V 1 ∂2V
0 = max + Wt w R + r + W t
2
w w
{w} ∂t ∂W 2 ∂W 2

+ λE V Wt + Wt w J t , t − V (Wt , t) . (10.17)
Das and Uppal’s Proposition 2 shows that if the value function has
the form
Wt1−γ
V (Wt , t) = A (t) ,
1−γ
∂V W 1−γ ∂A (t)
= t ,
∂t 1 − γ ∂t
∂V 1
= A (t) γ ,
∂W Wt
∂2V
= −γA (t) Wt−γ−1 ,
∂W 2
then

λE V Wt + Wt w J t , t − V (Wt , t)

= λE V Wt 1 + w J t , t − V (Wt , t)
A (t) Wt1−γ
=λ E[(1 + w J t )1−γ − 1]
1−γ
= λV (Wt , t) E[(1 + w J t )1−γ − 1].
∂V ∂V ∂2V
Substituting this and functional forms of V , ∂t , ∂W , ∂W 2
into
(10.17) gives

V (Wt , t) ∂A (t)
0 = max + A (t) Wt1−γ w R + r
{w} A (t) ∂t

γA (t) Wt1−γ
− w w + λV (Wt , t) E[(1 + w J t )1−γ − 1] ,
2

1 ∂A (t) (1 − γ) γ
0 = max + (1 − γ) w R + r − w w
{w} A (t) ∂t 2

+ λE[(1 + w J t )1−γ
− 1] (10.18)
and differentiating with respect to w gives

0 = R−γ w + λE[J t (1 + w J t )−γ ]
and w has to be solved numerically. For pure diffusion,

w = γ1 −1 R.
To identify A(t), we start by evaluating (10.18) at the optimum

weights which implies
1 ∂A (t)
= −κ, (10.19)
A (t) ∂t
where
1
κ ≡ (1 − γ) w R + r − γ (1 − γ) w w + λE[(1 + w J t )1−γ − 1].
2
Integrating both sides of (10.19) gives

dA
= ln A + C1 = −κdt = −kt + C2 ,
A
A (t) = e−κt+C2 −C1 = ae−κt ,
where a is an integrating constant. Using the terminal boundary

condition
A (T ) = ae−κT = 1, (10.20)
a = eκT .
Hence,
A (t) = e−κ(T −t) ,
WT1−γ
V (Wt , t) = e−κ(T −t) .
1−γ
For the special case of γ = 0, the risk neutral investor ignore the
higher moments

κ ≡ w α + λE 1 + w J t − 1 .
10.5.2 Stock markets systemic and idiosyncratic risk

Now extend (10.16) to include also idiosyncratic jumps, Ii , where
every stock i has its own jump intensity δi in addition to the systemic
jump intensity λ as follows:

dSi
= αi dt + σi dzi + (Ji − 1)dQ(λ) + (Ii − 1)dQ(δi )
Si
dQ(δi ) for i = 1, . . . , N are independent idiosyncratic Poisson jumps.
In this case, Eq. (10.18) becomes

∂V ∂V 1 ∂2V
0 = max + Wt w R + r + W t
2
w w
{w} ∂t ∂W 2 ∂W 2

+ λE V Wt + Wt w J t , t − V (Wt , t)
N

+ δi E [V (Wt + Wt wi Ii,t , t) − V (Wt , t)] (10.21)
i=1
and under the first-order condition:

0=R−γ w + λE[J t (1 + w J t )−γ ] + Λ,
where Λ = [δ1 E[I1 (1 + w1 I1 )γ ], . . . , δN E[IN (1 + wN IN )γ ]].

As the names imply, a jump that affects all assets is classified as a
systematic jump. A jump that affects only one asset is classified as an
idiosyncratic jump. In practice, the distinction is less clear as there
will be many jumps that impact on most but not all assets. There
are also many jumps that are experienced by different subgroups of
assets at different time.
Exercises: Discontinuity in Continuous Time

1. Most of the discussion in this chapter depends heavily on the abil-
ity to trade the underlying asset and sometimes the derivative(s)
as well in order to construct the hedged portfolio. Consider how
would one solve option pricing problem when the underlying asset
is not tradable. Ingersoll (1987, p. 381) provides some analyses for
the diffusion case. Suggest how you would extend this analysis to
jump-diffusion case. [Hint: With jumps, the main challenge is the
integral term in the PIDE as a consequence of jumps. As the
additional term is not a local term, it depends on the solution for
derivative price at the point S before and after the jump.]
2. Merton solved the option pricing problem by assuming that the
jump risk is not priced. Critically evaluate this assumption. Sug-
gest and illustrate (with at least one method) how you might solve
the pricing problem of jump processes. [Hint: There are some sug-
gestions in Cont et al. (2004).]
3. Reversed convertibles have been proposed, after the subprime cri-
sis, as a potential safeguard against bank failures. Kashyap et al.
(2008) analyse a reverse convertible bond with face value of $B
which will be turned into n number of equity shares when the
leverage ratio hits a fixed threshold or the equity value falls below
a threshold. Pennacchi et al. (2010) suggest the use of a call option
enhanced reverse convertible (see also Schoutens (2011). Given the
following parameters for the jump-diffusion and diffusion models,
calculate the price of a reverse convertible under each of these
two dynamics: [Hint: The conversion threshold suggests conver-
sion will take place when share price drops to 50 = 100 2 . Hence,
the reverse convertible can be considered as a portfolio of a short
put with strike price 50, and a cash or nothing digital call also
strike at 50. With appropriate assumption, e.g. normal or lognor-
mal jump size, the options can be valued using the solution in
Merton (1990) and in this chapter. Alternatively, you can solve
this question numerically by simulations.]
Diffusion Jump-Diffusion
σ = 26.46% σ = 20% S = 100
λ = 1.5 per year B = 100
m = −0.10 Threshold = BS =2
δ = +1.10 Conversion ratio 1 : 1
T = 1 year
r = 0%
(Kashyap et al. 2008; Pennacchi et al. 2010; Schoutens, 2011).

4. (Capital Structure Theory) Assuming that VT has a lognormal
distribution and Vt follows a GBM, show the impact of volatility
on the value of debt and equity at the various critical thresh-
old levels of a “Call Option Enhanced Reverse Convertible” (see
Pennacchi et al. 2010).
5. Kozhan et al. (2011) show skew risk premium is related to variance
risk premium in a general equilibrium model.
(a) Compare and contrast the Epstein–Zin utility in Kozhan et al.
(2011, Section 6.1) with HARA utility.
(b) Show how the P and Q dynamics of S&P 500 are connected
through this Epstein–Zin utility function.
(c) What is the consequence if you replace the Epstein–Zin utility
with Merton’s power utility which is in the HARA family.
(d) What is the relationship between variance risk premium and
skewness risk premium? What are their relationships with
jumps? (Kozhan et al. 2011)
6. Kostakis et al. (2011, show Section 2) that marginal rate of sub-
stitution of wealth can be approximated by a function of the first
three moments of asset returns and the agent’s relative risk aver-
sion (Arrow, 1964; Pratt, 1964), relative prudence (Kimball, 1990)
and relative temperance (Eeckhoudt et al. 1996).
(a) Define and study the three measures of preference function,
i.e. risk aversion, prudence and temperance.
(b) What are the relative risk aversion, relative prudence and
relative temperance of an investor who has a power utility of
the same form in Merton’s CRRA investor?
(c) What are the relative risk aversion, relative prudence and
relative temperance of an investor who has a HARA utility?
(d) What are the relationships between these three measures of
preference function and asset return risk premium?
(e) How is jump impact on the first three moments of asset
returns? How would the risk premium of jumps change
extending from your answers to part (d)? (Kostakis et al.
2011).
Chapter 11
Spanning and Capital Market

Theories
This chapter is based on Merton (1990, Chapter 2, Section 2.4)

where investment and asset pricing are evaluated in a static one-
period framework without consumption. We have adopted all the
numbers for Definition, Theorems and Corollary from Merton (1990,
Chapter 2).
Definition 2 (Spanning). The set of returns (X1 , . . . , XM ) span
the entire N-securities portfolio space ψ iff for any portfolio Zp in ψ
M
M

Zp = δj Xj , δj = 1,
j=1 j=1
M ∗ ≤ N , where M ∗ is the smallest possible value of M . For spanning

to be non-trivial M ∗ N .
Under spanning, investors are indifferent between selecting their
optimal portfolio from ψ and selecting combinations of funds from
M . If the investor knows that the funds span the space of optimal
portfolios, then he needs only to consider (X1 , . . . , XM ) to deter-
mine his optimal portfolio. Hence, if M ∗ funds can be constructed
without the information regarding the preferences, endowments and
probability beliefs of each investor, we have the separation theorem.
It is important to note that unlike the optimal portfolio of an
individual investor, the portfolio of securities held by a mutual fund
need not be an efficient portfolio. M ∗ has important implications
163
for the equilibrium properties of individual securities returns and

the optimal rules for firms making production and capital budgeting
decision. Moreover, for all important models of portfolio selection or
asset pricing, there exists some non-trivial form of spanning.
11.1 Necessary Conditions for Non-trivial

Spanning
Non-trivial spanning refers to the case when M ∗ < N , i.e. M is
strictly less than N , and in practice M is significantly less than N .
The necessary and sufficient conditions for spanning in Cass and
Stiglitz (1970) and Ross (1978) are presented in Theorems 2.9 and
2.10. The assumptions needed are quite weak, viz.
(i) Investors need not be risk averse so long as they prefer more to
less.
(ii) Investors agree on the factor loadings {aij } (see below).
(iii) Information on the joint distributions of (Z1 , . . . , Zn ) and that
for (X1 , . . . , Xm ) is not needed given the factor loadings infor-
mation in (ii) above.
Let ψ f be the set of all feasible portfolios that can be con-

structed from a riskless security R and n risky securities with returns
(Z1 , . . . , Zn ) and covariance w.
Theorem 2.9 (Necessary conditions for spanning). The set

(X1 , . . . , XM ) span the set of all feasible portfolio ψ f if (i) the rank

of w ≤ M , and (ii) there exist (δ1 , . . . , δM ), M j=1 δj = 1 such that
M
j=1 δj Xj has zero variance (i.e. risk free). In another words, each
security in ψ f can be spanned by (X1 , . . . , XM ) only if the rank of
w ≤ M . Moreover, since ψ f contains the risk-free asset, one of the
combinations of (X1 , . . . , XM ) must be a risk-free portfolio.

Proposition 2.1 (Risk-free portfolio). If Zp = nj=1 aj Zj + b is
the return on some security or portfolio and if there are no arbitrage
Spanning and Capital Market Theories 165
opportunities, then
⎛ ⎞
n

b = ⎝1 − aj ⎠ R
j=1
n

Zp = aj (Zj − R) + R
j=1
Proof. Let Z † be the return on a portfolio with fraction δj† allocated

to security j, j = 1, . . . , n; δp allocated to the security with return

Zp ; and (1 − δp − n1 δj† ) allocated to the riskless security with return
R. If δj† is chosen such that
δj† = −δp aj
then
n
n

†
Z = δj† Zj + δp Zp + 1 − δp − δj† R
1 1
n n

= (−δp aj )Zj + δp Zp + 1 − δp − (−δp aj ) R
1 1
n

n

= −δp aj Zj + δp Zp + 1 − δp + δp aj R.
1 1
Since
n

Zp = aj Zj + b
j=1
⎛ ⎞

n
n
n

†
Z = −δp aj Zj + δp ⎝ aj Zj + b⎠ + 1 − δp + δp aj R
1 j=1 1
n

= δp b + R − Rδp + Rδp aj
1
⎡ ⎛ ⎞⎤
n

= δp ⎣b − R ⎝1 − aj ⎠⎦ + R.
j=1
According to Theorem 2.9, the necessary condition for spanning is

that there exists a set of M feasible portfolios such that a combina-
tion of this set is a riskless portfolio (with zero variance). Assume Z †
is a riskless portfolio, then we have Z † = R under bo arbitrage. But
δp can be chosen arbitrarily. Therefore
⎛ ⎞
n
b = R ⎝1 − aj ⎠
j=1
Substituting for b, it follows directly that

⎛ ⎞
n n

Zp = aj Zj + R ⎝1 − aj ⎠
j=1 j=1
n

= aj (Zj − R) + R.

j=1
This proposition guarantees that one of the portfolios in any

candidate spanning set is the riskless security, m = M − 1, for
(X1 , . . . , Xm , XM = R)
Theorem 2.10 (Security returns). A necessary and sufficient
condition for (X1 , . . . , XM , R) to span ψ f if there exists {aij } where
m

Zj = R + aij (Xi − R), j = 1, . . . , n
i=1
Proof. Necessary condition: If (X1 , . . . , Xm , R) span Ψf , then there

exist portfolio weights (δ1j , . . . , δM j ), M1 δij = 1, such that Z =
M mj
1 δij Xi . Noting that XM = R and substituting δM j = 1 − 1 δij ,
we have that
M

Zj = δij Xi
1
m

= δij Xi + δM j XM
1
m
m

= δij Xi + 1− δij R
1 1
m

=R+ δij (Xi − R).
1
Sufficient condition:
If there exist numbers {aij } such that

m

Zj = R + aij (Xi − R),
1
then pick the portfolio weights δij = aij for i = 1, . . . , m, and δM j =

1− m 1 δij , from which it follows that
m
m

Zj = R + δij Xi − δij R
1 1
M

= δij Xi .
1
That is Zj can be written as a portfolio combination of (X1 , . . . , Xn )

and R. Hence, (X1 , . . . , Xm , R) span Ψf and this proves sufficiency.
If M ∗ = m + 1 is the smallest, then w is of rank m, XM = R.
11.2 Efficient Portfolio and Spanning

Now define ψ e as the set of all efficient portfolios contained in ψ f .
In the following theorems, we need stronger assumptions i.e. con-
cave utility where all investors are risk averse and have the same
probability beliefs.
Proposition 2.2 (Risk-free security in the efficient set). If

Ze is the return on a portfolio contained in ψ e , then any portfolio
that combines positive amounts of Ze with riskless security is also
contained in ψ e .
Proof. Let
Z = δ (Ze − R) + R, δ > 0.
As Ze is an efficient portfolio, there exists a strictly concave increas-

ing function V such that
E{V (Ze )(Zj − R)} = 0, j = 1, . . . , n.
Define
U (W ) ≡ V (aW + b),
where a ≡ 1δ > 0 and b ≡ (δ − 1) Rδ . As a > 0, U is a strictly concave

and increasing function and
U (Z) = aV (Ze ).
Hence,
E{U (Z)(Zj − R)} = 0, j = 1, . . . , n.
Therefore, there exists a utility function such that Z is an optimal

portfolio, and thus Z is an efficient portfolio.
Theorem 2.11 (Expected return of Z with respect to effi-

cient portfolio). Let (X1 , . . . , XM ) denote the returns on m feasi-
ble portfolios. If for security j, there exists a number {aij } such that

Zj = Z j + m K
i=1 aij (Xi − X i ) + i where E{j VK (Ze )} = 0 for some
efficient portfolio K, then
m

Zj = R + aij (X i − R).
i=1
Proof. Let Zp be the return on a portfolio with fraction δ allo-

cated to security j; fraction δi = −δaij allocated to portfolio Xi ,

i = 1, . . . , m; and 1 − δ − m
1 δi allocated to the riskless security.
Since
m

Zj = Z j + aij (Xi − X i ) + i .
i=1
Substitute it to Zp , we have
m
m

Zp = δZj − δ aij Xi + 1−δ+δ aij R
i=1 1

m

= δ Zj + aij (Xi − X i ) + i
i=1
m
m

−δ aij Xi + 1−δ+δ aij R
i=1 1

m

= R + δ Zj − R − aij (X i − R) + δj .
i=1
Given that for efficient portfolio

E{j VK (ZeK )} = 0,
E{δj VK } = δE{j VK } = 0.
By construction, E{j } = 0, and hence Cov(Zp , VK ) = 0. Therefore
the systematic risk of portfolio p, bK
p , is zero. From Theorem 2.4, if
Zp is the return on a feasible portfolio p and ZeK is the return on
efficient portfolio K, then
K
Z p − R = bK
p (Z e − R).
Since bK
p = 0, Z p = R and

m

δ Zj − R − aij (X i − R) = 0.
i=1

But δ can be chosen arbitrarily. Therefore, Z j = R + m i=1 aij (X i −
R). Hence, if the return on a security can be written in this lin-
ear form of the portfolios (X1 , . . . , XM ), its expected return is com-
pletely determined by the expected returns on these portfolios and
the weights {aij }.
Theorem 2.12 (Premium of Z and spanning). If, for every

security j, there exist numbers {aij } such that
m

Zj = R + aij (Xi − R) + j ,
i=1
where E{j |X1 , . . . , XM } = 0, then (X1 , . . . , XM , R) span the set of

efficient portfolio ψ e .
Proof. Let wjK denote the fraction of efficient portfolio K allocated

to security j, j = 1, . . . , n. By hypothesis, we can write
m

ZeK = R + δiK (Xi − R) + K ,
i=1
n n
where δiK ≡ K
1 wj aij and K ≡ 1 wjK j with
n

K
E{ |X1 , . . . , XM } = wjK E{j |X1 , . . . , Xm } = 0.
j=1
Construct the portfolio with return Z by allocating fraction δiK to

portfolio Xi , i = 1, . . . , m, and fraction 1 − m K
i=1 δi to the riskless
K
security. Then Ze can be written as
m
m

K K K
Ze = δi Xi + R 1 − δi + K
i=1 i=1
= Z + K ,
where

m
K
K K
E{ |Z} = E δi Xi = 0

i=1
because
E{j |X1 , . . . , Xm } = 0.
Hence, for K = 0, ZeK is riskier than Z in the Rothschild–Stiglitz
sense, which contradicts that ZeK is an efficient portfolio. Thus,
K ≡ 0 for every efficient portfolio K, and all efficient portfolios can
be generated by a portfolio combination of (X1 , . . . , XM , R).
Theorem 2.13 (Conditional value of ej condition on return

of replicating portfolio). Let wjK denote the fraction of efficient
portfolio K allocated to security j, j = 1, . . . , n. (X1 , . . . , Xm , R) span
ψ e iff
m

Zj = R + aij (Xi − R) + j ,
i=1
where
m

E j δiK Xi = 0,

i=1
n

δiK = wjK aij
j=1
for every efficient portfolio K.
Proof. The proof is in four specialised lemmas in Ross (1978,

appendix).
The implication is that if we can find a set of (X1 , . . . , Xm ) such

that every security returns can be expressed as a linear combina-
tion of (X1 , . . . , Xm , R) plus zero mean noise, then we have a set
of portfolio that span ψ e . Note that, unlike the previous case, to
span the efficient set it is not necessary that the linear combina-
tions of the spanning portfolios exactly replicate the return on each
available security. Furthermore, we have the special case m = 1 in
Corollary 2.13.
Corollary 2.13. (aj and spanning). (X1 , R) span ψ e iff there
exists a number aj for each security j, j = 1, . . . , n, such that
Zj = R + aj (X − R) + j ,
E {j |X } = 0.
Proof. The “if” part follows directly from Theorem 2.12. Let wjK
denote the fraction of efficient portfolio allocated to security j, j =
1, . . . , n. We have
ZeK = R + δK (X − R) + K

where δK = n1 wjK aj and K = n1 wjK j with
n

E{K |X} = wjK E{j |X} = 0.
1
Following the proof for Theorem 2.12, construct a portfolio Z such

that ZeK = Z + K where E{K |Z} = E{K | m K
i=1 δi Xi } = 0. If
K = 0, ZeK is riskier than Z in the Rothschild–Stiglitz sense, which
contradicts that ZeK is an efficient portfolio. Thus, K ≡ 0 for every
efficient portfolio. Therefore, if there exists a number aj for each
security j, j = 1, . . . , n, such that Zj = R + aj (X − R) + j where
E{j |X} = 0, (X, R) span ψ e .
The proof for the “only if” part is as follows. By hypothesis,
ZeK = δK (X − R) + R
for every efficient portfolio K. If X = R, then from Corollary 2.1
(which states that if R is the riskless return, then Z e ≥ R with
equality holds only if Ze is riskless), δK = 0 for every efficient port-
folio K and R spans Ψe . Otherwise, from Theorem 2.2 (which states
that a risk-averse investor will choose the riskless security if and only
if Z j = R for all j = 1, 2, . . . , n), we have δK = 0 for every efficient
portfolio. By Theorem 2.13, E{j |δK X} = 0, for j = 1, . . . , n and
every efficient portfolio K. But, for δK = 0, E{j |δK X} = 0, we can
get E{j |δK X} = 0 only if E{j |X} = 0.
Proposition 2.3 (Formula for aij ). If, for every security j,

E{j |X1 , . . . , Xm } = 0 with (X1 , . . . , Xm ) linearly independent with
finite variances and if the return on security j, Zj , has a finite vari-
ance, then the {aij }, i = 1, . . . , m in Theorems 2.12 and 2.13 are
given by
n

aij = νik Cov(Xk , Zj )
i=1
−1
where νik is the ikth element of wX .
m
Proof. We want to prove apq = k=1 Vpk Cov(Xk , Zq ).
m

Vpk Cov(Xk , Zq )
k=1
m
m

= Vpk Cov Xk , R + aiq (Xi − R) + q
k=1 i=1
m
m

= Vpk Cov Xk , aiq Xi
k=1 i=1
= a1q Vp1 Cov(X1 , X1 ) + · · · + amq Vp1 Cov(X1 , Xm )

+ a1q Vp2 Cov(X2 , X1 ) + · · · + amq Vp2 Cov(X2 , Xm ) + · · ·
+ a1q Vpm Cov(Xm , X1 ) + · · · + amq Vpm Cov(Xm , Xm )
m
m

= a1q Vpi Cov(X1 , Xi ) + · · · + amq Vpi Cov(Xm , Xi ).
i=1 i=1
The only nonzero element is

m

apq Vpi Cov(X1 , Xi ) = apq
i=1
−1
as the rests equals to zero according to wX . Hence
m

apq = Vpk Cov(Xk , Zq ).

k=1
The proof of Proposition 2.3 follows directly from the condi-

tion that E{j |Xk } = 0, which implies that Cov(j , Xk ) = 0,
k = 1, . . . , m. The condition that (X1 , . . . , Xm ) be linearly inde-
pendent is trivial in the sense that knowing the joint distribution of
a spanning set one can always choose a linearly independent sub-
set. The only properties of the joint distributions required to com-
pute the aij are the variances and covariances of X1 , . . . , Xm and
the covariances between Zj and X1 , . . . , Xm . In particular, knowl-
edge of Z j is not required because Cov(Xk , Zj ) = Cov(Xk , Zj − Z j ).
Hence, for m < n (and especially so for m << n), there exists a
non-trivial information set which allows the aij to be determined

without knowledge of Z j . If X1 , . . . , Xm are known, then Z j can be
computed by the formula in Theorem 2.11. By comparison with the
example in Sec. 2.3, the information set required there to determine
Z j was a utility function and the joint distribution of its associated
optimal portfolio with Zj − Z j . Here, we must know a complete set
of portfolios that span Ψe . However, here only the second-moment
properties of the joint distribution need be known, and no utility
function information other than risk aversion is required.
A special case of no little interest is when a single risky portfolio
and the riskless security span the space of efficient portfolios and
Corollary 2.13 applies. Indeed, the classic mean–variance model of
Markowitz (1952, 1959) and Tobin (1958) exhibits this strong form
of separation. Moreover, most macroeconomic models have highly
aggregated financial sectors where investors’ portfolio choices are lim-
ited to simple combinations of two securities: “bonds” and “stocks.”
The rigorous microeconomic foundation for such aggregation is pre-
cisely that Ψe is spanned by a single risky portfolio and the riskless
security.
If X denotes the random variable return on a risky portfolio
such that (X, R) spans Ψe , then the return on any efficient port-
folio, Ze , can be written as if it had been chosen by combining
the risky portfolio with return X with the riskless security: namely,
Ze = δ(X − R) + R, where δ is the fraction allocated to the risky
portfolio and 1 − δ is the fraction allocated to the riskless security.
By Corollary 2.1, the sign of δ will be the same for every efficient
portfolio, and therefore all efficient portfolios will be perfectly pos-
itively correlated. If X > R, then by Proposition 2.2, X will be an
efficient portfolio and δ > 0 for every efficient portfolio.
Proposition 2.4 (Delta and proportional weight in the

optimal portfolio). (The composition of the single-factor model
and the identity of the “market portfolio”) If (Z1 , . . . , Zn ) contain
no redundant securities, δj denotes the fraction of portfolio X allo-
cated to security j, and wj∗ denotes the fraction of any risk-averse
investor’s optimal portfolio allocated to security j, j = 1, . . . , n, then
for every risk-averse investor

wj∗ δj
= j, k = 1, . . . , n.
wk∗ δk
Proof. The proof follows immediately because every optimal port-

folio is an efficient portfolio and the holdings of risky securities in
every efficient portfolio are proportional to the holdings in X. Hence,
the relative holdings of risky securities will be the same for all risk-
averse investors. Whenever Proposition 2.4 holds and if there exist
n ∗
numbers δj∗ where δj∗ /δk∗ = δj /δk , j, k = 1, . . . , n and 1 δj = 1,
then the portfolio with proportions (δ1∗ , . . . , δn∗ ) is called the Optimal
Combination of Risky Assets. If such a portfolio exists, then without

loss of generality it can always be assumed that X = n1 δj∗ Zj .
Proposition 2.5 (Spanning set is a convex set). If (X, R) span

ψ e , then ψ e is a convex set.1
Proof. Let Ze1 and Ze2 denote the returns on two distinct efficient
portfolios and
Z ≡ λZe1 + (1 − λ)Ze2 , 0 ≤ λ ≤ 1,
Ze1 = δ1 (X − R) + R,
Ze2 = δ2 (X − R) + R
δ2 1
= (Z − R) + R.
δ1 e
Hence,

δ2 1
Z= λZe1 + (1 − λ) (Z − R) + R
δ1 e
δ2 1
= λ(Ze1 − R) + (1 − λ) (Z − R) + R
δ1 e
1
In Euclidean space, an object is convex if for every pair of points within the
object, every point on the straight line segment that joins them is also within the
object. For example, a solid cube is convex, but anything that is hollow or has a
dent in it, for example, a crescent shape, is not convex.

δ2
= λ + (1 − λ) (Ze1 − R) + R
δ1
= δ(Ze1 − R) + R.
Since Ze1 and Ze2 are efficient portfolio, δ1 has the same sign as δ2 and
so δ > 0. Therefore, by Proposition 2.2, Z is an efficient portfolio.
By induction, for any integer k,
k
k

k
λi = 1, Z = λi Zei
i=1 i=1
is the return on an efficient portfolio. Hence, ψ e is a convex set.
11.3 Market Portfolio Spanning and CAPM

A market portfolio is defined as a portfolio that holds all available
securities proportional to their market values. If δjM denote the frac-
tion of security j held in the market portfolio, then
Vj
δjM = n , j = 1, . . . , n,
j=1 Vj + VR
where Vj denotes the market value of security j and VR denotes the

value of the riskless security.
Theorem 2.14 (Market portfolio as an efficient portfolio).
Let W0k be the initial wealth of the kth investor with optimal portfolio
return
n
k
Z ≡R+ wjk (Zj − R),
j=1
where wjk is the fraction of his wealth allocated to security j. At

equilibrium,
K

Vj = wjk W0k ,
k=1
K n

W0 = W0k = Vj + VR ,
k=1 j=1
i.e. sum of each investor k security j holding gives market value of

security j. Similarly, the sum of each investor k initial wealth gives
initial aggregate wealth, which is separated into the sum of market
values of each security j and the market value of risk-free security
VR . Define λk as the weight of kth investor wealth to total aggregate
wealth,
W0k
λk = , k = 1, . . . , K,
W0

where 0 ≤ λk ≤ 1 and K k=1 λk = 1. Therefore, the market portfolio
can also be expressed as
K

δjM = wjk λk . (11.1)
k=1
Now, we are ready to prove the theorem; multiply both sides of (11.1)
by (Zj − R) and sum over j,
n
n
K
δjM (Zj − R) = wjk λk (Zj − R)
j=1 j=1 k=1
⎛ ⎞ ⎛ ⎞
n K
n
⎝ δjM Zj ⎠ − R = λk ⎝ wjk Zj − R⎠
j=1 k=1 j=1
K

ZM = λk Z k ,
k=1
where ZM is the return on the market portfolio. Every optimal port-

folio is an efficient portfolio. Hence, ZM is a convex combination of
the returns on K efficient portfolios.2 Therefore, if ψ e is convex, then
the market portfolio, ZM , is contained in ψ e , i.e. ZM is an efficient
portfolio.
2
A convex combination is a linear combination of points (which can be vectors,
scalars, or more generally points in an affine space) where all coefficients are
non-negative and sum up to 1.
Proposition 2.6 (Market portfolio expected return must be

greater than risk-free rate). From Theorem 2.14, subtracting by
R and taking expectation on both sides, we get
K

ZM − R = λk (Z k − R).
k=1
From Corollary 2.1, we know that the expected return of efficient
portfolio is great than R unless the efficient portfolio is risk free.
k
Given that λk > 0, so long as one of the Z > 0 (instead of equal to
zero), the expected return on the market portfolio Z M will be greater
than R.
Returning to the special case where ψ e is spanned by a single risky
portfolio and the riskless security, it follows from Proposition 2.5 and
Theorem 2.14 that the market portfolio is efficient. In this special
case, all efficient portfolios are perfectly positively correlated, it fol-
lows that the risky spanning portfolio can always be chosen to be the
market portfolio (i.e. X = ZM ).
It follows that every efficient portfolio (and hence every optimal
portfolio) can be represented as a simple portfolio combination of
the market portfolio and the riskless security with a positive fraction
allocated to the market portfolio.
Theorem 2.15 (Security Market Line and CAPM). If (Zm , R)
span ψ e , then the equilibrium expected return on security j can be
written as
Z j = R + βj (Z m − R),
where
Cov(Zj , Zm )
βj = .
Var(Zm )
Proof. If (Zm , R) span ψ e , then according to Corollary 2.13, there
exists a βj for each security j such that
Zj = R + βj (Zm − R) + εj , (11.2)
where
E [εj |Zm ] = 0. (11.3)
From (11.2) and (11.3), we get

Z j = R + βj Z m − R . (11.4)
Subtract (11.4) from (11.2), we get

εj = Zj − Z j − βj Zm − Z m . (11.5)
Condition in (11.3) implies
Cov(εj , Zm ) = 0.
Substitute the value of εj in (11.5), we get
Cov(Zj − Z j − βj (Zm − Z m ), Zm ) = 0,
Cov(Zj − βj Zm , Zm ) = 0,
Cov(Zj , Zm ) − βj Var(Zm ) = 0.
Hence
Cov(Zj , Zm )
βj = .
Var(Zm )
Theorem 2.16 (Min variance set). (a) Let σij denote the ijth
element of w, w is non-singular. Hence, let υij denote the ijth element
of w−1 . All portfolios in ψmin with expected return μ must have
portfolio weights that are solutions to the problem
n
n
min δi δj σij ,
i=1 j=1
subject to the constraint Z(μ) = μ, If μ = R, then Z(R) = R. So

all initial wealth are invested in risk-free security. Therefore, δjR =
0, j = 1, . . . , n. Consider the case when μ = R. Then first-order
condition are
n

0= δjμ σij − λμ (Z i − R), i = 1, . . . , n,
j=1
where λμ is the Lagrangian multiplier for the constraint. So

n

δjμ σij δiμ − λμ δiμ (Z i − R) = 0,
j=1
n
n n

δjμ σij δiμ − λμ δiμ (Z i − R) = 0,
i=1 j=1 i=1
n

Var(Z(μ)) − λμ δiμ (Z i − R) = 0,
i=1
Var(Z(μ)) − λμ (Z(μ) − R) = 0.
Hence,
Var(Z(μ))
λμ = .
μ−R
Likewise,
n

δjμ σij − λμ Z i − R = 0, i = 1, . . . , n
j=1
n
n

δjμ σij υij − λμ υij Z i − R = 0,
j=1 j=1
n
n n
n

δjμ σij υij − λμ υij Z i − R = 0,
i=1 j=1 i=1 j=1
n
n n

δjμ − λμ υij Z i − R = 0.
j=1 i=1 j=1
Hence,
n

δjμ = λμ υij Z i − R .
j=1
(b) ψmin consists of portfolios in ψ f such that there exists no other

portfolio in ψ f with the same expected return and a smaller variance.
From (a), for each value of μ, δjμ , j = 1, . . . , n, are unique. There-
fore, for value of μ, all portfolios in ψmin with μ = R are perfectly
correlated.
Pick any portfolio in ψmin with μ = R and call its return X. Then
every Z(μ) can be written in the form
Z (μ) = δμ (X − R) + R.
Hence, (X, R) span ψmin .

(c) From Corollary 2.13, we have
Zj = R + aj (X − R) + εj ,
where E[εj |X] = 0. Therefore,

Z j = R + aj X − R .
From Proposition 2.3, we have
m

aij = υik Cov(Xk , Zj ),
k=1
m
m

aij σik = υik σik Cov (Xk , Zj ) ,
k=1 k=1
n
m n m
aij σik = υik σik Cov (Xk , Zj ) ,
i=1 k=1 i=1 k=1
n m n
aij σik = Cov (Xk , Zj ) ,
i=1 k=1 k=1
aj wX = Cov (X, Zj ) ,
Cov (X, Zj ) Cov (X, Zj )
aj = = .
wX Var (X)
Hence, we get
Z j − R = aj (X − R),
where aj = Cov(X, Zj )/Var(X), j = 1, . . . , n.

CAPM and the security market line in Theorem 2.15 were first
derived by Sharpe (1964) as necessary conditions for equilibrium in
the mean–variance model of Markowitz and Tobin when investors
have homogenous beliefs.
Whenever, there exists a spanning set for ψ e with m = 1, the
mean, variance and covariances of (Z1 , . . . , Zn ) are sufficient statis-
tics to determine all efficient portfolios. Such a strong set of covari-
ances suggests that the class of joint probability distributions for
(Z1 , . . . , Zn ) which admit a two-fund separation theorem will be
highly specialised. Indeed Merton Theorem 2.18 gives the example of
a joint normal distribution whereas Theorem 2.19 is based on sym-
metric density functions.
Theorem 2.17 (Minimum variance set and efficient portfo-
lio). Let Ze = R + ae (X − R) be the return on efficient portfolio. Let
Zp be the return on any portfolio in ψ f such that Z e = Z p . Thus,
Zp = R + ap (X − R) + εp ,
where E[εp ] = E[εp |X] = 0. Therefore, ap = ae if Z e = Z p .
VaR(Zp ) = VaR (R + ap (X − R) + εp )
= VaR (ap X + εp )
= a2p VaR (X) + VaR (εp )
≥ a2p VaR (X) = VaR (Ze ) .
Hence, Ze is contained in ψmin .
Theorem 2.18 (X-span efficient portfolio). Pick a portfolio in
ψmin and call its return as X.
n

X =R+ aj Z j − R .
j=1
We can write return on security j as

Z j = (1 − aj ) R + aj X = R + aj (X − R) .
So for j = 1, . . . , n,
Zj = R + aj (X − R) + εj
By Theorem 2.16(c), E(εj ) = 0 and by construction, Cov(εj , X) = 0.

Since Z1 , . . . , Zn are normally distributed, X will be normally dis-
tributed. Hence εj is normally distributed, and because Cov(εj , X) =
0, εj and X are independent. Therefore, E[εj ] = E[εj |X] = 0.
So, there exists a number aj for each security, j = 1, . . . , n, Zj =
R + aj (X − R) + εj , and E[εj |X], which concludes the proof that
(X, R) span ψ e .
Theorem 2.19 (Symmetrical function and spanning). By
hypothesis p(Z1 , . . . , Zi , . . . , Zn ) = p(Zi , . . . , Z1 , . . . , Zn ) for each set
of given value (Z1 , . . . , Zn ). The f.o.c. for portfolio selection problem

E U (Z ∗ W0 ) (Zj − R) = 0, j = 1, . . . , n,
n
∗
Z = wj∗ (Zj − R) + R.
j=1
Therefore, the fraction of portfolio allocated to security j is the same

for every risk-averse investor. Hence, all investors will hold all risky
securities in the same relative proportions. If X is the return on a
portfolio with an equal investment in each risky security,
n

X =R+ aj Z j − R .
j=1
11.4 Arbitrage Pricing Theory (APT)

Ross (1976) APT is an important class of linear-factor models
that leads to spanning without assuming joint normal probability
distributions.
Theorem (Arbitrage pricing theory). Let
m

Zj = Z j + aij Yi + j , j = 1, . . . , n,
i=1
where E[εj ] = E[εj |Y1 , . . . , Ym ] = 0, E[Yi ] = 0 and Cov(Yi , Yj ) = 0

for i = j. If it is possible to construct a set of m portfolios with
returns (X1 , . . . , Xm ) such that Xi and Yi are perfectly correlated for
i = 1, . . . , m then (X1 , . . . , Xm , R) span ψ e .
Proof. Let
m

Zp = Z p + aip Yi + p ,
i=1
n

Zp = R + δj (Z j − R),
j=1
where
n

aip ≡ δj aij , and
j=1
n

p ≡ δj ij , with
j=1
μi
δj ≡ ,
n
and μi is unbounded. For sufficiently large n m, it is possible to
construct a set of well diversified portfolio {Xk } such that aik = 0
for i = k and akk = 0
n
1
Xk = X k + akk Yk + μ j j , k = 1, . . . , m.
n
j=1
As n → ∞, Xk → X k +akk Yk , and Xk and Yk are perfectly correlated

and by Theorem 2.12 (X1 , . . . , Xm , R) span ψ e .
If m = 1, then the two-fund separation will obtain independent

of any other distributional characteristics of Y1 or the {εj }.
11.5 Modigliani–Miller Hypothesis

Consider firm j with end-of-period value Vj and financed by q dif-
ferent financial claims. Let fk (Vj ) be the value function of security
k; it describes how the holders of this security will share the end-of-
period value of the firm. Let’s assume that production technology is
static and that the choice of investment intensity is not affected by
the choice of financing decision. Then by definition,

q

fk ≡ Vj (θj ) , (11.6)
k=1
where θj is a random variable. Let Vj0 be the initial value of an all

equity financed firm. Theorem 2.21 sets out to prove that
q

fk0 = Vj0 ,
k=1
i.e. the value of the firm remained unchanged when it is financed by

q securities.
Theorem 2.20 (Security price at equilibrium). Vj is the random

variable end-of-period aggregate value of security j.
Vj
Zj ≡ , Vj = Zj Vj0 ,
Vj0
n

Zj = R + aij (Xi − R) + εj ,
i=1

n

Vj = Zj Vj0 = Vj0 R + aij (Xi − R) + εj ,
i=1

n

V j = Vj0 R + aij (Xi − R) , (11.7)
i=1
where
Cov (Xk , Vj ) = Cov (Xk , Zj Vj0 ) = Vj0 Cov (Xk , Zj ) ,

n

aij = Vik Cov (Xk , Zj ) ,
i=1
n

Vj0 aij = Vik Cov (Xk , Zj Vj0 ) ,
i=1
n

Vj0 aij = Vik Cov (Xk , Vj ) ,
i=1
n
i=1 Vik Cov (Xk , Vj )
aij = .
Vj0
Put this result back in (11.7) to give
n n
i=1 i=1 Vik Cov (Xk , Vj ) (Xi − R)
V j = Vj0 R + ,
Vj0

V j − ni=1 ni=1 Vik Cov (Xk , Vj ) (Xi − R)
Vj0 = , j = 1, . . . , n.
R
Corollary 2.20a (End of period value of security).

V j − ni=1 nk=1 Vik Cov (Xk , Vj ) X i − R
Vj0 = ,
R
n
V j=1 λj Vj
Z= = n ,
V0 j=1 λj Vj0

λj V j − λj ni=1 nk=1 Vik Cov (Xk , Vj ) X i − R
λj Vj0 = ,
R
n n
λj V j − λj ni=1 nk=1 Vik Cov (Xk , Vj ) X i − R
λj Vj0 = .
R
j=1 j=1
Besides, we know that

n n n n
λ
j=1 j jV − λj i=1 V
k=1 ik Cov X k , V
j=1 j Xi − R
V0 =
R
n n n n
j=1 λj V j − j=1 λj i=1 k=1 Vik Cov (Xk , Vj ) X i − R
=
R
n n n
λj V j − λj i=1 k=1 Vik Cov (Xk , Vj ) X i − R
=
R
j=1
n

= λj Vj0 .
j=1
Therefore,
n n
Vj − i=1 k=1 Vik Cov (Xk , Vj ) Xi − R
Vj0 = ,
R
n n
μ qV j + μ − q i=1 k=1 Vik Cov (Xk , Vj ) Xi − R
qVj0 + = ,
R R
n n
E [qVj + μ] − q i=1 k=1 Vik Cov (Xk , Vj ) Xi − R
V0 =
R
μ
= qVj0 + .
R
Theorem 2.21 If firm j is financed by q different claims defined

by the function fk (Vj ), k = 1, . . . , q, and if there exists an equilib-
rium such that the return distribution of the efficient portfolio set
remains unchanged from the equilibrium in which firm j was all equity
financed, then
q

fk0 = Vj0 ,
k=1
where fk0 is the equilibrium initial value of financial claim k. (Ij is

dropped since it is assumed that investment policy is not affected by
financing policy.)
Proof. The proof will follow a 3-step procedure with assumptions

and necessary conditions clearly stated; (i) Express the initial value
Vj0 in terms of equity j’s return; (ii) Express the initial values of
q securities in terms of their respective returns; (iii) Substitute the

results qk=1 fk = Vj and show that qk=1 fk0 = Vj0 .
(1) Let Zj denote the return on security j. Then
Vj = Vj0 Zj ,
where
m

Zj = R + aij (Xi − R) + εj ,
i=1
m

aij = vil Cov (Xl , Zj ) .
l=1
Taking expectation, we have

V j = Vj0 Z j

m

= Vj0 R + aij X i − R
i=1

m
m

= Vj0 R + vil Cov (Xl , Zj ) X i − R
i=1 l=1
m
m

= Vj0 R + vil Cov (Xl , Vj0 Zj ) X i − R
i=1 l=1
m m

= Vj0 R + vil Cov (Xl , Vj ) X i − R .
i=1 l=1
Rearranging gives

m
m

1
Vj0 = Vj − vil Cov (Xl , Vj ) X i − R . (11.8)
R
i=1 l=1
(2) Writing security return fk , k = 1, . . . , q, in a similar fashion

m m

1
fk0 = f − vil∗ Cov∗ (Xl , fk ) X i − R ,
R k
i=1 l=1
and sum over all k

q

q m m
q

1
fk0 = fk − vil∗ Cov∗ Xl , fk X i − R
R
k=1 k=1 i=1 l=1 k=1

q m m

1
= fk − vil∗ Cov∗ (Xl , Vj ) X i − R ,
R
k=1 i=1 l=1
(11.9)
by the definition of Vj in (11.6). It is important to note the

implication of “∗”; vil and vil∗ in Eqs. (11.8) and (11.9) denote the
elements of variance–covariance matrix of (Xi , . . . , Xm ), when
firm j is financed by all equity and when firm j is financed by q
different claims, respectively. Similarly, Cov and Cov∗ are covari-
ances measured under these two settings.
(3) Hence, it is clear that (11.6) is equivalent to (11.9) if and only if
the change in capital structure has not affected the distribution
characteristics of (Xi , . . . , Xm ) and their relationship with Vj , in
which case we can then conclude that
q

fk0 = Vj0 .
k=1
The implication of Theorem 2.21 is that taking the production

technology and firm’s investment policy as given, the way in which
the firm finances its investment will not affect the market value of
the firm unless the choice of financial instruments changes the return
distribution characteristics of the efficient portfolio set.
One sufficient condition for Theorem 2.21 to hold is that the finan-
cial claims issued by the firm are “redundant securities” whose payoff
can be replicated by combining other existing securities. This condi-
tion is satisfied by the subclass of corporate liabilities that provides

for “linear” sharing rules (i.e. fk (V ) = ak V + bk , where k ak = 1,
3
and k bk = 0). Unfortunately, most common types of financial
instruments issued by corporations typically have nonlinear payoff
structures, for example, due to the probability of default.
11.6 Comment on Spanning

Does spanning necessarily mean some securities are redundant? The
simple answer to this question is no. If the question was more pre-
cise and asked whether non-trivial spanning of the set of all feasible
3
Note that Stiglitz (1969, 1974) shows that linearity of the sharing rules is not a
necessary condition for Theorem 2.21 to hold. But the establishment of conditions
under which the hypothesis of Theorem 2.21 is valid under nonlinear payoff is a
lot more complex.
portfolios necessarily means that some securities are redundant, then

the answer would have been yes. From the general definition of
spanning in Merton (Definition 2.2), if N is the number of securi-
ties available to generate the portfolios in ψ and if M ∗ denotes the
smallest number of feasible portfolios that span the space of port-
folios contained in ψ, then M ∗ ≤ N . Also, according to Theorem
2.9, the first necessary condition for the M feasible portfolios with
returns (X1 , . . . , XM ) to span the portfolio set ψ f is that the rank
of w, the n × n variance–covariance matrix of the returns on the n
risky assets, be less than or equal to M (w ≤ M ). So, as can be
seen from definition 2.2 and theorem 2.9, spanning in general allows
for the possibility of M ∗ being equal to N and w being equal to M,
which in other words means that it is possible for all securities to be
non-redundant.
On the other hand, non-trivial spanning requires strict inequal-
ity between M ∗ and N (M ∗ < N ). According to Merton’s Corollary
2.10, a necessary and sufficient condition for (X1 , . . . , Xm , R) where
M ∗ = m + 1 to be the smallest number of feasible portfolios that
span is that the rank of w equals the rank wX = m. It then follows
from Corollary 2.10 that since w = m < M ∗ then a necessary and
sufficient condition for non-trivial spanning of ψ f , assuming no arbi-
trage opportunities, is that some of the risky securities are redundant
securities.
Finally, as Merton notes, since ψ e is contained in ψ f any proper-
ties proved for portfolios that span ψ e must be properties of portfolios
that span ψ f . The essential difference being that to span the efficient
portfolio set it is not necessary that linear combinations of the span-
ning portfolios exactly replicate the return on each available security.
Hence, it is not necessary that there exist redundant securities for
non-trivial spanning of ψ e to obtain.
11.7 HARA
Theorem 2.22 (HARA). For HARA utility function
U (W ) = (a + bW )−c .
Suppose that there are K investors and the kth investor invests
(1 − αk ) of his initial wealth in the risk-free security and αk in a
portfolio of risky securities. Denote wjk as the unique solutions to
the following f.o.c.
⎡⎛ ⎞−c ⎤
n
0 = E ⎣⎝ak + bk (1 − αk ) R + bk αk wjk Zj ⎠ (Zj − R)⎦ ,
j=0
k = 1, . . . , K.
Since αk is arbitrary, set
ak
αk = + 1.
bk + R
Then the f.o.c. becomes
⎡⎛ ⎞−c ⎤
n

0 = E ⎣⎝bk αk wjk Zj ⎠ (Zj − R)⎦ ,
j=0
⎡⎛ ⎞−c ⎤
n
0 = (αk bk )−c E ⎣⎝ wjk Zj ⎠ (Zj − R)⎦ ,
j=0
∗ are independent of α and b and dependent only on c. Thus
wjk k k
all investors hold the same wj∗ , since c is the same for all investors.
Since all investors hold the same portfolio of risky securities, this
portfolio must be market portfolio, which concludes that there exists
a portfolio with return X such that (X, R) span ψ u .
Exercises: Spanning & Capital Market Theories

1. Compare and contrast the role of risk-free portfolio in Proposi-
tion 2.1 and Proposition 2.2 with hk (P, t) in the following security
demand equation in Chapter 5:
wk∗ = hk (P, t) + m (P, W, t) gk (P, t) + fk (P, W, t)
2. If risky returns have nonzero skewness (i.e. 3rd moment), what are
the implications on: [You may choose to answer any one below.]
(a) The necessary and sufficient condition for non-trivial spanning
(b) Risk and return of an efficient portfolio
(c) Market portfolio and CAPM
(d) APT
(e) MM
(f) HARA solution
February 22, 2018 13:18 Advanced Finance Theories 9in x 6in b3091-bib page 193
Bibliography
Arrow, K. (1964) The role of securities in the optimal allocation of risk bearing,
Review of Economic Studies, 31(2), 91–96.
Brennan, M.J. (1979) The pricing of contingent claims in discrete time models,
Journal of Finance, 34(1), 53–68.
Cass, D. and J. Stiglitz, (1970) The structure of investor preference and asset
returns, and separability in portfolio insurance allocation: A contribution to
the pure theory of mutual funds, Journal of Economic Theory, 2(2), 122–160.
Cochrane, J.H. (2005) Asset Pricing, Princeton University Press, Revised edition,
23 January.
Cont, R. and P. Tankov (2004) Financial Modelling with Jump Processes
Chapman and Hall.
Das, S.R. and R. Uppal (2004) Systemic risk and international portfolio choice,
Eeckhoudt, L., C. Gollier and H. Schlesinger (1996) Changes in Background Risk
and Risk Taking Behavior, Econometrica, 64(3), 683–689.
Fama, E.F. and M. Miller (1972) Theory of Finance, New York: Holt, Rinchart
and Winston.
Flannery, M.J. (2005) No pain, no gain? Effective market discipline via reverse
convertible debentures, In Capital Adequacy Beyond Basel: Banking, Securi-
ties and Insurance, Chap. 5, Hal S. Scott ed. Oxford University Press.
Friedman, M. and L.J. Savage (1948) The utility analysis of choices involving
risk, Journal of Political Economy, 59(4), 279–304.
Hansen, L.P. and R. Jagannathan (1991) Implications of security market data for
models of dynamic economies, Journal of Political Economy, 99, 225–262.
Ingersoll, J.E. (1987) Theory of Financial Decision Making, Savage, Maryland:
Rowman & Littlefield.
Joshi Mark, S. (2003) The Concepts and Practice of Mathematical Finance,
Cambridge University Press.
Kashyap A.K., R.G. Rajan and J.C. Stein (2008) Rethinking capital regulation,
Working Paper, University of Chicago and Harvard University.
Kimball, M.S. (1990) Precautionary saving in the small and in the large, Econo-
metrica, 58(1), 53–73.
193
February 22, 2018 13:18 Advanced Finance Theories 9in x 6in b3091-bib page 194
194 Bibliography
Kostakis, A., K. Muhammad and A. Siganos (2011) Higher co-moments and asset
pricing on London Stock Exchange, SSRN, Journal of Banking and Finance,
Forthcoming.
Kozhan, R., A. Neuberger and P. Schneider (2011) The Skew Risk Premium in
Index Option Prices, SSRN, Warwick Business School.
Laffont, J.-J. and J. Tirole (1991) The politics of government decision-making: A
theory of regulatory capture, The Quarterly Journal of Economics, 106(4),
1089–1127.
Leland, H. (1994) Risky debt, bond covenants and optimal capital structure,
Markowitz, H.M. (1952) Portfolio selection, Journal of Finance, 7(1), 77–91.
Markowitz, H.M. (1959) Portfolio Selection: Efficient Diversification of Invest-
ment, Wiley.
Merton, R.C. (1977), An analytic derivation of the cost of deposit insurance and
loan guarantees An application of modern option pricing theory, Journal of
Banking & Finance, 1(1), 3–11.
Merton, R.C. (1990) Continuous-Time Finance, Basil Blackwell Ltd.
Merton R.C. (1992) Continuous Time Finance. Wiley-Blackwell.
Pennacchi, G., T. Vermaelen and C.C.P. Wolff (2010) Contingent capital: The
case for COERCs, working paper, University of Illinois and Luxembourg
School of Finance.
Pratt J.W., 1964, Risk aversion in the small and in the large, Econometrica,
32(1/2), 122–136, DRA
Ross, S. A. (1976) The Arbitrage Theory of Capital Asset Pricing, Journal of
Economic Theory, 13(3), 343–362.
Rothschild M. and J.E. Stiglitz, (1970) Increasing Risk I: A Definition, Journal
of Economic Theory, 2(3), 225–243.
Rothschild M. and J.E. Stiglitz (1971) Increasing Risk I: Its Economic Conse-
quences, Journal of Economic Theory, 3(1), 66–84.
Samuelson, P.Z. (1969) Lifetime portfolio selection by dynamic stochastic pro-
gramming, Review of Economics and Statistics, 51(3), 239–246.
Samuelson, P.A. (1972) Mathematics of Speculative Prices, in R.H. Day and S.M.
Robinson eds. Mathematical Tools in Economic Theory and Computation,
reprinted in SIAM Review January 1973, 15, 1–42.
Schoutens, W. (2011) Pricing Coco Bonds: The Derivative Approach, Power Point
Presentation, Quant Congress.
Sharpe, W.F. (1964) Capital asset prices: A theory of market equilibrium under
conditions of risk, Journal of Finance, 19(3), 425–442.
Stiglitz, J.E. (1974) On the irrelevance of corporate financial policy, American
Economic Review, 64(5), 351–366, December.
Stiglitz, J.E. (1969) A re-examination of the Modigliani-Miller theorem, American
Economic Review, 59(5), 784–793.
Tobin, J. (1958) Liquidity preference as behavior towards risks, Review of Eco-
nomic Studies, 25, 68–85, February.
Vitiello, L. and S.-H. Poon (2014) Non-monotonic pricing kernel and an extended
class of mixture of distributions for option pricing, Review of Derivative
Research, 17(2), 241–259.
February 22, 2018 13:18 Advanced Finance Theories 9in x 6in b3091-note page 195
Calculus Notes
Differentiation
Constant
d
k = 0.
dx
Power
d
kxn = knxn−1 .
dx
Sum/difference
d d d
[f (x) ± g (x)] = f (x) ± g (x) .
dx dx dx
Product
d d d
[f (x) · g (x)] = f (x) g (x) + g (x) f (x) .
dx dx dx
Quotient

d f (x) 1 d d
= 2 g (x) f (x) − f (x) g (x) .
dx g (x) g (x) dx dx
Chain rule
z = f (y) and y = f (x) ,

dz dz dy
= · .
dx dy dx
195
196 Calculus Notes
Inverse

dy dx
=1 .
dx dy
Increments
y dy
y≡ x, dy ≡ dx.
x dx
Total differential, for z = f (x, y)
∂f ∂f
dz = dx + dy.
∂x ∂y
Exponential and log
d x
e = ex ,
dx
d f (x) d
e = ef (x) f (x) ,
dx dx
d 1
ln x = ,
dx x
d 1 d
ln f (x) = f (x) ,
dx f (x) dx
d x
a = ax ln a.
dx
Trigonometric function
d
sin x = cos x,
dx
d
cos x = − sin x,
dx
d
tan x = sec2 x,
dx
d 1
arcsin x = √ ,
dx 1 − x2
d 1
arccos x = − √ ,
dx 1 − x2
d 1
arctan x = ,
dx 1 + x2
1
cos2 x = (1 + cos 2x) ,
2
Calculus Notes 197
1
sin2 x = (1 − cos 2x) ,
2
sin x
tan x = .
cos x
If t = tan θ2 , then
2t 2t 1 − t2
tan θ = , sin θ = , cos θ = .
1 − t2 1 + t2 1 + t2
Integration
Power rule

1
xn dx = xn+1 + c,
n+1

1dx = x0 dx = x + c.

Note that 1dx is often written as dx.
Exponential

ex dx = ex + c,

f (x) ef (x) dx = ef (x) + c,
d
f (x) =f (x) ,
dx

1 x
ax dx = a + c.
ln a
Logarithmic

1
dx = ln |x| + c,
x

1 1
dx = ln |ax + b| + c,
ax + b a

f (x)
dx = ln f (x) + c, for f (x) > 0.
f (x)
198 Calculus Notes
Sum

[f (x) + g (x)] dx = f (x) dx + g (x) dx.
Multiple

K f (x) dx = K f (x) dx.
Substitution

du
f (x) dx = f (u) du.
dx
By parts

vdu = uv − udv,

f (x) g (x) dx = f (x) g (x) − f (x) g (x) dx.
Definite integral
b
f (x) dx = F (x)]ba = F (b) − F (a) ,
a
b a
f (x) dx = − f (x) dx,
a b
c c b
f (x) dx = f (x) , dx + f (x) dx.
a b a
Trigonometric function

cos xdx = sin x + c,

sin xdx = − cos x + c,

1
cos (ax + b) dx = sin (ax + b) + c,
a

tan xdx = − ln (cos x) + c.
Calculus Notes 199
Solid of revolution — volume under a curve y = f (x)

V = π y 2 dx.
Integral of Normally Distributed Variable

√
If y ∼ N (μ, σ) and let σ = σ T − t, then
∞
μ−a
f (y) dy = N
a
σ
∞
y μ−a 1 2
e f (y) dy = N eμ+ 2 σb
+σ
a
σ
and (the following is not strictly part of calculus)

1 2
E (ey ) = eμ+ 2 σb .
For more on the moment generating function of a normally dis-

tributed variable, see Mood et al. (1974) Introduction to the theory
of statistics, McGraw-Hill.
Ito lemma
Lemma C.1 (Ito’s lemma). If a stochastic variable Xt satisfies
the SDE
dXt = μ (Xt , t) dt + σ (Xt , t) dWt
then given any function f (Xt , t) of the stochastic variable Xt which

is twice differentiable in its first argument and once in its second,

∂ ∂ 1 ∂2
df (Xt , t) = + μ (Xt , t) + σ 2 (Xt , t) f (Xt , t) dt
∂t ∂Xt 2 ∂Xt2

∂
+ σ (Xt , t) f (Xt , t) dWt .
∂Xt
200 Calculus Notes
Example C.2.

∂G ∂G 1 ∂ 2 G 2 ∂G
dG (x, t) = a+ + 2
b dt + bdz,
∂x ∂t 2 ∂x ∂x
given
dx = a dt + b dz,
where z is a standard Brownian motion. Apply Ito’s lemma on the

log process, we get
G = ln x,
∂G 1 ∂G ∂2G 1
= , = 0, 2
= − 2,
∂x x ∂t ∂x x
dx = μxdt + σxdz,

1 1 2 2 1
d ln x = μx + 0 − 2 σ x dt + σxdz
x 2x x

1 2
= μ − σ dt + σdz.
2
Lemma C.3 (Ito’s lemmas for two stochastic variables). Sup-

pose x1 and x2 are two stochastic processes characterised by the
generic SDEs:
dx1 (t) = μ1 dt + σ1 dz1 and dx2 (t) = μ2 dt + σ2 dz2
The standard Brownian motions z1 and z2 are correlated with

dz1 dz2 = ρ1,2 dt. Let y(t) be a nonlinear transformation of x1 and
x2 . Then the total derivative of y is given by
∂f ∂f ∂f 1 ∂2f 2 1 ∂2f 2
dy = dx1 + dx2 + dt + σ 1 dt + σ dt
∂x1 ∂x2 ∂t 2 (∂x1 )2 2 (∂x2 )2 2
∂2f
+ σ1 σ2 ρ1,2 dt.
∂x1 ∂x2
Calculus Notes 201
Lemma C.4 (Quadratic variation). For relative changes
dXt
= μx (Xt , Yt , t) dt + σx (Xt , Yt , t) dWt
Xt
dYt
= μy (Xt , Yt , t) dt + σy (Xt , Yt , t) dWt
Yt
the quadratic covariation of the increments of X and Y can be com-

puted by calculating the expected value of the product
d [X, Y ]t = Cov [ dXt , dYt | Ft ] = σx σy Xt Yt dt
or more rigorously,
n−1

lim (X (tj+1 ) − X (tj )) (Y (tj+1 ) − Y (tj ))

T →0
j=0
t
= σx (s) X (s) σy (s) Y (s) ds a.s.
0
Lemma C.5 (Ito’s product rule).
d (Xt Yt ) dXt dYt d [X, Y ]t

= + +
Xt Yt Xt Yt Xt Yt
= (μx + μy + σx σy ) dt + (σx + σy ) dWt .
Lemma C.6 (Ito’s quotient rule).
d ( Xt / Yt ) dXt dYt d [Y, Y ]t d [X, Y ]t

= − + −
Xt / Yt Xt Yt Yt2 Xt Yt

= μx − μy + σy2 − σx σy dt + (σx − σy ) dWt .
Lemma C.7 (Ito’s Isometry). The expectation of the square of a

stochastic integral is
2 t
t
2
E hs dWs =E hs ds ,
0 0
202 Calculus Notes
where ht is any Ft -adapted process. For example

2 t t
t
2
E Ws dWs =E Ws ds = E Ws2 ds
0 0 0
t
1
= sds = t2 .
0 2
February 22, 2018 13:18 Advanced Finance Theories 9in x 6in b3091-index page 203
Index
A compensated process, 142

a systematic jump, 159 complete market, 5, 110
aggregate market cash flow, 6 concave utility, 135
aggregate market portfolio, 132 conditional objective function,
arbitrage pricing theory, 183 67
Arrow–Debreu, 5 constant absolute relative risk
Arrow–Pratt, 2 aversion, 3
asset-specific pricing kernel, 105 constant absolute risk aversion,
51
B constant relative risk aversion, 45
bankruptcy cost, 122 constant RRA, 2
beta of risk-free asset, 31 convertible bond, 116
beta of risky asset, 30 convex, 175
Black–Scholes fundamental PDE, convex combination, 177
96 convex set, 175
Black–Scholes hedge portfolio,
96 D
boundary condition for call, 112 debt overhang, 118
boundary condition for put, 112 deep-in-the-money, 110
budget equation, 40 deep-out-of-the-money, 110
demand function, 72
C deposit insurance, 126
càd-làg, 141 Dynkin operator, 67
call option enhanced reverse
convertible, 118 E
capital asset pricing model (CAPM), efficient frontier, 23
28, 129, 134, 178 efficient portfolio, 22, 167
capital market line, 73, 81, EMM, 10
129 Equivalent Martingale Measure, see
chain rule, 30 EMM
characteristic function, 97, 112 expected number of jumps, 145
203
204 Index
F M
feasible portfolio, 22 marginal utility-weighted physical
finite horizon, 59 probability, 105
first-order stochastic dominance, 21 marked point process, 141
forward market portfolio, 176, 178
price, 5 market risk premium for jump risk,
Fourier transform, 112 150
Fourier transform method, 95 martingale, 98
fundamental partial differential martingale property, 10
equation (FPDE), 111 mean value theorem, 42
mean–variance frontier, 83–84
G minimum variance portfolio, 81
geometric Brownian motion, 40, Modigliani–Miller hypothesis, 184
more risk averse, 4
95
mutual fund separation theorem, 73
globally minimum variance (GMV)
portfolio, 88
N
Gram–Schmidt process, 76
necessary conditions for spanning, 164
H negative exponential utility, 2
Neumann–Morgenstern utility, 41
Hansen–Jagannathan bounds, 93
non arbitrary, 76
HARA utility function, 190
non-monotonic asset-specific pricing
hazard rate, 150
kernel, 105
homogenous expectation, 135
non-satiation, 8
hyperbolic absolute risk aversion, 54
non-singular, 76
non-trivial spanning, 164, 189–190
I
idiosyncratic jump, 159 O
idiosyncratic risk, 16, 31
optimal coupon level, 123
immediate ruin, 151
optimal debt capacity, 123
impatient factor, 41
optimal leverage, 126
increasing risk, 21
optimal portfolio, 23, 174
indicator process, 141
inefficiency, 25 P
infinite time horizon, 44
inverse Fourier transform, 97 physical probability measure, 104
investment opportunity set, 135 P -measure, 104
point process, 141
J Poisson process, 142
power utility, 1
jump intensity, 145, 150 premium, 17, 170
price of risk, 17
L pricing kernel, 9, 103–104
life time objective function, 41 definition, 6
log utility, 2 forward, 6
log-normality, 76 importance, 7
Index 205
probability of bankruptcy, 122 spanning, 92, 163, 189

probability of jump, 145 spanning set, 175
probability-cum-utility function, 103, state price
105 forward, 6
probability deflated, 6
Q state-contingent claim, 5
quadratic utility function, 76 stochastic discount factor, 12
strictly concave increasing function
R V , 168
sum of betas, 32
random jump size, 152
systematic risk, 16
real probability measure, 104
relative risk aversion, 2
T
representative agent, 8
reverse convertible, 117 tax benefit, 122
risk aversion, 1, 8 three-fund theorem, 136
risk neutrality, 7, 10 Tobin–Markowitz separation
risk premium, 8, 28 theorem, 75, 79, 81
risk-free portfolio, 164 two-fund theorem, 92
risk-neutral investor, 3
risk-neutral measure, 6 U
risk-neutral probability, 105 unique ordering, 31
uniqueness, 76
S
second-order condition, 44 V
second-order stochastic dominance, Von Neuman–Morgenstern expected
22 utility theorem, 8
security market line, 28, 129, 134, 178
security price, 185 W
separation theorem, 163 warrants, 114
Sharpe ratio, 85

Cours Droit Des Affaires s5 Gestion

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Cours Droit Des Affaires s5 Gestion

Transféré par

Droits d'auteur :

Formats disponibles

8759_9789814460378_TP.

indd 1 13/2/18 4:35 PM

This page intentionally left blank

b2530_FM.indd 6 01-Sep-16 11:03:06 AM

8759_9789814460378_TP.indd 2 13/2/18 4:35 PM

Library of Congress Cataloging-in-Publication Data

British Library Cataloguing-in-Publication Data

Copyright © 2018 by World Scientific Publishing Co. Pte. Ltd.

For any available supplementary material, please visit

Desk Editors: Suraj Kumar/Philly Lim

Typeset by Stallion Press

Suraj - 8759 - Advanced Finance Theories.indd 1 18-01-18 1:50:29 PM

This page intentionally left blank

b2530_FM.indd 6 01-Sep-16 11:03:06 AM

This book provides modern treatments to key areas of ﬁnance theo-

ﬁnance, stochastic analysis and diﬀerential equations, mathemati-

About the Author

Dr Ser-Huang Poon is a Professor of Finance

This page intentionally left blank

b2530_FM.indd 6 01-Sep-16 11:03:06 AM

I would like to thank the many generations of Finance PhD students

This page intentionally left blank

b2530_FM.indd 6 01-Sep-16 11:03:06 AM

2 Pricing Kernel and Stochastic Discount Factor 5

4 Consumption and Portfolio Selection 39

5 Optimum Demand and Mutual Fund Theorem 65

7 Solving Black–Scholes with Fourier Transform 95

8 Capital Structure Theory 101

8.2.4 Introducing the concept of dQ . . . . . . . 107

9 General Equilibrium 129

10 Discontinuity in Continuous Time 141

10.4 Random Jump Size . . . . . . . . . . . . . . . . . . 152

11 Spanning and Capital Market Theories 163

This page intentionally left blank

b2530_FM.indd 6 01-Sep-16 11:03:06 AM

Note for PhD Students

This page intentionally left blank

b2530_FM.indd 6 01-Sep-16 11:03:06 AM

This chapter derives asset prices in a one-period model. We derive

2 Advanced Finance Theories

By the assumption of a risk averse investor, U (W ) is increasing

U (W ) > 0, U (W ) < 0 and A(W ) > 0.

Using the power utility function, we get U (W ) = W −γ and

tive Risk Aversion (RRA) under power utility is RRA = γ. If γ > 0,

after applying l’ Hôpital’s rule. Essentially, log utility function is a

so ARA = η and RRA = ηW . This is why this utility function is

1.1 Risk Aversion and Certainty Equivalent

4 Advanced Finance Theories

An investor is said to be more risk averse than a second investor

Pricing Kernel and Stochastic

2.1 Arrow–Debreu State Prices

For simplicity, when there is no ambiguity, we drop the time sub-

6 Advanced Finance Theories

zero-coupon bond, xi = $1 for all i. In this case, the forward price

A set {qi } which is positive and sums to unity is a “probability”

2.1.1 The pricing kernel, φi

i.e. it is the forward price of a state-contingent claim relative to the

The pricing kernel is often stated as a function of the aggregate

Pricing Kernel and Stochastic Discount Factor 7

Figure 2.1: The pricing kernel.

forward price of asset j is

It follows that the case where φi = 1, for all i, is of particular

8 Advanced Finance Theories

and given that E[φ(xm )] = 1, we have