Vous êtes sur la page 1sur 29

International Journal of Forecasting 34 (2018) 636–664

Contents lists available at ScienceDirect

International Journal of Forecasting


journal homepage: www.elsevier.com/locate/ijforecast

Predictions of short-term rates and the expectations


hypothesis
Massimo Guidolin a , Daniel L. Thornton b, *
a
Bocconi University, IGIER, and Baffi-CAREFIN, Italy
b
D.L. Thornton Economics, LLC, United States

article info a b s t r a c t
Keywords: This paper emphasizes that traditional tests of the EH are based on two assumptions: the
Expectations hypothesis expectations hypothesis (EH) per se and an assumption about the expectations generating
Random walk process (EGP) for the short-term rate. Arguing that conventional tests of the EH need to
Time-varying risk premium
assume EGPs that may be significantly at odds with the true EGP, we investigate this
Predictability
possibility by analyzing the out-of-sample predictive performances of several models for
predicting interest rates, including a few models which assume that the EH holds in its
functional form that relates long- to short-term yields. Using US riskless yield data for a
1970–2016 monthly sample and testing methods that take into account the parameter
uncertainty, the null hypothesis of an equal predictive accuracy of each model relative to
the random walk alternative is hardly ever rejected at intermediate and long horizons. This
confirms that, at least at a practical level, the main difficulty with the EH is represented by
the effective prediction of short-term rates. We discuss the relevance of these findings for
central banks’ use of forward guidance.
© 2018 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

‘‘The forecasting of short term interest rates by long by Woodford (2003) and others, the effectiveness of mon-
term interest is, in general, so bad that the student may etary policy depends critically on a central bank’s ability
well begin to wonder whether, in fact, there really is any to affect the longer-term rates that matter most for the
attempt to forecast.’’ aggregate demand. This observation has prompted at least
Macaulay (1938, p. 33) five central banks—the Reserve Bank of New Zealand, the
Norges Bank, the Riksbank, the Czech National Bank, and
the Federal Reserve—to provide forward guidance (i.e., a
1. Introduction detailed, state-contingent commitment to a certain path of
future monetary policy actions) about the path of the rele-
The expectations hypothesis (EH) of the term structure vant short-term interest rate in an attempt to have a larger
of interest rates—the proposition that the long-term rate effect on longer-term interest rates (e.g., see Andersson &
is determined by the market’s expectations of short rates Hoffman, 2010; Kool & Thornton, 2015).
over the holding period of the long-term bond plus a (con- This adoption of forward guidance has occurred despite
stant) risk premium—is a key paradigm that is at the core the EH having been rejected using a wide range of interest
of the monetary policy transmission mechanism. Indeed, rate series, over a variety of sample periods, alternative
virtually every central bank conducts monetary policy by monetary policy regimes, and a range of other details of the
targeting a short-term rate. However, as has been noted typical research design (e.g., see Campbell & Shiller, 1991;
Della Corte, Sarno, & Thornton, 2008; Mankiw & Miron,
1986; Roberds, Runkle, & Whiteman, 1996; Sarno, Thorn-
* Corresponding author.
E-mail address: dan@dlthornton.com (D.L. Thornton). ton, & Valente, 2007; Thornton, 2005). The most common

https://doi.org/10.1016/j.ijforecast.2018.03.006
0169-2070/© 2018 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 637

explanations for the failure of the EH are that the single- ability to predict future short-term rates, its practical use-
equation models that are used most often for testing it fulness is not. For example, if the market were unable to
are subject to spurious rejections because of time-varying predict changes in the short-term rate beyond its current
risk premia, non-rational expectations, peso problems, and level, the EH could still be valid but would be of little use,
measurement errors (e.g., Bekaert, Hodrick, & Marshall, as the term spread would provide no valuable information
2001; Dai & Singleton, 2002; Driffill, Psaradakis, & Sola, about the future path of interest rates. Indeed, based on
1997; Hess & Kamara, 2005; Roberds & Whiteman, 1999; their prior failure, investors would be best advised to avoid
Tzavalis & Wickens, 1997). any temptation to predict future short-term rates.
However, Froot (1989) and others have noted that tests Testing the EH under alternative assumptions about
of the EH are really tests of two hypotheses: (a) the func- the EGP is problematic, as Thornton (2006) showed that
tional form that relates long- and short-term rates and conventional tests of the EH can yield evidence favorable
that is commonly called the EH, and (b) a hypothesis to the EH even when the EH is known to be false. More-
about the process that generates the market’s expecta- over, Bekaert, Hodrick, and Marshall (1997) showed that
tions of future short rates, the expectations generating the coefficient estimates from single-equation tests of the
process (EGP). Hence, the EH can be rejected either because EH are subject to a small-sample bias that is extremely slow
(a) is false, i.e., the linkages between long- and short- to die out as the sample size increases. As a consequence,
term rates implied by the EH are inconsistent with the rather than proposing a specific alternative EGP as others
data; or because (b) is false, with the assumed EGP being have done, we investigate the possibility that the well-
significantly at odds with the true, but unknown, EGP. It documented rejections of the EH may be due simply to an
is important to know the source of the failure of the EH. inability to forecast future short-term rates. We do this by
If the empirical failure of the EH stems from (b) rather investigating the forecasting powers of a rich set of models
than (a), the recent forward guidance policies may be ef- that have emerged in the literature (e.g., Bali, Heidari,
fective, but only if central banks can credibly commit to a & Wu, 2009; Dai & Singleton, 2002; DL, 2006; Diebold,
path for the policy rate, as Woodford (2012) emphasized. Rudebusch, & Aruoba, 2006; Duffee, 2002) and have been
However, if the rejection of the EH is due to a rejection of shown elsewhere to have predictive power. Specifically,
(a), such central bank forward guidance is unlikely to be using monthly data on US riskless pure discount bond
successful.1 yields for the period January 1970–December 2016, we use
Others have addressed this problem using alternative these models to produce real-time, out-of-sample forecasts
EGPs. Froot (1989) used survey data in order to test the EH of short-term rates. We also use a relatively simple identi-
independently of conventional assumptions on the expec- fication procedure that can accommodate time variation in
tations generating mechanism. Fuhrer (1996) compared the risk premia—a common explanation for the empirical
the observed long-term rate with that implied by the pure failure of the EH—to obtain estimates of the conditional
EH based on rational expectations of the federal funds rate expectation of the short-term rate under the assumption
obtained from a Taylor-style reaction function that allowed that the EH holds.
for shifts in the Fed’s reaction function. He found that his Hence, we consider both forecasting models that im-
EH-implied long-term rate matched the observed long- pose the EH but make no specific assumptions about the
term rate more closely than that implied by a five-variable EGP, and models that implicitly encompass a fairly broad
VAR. Kozicki and Tinsley (2001) performed a similar anal- range of sensible EGPs. Some of the models considered
ysis allowing for historical shifts in the market perceptions impose little or no structure on the term structure of
of an estimated Fed’s inflation target.2 Elsewhere, they rates, while others impose a considerable structure. For
concluded that ‘‘(...) empirical rejections might reflect in- instance, affine term structure models allow for variation
correct assumptions about expectations formation rather in the risk premia and impose no-arbitrage. Finally, we
than incorrect assumptions about the theoretical link be- generate forecasts from two naive benchmark models: the
tween long rates and short rates,’’ i.e., a rejection of the random walk model and a simple regression model that
EH (Kozicki & Tinsley, 2005 p. 444). Carriero, Favero, and forecasts the short-term rate by using the slope of the yield
Kaminska (2006) suggested that the common practice of curve, as suggested by Duffee (2002). The forecasts are
using the actual, realized short-term rate as a proxy for the computed for a range of maturities over the period April
h-period-ahead expectation of the short-term rate may be 1983–December 2016. However, because we are interested
grossly inappropriate, and report that the evidence against in providing a better understanding of the empirical failure
the EH is reduced by using an alternative EGP. of the EH prior to the financial crisis (when the results
Our research is also motivated by the fact that, while cannot be affected by the financial market instability that
the validity of the EH may be independent of the market’s followed the Lehman Brothers’ bankruptcy in September
2008), we place a special emphasis on results for the period
1 Note that finding that (b) is empirically plausible, i.e., that there is an up until August 2008, and treat the sub-sample September
EGP that supports the notion that future short rates are predictable, does 2008–December 2016 as a robustness check. There is of
not imply that the EH holds. In fact, it merely makes the logical standing course a fairly extensive body of literature on the fore-
of the EH more fragile, because any direct rejection of the EH restriction casting of short-term riskless rates; however, we believe
must then derive from a rejection of (a).
2 Kozicki and Tinsley (2005) performed a similar analysis but empha- that both the breadth of the models that we investigate
sized the fit of long-term yields based on conventional tests of the EH,
and our exhaustive set of recursive out-of-sample exer-
rather than a comparison with the observed long-term yield as Fuhrer cises are unique, and provide an important insight into the
(1996) and Kozicki and Tinsley (2001) do. commonly-reported failure of the EH.
638 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

Our results on the abilities of all of the models examined period rate mi ≥ 0 periods forward, and π n,m denotes a
to predict future short-term rates are negative. Specifi- term-specific but constant premium.3 By construction, k is
cally, at intermediate and long-term forecast horizons— an integer and is defined as k = n/m. The most widely used
i.e., exactly when the EH would require the forecasts test of the EH is obtained by subtracting rtm from both sides
for predicting long-term rates—none of these models is of Eq. (1) and rearranging terms to yield
able to generate forecasts that are statistically superior k−1
1∑
to those obtained from the random walk model. Our EH- (rtn − rtm ) − π n,m = Et [rtm+mi ] − rtm
restricted models occasionally yield out-of-sample perfor- k
i=0
mances that are superior to those of models that require k−1
a considerable structure and that are much more difficult 1∑
= Et [∆rtm+mi ], (2)
to estimate; however, there were only a few instances k
i=1
where the EH-type forecasts dominated no-arbitrage fore-
casts based on statistical tests of equal predictive accuracy. where Et [∆rtm+mi ] ≡ Et [rtm+mi ]−rtm is the expected change in
Indeed, there were very few instances beyond a horizon of the m-period rate between times t and t + mi. Eq. (2) states
three months when any model was significantly superior that, apart from a (constant) premium, the spread between
to the random walk model using tests with a standard size. the long- and short-term rates is equal to the scaled sum of
Moreover, there were no instances of any model consis- expected future changes in the short rate. Single-equation
tently dominating any other model, as is also confirmed by tests of the EH have been derived routinely by assuming an
Hansen, Lunde, and Nason’s (2011) model confidence set EGP such that
bootstrapped tests, which always include the random walk
rtm+mi = Et [rtm+mi ] + ϵtm+mi , i = 1, 2, . . . , k − 1, (3)
in the smallest set of preferred models for horizons in ex-
cess of 2–3 months, both before and after the financial crisis where ϵ is i.i.d. (0, σ
m
t +mi
2
m,i )
and orthogonal to [ ] Et rtm+mi .4
of 2008. Although the overall level of predictive accuracy However, the EH per se places no restrictions on the
of the short rate increases somewhat after 1994, when the EGP for the short-term rate, i.e., on the time series of
Fed started announcing its federal funds target, in relative expectations {Et [rtm+mi ]}ki=−01 . Conventional tests of the EH
terms the random walk benchmark becomes increasingly are based on Eqs. (1) and (3), and hence, the EH will be
difficult for other models to beat. Our evidence suggests rejected if either is false. However, Eq. (3) constitutes a
that this violation of the assumption of the predictability of strong assumption about the predictability of the future
future short-term rates alone may be sufficient to account rates. If the actual EGP is significantly different from that
for the massive rejections of the EH that are found in assumed by Eq. (3), tests of the EH could reject even if the
the literature. As a consequence, our results support the long-term rates were really determined in accordance with
idea that central banks may be able to influence yields the EH.
further out on the term structure only if they can credibly
commit to a path for the policy rate, thus making it rather 2.1. Expected future short-term rates under the EH
predictable, which does not appear to have been a common
empirical phenomenon from a historical perspective. We begin by generating EH-restricted forecasts of the
The outline of the paper is as follows. Section 2 briefly short-term rate under the assumption that the EH is true,
introduces the EH and presents a simple methodology for i.e., that Eq. (1) holds. Specifically, the EH is imposed on
generating EH-restricted forecasts under the assumption riskless yield data in order to retrieve (risk-adjusted, up
that risk premia are either constant or smoothly time- to a Jensen’s inequality term) conditional expectations of
varying. We also describe Diebold and Li’s (2006) three- the future short-term rate. This procedure requires that no
factor model, Diebold et al.’s (2006) macro factor-enhanced assumptions be made about the EGP. However, because the
model, and Duffee’s (2002) affine and essentially affine estimates of the conditional expectation of the short-term
models. The time series properties and parameter esti- rate are functions of the realized long-term yields, they are
mates are discussed briefly in Section 3. Forecasts from consistent with whatever EGP the market actually used to
all of the models are compared and analyzed in Section 4. forecast the short-term rate.
Section 5 presents the results of tests of equal predictive To see how the conditional expectation of the short-
accuracy of all of the models relative to each other. Both term rate can be obtained under the assumption that the
classical equal predictive accuracy tests based on typical
loss functions and stochastic discount factor (SDF)-based 3 Shiller, Campbell, and Schoenholtz (1983) show that Eq. (1) is exact
results are presented. Section 6 concludes. in special cases and otherwise can be derived as a linear approximation of
a number of nonlinear models of the term structure.
4 Substituting Eq. (3) into Eq. (2) and parameterizing the resulting
2. The EH and the predictability of the short-term rate
expression yields:
k−1
The EH asserts that for k > 1, 1∑
rtm+mi − rtm = ς0 + ς1 (rtn − rtm ) + ηt ,
k
k−1 i=0
1∑
rtn = Et [rtm+mi ] + π n,m , n = km > m, (1) where ηt ≡ k−1 i=1 ϵtm+mi . Under the EH, ς0 = −π n,m and ς1 =
∑k−1
k 1. Usually, the EH has been investigated by testing the null hypothesis
i=0
that ς1 = 1. Estimates of ς1 are frequently positive and statistically
where rtndenotes the current n-period rate, Et [·] is the time significantly different from zero; however, the null hypothesis ς1 = 1
t conditional expectation operator, rtm+mi is the m < n is nearly always rejected with very low p-values.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 639

EH holds, it is convenient to consider the case where n = 2 Given estimates of the risk premia, the expected future
and m = 1, so that Eq. (2) is rewritten as one-month rates Et [rt +n ] are

2rt2 − 2rt = Et [rt +1 ] − rt + 2π 2,1 , (4) Êt [rt +n−1 ] = nrtn − (n − 1)rtn−1 − nπ̂ n,1 + (n − 1)π̂ (n−1),1 .
(9)
where we have simplified the notation by setting rt ≡
rt1 . Because both rt2 and rt are observable, Et [rt +1 ] can be We call these constant-risk-premium/EH-restricted fore-
estimated up to a constant premium under the EH: casts. Eq. (1) assumes that the risk premia are constant.
The empirical failure of the EH is often attributed to time-
Et [rt +1 ] = 2rt2 − rt − 2π 2,1 . (5) variation in risk premia. The effect of time variation in
the risk premia on the forecasts can be investigated by
This procedure is used commonly to estimate the for-
modifying our identification procedure to assume that ex-
ward rates by assuming that the term premium is zero
pectations are unbiased over any time horizon Q < T . As a
(i.e., π 2,1 = 0). In fact, there is a long-standing tradition of
consequence, we also compute forecasts of the future short
using the (risk-neutral, i.e., under an assumption of a zero
rate using alternative values of Q ; the smaller the value of
risk premium) forward rate to predict short-term rates Q , the more time variation in the risk premia.
(e.g., Cochrane & Piazzesi, 2005; Fama, 1976; Fama & Bliss,
1987). However, most of these tests have been of the in- 2.2. Diebold and Li’s model
sample type, which clearly limits their informativeness.
It is well known that in reality investors are not risk- As was noted in the introduction, in addition to EH-
neutral, and so the risk premia reflected in interest rates restricted predictions of future rates, we also consider
are usually positive. In fact, the failure of the EH is often several alternative term structure models for forecasting
attributed to the non-constancy of risk premia (see e.g. the short rates. Our first framework is the affine model by
simple proof provided by Engle & Ng, 1993). We reflect this Diebold and Li (2006), who use a modified version of the
fact by considering risk premia explicitly when calculating Nelson and Siegel (1987) three-factor forward rate model
the expected future short-term rate. For instance, in the to approximate the yield curve:
case above, we would first obtain an estimate of π 2,1 , called [ ]
1 − exp(−θt n)
π̂ 2,1 , and then identify the expectation of the future rate rtn = ξ1t + ξ2t
as Et [rt1+1 ] = 2rt2 − rt − 2π̂ 2,1 . In general, Eq. (5) can be θt n
[ ]
generalized easily to a recursive set of the so-called Fisher- 1 − exp(−θt n)
+ ξ3t − exp(−θt n) . (10)
Hicks formulae: θt n
Et [rt1+n−1 ] = nrtn − (n − 1)rtn−1 The parameter θt governs the exponential decay rate. Small
values produce slow decay and a better fit at longer ma-
−nπ n,1 + (n − 1)π (n−1),1 , (6) turities, while large values tend to provide a better fit at
short maturities. θt also governs where the loading on ξ3t
for all n ≥ 2, where π 1,1 = 0. However, in order to
achieves the maximum. Because the loading on ξ1t is one,
obtain the prediction Et [rt +n−1 ], an identifying assumption
meaning that its effect does not decay with the horizon
is required for estimating the term premia. Note that the n, Diebold and Li interpret it as the long-term factor that
mean forecast error for Et [rt +n−1 ] is corresponds to the level of the term structure. Because
T the factor loading on ξ2t decays monotonically from one
1 ∑{
rt +n−1 − [nrtn − (n − 1)rtn−1 ] to zero as n → ∞, ξ2t is viewed as a short-term factor,
}
T corresponding to the slope of the yield curve. In contrast,
t =1
the factor loading on ξ3t rises from zero and then decays
T
1 ∑{ back to zero as n → ∞. Hence, Diebold and Li suggest
rt +n−1 − Et [rt1+n−1 ] − nπ n,1
}
= that this factor be interpreted as the curvature of the yield
T
t =1 curve.
+ (n − 1)π (n−1),1 . (7) Rather than estimating Eq. (10) by nonlinear least
squares, Diebold and Li (henceforth DL) fix the value of θ .
If it is assumed that expectations are on average
∑T unbi- They argue that this simplifies the estimation greatly, and
ased over the sample period, i.e., that T −1 t =1 {rt +n−1 is also likely to yield more trustworthy estimates. DL set
−Et [rt1+n−1 ]} = 0, the constant risk premium π̂ n,1 can be θ̄ = 0.0609, so that the value for which the loading on
estimated recursively as5 : the curvature factor reaches its maximum corresponds to a
30-month maturity. Eq. (10) and DL’s two-step procedure
T
1 ∑{ are used to generate forecasts of rates at all maturities
π̂ n,1 = − rt +n−1 − [nrtn − (n − 1)rtn−1 ]
}
along the yield curve by first computing h-period-ahead
Tn
t =1 predictions of the factors, i.e., ξ̂1,t +h , ξ̂2,t +h , and ξ̂3,t +h . This
n−1 is done by estimating the factors in Eq. (10) from a cross-
+ π̂ (n−1),1 . (8) section of maturities for each of the first t monthly observa-
n
tions. Out-of-sample predictions of the factors are obtained
from simple AR(1) processes
5 The starting condition is given by:
π̂ 2,1 = − 2T1 ξ̂i,t = c + dξ̂i,t −1 + υit , i = 1, 2, 3,
∑T {
t =1 rt +1 − [2rt − rt ] .
}
2
(11)
640 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

and by updating the estimates of c and d recursively. forecasting model for the cross-section of N interest rates
h-period-ahead forecasts of period n rates are obtained as: is identical to Eq. (10) because the last three elements of
[
1 − exp(−0.0609n)
] each of the N row vectors λ′n are zero. However, the VAR(1)
r̂tn+h = ξ̂1,t +h + ξ̂2,t +h model (ft − µ) = A(ft −1 − µ) + υ t provides the past values
0.0609n
of the macroeconomic variables, which can help forecast
1 − exp(−0.0609n)
[ ]
the three traditional factors (ξ1t , ξ2t , and ξ3t ).
+ ξ̂3,t +h − exp(−0.0609n) .
0.0609n Although DRA experiment with full-information ML
(12) methods based on the Kalman filter, this paper opts for
a simpler, two-step implementation in which non-linear
Diebold and Li (2006) report an improvement over ran-
least squares is used to estimate the factors and the pa-
dom walk forecasts at long forecast horizons. In addition,
rameters. In fact, since De Pooter (2007) has shown that
Carriero et al. (2006) also report some (limited) outperfor-
the implied loss in predictive accuracy tends to be modest
mance of the random walk by Diebold and Li’s model for
on US data, as well as for the sake of continuity with the
short horizons, even though they provide no formal statis-
tical analysis of the improvement. This mounting evidence approach followed in Section 2.2, this paper simply sets
of the accuracy of Diebold and Li’s framework makes it an θ̄ = 0.0609 and estimates ξ1t , ξ2t , and ξ3t using OLS first
important alternative in our recursive forecasting exercise. on the cross-section of rates and then on the VAR(1) so
Also, there is a growing awareness in the literature of as to obtain recursive predictions of these very factors for
the fact that θ̂t may vary over time. As a consequence, plugging into the first equation in Eq. (13).
Section 4.3 reports a few robustness checks that allow θ̂t
to vary.6 2.4. Affine and essentially affine models

2.3. Diebold, Rudebusch, and Arouba’s macro-enhanced Duffee (2002) shows that some specific members of the
model class of ‘‘essentially affine’’ models can also beat random
walk forecasts according to a mean square forecast error
Diebold, Rudebusch, and Aruoba (henceforth, DRA) criterion, where the improvement generally increases with
extended DL’s framework to include observable macro the length of the forecast horizon. Even though Duffee
variables in a rather simple way, consistent with a well- (2002) does not test whether the differences in forecasts
developed body of literature that has attempted to map are statistically significant, this is an important finding
macroeconomic variables into the term structure of the because it suggests that no-arbitrage asset pricing models
riskless rates (see also Ang & Piazzesi, 2003; Favero, Niu, of the yield curve may be able to pin down the dynamics of
& Sala, 2012; Wu, 2006). They assume that (for n = risk premia sufficiently to allow them to produce accurate
1, 2, . . . , N) predictions. An Appendix reviews the structure and prop-
rtn = λ′n ft + ϵtn erties of affine dynamic term structure models.
]′ Given an M × 1 vector xt that comprises all of the state
1 − e−θ n 1 − e−θ n
[ ( )
λn ≡ 1 − e−θ n 0 0 0 variables (risk factors), an affine process for the yield curve
θn θn is one under which the conditional mean and variance of
ft ≡ [ξ1t ξ2t ξ3t CUt FFRt INFLt ]′ bond yields are linear affine functions of xt and for which
υt the short-term rate also follows the affine process r(t) =
⎡ ⎤
⎢ ϵt1 ⎥ δ0 + δ′ xt . Then, an affine term structure model is just a
(ft − µ) = A(ft −1 − µ) + υ t ⎢ .. ⎥
⎢ ⎥ special diffusion Markov process7
⎣.⎦ 1/2
dxt = K ( φ − xt )dt + St dWt
ϵtN M ×M M ×1 M ×1 M ×M M ×1
W O
( [ ]) √
1/2
× IID N 0,
(6,6) (6,N)
, (13) [St ]ii = αi + ξ ′i xt , i = 1, . . . , M , (14)
O H
(N ,6) (N ,N)
where Wt is an M × 1 vector of independent Brownian mo-
1/2
where CUt is the monthly manufacturing capacity utiliza- tions and [St ]ii is the ith element on the main diagonal of
1/2
tion, FFRt is the Federal funds rate (monthly average), and St , the Choleski decomposition of the covariance matrix
INFLt is the 12-month percentage change in the price defla- of shocks. The state variables are mean reverting as long
tor for personal consumption expenditures. In practice, the as the elements of K are positive. We price bonds in this
framework by assuming that the pricing kernel M has the
6 Diebold and Li (2006) integrated DL’s original two-step approach structure
into a single dynamic factor model by specifying the Nelson-Siegel
weights as an unobserved vector autoregressive process, and estimated θ dMt = −rt Mt dt − Mt Λ(xt )′ dWt , (15)
to be 0.077, which implies that the loading on the curvature factor is max-
imized at a maturity of 23.3 months. However, Yu and Zivot (2011) argue
that the process of the loadings is not very sensitive to different values of 7 Eq. (14) represents the stochastic process for the state vector under
θ , and that fixing it may maximize the forecasting power. Moreover, Yu the risk-neutral measure, i.e., without any correction for the price of risk
and Zivot (2011) examined the OOS performances of one-step state space factors. Implicitly, we assume all of the necessary restrictions to ensure
models vs. two-step dynamic DL models and found, surprisingly, that that the linear affine dynamics is well defined, which requires that αi +ξ ′i xt
the state space approach does not improve OOS predictions for Treasury is nonnegative for all i and for all possible values of xt , see e.g. Dai and
yields. Singleton (2002).
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 641

where Λt ≡ Λ(xt ) is the M × 1 vector of prices of the likelihood function of the data: the measurement errors
risk associated with each of the M risk factors. By Ito’s collected in the (N − M) × 1 vector ϵt are jointly normally
lemma, d ln Mt = (−rt − 21 Λ′t Λt )dt −Λ′t dWt , and it is distributed with a constant covariance matrix and density
Q
easy to prove that dWt = dWPt + Λt dt, so that the physical φϵ (ϵt ). At this point, we stack the yields observed without
representation of the process for the state vector is: errors in the vector Yt and the yields observed with error
in the vector Y̌t . Denote the parameter vector by θ . Because
1/2 1/2
dxt = K(φ − xt )dt − St Λ(xt )dt + St dWPt , the distribution of Yt +1 conditional on Yt is

1/2 1
[St ]ii = αi + ξ ′i xt , i = 1, . . . , M . (16) fY (Yt +1 |Yt ) = ⏐ ⏐ f (x̂ |x̂ ), (19)
⏐det(B̈′ )⏐ X t +1 t
ni
Eq. (16) allows us to compute the moments implied by
any parameter configuration, including the vector of risk where B̈ni is a matrix defined in the Appendix with a struc-
premia Λ(xt ), and plays a key role in a quasi-maximum ture that depends on the complete or essentially affine
likelihood estimation (QMLE) approach that relies on a nature of the model, the log-likelihood of observation t
multivariate normal set-up, in spite of the discretization for Y̌t is ℓt (θ ) = ln fY (Yt |Yt −1 ) + ln φϵ (ϵt ). The estimated
that is applied to the model when estimation is performed. QMLE
The derivation of the bond prices is then straightforward parameter vector θ̂ T is chosen to solve
(see the Appendix and Duffee, 2002). T
Within the general affine class, two important cases are

max ℓ t (θ )
obtained depending on whether Λ(xt ) is parameterized as θ
t =1
either [ ]
T
∑ 1
Λn (xt ) = λn (αn + Bn xt )
′ 1/2
(n = 1, . . . , M) or (17) = max fX (x̂t +1 |x̂t ) + ln φϵ (ϵt ) , (20)
θ |det(B̈′τi )|
t =1

where fX (x̂t +1 |x̂t ) follows a multivariate Gaussian distribu-


Λn (xt ) = λ1n (αn + Bn′ xt )1/2 tion for which it is tedious but possible to derive closed-
1
⎡⎧
form representations of the first and second conditional

⎨ inf(αi + Bi′ xt ) > 0
+ λ2n′ ⎣
(αn + Bn′ xt )1/2 i ⎦ xt , moments.9

0 otherwise
2.5. Other benchmarks
(18)
where Bn′ is the nth row of a matrix B . The completely affine We also implement two models that are used fre-
case occurs when αn = 0 (n = 1, . . . , M) in Eq. (17), but quently by practitioners. The simplest benchmark is
Bn is an estimable vector. The models by Vasicek (1977), a random walk, where the month t yield on an
Cox, Ingersoll, and Ross (1985, henceforth CIR), and Duffie n-maturity bond is used as the forecast of the month t + h
and Kan (1996) are all completely affine models, with the yield on an n-maturity bond. We also consider what Duffee
Vasicek and CIR models being restricted versions of the (2002) calls ‘‘a more sophisticated benchmark,’’ where
completely affine family.8 The essentially affine case of the forecast of the future short-term rate is based on the
Duffee (2002) consists of Eq. (18). An important limitation slope of the yield curve. We implement such a dependence
of completely affine specifications of Λt is that the tempo- of future short rates on the slope of the term structure
ral variation in the instantaneous expected excess returns through the standard regression
on n-period zero coupon bonds is determined entirely by
the volatilities of the state variables. Moreover, the sign rtm+h − rtm = ς0,h + ς1,h (rt5Y − rt3m ) + ζtm+h , (21)
of each Λn (xt ) is fixed over time and determined by the
sign of the coefficients in Bn . The essentially affine setup in where rt5Y
is the 5-year yield and rt3m
is the 3-month rate.
Eq. (18) allows for variation in the prices of the risk inde- The parameters of Eq. (21) are estimated recursively with
pendent of the volatilities, which is the kind of flexibility monthly updating in order to produce forecasts of the
that is needed to fit the empirical behavior of excess bond short-term rate.
returns.
In terms of estimation, we adopt Duffee’s QMLE, which 3. Data and estimation results
is therefore based on only two moments. Assume that the
yields on M bonds are measured without error at each The data used in this analysis are end-of-period
month-end t, t = 1, . . . , T . The yields on N − M other monthly observations on continuously-compounded yields
bonds are assumed to be measured with serially uncor- on U.S. riskless pure discount bonds. The raw data are
related, mean-zero measurement errors. As is common in from Bloomberg. The riskless pure discount bond yields
the literature, we impose structure on the joint distribution are obtained using FORTRAN code provided by Robert Bliss
of measurement errors and yields in order to derive the and Dan Waggoner based on Bliss (1997) and Waggoner
(1997). The yields are calculated for bonds with maturities
8 Vasicek’s model is obtained when B = 0 (n = 1, . . . , M), so
n
that Λ(xt ) becomes a vector of constant prices of risk. Likewise, the CIR 9 Estimation has been performed by updating the Fortran code kindly
framework is obtained when αn = 0 and Bn = ιn (n = 1, . . . , M), so that made available by Greg Duffee. Additional details are reported in a tech-
the prices of risk are time-varying and simply identical to the risk factors. nical Appendix.
642 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

of 1, 2, 3, 6, 9, 12, 15, 18, 24, 30, 36, 48, 60, 72, 84, are those with maturities of three months, two years, and
96, 108, and 120 months for the period January 1970 to five years. The remaining maturities fill the gaps in the
August 2008. Such data are then extended from September term structure and are assumed to be measured with error.
2008 to December 2016 using end-of-month, constant- For each of the models investigated, we also entertain a
maturity rates from the H15 statistical table provided by more parsimonious, scaled-down specification based on
the Federal Reserve as raw data for obtaining riskless yields the following algorithm:
with reference to the maturities of 1, 2, 3, 6, 12, 24, 36,
48, 60, 72, 84, 96, 108, and 120 months. Table A1 in the • compute the (Wald) t-statistics for the unrestricted
online appendix reports summary statistics for implicit parameter estimates;
zero coupon yields and the original pre-crisis sample. As • set to zero all parameters for which the (robust)
would be expected, on average the term structure of US p-value exceeds 0.10; and
riskless rates has maintained a moderately positive slope, • re-estimate the model under these restrictions.
with average nominal yields ranging from 5.7% at the short
Finally, we perform a recursive pseudo out-of-sample
end to 7.3% at the long end. This finding also holds with
exercise with a block structure, in the sense that param-
reference to median rates. All of the yield series are clearly
eter estimates are updated with a bi-annual frequency,
non-Gaussian and—even after first-differencing—appear to
i.e., starting with 1970:01–1982:12, followed by 1970:01–
contain strong patterns of heteroskedasticity (the squared
1984:12, etc., up to 1970:01–2006:12. An appendix re-
changes in the interest rates are serially correlated) and
ports full-sample estimates for all of the completely and
considerable serial correlation, especially at the shortest
essentially affine models listed above, along with a few
end of the yield curve.10
related comments. Our findings confirm Duffee’s ( 2002)
The forecasts from the DL model are obtained by es-
conclusions that models that are better able to produce
timating the three factors using all of the available rates
time-varying volatilities lead to higher maximized log-
for each month over the in-sample estimation period (see
likelihood (QML) values than models with time-invariant
below for details), after which h-period-ahead forecasts of
yield volatilities—QML values increase monotonically as
each of the three factors are obtained from Eq. (11) using
the number of factors that affect the volatilities increases
these estimates. These forecasts are then used to obtain
from zero to three. Based on the in-sample maximized
rate predictions using Eq. (12): forecasts of the 1-month
log-likelihood values, the additional flexibility offered by
T-bill rate for horizons of 1 and 2 months, and of the
essentially affine models over completely affine models
3-month bill rate at horizons of 3, 6, 9, 12, and 15 months.
turns out to be important.
Although they are not shown here, the estimated, recursive
factors correspond very closely to the estimates of the
4. Forecasting performance
level, slope, and curvature factors that are obtained from
the first three principal components of the yield data, and
We investigate the recursive (pseudo) out-of-sample
are similar to the results reported by DL (2006). Over the
forecasting performances of the models presented in
comparable period, the same can be said with respect to
Section 2. For each model, we predict the 1-month rate
the estimated and predicted interest rate factors of DRA
for the 1- and 2-month horizons, and the 3-month rate at
(2006), who specified observable macroeconomic variables
the 3-, 6-, 9-, 12-, and 15-month horizons, similar to what
for forecasting these very factors.
is typical in the literature (see e.g. Exterkate, van Dijk,
Similarly to Littermann and Scheinkman (1991) and
Heij, & Groenen, 2013). The DL, DRA, affine, and OLS slope-
Duffee (2002), and to encourage comparability across dif-
based forecasts are initialized using data for the period
ferent frameworks, all of the affine models also assume
January 1970–December 1982. When h > 1, the pseudo
three underlying factors (M = 3). We estimate four
out-of-sample evaluation period is 1983:04–2008:08 − h
different three-factor models: a completely affine, mean-
months, where the first date is selected for reasons of
reverting purely Gaussian model (L = 0 < M = 3);
symmetry that are related to a few additional analyses that
a completely affine model with L = 2 and M = 3; an
are performed in Section 6.1.12
essentially affine model that is designed to capture the
volatility dynamics with a high accuracy (L = 1 < M = 3);
4.1. EH-restricted forecasts
and an essentially affine Gaussian model that trades off the
ability to fit volatility dynamics with the ability to induce
We first generate EH-restricted forecasts under the as-
a rich time variation in bond yields (L = 0 < M = 3).11
sumption that the term premium is constant over the en-
We assume that the bonds with no measurement errors
tire sample period. The estimates of the constant premia
are π̂ 2,1 = 0.147, π̂ 3,1 = 0.281, π̂ 6,3 = 0.241, π̂ 9,3 =
10 Unreported sample autocorrelations indicate that the yields could be
0.352, π̂ 12,3 = 0.470, and π̂ 15,3 = 0.603. These estimates
integrated of order one. If such is the case, the underlying process is non-
stationary, and it would be necessary to take the first difference of the
yields to obtain valid sample inferences. However, in line with economic a standard overidentifying test) when compared to the corresponding
theory (yields must have a finite, non-negative expected value), yields essentially affine model. The L = 0 < M = 3 essentially affine model
cannot be integrated. Moreover, our interest lies in forecasting. Hence, is selected because of its good forecasting performance, as documented
following the approach taken in almost all of the literature, yields are by Duffee (2002).
modelled in levels throughout this paper. 12 An earlier draft that entertained a slightly longer OOS period,
11 The completely affine model with L = 2 and M = 3 is selected 1982:01–2008:08 − h months, obtained qualitatively similar, if anything
over the case of L = 3 because this model fails to be rejected (using stronger, results.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 643

are by no means excessive and increase at a decreasing rate the implied forward-type forecasts have practically zero
as the term to maturity lengthens, as one might expect. means, especially at short horizons. The medians are also
An analysis of the forecast errors in Table 1 under the small and positive, especially at short horizons, but oc-
assumption that the premia are constant reveals that they casionally turn negative at longer horizons in the case of
are very similar to those obtained from the random walk constant risk premia. Allowing for time-varying premia
model. However, forecast errors are occasionally large in shrinks the mean and median errors further at horizons
absolute value and, not surprisingly, the forecast errors are in excess of three months. In panel A, the summary statis-
largest in the early 1980s. Moreover, their absolute size and tics of errors from the random walk show that the mean
variability tend to increase monotonically as the forecast forecast errors are slightly negative at nearly all horizons,
horizon lengthens. Of course, apart from a constant re- indicating a tendency of the random walk model to under-
scaling, these forecasts are typical of those in the (implied) predict the corresponding short-term rates. Moreover, the
forward rate literature, where the risk premium is set to under-prediction increases monotonically as the invest-
zero. Their similarity to the errors from the random walk ment horizon lengthens. The similarity in the summary
model suggests that predicting future short rates in the statistics suggests a high degree of correspondence be-
US Treasury market may require considerably more effort tween EH-restricted and random walk forecasts. A com-
than simply calculating implied forward rates. parison of Panels A and B show that the EH-restricted and
random walk forecasts are similar. At short horizons, the
EH-restricted model is favored slightly; however, at long
4.2. Time-varying risk premia horizons—when h exceeds 9 months—the results favor the
random walk over the EH-restricted model. The perfor-
We investigate the effect of time variation in risk premia mance difference is greatest at the 12-month horizon, at
on the forecast errors by computing EH-restricted risk pre- nearly 7%.
mia under the assumption that the forecast errors average The EH-restricted forecasts improve at horizons of one
to zero over a rolling window of Q observations as in Eq. (8). to two months when the risk premium varies over time.
Several alternative values of Q were considered. While the Indeed, this specification outperforms both the random
time variation in the estimated risk premia was sensitive walk and constant-risk-premium models for the 1-month
to the choice of Q , the estimated forecast errors were not. rate series at both the 1- and 2-month horizons. Moreover,
As a consequence, the results are presented for Q equal as was the case with the constant-risk-premium speci-
to 60 months. Interestingly, the estimated premia decline fication, the forecast performance deteriorates markedly
below their full-sample average during the period of the as the forecast horizon lengthens. Indeed, the model per-
so-called ‘‘great moderation’’. A comparison of the forecast forms considerably worse than the constant-risk-premium
errors under the random walk vs. the time-varying term model for horizons of six months or longer. For horizons
premium EH-restricted case shows that the differences in beyond six months, the constant-risk-premium specifica-
the forecast errors are small at the 3-month horizon, but tion has a clear performance advantage that is driven by
larger at the 15-month horizon. However, as we discuss be- the lower variance of the resulting forecast errors. An addi-
low, there is relatively little difference in average forecast tional analysis with smaller values of Q (not reported here)
performances. Hence, allowing for a considerable variation confirms this characterization—the out-of-sample perfor-
in risk premia appears to have relatively little effect on the mance improves for small values of h but gets progres-
predictive power of imposing the functional form of EH. sively worse relative to the random walk benchmark as h
Table 1 presents summary statistics for monthly fore- increases towards 15. Hence, the net advantage of trying to
n,k n,k capture slow movements in risk premia in this fashion is
cast errors—et ,h ≡ rtn+h − r̂t ,t +h , where n is the maturity, k
unclear. In any event, the results suggest that the 3-month
denotes a model, and h is the horizon—for all horizons and
rate is dominated by news that market participants are
models. The table presents standard summary measures unable to forecast, especially for horizons in excess of three
of the forecasting accuracy, i.e., the mean squared forecast months.
error (MSFE), for the random walk benchmark only (panel
A), its square root (which can be compared directly to the 4.3. Diebold and Li’s forecasts
scale of the predicted series), and the ratio of the MSFE of
each model to that of the random walk, which we define as An analysis (unreported) of the forecast errors from the
the relative MSFE (RMSFE). Thus, a RMSFE value of less than DL and random walk models shows that the errors com-
(greater than) one indicates that a given model performs puted from the latter track the DL errors closely at most
better (worse) than a random walk. To facilitate compar- horizons. Although the differences increase as the horizon
isons, values less than one are shown in italics. In the lengthens, they appear to remain relatively modest even
table, the best performing models that achieve the smallest at the 15-month horizon. This impression is confirmed
MSFEs are also shown in bold. We have also computed and by the statistics on the performance of the DL models
tabulated mean absolute forecast error (MAFE) measures, that are presented in panels D to G of Table 1. Panel D
and these led to qualitatively similar results (which are presents the results for the DL model with θ̄ = 0.0609.
available upon request). This specification performs worse than both the random
Panels B and C report the performances of EH-restricted walk benchmark and the best EH-restricted models at all
predictions with a constant premium and a time-varying horizons. Indeed, the relative forecast performance of the
premium relative to the random walk. Unsurprisingly, DL model ranges from 1.04 at the 1-month horizon to 1.19
644 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

Table 1
Summary statistics for monthly forecast errors.
Statistics h=1 h=2 h=3 h=6 h=9 h = 12 h = 15
1-month rate 3-month rate
Panel A - Random Walk
Mean −0.023 −0.043 −0.065 −0.131 −0.192 −0.277 −0.358
Median 0.004 0.000 0.002 −0.058 −0.146 −0.245 −0.295
Max. 1.337 1.509 1.201 1.747 2.476 3.305 3.467
Min −1.830 −2.593 −2.462 −2.957 −3.616 −4.380 −4.588
S.D. 0.373 0.525 0.580 0.928 1.252 1.553 1.805

MSFE 0.140 0.277 0.341 0.878 1.605 2.487 3.386


RMSFE 0.374 0.526 0.584 0.937 1.267 1.577 1.840
Panel B - EH-Restricted Model (constant risk premium)
Mean 0.028 0.084 0.070 −0.018 −0.265 −0.427 −0.441
Median 0.086 0.190 0.206 0.159 −0.060 −0.264 −0.200
Max. 0.922 1.069 1.440 1.531 1.911 2.657 2.676
Min −1.716 −2.586 −2.669 −4.287 −4.114 −5.559 −5.248
S.D. 0.362 0.504 0.558 0.923 1.254 1.622 1.819

MSFE 0.132 0.261 0.316 0.852 1.642 2.813 3.502


RMSFE 0.971 0.971 0.963 0.985 1.011 1.064 1.017
Panel C - EH-Rest. Model (time-varying risk premium, Q = 60)
Mean 0.044 0.100 0.053 0.061 −0.037 −0.078 −0.108
Median 0.083 0.158 0.117 0.205 0.181 0.221 0.236
Max. 1.066 1.301 1.494 2.252 2.748 4.357 4.638
Min −1.660 −2.370 −2.565 −4.419 −5.914 −7.464 −7.926
S.D. 0.343 0.486 0.589 1.006 1.520 2.117 2.449

MSFE 0.120 0.246 0.350 1.016 2.311 4.488 6.009


RMSFE 0.926 0.942 1.013 1.076 1.200 1.343 1.332
Panel D - Diebold and Li’s (fixed, non-estimated θ ) AR(1)
Mean −0.188 −0.288 −0.293 −0.567 −0.804 −1.039 −1.257
Median −0.166 −0.222 −0.216 −0.475 −0.614 −0.692 −0.955
Max. 0.559 1.110 1.217 1.532 1.715 2.023 2.054
Min −1.968 −2.887 −2.777 −3.078 −4.126 −4.971 −5.817
S.D. 0.339 0.509 0.642 1.016 1.318 1.578 1.786

MSFE 0.150 0.342 0.498 1.353 2.384 3.570 4.769


RMSFE 1.037 1.111 1.208 1.242 1.219 1.198 1.187
Panel E - Diebold and Li’s (fixed, non-estimated θ ) VAR(1)
Mean −0.133 −0.184 −0.141 −0.308 −0.462 −0.637 −0.806
Median −0.069 −0.103 −0.083 −0.281 −0.425 −0.599 −0.745
Max. 0.543 1.010 1.213 1.678 2.144 2.603 2.906
Min −1.865 −2.688 −2.475 −2.855 −3.475 −4.911 −5.565
S.D. 0.321 0.460 0.560 0.886 1.187 1.459 1.684

MSFE 0.121 0.246 0.333 0.879 1.622 2.535 3.485


RMSFE 0.929 0.941 0.989 1.001 1.005 1.010 1.015
Panel F - Diebold and Li’s (θ recursively estimated on OOS data)
Mean −0.139 −0.169 0.030 −0.194 −0.420 −0.651 −0.869
Median −0.125 −0.140 0.125 −0.102 −0.319 −0.532 −0.752
Max. 1.050 1.170 1.352 1.580 2.104 2.480 2.142
Min −1.959 −2.742 −2.379 −2.696 −3.624 −4.843 −5.271
S.D. 0.357 0.487 0.569 0.888 1.183 1.456 1.672

MSFE 0.147 0.266 0.324 0.826 1.576 2.545 3.551


RMSFE 1.025 0.980 0.975 0.970 0.991 1.011 1.024

Panel G - Diebold and Li’s (single-step, time-varying θ )


Mean −0.194 −0.285 −0.270 −0.532 −0.760 −0.988 −1.201
Median −0.178 −0.219 −0.187 −0.425 −0.560 −0.723 −0.892
Max. 0.737 1.257 1.251 1.562 1.748 2.083 2.134
Min −1.943 −2.877 −2.885 −3.384 −4.271 −5.271 −6.383
S.D. 0.364 0.539 0.649 1.029 1.333 1.605 1.830

(continued on next page)


M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 645

Table 1 (continued)
Statistics h=1 h=2 h=3 h=6 h=9 h = 12 h = 15
1-month rate 3-month rate
MSFE 0.170 0.372 0.495 1.342 2.355 3.552 4.790
RMSFE 1.104 1.159 1.204 1.237 1.211 1.195 1.189
Panel H - Yield-macro model (fixed, non-estimated θ ) VAR(1)
Mean −0.080 −0.103 −0.030 −0.116 −0.201 −0.345 −0.512
Median −0.046 −0.039 0.025 −0.056 −0.202 −0.295 −0.521
Max. 1.950 1.696 1.667 2.544 4.348 5.070 6.003
Min −1.885 −2.734 −2.515 −2.634 −3.692 −4.207 −4.608
S.D. 0.365 0.479 0.561 0.907 1.260 1.546 1.780

MSFE 0.140 0.240 0.316 0.835 1.629 2.510 3.431


RMSFE 1.001 0.930 0.963 0.976 1.007 1.005 1.007
Panel I - Slope-Based Benchmark Errors
Mean −0.043 −0.148 −0.067 −0.064 −0.060 −0.056 −0.060
Median −0.021 −0.117 −0.084 −0.088 −0.073 −0.072 −0.068
Max. 1.504 1.206 1.613 1.622 1.612 1.616 1.616
Min −2.546 −1.950 −1.462 −1.461 −1.444 −1.422 −1.455
S.D. 0.521 0.374 0.405 0.406 0.411 0.412 0.409

MSFE 0.273 0.162 0.168 0.169 0.172 0.173 0.171


RMSFE 1.399 0.764 0.702 0.439 0.327 0.264 0.225
Panel L - Unrestricted Completely Affine Gaussian Model A_0(3)
Mean −0.019 −0.051 0.024 0.061 0.101 0.125 0.154
Median 0.015 0.029 0.081 0.054 0.144 0.188 0.216
Max. 0.701 1.316 1.454 2.260 2.988 3.840 3.964
Min −1.681 −2.381 −2.033 −2.792 −3.272 −3.767 −3.765
S.D. 0.315 0.569 0.580 0.915 1.201 1.422 1.585

MSFE 0.099 0.326 0.337 0.841 1.452 2.037 2.535


RMSFE 0.844 1.085 0.995 0.979 0.951 0.905 0.865
Panel M - Restricted Completely Affine Model A_2(3)
Mean −0.013 −0.053 0.072 0.258 0.519 0.786 1.056
Median 0.025 0.030 0.140 0.252 0.580 0.848 1.129
Max. 0.696 1.333 1.430 2.409 3.371 4.500 4.867
Min −1.676 −2.401 −2.116 −2.605 −2.870 −3.066 −2.831
S.D. 0.315 0.567 0.568 0.892 1.175 1.396 1.556

MSFE 0.099 0.325 0.328 0.863 1.650 2.567 3.534


RMSFE 0.843 1.082 0.980 0.991 1.014 1.016 1.022
Panel N - Unrestricted Essentially Affine Gaussian Model A_0(3)
Mean −0.016 −0.030 0.030 0.037 0.042 0.039 0.006
Median 0.021 0.032 0.085 0.043 0.075 0.053 0.030
Max. 0.664 1.628 1.153 1.492 1.768 2.046 2.025
Min −1.460 −2.082 −1.704 −2.073 −2.119 −2.150 −2.140
S.D. 0.294 0.546 0.473 0.650 0.770 0.830 0.853

MSFE 0.087 0.299 0.224 0.424 0.594 0.690 0.728


RMSFE 0.789 1.038 0.811 0.695 0.608 0.527 0.464
Panel 0 - Restricted Essentially Affine Model A_1(3)
Mean −0.010 −0.048 −0.214 −0.382 −0.445 −0.459 −0.436
Median 0.021 0.036 −0.163 −0.316 −0.333 −0.323 −0.302
Max. 0.692 1.355 1.272 1.806 2.320 2.835 2.911
Min −1.580 −2.372 −2.496 −3.752 −4.297 −4.439 −4.355
S.D. 0.307 0.565 0.595 0.938 1.161 1.278 1.334

MSFE 0.094 0.322 0.400 1.026 1.546 1.845 1.971


RMSFE 0.821 1.078 1.084 1.081 0.981 0.861 0.763
The table shows summary statistics for monthly forecast errors for a range of models and horizons. T-Bills at the 1- and 3-month maturities are considered.
For the sake of comparability, these statistics are calculated using forecast errors over the common out-of-sample period, 1983:04-2008:07. Panels B and
C report the performance of the EH-implied forecasts with constant and time-varying risk premia (computed using a rolling window of Q = 60 months),
respectively. MSFE and RMSFE are the mean and root-mean squared forecast errors. RMSFE is relative to the random walk model; values less than one are
shown in italics. The best performing models with the smallest MSFEs are shown in bold.
646 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

at the 15-month horizon, with a peak (1.24) at the 6-month 4.4. Slope-based naive predictions
horizon.13
We investigated whether the relatively poor perfor- Panel I of Table 1 presents the forecasting performance
mance of the DL model was due to θ being held constant of the naive slope-based model, a popular benchmark with
over the forecast period by considering two alternative fixed income practitioners. Consistent with Duffee’s re-
specifications. First, we allowed θ to change empirically sults, the naive benchmark tends to produce inaccurate
over time, similarly to Exterkate et al. (2013), by estimat- forecasts; however, there are some exceptions. The model
ing θ recursively using nonlinear least squares over our outperforms the random walk benchmark at horizons of 12
(pseudo) out-of-sample period. The estimates of θ were and 15 months, and also outperforms all three DL imple-
larger than θ̄ = 0.0609 on average, were subject to wild mentations at the 12- and 15-month horizons. The better
gyrations, and occasionally took values that could not be performance of the slope-based model relative to standard,
meaningful, resulting in poor forecasts. As a consequence, fixed θ , DL models is surprising, in view of the fact that
we simply repeated the same exercise as in panel D, but the latter framework also contains information about the
using a larger value of θ , 0.1125.14 Our second approach shape of the yield curve and may cast doubt on the net
was to treat θt as a time-varying parameter. Specifically, we predictive contribution of the factors that help in forecast-
follow the state space approach of Koopman et al. (2010) ing the level or convexity of the term structure in Eq. (12).
and Koopman and van der Wel (2012) in which θt becomes However, the fact that none of these models—including
a latent fourth factor to be estimated jointly through an simple term spread regressions—perform markedly better
application of the Kalman filter to a linearized version of an than a random walk also suggests that information about
expression that extends Eq. (10) to include a fourth factor, the yield curve may not be particularly useful for predicting
together with ξ1,t , ξ2,t , and ξ3,t . the future short-term rate.
The results based on θ̄ = 0.1125 are reported in Panel
E. Because the DL forecasts in panel D were computed 4.5. Completely affine models
under the assumption of simple AR(1) processes for all of
the factors whereas the affine models implemented below Panels L and M of Table 1 report on the performances
correspond to various different types of homoskedastic and of completely affine models, where risk premia are simply
heteroskedastic restricted latent VAR(1) models, Panel E linear functions of the variances of the risk factors. To
implements a richer, VAR(1) version of DL’s framework, in save space, the results are reported only for the two best-
which lags of ξ1,t , ξ2,t , and ξ3,t all possibly forecast ξ1,t +1 , performing cases of restricted and unrestricted affine mod-
ξ2,t +1 , and ξ3,t +1 , etc. There is a visible improvement in els: the unrestricted completely affine, purely Gaussian
the forecasting accuracy of this specification relative to the model with L = 0 and the restricted (more parsimonious)
previous ones at all horizons, but the performance is still completely affine model with L = 2.15 Again in this case,
not strong enough to be superior to the random walk. The plots of the forecast errors (omitted to save space) dis-
results of allowing for time variation in θ , shown in Panel F, play only modest differences in the errors either between
indicate that, while the variability in θt is modest (which is the completely affine models or from the benchmark
consistent with the post-1982 findings of (Koopman et al., model at the 3-month horizon. The forecast errors of the
2010), the performance of this specification is considerably unrestricted and random walk models remain relatively
better than those of either of the previous DL specifications, similar at the 15-month horizon, while those from the
especially at intermediate forecast horizons. Nevertheless, restricted model tend to be larger when the forecast errors
in overall terms, the DL model performs similarly to the are positive and smaller when the errors are negative. The
random walk benchmark at most horizons but gets worse overall impression that the random walk is not dominated
for 12- and 15-month predictions. by either specification is reflected in the forecast metrics
Panel H of Table 1 shows the predictive results ob- presented in Table 1. There are now eight horizon/rate
tained from the DRA’s macro-augmented framework. The combinations (out of 14 possible) for which the completely
payoff from including macroeconomic information in DL’s affine models outperform the random walk; however, the
framework is considerable relative to the standard case, margin of superiority in performance is generally modest.
but fails to lead to sufficient improvements to also outper- Interestingly, the unrestricted affine framework tends to
form the random walk, and hence establish the existence perform better than the restricted one. The performances
of predictability in Treasury rates. Interestingly, given an of the completely affine models are generally superior to
approximately identical variance of the prediction errors, those of the DL models, especially at horizons of six months
the improvement over DL implementations that exclude or longer, and are superior to those of both of the EH-
macroeconomic information comes almost entirely from restricted models at horizons of nine months or longer.
lower means and medians of the forecast errors.
4.6. Essentially affine models
13 This is not completely surprising, given that Diebold and Li (2006)
found that their model did not produce significantly better one-month- The forecasting results for two essentially affine models
ahead forecasts than a random walk model using a much shorter sample are presented in panels N and O of Table 1. A visual anal-
period. However, their model produced better 1-year-ahead forecasts ysis of the forecast errors shows that the two essentially
than ten competing models in terms of RMSFEs.
14 This value is also the approximate average of Koopman, Mallee, and
Van der Wel’s (2010) time-varying estimates of θ obtained through a 15 Complete results for both restricted and unrestricted models are
Kalman filter for the period 1982–2002. available from the authors upon request.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 647

affine models have nearly identical forecast errors until The DMW test for a pair of models indexed as k1 and k2
the mid-to-late 1990s, when they begin to diverge, with is based on the statistic
the restricted model’s performance deteriorating relative
d
to those of both the unrestricted and benchmark models. DMW k1 ,k2 ≡ √ , (22)
Moreover, both models tend to perform well relative to Var(d)
ˆ
the benchmark for much of the out-of-sample period. In
where d is an average over P observations of the values
Table 1, the unrestricted essentially affine model forecasts n ,k
are superior to those of the restricted essentially affine taken by some differential in loss functions, dt ≡ ℓ(et ,h 1 ) −
n,k
model. Indeed, there is no instance in which the restricted ℓ(et ,h 2 ), where ℓ(·) is a generic loss function and Var(d)
model’s performance is superior according to MSFE. This is the sample variance of d. The DMW statistic has an
means that imposing parametric restrictions based on asymptotic standard normal distribution under the null
in-sample t-statistics hurts the model’s forecasting perfor- hypothesis that E [d] = 0, which corresponds to a null of no
mance. It is also interesting that the unrestricted essen- differential predictive accuracy. Following standard prac-
tially affine model does not fit particularly well in-sample tice, the variance of d is estimated using a heteroskedastic
because the structure proposed for the dynamics in the autocorrelation-consistent estimator,
second moments remains rather rudimentary. Neverthe- ⎡ ⎤
K −1
less, as Duffee (2002) noted, this model is capable of fitting ∑
many different types of shapes in the term structure. This ˆ = P −1 ⎣ϕ̂0 + 2P −1
Var(d) (P − K )ϕ̂j ⎦ , (23)
greater flexibility appears to be rewarded by competitive j=1
out-of-sample results. Unlike other models that utilize in- ∑P
formation about the shape of the term structure, this model where ϕ̂j ≡ (P − j)−1 t =j+1 (dt − d̄)(dt −j − d̄). Based on
performs relatively well compared to the random walk. the findings of Harvey, Leybourne, and Newbold (1997),
Indeed, it outperforms the benchmark for the 3-month rate the modified DMW test
at horizons of six months or longer and for the 1-month
MDMW k1 ,k2
rate for the 1-month horizon. The performance advan- ]−1/2
P + 1 − 2K + P −1 K (K − 1)
[
tage is occasionally large, as essentially affine models also
≡ DMW k1 ,k2 (24)
outperform the time-varying risk-premium EH-restricted P
model at several horizons.
is reported in what follows. The MDMW statistic corrects
5. Tests of out-of-sample equal predictive accuracy for size distortions that may be associated with the DMW
statistic.17
The results presented in Table 1 suggest that it is very West (1996) has shown that, when loss functions de-
difficult to improve on the predictions from the random pend on estimated parameters, Eq. (23) generally provides
walk. This section tests whether any of the differences in a valid estimate of the asymptotic variance of d only in
performance that were commented on above are statisti- special circumstances, such as when the models are es-
cally significant. As was noted earlier, finding that none of timated consistently by OLS and the loss function is a
the models provide statistically significant improvements squared function (i.e., under MSFE). In general, however,
relative to the random walk model limits the practical the structure of Var(d) is
usefulness of the EH significantly. There are two problems
Var(d) = Var(d̂) + 2λdm (FUCov ′ (d, m))
associated with testing for differences in predictive accu-
racy. The first is sampling variation: in the presence of + λmm FUVar(m)U′ F′ , (25)
rather small differences between the predictive accuracies
where, in the case of a recursive forecasting exercise, λdm =
of the random walk and the other model, it is possible that
1 − R/P ln(1 + P /R), λmm = 2[1 − R/P ln(1 + P /R)], P =
our already occasional and generally weak findings in favor
292 − h (the number of recursive pseudo out-of-sample
of either EH-restricted or affine-type, no-arbitrage frame-
forecasts), and R = 159 (the training sample used for
works may be due to pure chance. We therefore investigate
the statistical significance of the differences between com- estimation in this paper). F and U are matrices that depend
peting models using the Diebold–Mariano–West (DMW; on the data used in estimation, as well as on derivatives of
see Diebold & Mariano, 1995; West, 1996) test. The second the loss functions with respect to unknown parameters to
difficulty stems from the fact that, with the exception of be computed in correspondence to the true but unknown
the random walk and DL under a fixed θ , the models all population parameters (see McCracken, 2004). Finally, m
contain parameters that had to be estimated. A sensible denotes the time series of the scores generated by each
procedure is to use a test that takes into account parameter model, when estimation occurs by QML. McCracken (2004)
uncertainty. Consequently, we also test for equal forecast
accuracy using the West–McCracken (McCracken, 2004; exposes us to the typical size distortions of multiple tests. However, the
West, 1996) nonparametric test for non-nested models, model confidence set tests in Section 5.3 control for such issues, at least
which incorporates any incremental sample variation in in part. We thank an associate editor of this journal for emphasizing this
forecast errors due to parameter uncertainty.16 limitation of the results in Sections 5.1 and 5.2.
17 Harvey et al. (1997) also recommend using the critical values from
the Student’s t-distribution rather than those from the normal distribu-
16 We remind ourselves here that the repeated use of DMW and West– tion. However, the sample sizes used in our paper are large enough to
McCracken tests for comparing pairs of models/forecast error loss series make this further adjustment irrelevant.
648 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

Table 2
Tests of equal predictive accuracy: squared forecast error loss.

The table presents test statistics for two types of equal predictive accuracy tests. The numbers above the main diagonal report the standard modified
Diebold-Mariano-West test statistics, with the corresponding significance levels in parentheses. The numbers below the diagonal report the West-
McCracken test statistics, again with the corresponding significance levels in parentheses. The tests are performed for the 1-, 6-, and 15-month horizons
(the first exercise refers to 1-month T -bill rates, the latter two exercises to 3-month T -bill rates). All instances where the null hypothesis of equal predictive
accuracy is rejected with a p-value below 0.05 are shown in bold. In the table, a negative (positive) value of the test statistic implies that the model in the
row produces more (less) accurate predictions than the model in the column.

proposes that F be estimated without deriving the func- this section to nine models: the random walk, the two EH-
tional form for the derivatives of the loss function or mak- restricted forecast models, the best performing DL models
ing strong assumptions about the joint distribution of the (including DRA’s, which includes macroeconomic informa-
observables. The idea is that unknown derivatives can be tion), and three representative affine models, including the
approximated numerically by using the finite difference best performing essentially affine Gaussian framework. To
method. ease the visualization of the table, all instances in which
As with Table 1, Table 2 presents test results for a the null hypothesis of equal predictive accuracy is rejected
squared error loss. We have also computed and tabulated with a p-value below 0.05 are shown in bold. For each
results under an absolute forecast error loss, and obtained statistic reported, a negative value of the statistic implies
qualitatively identical insights that are not reported here that the model in the row produces more accurate predic-
to save space. In Table 2, the numbers above the diago- tions than the model in the column. For example, the value
nal report the standard MDMW test statistics, with the of the MDMW test for the EH time-varying-risk-premium
corresponding significance level in parentheses. The num- row and the random-walk column of Table 2, Panel A, is
bers below the diagonal report the West-McCracken test −1.855, indicating that the random walk model produced
statistics, again with the corresponding significance level a less accurate forecast. Correspondingly, the value of the
in parentheses. The tests are reported for the 1-, 6-, and West-McCracken test statistic in the random-walk row
15-month horizons (the first exercise refers to the 1-month and the EH time-varying risk-premium cell/intersection
T-bill rate, the latter two exercises to the 3-month T-bill in Panel A is 1.896, indicating that the EH time-varying
rate). Due to space limitations, we limit the exercises in premium model produced the superior forecast. However,
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 649

neither test statistic is significant at a conventional 5% size volatility) A2 (3) framework.18 In practice, because the
level. Hence, both tests indicate that the null of equal fore- implied SDF M̂t +1 /M̂t under the estimated A2 (3) model
casting power cannot be rejected at the 1-month horizon follows the process
using a squared error loss. (
M̂t +1
While the table reports the test statistics for all pair- = exp − r̂t
wise model comparisons, our discussion focuses mainly M̂t
on comparisons of the random walk benchmark with the 1[ ]′ [ ]
other models. The results in Table 2 indicate that, while
− λ̂(α̂ + B̂′ xt )1/2 λ̂(α̂ + B̂′ xt )1/2
2
none of the models produce significantly better forecasts [ ]′ )
than a random walk at all horizons (1, 6 and 15 months; − λ̂(α̂ + B̂′ xτ )1/2 ϵ̂t +1 , (26)
i.e., across the board) under the mean squared error metric,
}2008:07
the completely and essentially affine models do forecast we estimate the SDF time series M̂t +1 /M̂t t =1983:03
{
more accurately than the random walk at the 1-month in the following way. First, we initialize M̂1971:12 as of
horizon, and also to some extent at the 15-month horizon. December 1971, to imply an average riskless rate that
Although the results are not systematic, there are also occa- is sensible
sions in which the DL model challenges the random walk’s 1
∑for our pre-estimation sample, M̄1971:12 =
1983:03
exp(− 135 τ =1972:01 rτ ), then we compute M̂1983:03 by cu-
performance. Besides this evidence, there are indications—
which are especially strong under the MDMW test—that mulating the transformed residuals {ϵ̂t } from the dynamic
all structural affine models tend to perform well relative process of the latent factors—as inferred from market bond
to simpler benchmarks, including EH-restricted models. prices—according to Eq. (27) and using the law of motion
Not surprisingly, the corresponding West-McCracken test in Eq. (26).19 Given M̂1983:03 , the SDF is updated using
statistics are uniformly smaller.
(
M̂t +1
Apart from the longest horizon forecasts, the EH- = exp − r̂t +1
M̂t
restricted predictions that allowed for time variation in
1[ ]′ [ ]
the premia were generally not statistically significantly − λ̂t (α̂t + B̂t′ xt )1/2 λ̂t (α̂t + B̂t′ xt )1/2
different from those based on a constant term premium, 2 )
]′
regardless of the metric or test used. This result suggests
[
− λ̂t (α̂t + B̂t′ xt )1/2 ϵ̂t +1
that the short-term predictions derived from models that
impose the functional relationship of the EH are dominated { [ VV ]
−1/2 K̂ − I2 0
by the response of rates to new information (i.e., ‘‘news’’), ϵ̂t +1 = (Ŝ(xt )) xt +1 −
K̂DV κ̂ DD − 1
which is unpredictable. From this perspective, the models’ ([ V ] )}
difficulty in generating predictions that dominate those
× φ̂ − x , (27)
t
from a random walk when we move past the one-month 0
horizon suggests that the empirical failure of the EH may ′
1/2 1/2
stem from the fact that short-term rates are largely unpre- with [Ŝt ]ii = xi,t for i = 1, 2, [Ŝt ]33 = (1 + ξ̂ 3,t ẍt )1/2
dictable beyond their current level. (where ẍt collects the first two components of the state
vector), and the estimated coefficients are indexed by t =
5.1. SDF-based loss functions 1983:03, 1983:04, . . . , 2008:08 to emphasize their depen-
dence on the expanding estimation sample.
Even though a fraction of our models are based on
{ At this}2008 point, the estimated SDF time series
:07
specific assumptions regarding the pricing kernel M and M̂t +1 /M̂t t =1983:03 is used to implement equal predictive
the functional form linking the time-varying risk premia accuracy tests in a manner similar to that of van Dijk and
to the state of the economy in Eqs. (17)–(18), thus far our Franses (2003), who developed a weighted test of equal
predictive accuracy tests have ignored our knowledge of prediction accuracy by modifying the DMW baseline test.
the kernel acquired in estimation. However, a fair testing We adapt the DF test in order to weight observations on the
of the performances of the models in Section 2.3 would basis of the price of the underlying economic state. Because
require either that the loss functions be adjusted to reflect M̂t +1 /M̂t will be large (small) during bad (good) states,
covariance terms with the SDF (intuitively, an investor when wealth is low (high), any differences in loss functions
cares whether or not a model produces large forecast errors will be over- (under-)weighted in bad (good) states, to
precisely in those states of the world in which his/her
consumption is already low), or that tests of equal predic-
18 Interestingly, such a model is not one of the best forecasting models
tive accuracy be based on M directly (see Timmermann &
in Table 1. On the one hand, this is not a problem as we are only selecting
Granger, 2004).
time-varying estimates of the pricing kernel as a way to re-scale loss
We address these issues in a number of ways. First, functions, i.e., we do not directly care about the performance of alternative
we assume the validity of the SDF implied by the best- models at predicting the pricing kernel. On the other hand, the fact that
fitting of the affine models, as measured by its implied the performances of the models in Table 1 are assessed under a pricing
Hannan-Quinn information criterion that trades off the in- kernel that has not been directly inferred (in-sample) from any of the
models in the same table represents an additional source of robustness.
sample fit, captured by the maximized log-likelihood, and 19 Because no recursive estimates are available for the initial 11-year
parsimony, indicative of potential out-of-sample accuracy. sample January 1972–March 1983, the calculation of M̂1983:03 is based
The model that minimizes the (full-sample) Hannan-Quinn on parameter estimates that correspond to those obtained using the
criterion is the unrestricted completely affine (stochastic 1972:01–1983:03 sample.
650 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

capture the fact that wealth-relevant forecast errors will • shorts h-month bills/notes and rolls over a long posi-
be more (less) painful during bad (good) states. Define the tion in m-month bills (h/m) − 1 times between time
weight series ω̂t as t and t + h − m, when the long rate on the h-month
bill/note is inferior to the predicted total return of
M̂t 1
ω̂t = ( ∑ ), rolling over the long positions in bills, where the
M̂t −1 1 T M̂t predictions are obtained from model k; and
T t =1 M̂
t −1
• does nothing when the long rate on the h-month
∑T bill/note is identical to the predicted cost of rolling
where the scaling ensures that t =1 ω̂t = 1; in the absence
of arbitrage, M̂t +1 /M̂t > 0 ∀t, so the weights are positive. over the short positions, where the predictions are
The DF statistic (referred to as the weighted DMW statistic, obtained from model k.
WDMW), is then given by the weighted average loss differ-
At each point of the out-of-sample period, we record the
ential of two competing models, say k and l, divided by its percentage profits and losses derived from such strategies
standard deviation, m,h,k m,h,l
applied to two different models, k and l, x̂t and x̂t .
ω̂t × difftk,,hl
∑2008:08−h
1 In what follows, we have implemented three strategies, as
k,l
WDMWh =
305−h
(t =1983:04 ) , (28) defined by the choices of m and h: h = 2 and m = 1; h = 6
σ̂ ω̂t × difftk,,hl and m = 3; and h = 18 and m = 3. The last strategy im-
( ) plies rolling over 3-month T-bills (h/m) − 1 = 5 times, and,
k,l k,l
where difft ,h ≡ ℓ ekt,t +h − ℓ elt ,t +h and σ̂ 2 difft ,h
( ) ( )
= as such, involves all forecasts computed under both models
∑h k,l k,l k and l, at horizons of 3, 6, 12, and 15 months. Although
i=−h Cov (ω̂t · difft ,h , ω̂t +i · difft +i,h ). Similarly to DMW, the these strategies are rather simple, their shortcomings are
ˆ
WDMW statistic has an asymptotic standard normal distri- less pronounced when they are used, not to assess the
bution under the usual assumptions. For consistency, we economic value generated by any specific model k, but to
compute the WDMW statistic with Harvey et al.’s (1997) compare the predictive accuracies of pairs of models (say,
correction. The results
( in Table
) 3( are based
)2 on a square loss k ̸ = l ). This is because, if M̂t +h /M̂t represents the SDF
function selection, ℓ ekt,t +h = ekt,t +h .20 that prices all payoffs between times t and t + h, then—
However, the approach that originated the test in m,h,k m,h,l
since the trading strategies yielding x̂t and x̂t are
Eq. (28) is spurious, because, while the SDF is used to zero net investment strategies—in the absence of arbitrage,
compute the time series of weights ω̂t , the loss function we should have
that measures the( cost to ) an economic agent of an incor- [ ]
M̂t +h m,h,k m,h,l
rect prediction, ℓ et ,t +h , is still one of the standard loss E (x̂t − x̂t ) = 0, (29)
functions. A second approach performs equal predictive M̂t
accuracy tests without the intermediate step of assuming which implies that the SDF-weighted expected differential
some loss function ℓ (·), instead using the weighting pro- between the returns from two alternative forecasts is zero.
m,h,k m,h,l
vided by the SDF directly as a measure of the loss that At this point, a finding that E[(M̂t +h /M̂t ) (x̂t −x̂t )] >
an agent suffers because of her forecast errors. However, 0 (< 0) reveals that strategy k (l) outperforms strategy l
this approach is only possible after the series of forecast (k), revealing abnormal profits over l (k). In practice, the
errors produced by any possible pair of models (say, k and condition in Eq. (29) is similar to a standard equal predic-
l) has been converted into a series of portfolio returns. Of tive accuracy condition (see Cenesizoglu & Timmermann,
course, in practice, the way in which a trader may use two 2012). If one sets
alternative prediction frameworks to trade Treasury bills
m,k,l M̂t +h m,h,k ,h ,l
and notes remains rather subjective. Because our paper difft ,h = (x̂t − x̂m
t ),
employs only data implied by spot bond prices, we convert M̂t
the time series of forecast errors into profits and losses then the usual DMW technology for testing the null of no
using a plain vanilla zero net-investment strategy that, differential profitability (economic value) can be adapted
given some model k,21 to this case. We implement an SDF-rooted equal predictive
accuracy approach using the A2 (3)-implied estimated SDF
• goes long in h-month bills/notes and rolls over a }2008:07
time series M̂t +h /M̂t t =1983:03 that was described above.
{
short position in m-month bills (h/m) − 1 times
Table 3 shows the two sets of results that were obtained
between times t and t + h − m, when the long
from the A2 (3) -implied SDF. In the table, above the main
rate on the h-month bill/note exceeds the predicted
diagonal, a negative (positive) value of the test statistic
cost of rolling over the short positions, where the
implies that the model in the row produces more (less)
predictions are obtained from model k;
accurate predictions than the model in the column. Below
the main diagonal, a positive value of the test statistic
20 The results under an absolute value loss function were qualitatively implies that the model in the column produces higher
similar and are available from the authors upon request. trading profits than the model in the row. At a 1-month
21 Note that h and m are selected in such a way that h/m ≥ 2 and
horizon, we fail to find any qualitative changes relative to
an integer. Our trading strategy mimics that of Xiang and Zhu (2013), Table 2, which was also based on un-weighted squared
who, for a problem with a 10-year horizon, assumed a simple asset menu
consisting of two assets: a long-term h = 10-year bond and an m-period
loss functions. As before, the EH-restricted model often
short-term bill/note rolled over to rebalance the portfolio according to performs worse for short-horizon predictions than the no-
alternative forecasts. arbitrage completely and essentially affine models. In fact,
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 651

Table 3
SDF-based tests of equal predictive accuracy/profitability: A2 (3)-implied SDF.

The table presents test statistics for two types of equal predictive accuracy/profitability tests. The numbers above the main diagonal report the modified van
Dijk-Franses-Diebold-Mariano-West test statistics when the weights are computed from a standardized stochastic discount factor estimated recursively
from the completely affine A2(3) model, with the corresponding significance level in parentheses. The numbers below the diagonal report t-statistics
concerning the mean of the risk-adjusted abnormal profitability differences across pairs of models, when the adjustment is performed using the stochastic
discount factor estimated recursively from the completely affine A2(3) model. The significance levels in parentheses refer to tests of the null hypothesis
of zero risk-adjusted mean profit differences. The tests above the main diagonal are performed for the 1-, 6-, and 15-month horizons (the first exercise
refers to 1-month T-bill rates, the latter two exercises to 3-month T-bill rates). The profit differences reported below the main diagonal concern strategies
concerning 1- and 3-month T-bills. All instances where the null hypothesis of equal predictive accuracy is rejected with a p-value below 0.05 are shown in
bold. Above the main diagonal, a negative (positive) value of the test statistic indicates that the model in the row produces more (less) accurate predictions
than the model in the column. Below the main diagonal, a positive value of the test statistic indicates that the model in the column produces higher trading
profits than the model in the row.

both of the essentially affine models (A0 (3) restricted and in to distinguish among alternative models. However, there
A1 (3)) significantly outperform all remaining models, in- are indications of abnormal returns favoring the EH-based
cluding the random walk. However, as has been noted, and affine structural forecasting frameworks over the ran-
such evidence weakens considerably at longer horizons. At dom walk for the 1-month rate at the 1-month horizon.
six months, only the restricted A0 (3) framework performs
better than the random walk; at 15 months, the evidence 5.2. A nonparametric SDF approach
is similar, but now A0 (3) jumps above the threshold of
statistical significance in differential accuracy. At longer A more robust approach follows the steps in Section 5.1
horizons, the results are very similar to those in Table 2. The but replaces the SDF estimated from a completely affine
indications that emerge from the lower triangular portions A2 (3) model with a nonparametric SDF (Mt ) estimated
of the various panels in Table 3 point to even less medium- following Almeida and Garcia (2012). In a first stage,
and long-horizon predictability: when the forecasts are the SDF is backed out from data on a set of underlying
converted into trading strategies and their (relative) SDF- risk factors that—for the sake of internal consistency—
adjusted returns are compared, the data have little power are assumed to correspond to the three latent factors
652 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

{xt }2008 :07


t =1983:03 that have been implied from the A2 (3) affine the dominance of the essentially affine A0 (3) model over
model. The nonparametric estimates of the SDF are derived the random walk weakens, and it becomes impossible in
directly from the conventional asset pricing Euler condi- general to distinguish most models from the case of no
tion E [(Mt /Mt −1 )xt ] = 13 , where xt is by construction predictability. The conclusions that emerge from the lower
the return on a factor-mimicking portfolio that depends on triangular portions of the various panels of Table 4 are
yield data.22 Following Almeida and Garcia, the estimate of virtually identical to those reported in Table 3: the relative
the SDF is obtained from abnormal returns from trading strategies that exploit any
M̃t +1
(
Mt +1
) differential forecasting power against the no-trade random
=T walk benchmark can hardly distinguish between alterna-
M̃t Mt
tive models in pairs, especially at longer horizons.
)] γ1
All of the results presented thus far overcome the lim-
[ (
1 − γ ϕ̂ ′ xt − 1
13
(Mt +1 /Mt ) itations posed by the statistical nature of standard imple-
×
∑T [ ( )] γ1 mentations of DMW tests at their roots, but suffer from a
1 − γ ϕ̂ ′ xt − 1
13 dependence on various special choices of either the loss
t =1 (Mt +1 /Mt )
{ ( T
)1+γ ∑ functions or the strategies that convert forecasts into prof-
1 Mt +1 1 its and losses. Our last attempt to link our forecasting
ϕ̂ = sup −
ϕ∈Υ T Mt 1+γ results to the SDF is less refined, but possibly more robust.
t =1
⎫ For both time series of SDF scores obtained above, we com-
)] 1+γ
γ ⎬ pute sample correlations (ρ̂hk ) between sign-free prediction
[ (
1
× 1 − γ ϕ̂ ′ xt − 13 , errors and the SDF. A model k has increasingly ‘‘good’’ SDF-
(Mt +1 /Mt )
related properties if it implies ρ̂hk ≤ 0, and the smaller the

value of ρ̂hk , the better the model is. This is because ρ̂hk ≤ 0
where Mt /Mt −1 is the unconditional mean and γ is such
means that forecast errors are ‘‘small’’(when their sign is
that 1 − γ ϕ̂ ′ (xt − 1
1 ) > 0 ∀t ≥ 1 (which ensures
(Mt +1 /Mt ) 3 removed) when the SDF is large, i.e., corresponding to bad
positivity of the SDF, and hence, no arbitrage). γ is the economic states. We compute eight alternative estimates
hyperparameter of a Cressie–Read discrepancy function, of ρ̂hk for each of the seven models, corresponding to both
[((Mt +1 /Mt )1+γ − 1)/(1 + γ )γ ], that can be interpreted squared and absolute values of the errors and to both ways
as the risk aversion coefficient of an investor with HARA of estimating the SDF. For the nonparametric SDF, we use
preferences. We assume γ = 5, which is typical of the asset three alternative values of γ , namely 2, 5, and 10.25
pricing literature.23 Table 5 reveals that most models are characterized by
{ At this point,
}2008:07the time series of estimated SDF scores negative, large, and statistically significant correlations be-
M̃t +1 /M̃t t =1981:11 are used to implement equal pre- tween error losses and the estimated SDF. Such correla-
dictive accuracy tests in a manner similar to van Dijk tions are larger and estimated more precisely when γ =
and Franses (2003) and based on a weight series ω̂t 5 or 10. When the SDF is estimated from the completely
according to Almeida and Garcia’s nonparametric SDF. affine A2 (3) model, the correlations are more negative and
We have also implemented the same trading strate- more often significant. Nevertheless, the tabulated results
gies
[ described in Section 5.1] and then tested whether reveal that it remains problematic to discriminate among
m,h,k
E (M̃t +h /M̃t )(x̂t − x̂tm,h,l ) > 0(< 0), to assess alternative forecasts. Even under a squared loss function,
whether strategy k (l) outperforms strategy l (k) in risk-, there is little or no evidence of no-arbitrage models being
SDF-adjusted terms. in any way more negatively correlated with the SDF than
Table 4 shows the results obtained from this nonpara- either of the benchmarks. A simple count across alternative
metric SDF and has the same structure as Table 3. Although forecasting models and across the three forecast horizons
the specific values are heterogeneous, the upper triangular shows that the number of times that the correlation is neg-
portion of Table 4 gives the same qualitative impressions ative and statistically significant is nearly uniform across
as in Table 3, for all forecast horizons.24 At the 1-month models: four models, namely the EH-restricted models, the
horizon, the complete and essentially affine models sig- restricted essentially affine A1 (3) model, and the random
nificantly outperform all remaining models, including the walk, lead to 18 (out of 24) such instances.
random walk, for which the null of equal predictive accu-
racy can be rejected with p-values below 0.01. However, 5.3. Model confidence set tests
few or no significant differences in predictive accuracy
survive at longer horizons. At a 15-month horizon, even This insight into the difficulty of telling alternative mod-
els apart takes us to the last set of tests implemented
22 Under an affine model, bond yields are y
in this paper, namely Hansen et al.’s (2011), henceforth
t ,T = −(T − t)−1
HLN) model confidence set (MCS) tests. The HLN method
γ0 (T − t) + γ ′X (T − t)xt , so that when γ0 = 0 and γ X = IN , yt ,T = xt
[ ]
consists of a sequence of tests that allows an analyst to
and the factors are identical to yields measured with no errors.
23 We have also experimented with γ = 2 and 10, and obtained
construct a set of ‘‘superior’’ models for which the null
qualitatively similar results that are available upon request.
24 The differences originate from heterogenous estimated SDF series. 25 Absolute loss function results have been computed systematically
For instance, over the 1983–2008 monthly sample, the correlation be- too, using the same tests as those underlying Tables 2–4, but are not
tween the nonparametric SDF estimated for γ = 5 and the A2 (3) com- reported here to save space. However, the results are qualitatively similar
pletely affine SDF is 0.63. to those commented on in the main text.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 653

Table 4
SDF-based tests of equal predictive accuracy/profitability: nonparametric factor-implied SDF.

The table presents test statistics for two types of equal predictive accuracy/profitability tests. The numbers above the main diagonal report the modified van
Dijk-Franses-Diebold-Mariano-West test statistics when the weights are computed from a nonparametric stochastic discount factor estimated recursively
following Almeida and Garcia (2012), with the corresponding significance levels in parentheses. The parameter γ in the Cressie-Read discrepancy function
is set to 5. The numbers below the diagonal report t-statistics concerning the mean of risk-adjusted abnormal profitability differences across pairs of models,
when the adjustment is performed using the stochastic discount factor estimated recursively following Almeida and Garcia (2012). The significance levels
in parentheses refer to tests of the null hypothesis of zero risk-adjusted mean profit differences. The tests above the main diagonal are performed for the 1-,
6-, and 15-month horizons (the first exercise refers to 1-month T-bill rates, the latter two exercises to 3-month T-bill rates). The profit differences reported
below the main diagonal refer to strategies concerning 1- and 3-month T-bills. All instances where the null hypothesis of equal predictive accuracy is
rejected with a p-value below 0.05 are shown in bold. Above the main diagonal, a negative (positive) value of the test statistic implies that the model in the
row produces more (less) accurate predictions than the model in the column. Below the main diagonal, a positive value of the test statistic implies that the
model in the column produces higher trading profits than the model in the row.

hypothesis of equal predictive ability (EPA) is not rejected standard, pairwise comparison procedures fail to deliver an
at a certain confidence level α (set to 5% below). Given unique, or clear-cut result. Such seems to be the case here.
that alternative models are built to fit specific properties The MCS procedure starts from an initial set of com-
of the data and/or on the basis of peculiar asset pricing peting models L0 (call their number #L0 ) and results in a
frameworks, we hardly expect a single model to dominate (potentially) smaller set of ‘‘superior’’ models at horizon
all of its competitors, either because they are statistically h (henceforth, SSM(h)), denoted L̂h1−α . Of course, the best
equivalent or because there is not enough information scenario is when the final set consists of a single model, but
in the data to discriminate unequivocally among models empirically this is hardly ever the case. The EPA hypothesis
(see also Martin, Reidy, & Wright, 2009 and the discussion is tested at each step of the iterative procedure; if the null
therein). Hence, obtaining an answer that consists of a hypothesis is accepted, then the procedure stops and L̂h1−α
set is logically sensible and statistically rigorous. As with is returned; otherwise, EPA is tested again after the worst
k ,k n,k
our earlier tests, the EPA statistic may be computed for model has been eliminated. Formally, let dt ,1h 2 ≡ ℓ(et ,h 1 ) −
any arbitrary loss functions. The methodology has been n ,k
ℓ(et ,h 2 ) denote the loss differential between models k1 ,
developed explicitly for dealing with situations in which
654
Table 5
Correlation of losses from forecast errors with alternative estimates of the SDF.
Squared loss function Absolute value loss function
A2 (3)-implied SDF Nonparametric SDF from A2 (3) A2 (3)-implied SDF Nonparametric SDF from A2 (3)
estimated factors estimated factors
γ =2 γ =5 γ = 10 γ =2 γ =5 γ = 10
−0.049 −0.048 −0.195 −0.436 −0.160 −0.042 −0.204 −0.547
H=1
(0.385) (0.394) (0.001) (0.000) (0.004) (0.451) (0.000) (0.000)
EH Forecasts
−0.085 −0.123 −0.260 −0.686 −0.208 −0.068 −0.262 −0.754
(constant term H=6
(0.134) (0.030) (0.000) (0.000) (0.000) (0.230) (0.000) (0.000)
premia)
−0.239 −0.136 −0.360 −0.567 −0.291 −0.109 −0.281 −0.541
H = 15

M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664


(0.000) (0.018) (0.000) (0.000) (0.000) (0.058) (0.000) (0.000)
−0.038 −0.068 −0.193 −0.447 −0.122 −0.133 −0.255 −0.610
H=1
EH Forecasts (0.499) (0.226) (0.001) (0.000) (0.029) (0.018) (0.000) (0.000)
(variable term −0.079 −0.117 −0.232 −0.685 −0.182 −0.082 −0.207 −0.733
H=6
premia, 60-month (0.163) (0.039) (0.000) (0.000) (0.001) (0.146) (0.000) (0.000)
window) −0.248 −0.048 −0.425 −0.635 −0.380 −0.066 −0.378 −0.614
H = 15
(0.000) (0.405) (0.000) (0.000) (0.000) (0.253) (0.000) (0.000)
−0.063 −0.042 −0.142 −0.471 −0.113 0.000 −0.119 −0.601
H=1
(0.266) (0.453) (0.011) (0.000) (0.044) (0.998) (0.033) (0.000)
−0.090 −0.200 −0.112 −0.656 −0.223 −0.172 −0.136 −0.709
Random Walk H = 15
(0.112) (0.000) (0.047) (0.000) (0.000) (0.002) (0.016) (0.000)
−0.271 −0.104 −0.492 −0.470 −0.378 −0.104 −0.432 −0.514
H = 15
(0.000) (0.071) (0.000) (0.000) (0.000) (0.069) (0.000) (0.000)
−0.048 −0.070 −0.138 −0.433 −0.117 −0.089 −0.106 −0.525
H=1
(0.395) (0.216) (0.013) (0.000) (0.037) (0.111) (0.058) (0.000)
−0.077 −0.198 −0.161 −0.682 −0.202 −0.184 −0.190 −0.746
Diebold-Li H = 15
(0.176) (0.000) (0.004) (0.000) (0.000) (0.001) (0.001) (0.000)
−0.218 −0.248 −0.082 −0.601 −0.282 −0.225 −0.078 −0.603
H = 15
(0.000) (0.000) (0.151) (0.000) (0.000) (0.000) (0.175) (0.000)
−0.025 −0.044 −0.149 −0.424 −0.094 −0.045 −0.174 −0.610
H=1
(0.660) (0.434) (0.008) (0.000) (0.095) (0.427) (0.002) (0.000)
Unrestricted Affine −0.092 −0.214 −0.104 −0.670 −0.226 −0.201 −0.126 −0.723
H = 15
A0 (3) (0.102) (0.000) (0.065) (0.000) (0.000) (0.000) (0.025) (0.000)
−0.286 −0.034 −0.481 −0.306 −0.396 −0.043 −0.393 −0.374
H = 15
(0.000) (0.553) (0.000) (0.000) (0.000) (0.454) (0.000) (0.000)
−0.026 −0.043 −0.154 −0.425 −0.098 −0.049 −0.184 −0.622
H=1
(0.647) (0.448) (0.006) (0.000) (0.081) (0.380) (0.001) (0.000)
Unrestricted
−0.085 −0.206 −0.120 −0.660 −0.207 −0.203 −0.146 −0.746
Essentially Affine H = 15
(0.135) (0.000) (0.034) (0.000) (0.000) (0.000) (0.010) (0.000)
A0 (3)
−0.258 −0.034 −0.452 −0.302 −0.369 −0.085 −0.476 −0.404
H = 15
(0.000) (0.550) (0.000) (0.000) (0.000) (0.140) (0.000) (0.000)
−0.026 −0.036 −0.157 −0.421 −0.092 −0.043 −0.172 −0.613
H=1
(0.645) (0.522) (0.005) (0.000) (0.099) (0.443) (0.002) (0.000)
Restricted Essentially −0.187 −0.165 −0.467 −0.473 −0.290 −0.196 −0.630 −0.465
H = 15
Affine A1 (3) (0.001) (0.003) (0.000) (0.000) (0.000) (0.001) (0.000) (0.000)
−0.247 −0.040 −0.773 −0.133 −0.307 −0.046 −0.761 −0.245
H = 15
(0.000) (0.488) (0.000) (0.015) (0.000) (0.427) (0.000) (0.015)
γ is the parameter in the Cressie-Read discrepancy function. All instances where the null hypothesis of a zero correlation is rejected with a p-value below 0.05 are shown in bold.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 655

k2 ∈ L0 from forecasts at horizon h and sense of the hypotheses specified in Eq. (30). Interest-
1 ingly, it does not require a long horizon for this evidence
k ,· k ,k

dt ,1h ≡ dt ,1h 2 , k1 ∈ L0 , k1 ̸ = k2 , to emerge: the empirical results for 2-month predictions
#L0 − 1 are qualitatively similar to those obtained for the 6- and
k2 ∈L0
15-month horizons. Third, there is a considerable degree of
the simple loss of model k1 relative to any other model k2
heterogeneity in the identity of the models included in the
at time t. The EPA hypothesis for a given set of models L can
final SSM(h). Interestingly, the TR,L and Tmax,L -based SSMs
be formulated in two alternative ways:
are also different, with the latter uniformly being smaller
k ,k
Ho,L : E [dt ,1h 2 ] = 0 for all k1 , k2 ∈ L0 , k1 ̸ = k2 than those yielded by the former statistic. However, in all
k ,k cases and for h = 2, 6, and 15 months, the random walk,
Ha,L : E [ dt ,1h 2 ] ̸= 0 for some k1 , k2 ∈ L0 , k1 ̸ = k2 or no-predictability benchmark is included in the 5%-sized
k ,·
Ho,L : E [ dt ,1h ]=0 for all k1 , k2 ∈ L0 , k1 ̸ = k2 MCS.
k ,·
Ha,L : E [dt ,1h ] ̸ = 0 for some k1 , k2 ∈ L0 , k1 ̸ = k2 . (30) 6. Additional robustness checks
HLN prove that the following two t-statistic-type tests
6.1. Relative forecasting accuracy pre- and post-1994
can be computed for testing either of the two hypotheses
above:
A body of literature has observed that there is evidence
k1 ,k2
dh that (short-term) interest rates became more predictable
HLNk1 ,k2 ≡ √ or after the Fed began announcing its funds rate target in 1994
k1 ,k2
Var(d
ˆ h ) (e.g., Lange, Sack, & Whitesell, 2003). It is therefore in-
k1 ,· teresting to investigate whether our pseudo out-of-sample
dh
HLNk1 ,· ≡ √ for k1 , k2 ∈ L0 , k1 ̸ = k2 , (31) forecasting results are consistent with this conjecture. Of
ˆ hk1 ,· )
Var(d
course, if the predictability of short-term rates has become
stronger in recent times, this would represent a problem
where the two variances in the denominator are boot- for the punchline of our paper. Moreover, our results were
strapped estimates of the variances of the sample means at produced using a recursive scheme where the parameters
the numerator. In what follows we use a 10,000-trial block are estimated initially with just over 12 years of monthly
bootstrap scheme, where the block length is the largest data. There may be some concern that a large small-sample
number of significant parameters obtained by fitting an AR downward bias might appear in the AR(1) coefficient of the
k ,k
process on all of the available time series of dt ,1h 2 , as k1 , k2 , first latent state, for instance in DL, as well as in the affine
and h vary. The two statistics in Eq. (31) naturally lead to models, given the well-established strong persistence of
the test: interest rates.26 We have tackled these two issues in a
simple way: by tabulating and analyzing results for the
TR,L ≡ max |HLNk1 ,k2 | and Tmax,L ≡ maxHLNk1 ,· (32) post-1994, pseudo out-of-sample period only, for a total of
k1 ,k2 ∈L k1 ∈L
approximately 15 years. This guarantees implicitly that all
The asymptotic distributions of the two test statistics un- estimations concerning AR models for level-related factors
der the null hypothesis are non-standard, and they are esti- are performed using a minimum of 23 years of monthly
mated using a bootstrap. The MCS procedure then consists data, almost double the length of the time series used in
of a sequential testing procedure that eliminates the worst Section 5.
model at each step, until the EPA hypothesis is accepted for Table 7 shows the ratios of pre- to post-1994 MSFE,
all models belonging to the SSM. The choice of the worst RMSFE and MAFE forecast accuracy measures, following
model is made using an elimination rule that is coherent a structure similar to that of Table 1.27 The conjecture
with the statistics defined in Eq. (31), i.e., by looking for above turns out to be correct: the accuracy of most models
which model maximizes the tests in Eq. (31). (but not all, as the forecasting performance of essentially
Table 6 presents the key results for the three horizons affine model worsens relative to the pre-1994 sample at
that have been considered in Tables 2–5, as well as for a long horizons) increases after 1994 according to all mea-
2-month horizon (for reasons that are discussed below), sures. Such an increase is substantial and occurs across
and using both the TR,L and Tmax,L statistics. Three of the key all horizons, even though it tends to be more pronounced
results reported in Sections 5.1 and 5.2 are confirmed. First, for short-term predictions for most models. Moreover, we
there is a clear-cut difference between short- (1-month) note two interesting facts, especially for horizons of three
and intermediate- to long-term prediction horizons, with months or longer. First, the performance of the random
the final SSM(1) being much smaller than SSM(6) and walk improves considerably too: for instance, its MSFE
SSM(15). Second, while it is possible to outperform the ratio at h = 6 is 0.83 relative to pre-1994, while that at h =
random walk at a short horizon, such is not the case at 15 is 0.75. Second, the biggest gains in predictive accuracy
the 6- and 15-month horizons, in the sense that the fi-
nal SSM(h) obtained always includes the random walk. 26 We are grateful to one anonymous referee for suggesting that we
In fact, the random walk actually ranks high within the check both of these issues.
MCS at the 2- and 15-month horizons. Therefore, there 27 As an additional robustness check, Table 7 reports much less detailed
is no significant evidence of models that are able to out- information than Table 1, but it includes MAFE values to give an idea of
perform the no predictability benchmark, at least in the the qualitative results that one would get under an absolute loss function.
656 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

Table 6
Model confidence set tests.
Model Rank_max,m t_i. p-value Rank R,m t_ij p-value Average Loss
Horizon: 1 month
Unrestricted Essentially Affine Gaussian Model A_0(3) 1 −1.649 1.000 1 −0.942 1.000 0.087
Essentially Affine A_1(3) 2 −0.406 1.000 2 0.937 0.999 0.094
Restricted Essentially Affine Model A_1(3) 3 −0.313 1.000 3 0.990 0.992 0.094
Affine A_2(3) 4 −0.263 1.000 4 1.027 0.987 0.095
Restricted Completely Affine Model A_2(3) 5 0.725 0.999 5 1.537 0.002 0.099
Unrestricted Completely Affine Gaussian Model A_0(3) 6 0.762 0.999 6 1.557 0.000 0.099
Restricted Affine A_0(3) 7 1.239 0.292 7 1.779 0.000 0.102
Horizon: 2 month
Slope-Based Benchmark 1 −2.129 1.000 1 −1.076 1.000 0.162
Yield-macro model (fixed, non-estimated θ ) VAR(1) 2 −1.021 1.000 2 1.091 1.000 0.240
Diebold and Li’s (fixed, non-estimated θ ) VAR(1) 3 −0.955 1.000 3 1.128 1.000 0.246
EH-Rest. Model (time-varying risk premium, Q = 60) 4 −0.927 1.000 4 1.154 1.000 0.246
EH-Restricted Model (constant risk premium) 5 −0.641 1.000 6 1.321 1.000 0.261
Diebold and Li’s (θ recursively estimated on OOS data) 6 −0.554 1.000 5 1.317 1.000 0.266
Random Walk 7 −0.325 1.000 7 1.453 0.833 0.277
Unrestricted Essentially Affine Gaussian Model A_0(3) 8 0.119 1.000 8 1.724 0.259 0.299
Restricted Essentially Affine A_0(3) 9 0.307 1.000 9 1.755 0.197 0.308
EH-Rest. Model (time-varying risk premium, Q = 10) 10 0.373 1.000 12 1.924 0.016 0.312
Affine A_2(3) 11 0.400 1.000 10 1.824 0.041 0.312
Essentially Affine A_1(3) 12 0.573 1.000 11 1.843 0.018 0.321
Restricted Essentially Affine Model A_1(3) 13 0.597 1.000 13 1.930 0.016 0.322
Restricted Completely Affine Model A_2(3) 14 0.640 1.000 16 1.962 0.002 0.325
Unrestricted Completely Affine Gaussian Model A_0(3) 15 0.684 1.000 15 1.958 0.002 0.326
Restricted Affine A_0(3) 16 0.871 1.000 17 2.017 0.002 0.337
Diebold and Li’s (fixed, non-estimated θ ) AR(1) 17 0.937 0.999 14 1.936 0.002 0.342
Diebold and Li’s (single-step, time-varying θ ) 18 1.382 0.808 18 2.166 0.000 0.372
Horizon: 6 month
Slope-Based Benchmark 1 −1.969 1.000 1 −0.562 1.000 0.169
Unrestricted Essentially Affine Gaussian Model A_0(3) 2 −1.414 1.000 2 0.570 1.000 0.424
Diebold and Li’s (θ recursively estimated on OOS data) 3 −0.234 1.000 3 1.375 1.000 0.826
Restricted Affine A_0(3) 4 −0.228 1.000 5 1.388 1.000 0.828
Yield-macro model (fixed, non-estimated θ ) VAR(1) 5 −0.204 1.000 4 1.383 1.000 0.835
Unrestricted Completely Affine Gaussian Model A_0(3) 6 −0.185 1.000 7 1.396 1.000 0.841
EH-Restricted Model (constant risk premium) 7 −0.150 1.000 6 1.393 1.000 0.852
Restricted Completely Affine Model A_2(3) 8 −0.114 1.000 9 1.432 0.704 0.863
Random Walk 9 −0.067 1.000 8 1.427 0.704 0.878
Diebold and Li’s (fixed, non-estimated θ ) VAR(1) 10 −0.061 1.000 10 1.446 0.691 0.879
EH-Rest. Model (time-varying risk premium, Q = 60) 11 0.378 1.000 11 1.688 0.115 1.016
Restricted Essentially Affine Model A_1(3) 12 0.413 1.000 12 1.709 0.061 1.026
Diebold and Li’s (single-step, time-varying θ ) 13 1.269 0.912 13 1.993 0.000 1.342
Diebold and Li’s (fixed, non-estimated θ ) AR(1) 14 1.304 0.861 14 2.017 0.000 1.353
Affine A_2(3) 15 1.316 0.837 15 2.052 0.000 1.339
Horizon: 15 month
Slope-Based Benchmark 1 −1.870 1.000 1 −0.293 1.000 0.171
Unrestricted Essentially Affine Gaussian Model A_0(3) 2 −1.592 1.000 2 0.289 1.000 0.728
Restricted Essentially Affine Model A_1(3) 3 −0.801 1.000 3 0.920 1.000 1.971
Unrestricted Completely Affine Gaussian Model A_0(3) 4 −0.389 1.000 4 1.186 1.000 2.535
Restricted Affine A_0(3) 5 −0.295 1.000 5 1.211 1.000 2.665
Random Walk 6 0.256 1.000 9 1.556 0.223 3.386
Yield-macro model (fixed, non-estimated θ ) VAR(1) 7 0.291 1.000 7 1.536 1.000 3.431
Diebold and Li’s (fixed, non-estimated θ ) VAR(1) 8 0.332 1.000 10 1.564 0.202 3.485
EH-Restricted Model (constant risk premium) 9 0.345 1.000 6 1.533 1.000 3.502
Restricted Completely Affine Model A_2(3) 10 0.363 1.000 8 1.551 1.000 3.534
Diebold and Li’s (θ recursively estimated on OOS data) 11 0.383 1.000 11 1.588 0.191 3.551
Essentially Affine A_1(3) 12 0.824 1.000 12 1.764 0.011 4.201
Diebold and Li’s (fixed, non-estimated θ ) AR(1) 13 1.220 0.923 14 1.995 0.000 4.769
Diebold and Li’s (single-step, time-varying θ ) 14 1.240 0.877 13 1.985 0.000 4.790
The table reports the models that belong to the model confidence set with a 5% confidence level, obtained though the Tmax test. The columns Rank_max,m
and Rank R,m report the rankings over the models. The third and seventh columns report the MCS p-values based on the Tmax and TR statistics, respectively.
The p-value of the test statistic is equal to the minimum of the overall p-values. Finally, the last column is the average loss over the OOS period.

concern models that did not perform that well in Table 1, essentially affine A0 (3) model are 0.84 and 1.07 at h = 6
with the accuracy of the best models in Table 1 improving and 15, respectively, both ratios are 0.63 for the slope-
very little or not at all. For instance, while the ratios of based model. Given the solid gains in accuracy recorded
the best-performing (according to Table 6) unrestricted by the random walk, this implies that, while the distance
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 657

Table 7
Measures of forecasting accuracy pre- and post-1994.
Statistics h=1 h=2 h=3 h=6 h=9 h = 12 h = 15
1-month rate 3-month rate
Panel A - Random Walk
MSFE 0.576 0.528 0.624 0.831 0.879 0.822 0.751
RMSFE 0.759 0.726 0.790 0.911 0.938 0.907 0.867
MAFE 0.769 0.730 0.715 0.815 0.856 0.838 0.788
Panel B - EH-Restricted Model (constant risk premium)
MSFE 0.469 0.568 0.463 0.580 0.472 0.418 0.441
RMSFE 0.685 0.754 0.680 0.762 0.687 0.646 0.664
MAFE 0.757 0.927 0.748 0.779 0.681 0.683 0.688
Panel C - EH-Rest. Model (time-varying risk premium, Q = 60)
MSFE 0.425 0.419 0.401 0.592 0.469 0.373 0.338
RMSFE 0.652 0.647 0.633 0.770 0.685 0.611 0.581
MAFE 0.612 0.620 0.557 0.782 0.724 0.638 0.593
Panel D - Diebold and Li’s (fixed, non-estimated θ ) AR(1)
MSFE 0.460 0.394 0.469 0.487 0.459 0.407 0.368
RMSFE 0.678 0.627 0.685 0.698 0.678 0.638 0.607
MAFE 0.608 0.609 0.644 0.644 0.643 0.631 0.601
Panel E - Diebold and Li’s (fixed, non-estimated θ ) VAR(1)
MSFE 0.546 0.447 0.520 0.680 0.716 0.653 0.587
RMSFE 0.739 0.668 0.721 0.825 0.846 0.808 0.766
MAFE 0.675 0.623 0.638 0.722 0.763 0.756 0.723
Panel F - Diebold and Li’s (θ recursively estimated on OOS data)
MSFE 0.551 0.479 0.658 0.695 0.645 0.553 0.491
RMSFE 0.742 0.692 0.811 0.834 0.803 0.743 0.700
MAFE 0.751 0.693 0.787 0.747 0.742 0.707 0.670
Panel G - Diebold and Li’s (single-step, time-varying θ )
MSFE 0.487 0.405 0.401 0.415 0.399 0.357 0.327
RMSFE 0.698 0.636 0.634 0.645 0.631 0.597 0.571
MAFE 0.643 0.635 0.612 0.607 0.606 0.596 0.574
Panel H - Yield-macro model (fixed, non-estimated θ ) VAR(1)
MSFE 0.867 0.476 0.437 0.440 0.411 0.397 0.366
RMSFE 0.931 0.690 0.661 0.663 0.641 0.630 0.605
MAFE 0.781 0.658 0.628 0.627 0.617 0.623 0.621
Panel I - Slope-Based Benchmark Errors
MSFE 0.536 0.597 0.626 0.626 0.633 0.644 0.628
RMSFE 0.732 0.773 0.791 0.791 0.796 0.802 0.792
MAFE 0.730 0.763 0.789 0.788 0.790 0.800 0.793
Panel L - Unrestricted Completely Affine Gaussian Model A_0(3)
MSFE 0.541 0.494 0.638 0.817 0.857 0.877 0.870
RMSFE 0.735 0.703 0.799 0.904 0.926 0.936 0.933
MAFE 0.709 0.661 0.731 0.878 0.909 0.918 0.912
Panel M - Restricted Completely Affine Model A_2(3)
MSFE 0.535 0.493 0.618 0.766 0.766 0.769 0.753
RMSFE 0.732 0.702 0.786 0.875 0.875 0.877 0.868
MAFE 0.707 0.658 0.717 0.909 0.941 0.971 1.011
Panel N - Unrestricted Essentially Affine Gaussian Model A_0(3)
MSFE 0.551 0.593 0.631 0.840 0.927 1.024 1.065
RMSFE 0.742 0.770 0.794 0.916 0.963 1.012 1.032
MAFE 0.695 0.708 0.722 0.864 0.934 1.012 1.013
Panel 0 - Restricted Essentially Affine Model A_1(3)
MSFE 0.561 0.517 1.195 1.814 1.819 1.720 1.571
RMSFE 0.749 0.719 1.093 1.347 1.349 1.311 1.253
MAFE 0.724 0.678 0.996 1.199 1.197 1.149 1.079
The table shows the ratios of the post- to pre-1994 measures of forecasting accuracy. A value of less than one indicates that the model performs better in
the second OOS period.

between the benchmark and the best predicting models not providing precise predictions previously. In fact, unre-
narrows, we record a sort of ‘‘converge to the middle’’ trend ported MCS tests similar to those in Table 6 confirm the
in the post-1994 period in the case of models that were results commented on above, with the SSM(h) becoming
658 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

even more crowded under the Tmax,L statistic.28 In sum- post-GFC sample realizations, at least up to the end of
mary, while pockets of predictability of short-term rates 2015.29
exist for short-term forecasts, such a predictability keeps Table 9 replicates the analysis in Table 2, showing that
failing us for longer horizons; however, these are the very there are now many more cells that yield statistically sig-
horizons for which predictability is required in order for nificant test results rejecting the null of equal predictive
the EH to hold, or at least to be a useful reference for accuracy. However, consistent with the findings in Table 8,
policymakers. such significant tests point in a direction that differs from
that of Table 2: the random walk yields many instances of
average differentials in squared loss functions that favor
6.2. Post-2008 tests and the effects of the financial crisis
it over other models, thus implying that the changes in
the short-term rate simply became unpredictable after the
Our earlier analysis was limited to the pre-September GFC. There is no paradox in this case: if guidance goes in
2008 period, before the collapse of Lehman Brothers and the direction of promising no future changes in rates, any
the worsening of the turmoil that engulfed the world’s changes that occur thus become unpredictable and without
financial markets, at least until the summer of 2009. It is structure, as the IID shocks to a random walk process in-
important to extend our results to the sample following deed are. However, in practice, such guidance makes the EH
the Lehman bankruptcy filing, because a high degree of irrelevant and trains market participants not to form useful
(at least intended) forward guidance came into effect at forecasts of short-term rates, but instead to stop forming
that time, while the Fed funds rate also became extremely them. Table 10 closes this portion of the analysis, applying
predictable, having been set at the zero lower bound since MCS tests again, but this time with reference to the 2008–
early 2009. Thus, we have collected data concerning Trea- 2016 subsample only. The resulting 5% SMM sets contain
sury bills, notes, and bonds from the Federal Reserve Board the random walk model at all horizons, irrespective of the
H15 publications and applied standard algorithms to ex- specific test applied. In fact, the random walk always ranks
tract riskless zero coupon yields. Next, such data have been quite highly.
added to the earlier series and the recursive pseudo out-of-
sample tests have been applied using a 1970:01–2008:07
sample for initial estimation purposes, before proceed- 7. Summary, conclusions and implications
ing according to the recursive block structure presented
above. This paper notes that conventional tests of the EH are
Table 8 replicates Table 1, but with reference to the based on two assumptions: a specific functional form of the
period 2008:08–2016:09 and the 1-, 2-, and 3-month hori- EH and an assumption about market participants’ abilities
zons only. We ensure that the evaluation period is identical to predict the future path of the short-term rate over long
for each forecast horizon by stopping the recursive fore- horizons, i.e., an assumption about the EGP. We investigate
casting in September 2016 to make sure that the last 1-, the possibility that the common empirical rejections of the
2-, and 3-month forecast horizons are all evaluated with EH found in the literature occur because the assumption
reference to the same data and state of the economy. The about market participant’s abilities to predict the path of
random walk again represents a hard benchmark to beat the short-term rate that is used in conventional tests of the
over the (post-) GFC sample. At h = 1, it ranks second EH may be grossly inconsistent with market participants’
in the MSFE ranking; at h = 2, it is the best model; abilities to predict short-term rates. We investigate this
and at h = 3 it is again the second-best model, but possibility by comparing the forecasting performances of
the distance to the unrestricted essentially affine A0 (3) is a variety of interest rate forecasting models used in the lit-
modest. This means that while it was possible to forecast erature to both a random walk model and a model that as-
short rates over horizons of at least 1, 2, and often 3 months sumes that the EH is true but makes no assumptions about
over the period 1983–2008, such was no longer the case the EGP. We tested for statistical differences in the out-
after the GFC, with the random walk becoming very hard of-sample performances using both a modified Diebold–
to tie with. Even though this may seem counter-intuitive Mariano–West and a West–McCracken test. The latter test
because a period of rates close to the zero lower bound allows for parameter uncertainty but is computationally
and characterized by heavy doses of forward guidance may more burdensome. We also apply model confidence set
seem to favor interest rate predictability, readers should inference to test whether the random walk belongs to the
consider how forecasting from a random walk works me- set of non-dominated models in a forecasting sense, and at
chanically: the most recent (zero or close to zero) rate which horizons. Our results indicate that the differences in
is simply copied forward for predicting future rates. By the predictive performances of alternative models at inter-
and large, this has represented a good description of the mediate and long forecasting horizons are generally small.
Importantly, we find that none of the models outperform a
simple random walk under all prediction accuracy metrics
28 Interestingly, the result gets even stronger in the post-1994 sample
in cases where the predictability has already been shown to be strong over
the full sample, i.e., at the 1-month horizon. For instance, for h = 1, the 29 These results mean that it may be less of a problem that a lack of
MSFE ratio is 0.58 for the random walk, 0.47 for the constant risk premium data does not allow us to propose results for h = 6 and 15 months. We
EH-restricted model, 0.46 for the DL model with fixed θ , and 0.55 for the cannot do this because the sparsity of the Federal Reserve data implies
unrestricted essentially affine A0 (3) framework. In other words, the ability that EH-restricted forecasts could be computed. The logic is that if the
of the last three models to outperform the random walk gets even stronger random walk is already the best performing model at h = 2, it becomes
after 1994. less relevant to check what happens at longer horizons.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 659

Table 8
Summary statistics for monthly forecast errors for the post-2008 period.
Statistics h=1 h=2 h=3
1-month rate 3-month rate
Panel A - Random Walk
Mean −0.012 −0.026 −0.041
Median 0.000 −0.010 −0.010
Max. 0.203 0.203 0.254
Min −0.711 −1.412 −1.707
S.D. 0.113 0.189 0.252

MSFE 0.013 0.036 0.065


RMSFE 0.113 0.191 0.255
Panel B - EH-Restricted Model (constant risk premium)
Mean 0.259 0.545 0.195
Median 0.264 0.576 0.299
Max. 0.436 0.840 0.461
Min −0.328 −1.115 −1.732
S.D. 0.087 0.250 0.372

MSFE 0.075 0.359 0.176


RMSFE 2.416 3.139 1.645
Panel C - EH-Rest. Model (time-varying risk premium, Q = 60)
Mean 0.033 0.090 0.071
Median 0.030 0.151 0.166
Max. 0.245 0.496 0.481
Min −0.525 −1.392 −1.783
S.D. 0.090 0.283 0.396

MSFE 0.009 0.088 0.162


RMSFE 0.846 1.553 1.577
Panel D - Diebold and Li’s (fixed, non-estimated θ ) AR(1)
Mean −0.139 −0.214 −0.302
Median −0.130 −0.175 −0.223
Max. 0.046 0.110 0.246
Min −0.815 −1.583 −1.950
S.D. 0.138 0.225 0.305

MSFE 0.038 0.096 0.184


RMSFE 1.732 1.623 1.681
Panel E - Diebold and Li’s (fixed, non-estimated θ ) VAR(1)
Mean −0.070 −0.086 −0.103
Median −0.059 −0.049 −0.046
Max. 0.106 0.173 0.148
Min −0.770 −1.503 −1.821
S.D. 0.121 0.210 0.277

MSFE 0.020 0.051 0.087


RMSFE 1.234 1.187 1.159
Panel F - Diebold and Li’s (θ recursively estimated on OOS data)
Mean −0.236 −0.313 −0.272
Median −0.212 −0.288 −0.205
Max. 0.007 0.090 0.249
Min −0.854 −1.625 −1.925
S.D. 0.154 0.240 0.304

MSFE 0.080 0.156 0.167


RMSFE 2.493 2.066 1.601

Panel G - Diebold and Li’s (single-step, time-varying θ )


Mean −0.076 −0.126 −0.192
Median −0.045 −0.084 −0.114
Max. 0.162 0.244 0.311
Min −1.472 −1.704 −1.971
S.D. 0.200 0.311 0.370

(continued on next page)


660 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

Table 8 (continued)
Statistics h=1 h=2 h=3
1-month rate 3-month rate
MSFE 0.046 0.113 0.174
RMSFE 1.895 1.758 1.633
Panel H - Yield-macro model (fixed, non-estimated θ ) VAR(1)
Mean −0.066 −0.081 −0.101
Median −0.078 −0.129 −0.172
Max. 0.326 0.555 0.765
Min −0.662 −1.297 −1.521
S.D. 0.133 0.244 0.334

MSFE 0.022 0.066 0.122


RMSFE 1.310 1.345 1.368
Panel I - Slope-Based Benchmark Errors
Mean −0.010 −0.029 −0.040
Median 0.000 −0.009 −0.001
Max. 0.203 0.207 0.291
Min −0.745 −1.417 −1.712
S.D. 0.118 0.203 0.264

MSFE 0.014 0.042 0.071


RMSFE 1.044 1.072 1.046
Panel L - Unrestricted Completely Affine Gaussian Model A_0(3)
Mean −0.143 −0.205 −0.039
Median −0.117 −0.150 −0.008
Max. 0.117 0.149 0.239
Min −1.501 −1.754 −1.677
S.D. 0.187 0.294 0.247

MSFE 0.055 0.129 0.062


RMSFE 2.080 1.878 0.979
Panel M - Restricted Completely Affine Model A_2(3)
Mean −0.141 −0.203 −0.049
Median −0.113 −0.149 −0.013
Max. 0.136 0.161 0.240
Min −1.495 −1.748 −1.710
S.D. 0.186 0.293 0.251

MSFE 0.054 0.127 0.066


RMSFE 2.061 1.869 1.003
Panel N - Unrestricted Essentially Affine Gaussian Model A_0(3)
Mean −0.128 −0.202 −0.216
Median −0.123 −0.151 −0.186
Max. 0.054 0.123 0.008
Min −0.698 −1.698 −1.649
S.D. 0.104 0.286 0.226

MSFE 0.027 0.122 0.097


RMSFE 1.460 1.832 1.223
Panel O - Restricted Essentially Affine Model A_1(3)
Mean −0.162 −0.221 −0.299
Median −0.135 −0.162 −0.264
Max. 0.096 0.104 −0.044
Min −1.483 −1.747 −1.867
S.D. 0.183 0.298 0.241

MSFE 0.060 0.138 0.148


RMSFE 2.156 1.943 1.507
The table shows summary statistics for monthly forecast errors for a range of models and
horizons. Due to data availability, T-bills at the 1- and 3-month maturities are considered. For
comparability, these statistics are calculated using forecast errors over the common out-of-
sample period, 2008:08–2016:09. Panels B and C report the performances of the EH-implied
forecasts with constant and time-varying risk premia (computed using a rolling window of Q
= 60 months), respectively. MSFE and RMSFE are the mean and root-mean squared forecast
errors. The RMSFE values are relative to the random walk model; values less than one are in
italics. The best performing models with the smallest MSFEs are shown in bold.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 661

Table 9
Tests of equal predictive accuracy: squared forecast error loss post-2008.

The three panels of the table perform the same tests as in Table 2. The numbers above the main diagonal report the standard modified Diebold-Mariano-West
test statistics, with the corresponding significance levels in parentheses. The numbers below the diagonal report the West–McCracken test statistics.

and at all forecast horizons. We show that this conclusion Cochrane and Piazzesi (2005) or Fama and Bliss (1987)
is robust to the weighting of prediction error losses by the relative to the no-predictability alternative. However, we
economic price of the underlying states, as estimated by trust that none of these papers has performed the extensive
various empirical definitions of the SDF. Finally, this con- tests—in terms of range of models, rates predicted, hori-
clusion grows even stronger after 1994, and especially in zons investigated, and statistical methods deployed, that
the aftermath of the financial crisis, when it is the random our work has featured.
walk, if anything, that tends to outperform other models in The finding that the empirical failure of the EH is likely
terms of predictive accuracy. to be due to the auxiliary assumption about the predictabil-
Interestingly, the performance differences between the ity of the short-term rate rather than to the EH per se
EH-restricted forecasts that assumed that the risk premium suggests that forward guidance policies might be success-
is constant and those which allowed for variation in the ful if central banks can commit credibly to a path for the
premia were small, with neither model outperforming the policy rate. Indeed, there is evidence that the federal funds
other consistently at all forecast horizons and none being rate has become more predictable since the Fed began
statistically significant. This result supports Kozicki & Tins- announcing its funds rate target in 1994, at least at short
ley’s (2005) finding that the empirical failure of the EH horizons (e.g., Lange et al., 2003; Poole, Rasche, & Thornton,
may not be due to time-variation in risk premia. Rather, 2002). However, the evidence presented by Andersson and
the ubiquitous empirical rejection of the EH appears to be Hoffman (2010) and Kool and Thornton (2015) suggests
due to the failure of market participants to forecast short- that central banks’ forward guidance policies have had
term rates in the manner assumed by the EGP that is used limited success. To some extent, our findings in Section 6
in conventional tests of the EH. Indeed, the results sug- add to this evidence: beating the random walk benchmark
gest that future short-term rates are determined by new has now become harder than ever, if possible, even at short
information (i.e., news) that is essentially unpredictable. horizons.
This explains not only why the spread between the long- We acknowledge that, despite our best efforts to imple-
and short-term rates is a relatively poor predictor of the ment a robust research design, there are several extensions
future short-term rate, but also why conventional tests of that could be explored in the attempt to find cases in which
the EH consistently reject it. Our findings are also broadly the predictability in short-term rates may be stronger than
consistent with those of Thornton and Valente (2012), who that which we have uncovered. For instance, although
find that there are no gains to an investor who exploits our models have employed information from the entire
the predictability of the bond excess return models of term structure of interest rates, we have not explored the
662 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

Table 10
Model confidence set tests for the post-2008 period.
Model Rank_max,m t_i. p-value Rank R,m t_ij p-value Average
Loss
Horizon: 1 month
EH-Rest. Model (time-varying risk premium, Q = 10) 1 −1.413 1.000 1 −0.605 1.000 0.006
EH-Rest. Model (time-varying risk premium, Q = 60) 2 −0.431 1.000 2 0.606 1.000 0.009
Random Walk 3 0.743 1.000 3 1.322 0.972 0.013
Slope-Based Benchmark 4 1.113 1.000 4 1.548 0.259 0.014
Horizon: 2 months
EH-Rest. Model (time-varying risk premium, Q = 10) 1 −1.547 1.000 1 −0.080 1.000 0.032
Random Walk 2 −1.428 1.000 2 0.079 1.000 0.036
Slope-Based Benchmark 3 −1.295 1.000 3 0.170 1.000 0.042
Diebold and Li’s (fixed, non-estimated θ ) VAR(1) 4 −1.065 1.000 4 0.331 1.000 0.051
Yield-macro model (fixed, non-estimated θ ) VAR(1) 5 −0.707 1.000 5 0.579 1.000 0.066
EH-Rest. Model (time-varying risk premium, Q = 60) 6 −0.163 1.000 6 0.946 1.000 0.088
Diebold and Li’s (fixed, non-estimated θ ) AR(1) 7 0.038 1.000 7 1.082 1.000 0.096
Diebold and Li’s (single-step, time-varying θ ) 8 0.449 1.000 8 1.361 1.000 0.113
Unrestricted Essentially Affine Gaussian Model A_0(3) 9 0.692 1.000 9 1.524 1.000 0.122
Restricted Completely Affine Model A_2(3) 10 0.812 1.000 11 1.606 1.000 0.127
Restricted Affine A_0(3) 11 0.813 1.000 10 1.605 1.000 0.127
Unrestricted Completely Affine Gaussian Model A_0(3) 12 0.844 1.000 12 1.634 1.000 0.129
Restricted Essentially Affine Model A_1(3) 13 1.069 1.000 13 1.778 0.987 0.138
Diebold and Li’s (θ recursively estimated on OOS data) 14 1.489 0.583 14 2.055 0.000 0.156
Horizon: 3 months
Unrestricted Completely Affine Gaussian Model A_0(3) 1 −1.112 1.000 1 −0.080 1.000 0.062
Restricted Affine A_0(3) 2 −0.993824 1.000 2 0.080 1.000 0.063
Random Walk 3 0.246 1.000 3 0.906 0.980 0.065
Restricted Completely Affine Model A_2(3) 4 0.459 1.000 4 1.045 0.452 0.066
EH-Rest. Model (time-varying risk premium, Q = 10) 5 0.870 0.508 5 1.143 0.140 0.067
The table reports the models that belong to the model confidence set at the 5% confidence level, obtained though the Tmax test. The columns Rank_max,m
and Rank R,m report the rankings over the models. The third and seventh columns report the MCS p-values based on the Tmax and TR statistics, respectively;
these p-values equal the minimum of the overall p-values. The last column is the average loss across the period considered. The recursive OOS period starts
on September 2008 and goes until September 2016.

possibility that estimating and exploiting the existence the IJF, four anonymous referees, Michael Dueker, Victor
of cointegrating relationships may improve the forecast Gaspar, Jeremy Piger, Lucio Sarno, Martin Sola, Jacky So,
accuracy (see e.g. Hall, Anderson, & Granger, 1992). In Ken West, and Tao Wu for helpful comments on an earlier
addition, our econometric models have linked forecasts to draft of this paper, and Beatrice Franzolini, John McAdams,
(unobservable) features of the term structure. There is a and John Zhu for valuable research assistance.
voluminous body of work in finance on the presence of
non-linear dynamics in the latent factors that characterize Appendix A. Supplementary data
the term structure (see e.g. Bansal & Zhou, 2002; Engle &
Ng, 1993; Hess & Kamara, 2005), with applications to fore- Supplementary material related to this article can be
casting (e.g., Guidolin & Timmermann, 2009). Finally, as found online at https://doi.org/10.1016/j.ijforecast.2018.
in most applied econometric studies of the term structure 03.006.
of interest rates, we have assumed that investors’ histor-
ical predictions were identical to the recursive forecasts References
derived in real time from a range of statistical models.
Almeida, C., & Garcia, R. (2012). Assessing misspecified asset pricing
However, some recent studies have reported important models with empirical likelihood estimators. Journal of Econometrics,
differences between survey forecasts and the forecasts 170, 519–537.
from standard models (see e.g. Kim & Orphanides, 2005; Andersson, M., & Hoffman, B. (2010). Gauging the effectiveness of central
Piazzesi & Schneider, 2013). It would be interesting to bank forward guidance. In D. Cobham, A. Eitrheim, S. Gerlach, & J.
Qvigstad (Eds.), Inflation targeting twenty years on: past lessons and
investigate how survey forecasts can supplement model
future prospects. Cambridge University Press.
forecasts. All of these extensions deserve consideration in Ang, A., & Piazzesi, M. (2003). A no-arbitrage vector autoregression of term
future work. structure dynamics with macroeconomic and latent variables. Journal
of Monetary Economics, 50, 745–787.
Bali, T., Heidari, M., & Wu, L. (2009). Predictability of interest rates and
Acknowledgments interest-rate portfolios. Journal of Business & Economic Statistics, 27,
517–527.
Bansal, R., & Zhou, H. (2002). Term structure of interest rates with regime
The views expressed here are the authors’ and do not shifts. Journal of Finance, 57, 1997–2043.
necessarily reflect the views of the Board of Governors of Bekaert, G., Hodrick, R., & Marshall, D. (1997). On biases in tests of the
the Federal Reserve System or the Federal Reserve Bank of expectations hypothesis of the term structure of interest rates. Journal
St. Louis. We would like to thank one associate editor at of Financial Economics, 44, 309–348.
M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664 663

Bekaert, G., Hodrick, R., & Marshall, D. (2001). Peso problem explana- Kool, C., & Thornton, D. (2015). How effective is central bank forward
tions for term structure anomalies. Journal of Monetary Economics, 48, guidance? In Federal reserve bank of St. Louis review, fourth quarter
241–270. (pp. 303–322).
Bliss, R. (1997). Testing term structure estimation methods. Advances in Koopman, S., Mallee, M., & Van der Wel, M. (2010). Analyzing the term
Futures and Options Research, 9, 197–231. structure of interest rates using the dynamic Nelson-Siegel model
Campbell, J., & Shiller, R. (1991). Yield spreads and interest rate move- with time-varying parameters. Journal of Business & Economic Statis-
ments: a bird’s eye view. Review of Economic Studies, 58, 495–514. tics, 28, 329–343.
Carriero, A., Favero, C., & Kaminska, I. (2006). Financial factors, macroeco- Koopman, S., & van der Wel, M. (2012). Forecasting the U.S. term struc-
nomic information and the expectations theory of the term structure ture of interest rates using a macroeconomic smooth dynamic factor
of interest rates. Journal of Econometrics, 131, 339–358. model. International Journal of Forecasting, 29, 676–694.
Cenesizoglu, T., & Timmermann, A. (2012). Do return prediction models Kozicki, S., & Tinsley, P. (2001). Shifting endpoints in the term structure of
add economic value? Journal of Banking and Finance, 36, 2974–2987. interest rates. Journal of Monetary Economics, 47, 613–652.
Cochrane, J., & Piazzesi, M. (2005). Bond risk premia. American Economic Kozicki, S., & Tinsley, P. (2005). What do you expect? Imperfect policy
Review, 95, 138–160. credibility and tests of the expectations hypothesis Journal of Mone-
Cox, J., Ingersoll, J., & Ross, S. (1985). A theory of the term structure of tary Economics, 52, 421–447.
interest rates. Econometrica, 53, 385–408. Lange, J., Sack, B., & Whitesell, W. (2003). Anticipations of monetary
Dai, Q., & Singleton, K. (2002). Expectation puzzles, time-varying risk policy in financial markets. Journal of Money, Credit and Banking, 35,
premia, and affine models of the term structure. Journal of Financial 889–910.
Economics, 63, 415–441. Littermann, R., & Scheinkman, J. (1991). Common factors affecting bond
De Pooter, M. (2007). Examining the Nelson-Siegel class of term structure returns. Journal of Fixed Income, 1, 54–61.
models (No. 07-043/4). Tinbergen Institute Discussion Paper. Macaulay, F. (1938). Some theoretical problems suggested by movements
Della Corte, P., Sarno, L., & Thornton, D. (2008). The expectations hypothe- of interest rates, bond yields, and stock prices in the United States since
sis of the term structure of very short-term rates: statistical tests and 1856. New York: National Bureau of Economic Research.
economic value. Journal of Financial Economics. Mankiw, N., & Miron, J. (1986). The changing behavior of the term struc-
Diebold, F., & Li, C. (2006). Forecasting the term structure of government ture of interest rates. Quarterly Journal of Economics, 101, 211–228.
bond yields. Journal of Econometrics, 130, 337–364. Martin, G., Reidy, A., & Wright, J. (2009). Does the option market produce
Diebold, F., & Mariano, R. (1995). Comparing predictive accuracy. Journal superior forecasts of noise-corrected volatility measures? Journal of
of Business & Economic Statistics, 13, 253–263. Applied Econometrics, 24, 77–104.
Diebold, F., Rudebusch, G., & Aruoba, S. (2006). The macroeconomy and the McCracken, M. (2004). Parameter estimation and tests of equal forecast
yield curve: a dynamic latent factor approach. Journal of Econometrics, accuracy between non-nested models. International Journal of Fore-
131, 309–338. casting, 20, 503–514.
Driffill, J., Psaradakis, Z., & Sola, M. (1997). A reconciliation of some Nelson, C., & Siegel, A. (1987). Parsimonious modeling of yield curves.
paradoxical empirical results on the expectations model of the term Journal of Business, 60, 473–489.
structure. Oxford Bulletin of Economics and Statistics, 59, 29–42. Piazzesi, M., & Schneider, M. (2013). Trend and cycle in bond premia working
Duffee, G. (2002). Term premia and interest rate forecasts in affine models. paper. Stanford University.
Journal of Finance, 57, 405–443. Poole, W., Rasche, R., & Thornton, D. (2002). Market anticipations of
Duffie, D., & Kan, R. (1996). A yield-factor model of interest rates. monetary policy actions. Federal Reserve Bank of St. Louis Review, 84,
Mathematical Finance, 6, 379–406. 65–93.
Engle, R., & Ng, V. (1993). Time varying volatility and the dynamic be- Roberds, W., Runkle, D., & Whiteman, C. (1996). A daily view of yield
havior of the term structure. Journal of Money, Credit and Banking, 25, spreads and short-term interest rate movements. Journal of Money,
336–349. Credit and Banking, 28, 34–53.
Exterkate, P., van Dijk, D., Heij, C., & Groenen, P. (2013). Forecasting the Roberds, W., & Whiteman, C. (1999). Endogenous term premia and
yield curve in a data-rich environment using the factor-augmented anomalies in the term structure of interest rates: explaining the
Nelson-Siegel model. Journal of Forecasting, 32, 193–214. predictability smile. Journal of Monetary Economics, 44, 555–580.
Fama, E. (1976). Forward rates as predictors of future spot rates. Journal of Sarno, L., Thornton, D., & Valente, G. (2007). The empirical failure of the
Financial Economics, 3, 361–377. expectations hypothesis of the term structure of bond yields. Journal
Fama, E., & Bliss, R. (1987). The information in long maturity forward rates. of Financial and Quantitative Analysis, 42, 81–100.
American Economic Review, 77, 680–692. Shiller, R., Campbell, J., & Schoenholtz, K. (1983). Forward rates and future
Favero, C., Niu, L., & Sala, L. (2012). Term structure forecasting: no- policy: interpreting the term structure of interest rates. Brookings
arbitrage restrictions vs. large information sets. Journal of Forecasting, Papers on Economic Activity, 1, 173–217.
31, 124–156. Thornton, D. (2005). Tests of the expectations hypothesis: resolving the
Froot, K. (1989). New hope for the expectations hypothesis of the term anomalies when the short-term rate is the federal funds rate. Journal
structure of interest rates. Journal of Finance, 44, 283–305. of Banking and Finance, 29, 2541–2556.
Fuhrer, J. (1996). Monetary policy shifts and long-term interest rates. Thornton, D. (2006). Tests of the expectations hypothesis: resolving the
Quarterly Journal of Economics, 111, 1183–1209. campbell-shiller paradox. Journal of Money, Credit, and Banking, 38,
Guidolin, M., & Timmermann, A. (2009). Forecasts of us short-term inter- 2039–2071.
est rates: a flexible forecast combination approach. Journal of Econo- Thornton, D., & Valente, G. (2012). Out-of-sample predictions of bond
metrics, 150, 297–311. excess returns and forward rates: an asset allocation perspective.
Hall, A., Anderson, H., & Granger, C. (1992). A cointegration analysis Review of Financial Studies, 25, 3141–3168.
of treasury bill yields. The Review of Economics and Statistics, 74, Timmermann, A., & Granger, C. W. (2004). Efficient market hypothesis and
116–126. forecasting. International Journal of Forecasting, 20, 15–27.
Hansen, P. R., Lunde, A., & Nason, J. (2011). The model confidence set. Tzavalis, E., & Wickens, M. (1997). Explaining the failures of the term
Econometrica, 79, 453–497. spread models of the rational expectations hypothesis of the term
Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the equality of structure. Journal of Money, Credit and Banking, 29, 364–380.
prediction mean squared errors. International Journal of Forecasting, van Dijk, D., & Franses, P. H. (2003). Selecting a nonlinear time series
13, 281–291. model using weighted tests of equal forecast accuracy. Oxford Bulletin
Hess, A., & Kamara, A. (2005). Conditional time-varying interest rate risk of Economics and Statistics, 65, 727–744.
premium: evidence from the treasury bill futures market. Journal of Vasicek, O. (1977). An equilibrium characterization of the term structure.
Money Credit and Banking, 37, 679–698. Journal of Financial Economics, 5, 177–188.
Kim, D. H., & Orphanides, A. (2005). Term structure estimation with survey Waggoner, D. (1997). Spline methods for extracting interest rate curves from
data on interest rate forecasts. In Finance and economics discussion coupon bond prices. Federal Reserve Bank of Atlanta Working Paper
series no. 48. Board of Governors of the Federal Reserve System. 97-10.
664 M. Guidolin, D.L. Thornton / International Journal of Forecasting 34 (2018) 636–664

West, K. (1996). Asymptotic inference about predictive ability. Review, the Journal of Financial Economics, the Review of Financial Studies,
Econometrica, 64, 1067–1084. the Journal of Financial and Quantitative Analysis, and the Journal of Econo-
Woodford, M. (2003). Central bank communication and policy effective- metrics. He is serves on the editorial board of a number of journals, such as
ness. In The greenspan era: lessons for the future (pp. 339–474). A the Journal of Banking and Finance, the Journal of Economic Dynamics and
symposium sponsored by the Federal Reserve Bank of Kansas City. Control, and the International Journal of Forecasting.
Woodford, M. (2012). Monetary policy in the information economy.
In Economic policy and the information economy (pp. 297–370). A sym-
posium sponsored by the Federal Reserve Bank of Kansas City. Daniel L. Thornton is president of D.L. Thornton Economics LLC. He was
Wu, T. (2006). Macro factors and the affine term structure of interest rates. vice president and economic advisor at the Federal Reserve Bank of St.
Louis before retiring in August 2014. Prior to joining the Federal Reserve
Journal of Money, Credit and Banking, 38, 1848–1873.
Bank of St. Louis in 1981 Dr. Thornton was an associate professor of
Xiang, J., & Zhu, X. (2013). A regime-switching Nelson–Siegel term struc-
economics at Central Michigan University. Thornton received his Ph.D.
ture model and interest rate forecasts. Journal of Financial Economet-
in economics from the University of Missouri—Columbia and an M.S. in
rics, 11, 522–555.
economics from Arizona State University. He has published widely in
Yu, W.-C., & Zivot, E. (2011). Forecasting the term structures of trea-
leading economics and finance journals, such as, the Review of Economics
sury and corporate yields using dynamic nelson-siegel models. Inter- and Statistics, the Journal of Money, Credit, and Banking, the Journal of
national Journal of Forecasting, 27, 579–591. Financial Economics, and the Review of Financial Studies. He is an Associate
Editor of the Applied Economics Letters, and Applied Financial Economics, a
Research Fellow at the Centre for Finance and Credit Markets, a member of
Massimo Guidolin is a professor of Finance with Bocconi University and the Central Bank Communication Network and a member of the advisory
director of Bocconi’s FT-ranked MSc. Finance, where he teaches courses board of the International Centre for Banking and Corporate Governance.
in financial econometrics and asset pricing and the graduate level. His He is also a member of the Board of the St. Louis Council on Economic
research has been published in outlets such as the American Economic Education and a Trustee of the Missouri Council on Economic Education.

Vous aimerez peut-être aussi