Académique Documents
Professionnel Documents
Culture Documents
Key Points
Most firms and portfolio managers rely on backtests (or historical
simulations of performance) to allocate capital to investment strategies.
After trying only 7 strategy configurations, a researcher is expected to
identify at least one 2-year long backtest with an annualized Sharpe ratio
of over 1, when the expected out of sample Sharpe ratio is 0.
If the researcher tries a large enough number of strategy configurations, a
backtest can always be fit to any desired performance for a fixed sample
length. Thus, there is a minimum backtest length (MinBTL) that should be
required for a given number of trials.
Standard statistical techniques designed to prevent regression overfitting,
such as hold-out, are inaccurate in the context of backtest evaluation.
The practical totality of published backtests do not report the number of
trials involved.
Under memory effects, overfitting leads to systematic losses, not noise.
SECTION I
Backtesting Investment Strategies
Hold-Out (1/2)
Perhaps the most common approach among
practitioners is to require the researcher to hold-out a
part of the available sample (also called test set
method).
This hold-out is used to estimate the OOS
performance, which is then compared with the IS
performance.
If they are congruent, the investor has no grounds to
reject the hypothesis that the backtest is overfit.
The main advantage of this procedure is its simplicity.
Hold-Out (2/2)
Training
Set
Unused
sample
Model
Configurations
Investment
strategy
No
IS Perform.
Evaluation
Is Perform.
Optimal?
Yes
Unused
sample
Testing
Set
Investment
strategy
Optimal Model
Configuration
OOS
Perform.
Evaluation
10
SECTION II
How Easy is to Overfit a Backtest?
1 Z1
2
1
1
1
+ Z1 1 1
2
Note: MinBTL assesses a backtests representativeness given N
trials, while MinTRL & PSR assess a track-records (single trial). See
Bailey and Lpez de Prado [2012] for further details.
14
10
1
1
+ Z 1 1 1
1 Z 1 1
0
0
100
200
300
400
500
600
700
800
900
1000
Therefore, a backtest that does not report the number of trials N used to identify
the selected configuration makes it impossible to assess the risk of overfitting.
Overfitting makes any Sharpe ratio achievable IS the researcher just needs to keep
trying alternative parameters for that strategy!!
15
SECTION III
The Consequences of Overfitting
The left figure shows the relation between SR IS (x-axis) and SR OOS (y-axis), for
= 0, = 1, = 1000, = 1000. Because the process follows a random walk, the
scatter plot has a circular shape centered in the point (0,0).
The right figure illustrates what happens once we add a model selection procedure.
Now the SR IS ranges from 1.2 to 2.6, and it is centered around 1.7. Although the
backtest for the selected model generates the expectation of a 1.7 SR, the expected
SR OOS in unchanged around 0.
18
In-Sample (IS)
Out-Of-Sample (OOS)
19
=1
>
<
Another way of introducing memory is through serialconditionality, like in a first-order autoregressive process.
= 1 + 1 1 +
= 1 + 1 +
where the random shocks are IID distributed as ~.
22
>
<
Proposition 4 reaches the same conclusion as Proposition 2 (a
compensation effect), without requiring a global constraint.
23
SECTION IV
Combinatorially-Symmetric Cross-Validation
<
=
==
2
2
2
1
2
=0
2
28
submatrices that
words, is the
matrix formed by all rows of M that are not
part of J.
c. Form a vector R of performance statistics of order N, where the nth item of R reports the performance associated with the n-th
column of J (the training set).
d. Determine the element such that , = 1, , . In
other words, = arg .
29
.
1
OOS
B
C
D
C
D
D
C
B
B
A
A
A
D
D
C
D
C
B
SECTION V
Assessing the Representativeness of a Backtest
SECTION VI
Features and Accuracy of CSCVs Estimates
SR_Case
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
T
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
N
500
500
500
100
100
100
50
50
50
10
10
10
500
500
500
100
100
100
50
50
50
10
10
10
500
500
500
100
100
100
50
50
50
10
10
10
500
500
500
100
100
100
50
50
50
10
10
10
Mean_CSCV
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.993
0.893
0.561
0.929
0.755
0.371
0.870
0.666
0.288
0.618
0.399
0.123
0.679
0.301
0.011
0.488
0.163
0.004
0.393
0.113
0.002
0.186
0.041
0.000
0.247
0.020
0.000
0.124
0.007
0.000
0.088
0.004
0.000
0.028
0.001
0.000
Std_CSCV
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.001
0.000
0.000
0.007
0.032
0.022
0.023
0.034
0.034
0.031
0.035
0.047
0.054
0.054
0.048
0.037
0.038
0.011
0.035
0.045
0.006
0.040
0.044
0.004
0.054
0.027
0.001
0.043
0.017
0.000
0.042
0.008
0.000
0.037
0.006
0.000
0.022
0.002
0.000
Prob_MC
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.991
0.872
0.487
0.924
0.743
0.296
0.878
0.628
0.199
0.650
0.354
0.093
0.614
0.213
0.000
0.413
0.098
0.002
0.300
0.068
0.000
0.146
0.011
0.000
0.174
0.005
0.000
0.075
0.001
0.000
0.048
0.002
0.000
0.010
0.000
0.000
CSCV-MC
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.002
0.021
0.074
0.005
0.012
0.075
-0.008
0.038
0.089
-0.032
0.045
0.030
0.065
0.088
0.011
0.075
0.065
0.002
0.093
0.045
0.002
0.040
0.030
0.000
0.073
0.015
0.000
0.049
0.006
0.000
0.040
0.002
0.000
0.018
0.001
0.000
39
, where
6
1 =
1 2
1 +
2
, ,
1 0, , ,
1
1 + 2
2
2 =
, ,
as a result of , <
. The integral has an upper boundary
SR_Case
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
T
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
500
1000
2500
N
500
500
500
100
100
100
50
50
50
10
10
10
500
500
500
100
100
100
50
50
50
10
10
10
500
500
500
100
100
100
50
50
50
10
10
10
500
500
500
100
100
100
50
50
50
10
10
10
Mean_CSCV
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.993
0.893
0.561
0.929
0.755
0.371
0.870
0.666
0.288
0.618
0.399
0.123
0.679
0.301
0.011
0.488
0.163
0.004
0.393
0.113
0.002
0.186
0.041
0.000
0.247
0.020
0.000
0.124
0.007
0.000
0.088
0.004
0.000
0.028
0.001
0.000
Std_CSCV
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.001
0.000
0.000
0.007
0.032
0.022
0.023
0.034
0.034
0.031
0.035
0.047
0.054
0.054
0.048
0.037
0.038
0.011
0.035
0.045
0.006
0.040
0.044
0.004
0.054
0.027
0.001
0.043
0.017
0.000
0.042
0.008
0.000
0.037
0.006
0.000
0.022
0.002
0.000
Prob_EVT
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.994
0.870
0.476
0.926
0.713
0.288
0.859
0.626
0.220
0.608
0.360
0.086
0.601
0.204
0.002
0.405
0.099
0.001
0.312
0.066
0.000
0.137
0.023
0.000
0.148
0.005
0.000
0.068
0.002
0.000
0.045
0.001
0.000
0.015
0.001
0.000
CSCV-EVT
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
-0.001
0.023
0.086
0.003
0.042
0.083
0.011
0.041
0.068
0.009
0.039
0.036
0.079
0.097
0.009
0.084
0.065
0.003
0.081
0.047
0.002
0.049
0.018
0.000
0.099
0.015
0.000
0.056
0.005
0.000
0.043
0.003
0.000
0.013
0.000
0.000
42
SECTION VI
A Practical Application
46
48
SECTION VII
Conclusions
Conclusions (1/2)
1. Backtest overfitting is difficult to avoid.
2. For a sufficiently large number of trials, it is trivial to achieve
any desired Sharpe ratio for a backtest.
3. Given that most published backtests do not report the
number of trials attempted, we must suppose that many of
them are overfit.
4. In that case, if an investor allocates capital to those
strategies, OOS performance will vary:
Conclusions (2/2)
6. Standard statistical techniques designed to detect overfitting
in the context of regression models are poorly equipped to
assess backtest overfitting.
7. Hold-outs in particular are unreliable and easy to manipulate.
8. The solution is not to stop backtesting. The answer to this
problem is to estimate accurately the risk of overfitting.
9. To address this concern, we have developed the CSCV
framework, which derives 5 metrics to assess overfitting:
52
SECTION VII
The stuff nobody reads
Bibliography (1/2)
Bailey, D. and M. Lpez de Prado (2012): The Sharpe Ratio Efficient Frontier,
Journal of Risk, 15(2), pp. 3-44. Available at http://ssrn.com/abstract=1821643
Embrechts, P., C. Klueppelberg and T. Mikosch (2003): Modelling Extremal
Events, Springer Verlag, New York.
Hadar, J. and W. Russell (1969): Rules for Ordering Uncertain Prospects,
American Economic Review, Vol. 59, pp. 25-34.
Hawkins, D. (2004): The problem of overfitting, Journal of Chemical Information
and Computer Science, Vol. 44, pp. 1-12.
Hirsch, Y. (1987): Dont Sell Stocks on Monday, Penguin Books, 1st Edition.
Leinweber, D. and K. Sisk (2011): Event Driven Trading and the New News,
Journal of Portfolio Management, Vol. 38(1), 110-124.
Lo, A. (2002): The Statistics of Sharpe Ratios, Financial Analysts Journal, (58)4,
July/August.
Lpez de Prado, M. and A. Peijan (2004): Measuring the Loss Potential of Hedge
Fund Strategies, Journal of Alternative Investments, Vol. 7(1), pp. 7-31. Available
at http://ssrn.com/abstract=641702
54
Bibliography (2/2)
55
Bio
Marcos Lpez de Prado is Senior Managing Director at Guggenheim Partners. He is also a Research
Affiliate at Lawrence Berkeley National Laboratory's Computational Research Division (U.S. Department
of Energys Office of Science).
Before that, Marcos was Head of Quantitative Trading & Research at Hess Energy Trading Company (the
trading arm of Hess Corporation, a Fortune 100 company) and Head of Global Quantitative Research at
Tudor Investment Corporation. In addition to his 15+ years of trading and investment management
experience at some of the largest corporations, he has received several academic appointments,
including Postdoctoral Research Fellow of RCC at Harvard University and Visiting Scholar at Cornell
University. Marcos earned a Ph.D. in Financial Economics (2003), a second Ph.D. in Mathematical
Finance (2011) from Complutense University, is a recipient of the National Award for Excellence in
Academic Performance by the Government of Spain (National Valedictorian, 1998) among other awards,
and was admitted into American Mensa with a perfect test score.
Marcos is the co-inventor of four international patent applications on High Frequency Trading. He has
collaborated with ~30 leading academics, resulting in some of the most read papers in Finance (SSRN),
three textbooks, publications in the top Mathematical Finance journals, etc. Marcos has an Erds #3 and
an Einstein #4 according to the American Mathematical Society.
56
Disclaimer
The views expressed in this document are the authors
and do not necessarily reflect those of the
organizations he is affiliated with.
No investment decision or particular course of action is
recommended by this presentation.
All Rights Reserved.
57
Notice:
The research contained in this presentation is the result of a
continuing collaboration with
David H. Bailey, Berkeley Lab
Jonathan M. Borwein, FRSC, FAAS
Jim Zhu, Western Michigan Univ.
The full papers are available at:
http://ssrn.com/abstract=2308659
http://ssrn.com/abstract=2326253
For additional details, please visit:
http://ssrn.com/author=434076
www.QuantResearch.info