Académique Documents
Professionnel Documents
Culture Documents
MARK JOSHI
3. A REAS OF DERIVATIVES
FX
Equities
Fixed income
Credit derivatives
Commodities
Hybrids
Power/energy
Commodities, this is also a big growth area with the general rally in
commodity prices in recent years. This is the area that seems to be hold-
ing up best in the current job market.
Hybrids are derivatives that pay off according to behaviours in more
than one market this is typically interest rates plus something else. The
main advantage of working on such products is the ability to learn multi-
ple areas.
Power and energy derivatives relate to the cost of buying and selling
electricity. The wholesale market for electricity has some unique features.
4. S ORTS OF EMPLOYERS
Commercial banks ask less of you, and pay less. Better job security.
Investment banks tend to demand long hours but pay well. Not so
good job security.
Hedge funds tend to demand a lot of work. They are very volatile and
a big growth industry currently. There is the potential to make a huge
amount of money, but also the potential to be unemployed after a few
months.
In general, American banks pay better but demand longer hours than
European banks.
The big accountancy firms have quant teams for consulting. Some
places, particularly D-fine, send their employees on the Oxford Masters
course. The main disadvantage is that you are far from the action, and
high quality individuals tend to work in banks so it may be hard to find
someone to learn from. Related places are consultancies and insurance
companies.
4 MARK JOSHI
5. S TUDY
What should one learn? There is by now a huge number of books avail-
able. Standard books are
http://www.markjoshi.com/RecommendedBooks.html
6. F ORUM
I am now running a book and careers forum to discuss books and get-
ting a first job in quant which you can access from
http://www.markjoshi.com
Please ask me questions via this forum rather than by e-mail. (Please only
e-mail me if there is some confidential aspect to your query.)
It also now has an experimental job-wanted section. Post your profile
but not personal details and see if anyones interested...
The amount you must study before getting a job varies a lot from place
to place. It goes up every year as it becomes more standard to do financial
mathematics degrees. At the time of writing, I would advise knowing the
contents of both my books well. A lot of candidates go wrong by reading
books instead of studying them. Pick a couple of books and pretend that
you have to do an exam on them (this is essentially what happens in an
interview,) if you arent confident that youd get an A in that sort of exam,
dont apply for jobs.
Interviewers tend to care more about understanding the basics well
than on knowing a lot. Its also important to demonstrate genuine in-
terest in the field. Read the Economist and the FT or Wall Street Jour-
nal comprehensively. Its not unusual to ask basic calculus or analysis
questions e.g. what is the integral of log x. Asking for a derivation of the
6 MARK JOSHI
Black-Scholes equation is very common too. They always ask you to ex-
plain your thesis so be prepared to be able to do this. Have a prepared 60
second speech on every phrase on your cv.
The interview is also a chance for you to judge them. What are they like
as people? (You will be spending most of your waking life with them so
this is important.) What do they care about, as evidenced by what they
ask you? If most of the questions are about the minutae of C++ syntax
then be wary unless thats the sort of job you want.
Generally, a PhD (or almost a PhD) is a necessity to get a quant job. I
would advise against starting before its awarded as it tends to be hard to
get it done whilst doing a busy job.
Having a masters degree in Financial mathematics but no PhD tends
to lead into jobs in banking in risk or trading support but not straight
quant jobs. Banking is becoming progressively more mathematical so the
knowledge is useful in many areas in banks. Some people then manage
to move into quant later on.
In the US, it seems to be becoming more and more common to do a
masters after a PhD. This still seems to be less the case in the UK. There
is a general move towards more routine work and less research in banks
making the job less interesting. This seems to be particularly the case in
the US. One head quant told me that he regards research as something
to be contracted out to universities.
Post the global financial crisis, it has become much harder to find a job.
The problems are particularly acute at entry level. It each year it seems
to get worse, but some jobs do still exist. What does this translate into in
terms of behaviour as a job seeker?
First, you must really know your stuff. The days hiring on the basis
of potential are gone, now make sure that you have done your prepara-
tion and can cope with any reasonable question. This means learning
the books, being able to reproduce them and drilling interview questions
at great length. It also means spending a lot of time implementing the
models in C++ so you can demonstrate your ability to contribute from
day one.
Second, you cant assume that youll have a lot of chances. In the past,
many candidates honed their skills by going to lots of interviews and hav-
ing their gaps discovered for them. This is not an option when only a few
ON BECOMING A QUANT 7
places are hiring: they wont reinterview you just because you have done
a bit more preparation. Doing your preparation also means finding out
about the company and the area.
Third, dont be picky regarding area. You may well want to work with
exotic interest rate derivatives, but if all the jobs are in commodities then
accept that and plan a shift when the market does.
Fourth, dont get focussed on the salary. The money is down; the im-
portant thing is to get some useful experience for when the market turns
around.
Fifth, do you really need to graduate this year? Spending a little longer
at university is not a bad way to sit out the crisis. You can always spend
the time broadening your knowledge and maybe even get a financial maths
research project going.
9. F OR PURE MATHEMATICIANS
10. C ODING
All forms of quants spend a large amount (i.e. more than half ) their
time programming. However, implementing new models can be inter-
esting in itself. The standard programming approach is object-oriented
8 MARK JOSHI
C++. A wannabe quant must learn C++. 1 Some places use MatLab and
that is also a useful skill, but less important. VBA is also used a lot, but
there is a general attitude that you can pick it up on the job. If a job is
very VBA focussed thats generally a bad sign.
All of the finance forums have their own jobs advertising boards. An-
other useful site containing distilled version of this guide is
https://www.financejobs.co/
Some adverts are from recruitment consultants rather than from banks.
It is important to realize that the job may not even exist the consul-
tant wants to get decent candidates that he can then try to place them in
banks. The consultant gets a commission from the bank if he can place
you. They tend to have short attention spans. if you do well at the first
couple of interviews then they will work hard to get you a good job but
if you dont they will quickly lose interest. Also, be aware their agenda is
to get a good commission rather than to help you so they will push you
at jobs on that basis. (A typical cut is 25% of your first years package so
whether you say yes to a job makes a difference of ten thousand pounds
to them.) If you want to understand them, think of estate agents.
In fact, going via a recruitment consultant is the standard way to get
a job. Quants are generally not hired as a part of the on campus recruit-
ment process but instead hired as they are needed by the team. That said
it is worthwhile to go to presentations and to meet the people, and get
their contact details for later. Because of this it is not a great idea to start
applying a long time before you want to start. Banks tend not to be into
paying expenses for interviews. One therefore needs to go to London or
New York and attempt to get as many interviews as possible as quickly as
possible.
If you have personal contacts, you should use them. Employers prefer
not to use headhunters if they can avoid it. If you are finishing a maths or
physics PhD from a top university you will be a hot property. Employers
will be keen to get you before someone else grabs you, so make use of
this.
Recruitment agencies vary tremendously and are discussed at great
length on all the online forums.One which seems to know what they are
1I have no opinion on whether this should be the correct language for implementing;
it is merely the correct language for getting a job.
ON BECOMING A QUANT 9
doing more than most, and which has its own much more extensive guides
is paulanddominic.
If you get offered a job that is not in your ideal area do not be too wor-
ried. It is the first job that is hard to get. You can move on. The main
thing is not to spend more than a couple of years in an area where you do
not want to be. Quants are most employable with 18 months to 2 years
experience. With more than that they tend to be too well paid and get
pigeon-holed.
From time to time, I hear of someone being offered a job and being told
they must accept immediately or within 24 hours. This is unreasonable
and you should question why they are doing this, and do you want to
work with someone who treats you this way? Possible responses are
Why?
Does that mean the offer will go away if I dont accept immedi-
ately?
Oh I get it, you are testing my naivety and laugh.
Mark Joshis guide says never to accept an offer made under such
circumstances.
If you are interviewing with other places, call them first and tell them the
circumstances, they will find this less annoying than you telling them you
accepted a job under pressure.
I regularly get berated by recruitment consultants for my comments in
this section. Here is one rebuttal:
While it is true that the recruitment / consulting agency market has be-
come overly saturated and commoditized as people working in the indus-
try realized they only really need a phone and a computer to start off on
their own, in this business. I believe that our current state of affairs has
weeded out many of those non-reputable firms or one man Body Shops
looking for a quick placement to make a few dollars. There are definitely
advantages to working with reputable a recruitment firm and I do agree
with Marks assessment; that care must be taken in learning about, who
you are dealing with, prior to just turning over your CV blindly to them.
A few quick tips; look at the firms website, if they do not have one this
should be your first red flag: When were they established? Who are their
clients? What types of firms are they representing? Chances are, that if they
have been in business for a number of years and have a good client list,
they are more then likely reputable. Typically, large investment banks and
organizations go to great lengths to establish their preferred vendors list
of allowable recruitment agencies. Please ensure that you are represented
10 MARK JOSHI
directly to the client, as some agencies will attempt to represent you via a
third party, this is not recommended. If an accurate job description is pro-
vided and the direct clients name, you can probably be assured the person
you are dealing with is reputable. The advantage is this: when you ap-
ply directly to a large investment bank, your name, CV, and contact info.
are entered into a large / complex database containing 1000s of potential
candidates. These resumes are funneled through channels to an internal
recruiter, who gets 100s of resumes daily for a variety of different jobs, not
just Quant Jobs, whatever happens to be a priority that day. He will focus
on maybe a few of the resumes he gets everyday and the rest will be filed
for future reference, which never happens. If you do not hear anything
and keep applying your CV and Profile are tagged as a Serial Applicant
and you will no longer be considered for positions within the bank, out of
the sheer fact you seemed desperate, even though you had never heard any-
thing so just kept applying. If, I on the other hand call you, or get back to in
regards to one of my posts, say on LinkedIn, I am contacting you for a spe-
cific job, for a specific manager, whom I speak with on nearly a daily basis.
I have the advantage of submitting you directly to the hiring manager. I lit-
erally put your resume right into his hands, with a brief summary of your
skills and why I think you are a match for the role. I can almost guarantee
you an interview in most cases, where I submit your profile and I provide
insights from others who have interviewed with the same managers in the
past, possibly providing potential questions he may ask and the answers he
is looking for. The Rest is up to you. I do believe there is in fact, an inherent
value here, in a world of online applications, databases, Vendor Manage-
ment Offices etc... It is still nice to know there are agencies out there, who
go above and beyond to see you placed. It makes no sense, for me to place
someone in a role for which he / she will not be happy for a few reasons: A.)
If the person leaves, a portion of my fee needs to be refunded and B.) If the
candidate is not happy and does not perform, I look bad to my client. In
closing, I dont think it prudent to treat potential quants like lost children,
who need to be sheltered and shown the light, chances are if they have not
seen it already, they will be eaten alive in the world of High Finance.
Dallin T. Swenson
Account Executive
dswenson@softinc.com
www.softinc.com
ON BECOMING A QUANT 11
12. PAY
How much does a quant earn? A quant with no experience will gener-
ally get between 40 and 70k pounds. The lowest I have heard of is 25k and
the highest is 70. If the pay is outside the standard range, you should ask
yourself why? Pay will generally go up fairly rapidly. Bonuses are gener-
ally a large component of total salary, and should be taken into account
when negotiating pay. E.g. you may be able to get a guaranteed bonus if
the base is lower.
Do not get too focussed on what the starting salary is. Instead examine
what the job opportunities will be, and what the learning experience is
likely to be. How much turnover is there in the team? (some managers
get touchy if asked about turnover so it may be better to try and ascertain
this indirectly) and where do the people go?
13. H OURS
How hard does a quant work? This varies a lot. At RBS we got in be-
tween 8.30 and 9 and went home around 6pm. The pressure varied. Some
of the American banks expect much longer hours. Wall St tends to be
more demanding than the City. In London 5 to 6 weeks holidays is stan-
dard. In the US 2 to 3 is standard.
14. I NTERVIEWING
Here are some dos and donts that will reduce your chance of messing
up unnecessarily.
Dont be late.
Dont be early; this annoys the interviewer. Get there early, go to a
cafe and have a lemonade and turn up dead on time.
Do eat a good meal beforehand; sugar lows destroy thinking power.
Dont argue with the interviewer about why theyve asked you some-
thing. Theyve asked you it because they want to know whether
you can do it.
Do appear enthusiastic.
Do wear a suit.
Do be eager to please. They want someone wholl do what they
want, you must give the appearance of being obliging rather than
difficult.
12 MARK JOSHI
Dont be too relaxed; they may well conclude that you arent hun-
gry enough for success to work hard.
Dont tell them they shouldnt use C++ because my niche language
is better.
Do demonstrate an interest in financial news.
Do be able to talk about everything on your cv (resume in Ameri-
can). Have a prepared 2 minute response on every phrase on it.
Do bring copies of your CV.
Dont expect the interviewer to be familiar with your CV.
Dont say youve read a book unless you can discuss its contents;
particularly, if theyve written it.
Do be polite.
Do ask for feedback and dont argue about it. Even if its wrong try
to understand what made the interviewer think that.
Dont say you want to work in banking for the money; of course,
you do but its bad form to say so.
Do say you want to work closely with other people rather than
solo.
Dont say that you think that bankers are reasonable people they
arent.
Do take a break from interviewing and do more prep if more than
a couple of interviews go badly.
Dont use a mobile for a phone interview.
Do be able to explain your thesis work out explanations for dif-
ferent sorts of people in advance.
Dont expect banks in the UK to pay for interview expenses. If they
do agree to pay, make sure they are willing to pay what your ticket
will cost. eg dont get an expensive ticket if they say theyll pay for
a cheapo airline.
Do ask about the group, youll be working in. e.g. turnover, where
people go when they leave, how many, when can you meet the
rest of the group (only if an offer appears imminent), how old the
group is, whats the teams raison detre, is it expanding or con-
tracting. What would a typical working day be?
Dont get on to the topic of money early in the process.
good at maths and do your preparation, you can be at that level and get a
job.
15. T HE CQF
I get more e-mails on this topic than any other. I have little direct ex-
perience of it. However, heres my impressions from others.
First, the CQF stands for the Certificate in Quantitative Finance and
is run by 7City training. This organization was created by quant author
Paul Wilmott of wilmott.com. Wilmott also created the diploma in Math-
ematical Finance at the University of Oxford before parting company with
that organization.
The CQF is a six-month part-time course which is available by distance
learning. Its aim is to teach the attendee how to be a quant.
Here some comments from a recent satisfied customer who was al-
ready working in banking:
The CQF is an excellent course, that is like a condensed accelerated MSc
in Mathematical Finance. The CQF covers the basics plus a lot of practi-
cal stuff like C++, Excel VBA and advanced topics like uncertain parame-
ters and stochastic volatility. It has definitely opened a lot of doors for me
that were previously closed, and it is becoming more and more recognised
within the industry. The whole thing takes 6 months, with a module per
month. Each module consists of 4 or 5 sections with homework set at the
end of each one. There is an exam at the end of each module, where you
need to score 60% or above to progress to the next module. If anyone fails
a module, they are given a reading list and encouraged to join the course
at the same point six months later - i.e. with enough will no-one fails.
The final exam is a programming project where youre given a Monte Carlo
and FDM scenario to code up. The content of the course is heavily math-
ematical with no holes barred - Stochastic Calculus, derivation of Black
Scholes, BS with dividends, BS with discrete hedging, stochastic vol, jump
diffusion, calibration, interest rates models, credit models, etc etc. Foun-
dational mathematics is given prior to the start of the course if required,
and new entrants are required to sit a small exam to test out their ability
to do the course (basic calculus, linear algebra and probability type ques-
tions). All exams are done at home, except for a final one at the very end
of the course, after the module exams, which is optional and determines if
you get a distinction. A distinction is basically an asterisk by your name in
the FT
14 MARK JOSHI
Another recent attendee says that it inevitably covers less than an MSc
since it is part time over six months, versus one year full-time or two-
years part time for an MSc. He also thought it was well-suited to those al-
ready with day jobs, and valuable for career development for those want-
ing to move into more quantitative areas.
A general impression seems to be that it is easy to pass the course, but
getting a distinction requires some real work and ability.
A head-hunter suggests that it is more useful for those already working
in banking to change areas rather than to move into banking.
Some comments made my Urnash on nuclearphynance (where you
can find further discussion.)
I was looking for 1) a way to learn those parts of quantitative finance
that every quant should know, but that I havent learned so far (because
they are not used at my current job) and 2) something that can be done in
less than a year full-time (for personal reasons). Since this ruled out every
program for a Master in Financial Engineering, I chose the CQF.
Of course, I could have simply bought a couple of books and work through
them myself. However, I learn much better if I know that I have to read
something this weekend, since I will have to answer some questions about
it before Monday, and that Ill have an exam on it in two weeks time.
What do you learn? Certainly not everything which is mentioned in the
many books that one receive (all published by Wiley, surprise, surprise).
Not even everything which is written in the 2nd edition of Pauls book. The
span of the course is much wider. So you do not only work with PDEs,
you also get quite a bit about the Martingale approach, quite a bit on
Credit derivatives, a lecture on portfolio optimisation, a lecture by Ayache
(ITO33) on Convertible bonds, Jaeckel on Monte Carlo, etc. However, note
that the program changes continuously, so I do not know what the current
program is.
Before giving a list of what I liked and disliked about the CQF, you should
know the following: I took the distance course since I do not live in London.
This means that I took it through the internet. You can see the presenter
and what he writes, and you can ask questions either through IM or a mi-
crophone.
Furthermore I already knew approximately 50-60% of what was taught
in the CQF. So I do not know what happens if you start the CQF without any
knowledge about quantitative finance. My idea is that it would be quite
hard. My advice would be to browse for a short period in either Pauls book
ON BECOMING A QUANT 15
While the explanation about the grading of most of exams was suc-
cinct, but clear enough, the explanation of the grading of Module
6 was, with all due respect, laughable. All I got is a single line of
comment. And this as a reply for a bounded 15-page booklet with a
CD-ROM that I had to send to the CQF organisers by priority mail!
There is a helpdesk on the CQF website which one can use to ask the
organisers questions. Since e-mails tend nowadays to get lost in a
spam filter, I used it a lot. While some of my queries were answered
quickly, others were not answered at all. After a long mail to the
CQF organisers they told me that while the helpdesk is there for the
delegates it is still better to contact the presenters directly. If this is
the case, why does the helpdesk exist?
While the list of things I have not liked is longer that the list of what I
did like, this does not mean that I did not like the course. However, the
problems with the connection and the fact that some helpdesk questions
were not answered at all made it a bit hard for those, like me, who take the
distance learning course, to receive the same amount of tutoring as those
who took the course in the classroom. And tutoring through e-mail and the
web is what should make an internet course different form a set of taped
lectures on a DVD.
A general complaint is that its expensive for what it is.
Paul Wilmott is someone who arouses strong emotions in the quan-
titative finance community, and certainly some people are against the
qualification for that reason.
The bottom line seems to be: worth doing if you want to move areas
within banking and your employer is willing to pay, but not the way to
get your first quant job after university.
There are by now a large number of online forums where these sort of
questions are discussed to death. I keep an up to date list on www.markjoshi.com
. I also running a forum on www.markjoshi.com for discussing books and
career issues.
17. E XAMS
There has been a shift towards the use of written exams to sift entry-
level candidates. There is a certain degree of fairness in this approach.
ON BECOMING A QUANT 17
The main issue tends to be that the question are fitted very much to the
setters prejudices but this is true in all interviews in any case.
Along with this is the shift to associates programmes specifically for
entry-level quants instead of hiring them as needed. For example, Barcap
has a quantitative associates program that only has intake at specific
times.
http://www.barcap.com/campusrecruitment
18.2. Applying for a job in Italy. If youre looking for your first job, head-
hunters wont help you that much. Italian headhunters tend to pay at-
tention to candidates who already have some years of professional expe-
rience. Nowadays youre in the right track to get your first job when you:
18 MARK JOSHI
When youre looking for your first job, its really important to be em-
ployable for an internship. Id like to stress this point because starting as
a stageur is a good way to become an employee in few months. Intern-
ship can be as long as four months up to one year. During this period a
project will be assigned to you and a tutor will train you. Detailed infor-
mation on the rules governing internship in Italy can be found here:
http://www.sportellostage.it/aziende/normativa.htm.
Here are some comments from an Australian who did his PhD in pure
mathematics in Japan, and then went looking for a quant job.
I applied to banks in Japan through their standard new grad recruit-
ment programme (undergrads and postgrads together; note this seems to
be different to how a potential PhD applies in the UK). After many info
seminars and early-stage interviews, I got a much better idea of people and
roles in a bank. In fact I decided to go for trader/structurer roles instead of
quant.
The rates hybrids desk at an international bank said if I really wanted
to start immediately in trading then theyd let me (at this stage I had the
leverage of another firms structurer offer in my pocket), but theyd like me
to work as a quant for two years first. They said the best traders know their
models inside-out. I liked all the people there and I trusted what they said,
so I accepted their offer.
I have heard the following from a few quants with science PhDs. The
following happens
I would like to make this guide more dynamic by including the latest
gossip and stories of job applicants. So send me, mark@markjoshi.com,
your experiences, including info such as
23. C OURSES
Once youve got that job, the firm will generally be willing to send you
on at least one training course. Please consider attending one of mine.
My next course will be in Sydney in December 2011 and will cover the
LIBOR market model and its kooderive (ie CUDA) implementation. I also
keep a list on www.markjoshi.com
24. A DVERTISING
Various recruitment agencies and courses have asked for plugs. If you
would like to advertise in this guide, e-mail mark@markjoshi.com .
25. R EPRODUCTION
Please dont copy this guide on to your web-site. I am happy for you
to include extracts and a deep-link to it, however, and I will not move
the guides web location. The reason for this is that I update the guide
regularly and I do not want there to be lots of versions floating around
which I then have to police.
The Essential Algorithmic Trading Reading List
Michael Halls-Moore QuantStart.com
Thank you for signing up to the QuantStart mailing list and receiving the Algorithmic Trading
Toolbox. As part of the toolbox I wanted to provide a comprehensive reading list to help you get
up to speed with algorithmic trading. Algorithmic trading covers a broad range of topics and as such
it can be extremely confusing for a beginner to know where to start. For this reason I have labelled
each book as "Beginner" or "Advanced". If you have no prior background with algorithmic trading,
then I suggest consulting the beginner texts and work you way through to the advanced books.
Everyone who reads this list will have taken a very different educational path. Some of you may be
experienced discretionary traders who are interested in automating your strategies, but haven't
coded in a programming language or delved into advanced mathematics before. Others of you may
have a PhD in statistics or machine learning but have never applied your skills to the financial
markets. I have tried to create a "one size fits all" list, but obviously it will needed to be tailored to
your particular skillset and interests. I hope the list will be of interest to both retail traders who want
to "test the quantitative waters" as well as seasoned hedge fund professionals who are looking for a
new approach to their trading.
The approach I've taken is to introduce you to the necessary mathematics that will help you get up
to speed in creating your algorithmic trading strategies. You can of course skip these books if you
want to "dive in" or if you have an extensive mathematical background. However, if you haven't
taken a first year university level course in Probability, Calculus or Linear Algebra, you may find
the subsequent texts hard going.
I'm well aware that the length of the list can be off-putting to a beginner! Clearly it is unrealistic to
consider reading all of these books from cover-to-cover. There are only 24 hours in the day, after
all! In my own personal reading, I tend to concentrate on specific chapters of individual books. I re-
read those chapters multiple times when necessary. Knowing the basics extremely well is much
more important than having an encyclopaedic knowledge of all statistical machine learning and
time series models.
Necessary Mathematics
This is an optional section and is only suitable for those who have no university mathematics
background. In order to tackle this section you should be familiar with mathematics to a UK A-
Level or European International Baccalaureate (IB) level. I believe this is equivalent to senior high-
school mathematics in the US. In order to tackle these following books you should be familiar with
basic differentiation and integration techniques, trigonometry, and perhaps some exposure to
matrices and ordinary differential equations. If these topics suggested are unfamiliar to you, it
may be necessary to take some more elementary mathematics courses, perhaps from an online
MOOC site such as Coursera or Khan Academy, prior to tackling the books below.
The mathematics of quantitative trading differs significantly from that of derivative pricing, which
is also known as "mathematical finance", "financial engineering" or "quantitative finance".
Unfortunately all of these phrases are vague and only serve to confuse beginners coming into
finance! Derivatives pricing makes extensive use of upper undergraduate mathematics such as
partial differential equations, stochastic calculus, advanced linear algebra and vector analysis. There
is not a great deal of stochastic calculus in general algorithmic trading, unless you are considering
options or volatility trading, in which case you will need to be aware of stochastic calculus, the
Black Scholes model and its extensions.
Schaum's Outline of Probability and Statistics - John Schiller, R. Alu Srinivasan, Murray
Spiegel [BEGINNER]
If you have no probability or statistics background whatsoever, this is a great book with which to
gain familiarity. As I mention below, Schaum's Guides are great if you enjoy learning by working
through a lot of questions. This book begins with very elementary concepts in probability and
slowly leads up to basic intuition for frequentist statistical modelling via null hypothesis testing.
The Elements of Statistical Learning - Trevor Hastie, Robert Tibshirani, Jerome Friedman
[ADVANCED]
If I was forced to recommend only one book from the entire list presented here, this is the book I
would suggest. It is an absolutely exceptional book on how to create modern statistical machine
learning techniques. Note that this is not a beginner book! It requires a solid grounding in linear
algebra, calculus and probability. However, it presents and elucidates upon all of the necessary
issues and trade-offs that arise in creating machine learning models, as well as providing a solid
statistical basis for each model. Understanding this book will give you a "feel" for how to create
new models, as well as the limitations of machine learning. The best part is that the ebook version
can be found completely for free on the authors' website:
http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
Schaum's Outline of Statistics and Econometrics, 2nd Edition - Dominick Salvatore, Derrick
Reagle [BEGINNER]
For those of you who like the Q&A approach to self-study, the Schaum's Guides are fantastic. This
book in particular will take you from no statistical background whatsoever to a place where you can
carry out basic time series modelling. The format, as with all Schaum's Guides, is to learn by doing
a lot of questions, around half of which have model answers inline with the questions, while the rest
can be found at the back. I've read many of these books over the years and have always found them
to be a great way to learn.
General Trading
As I mentioned above, some of you may have little experience with the financial markets or
discretionary (i.e. non-algorithmic) trading. Hence I've listed some of the more useful trading texts
that will help you get a feel for how professional trading is carried out.
Market Wizards: Updated Interviews With Top Traders - Jack Schwager [BEGINNER]
A few of my discretionary trader friends who work in institutional settings said that this was the text
that they were given to read when they first started trading their own book. While the period of
coverage is well in the past now, the mentality of the traders and the pearls of wisdom gleaned
make this a worthwhile addition to the bookshelf. If you enjoy the interview style of this book then
there are also two other books in the series: The New Market Wizards and Hedge Fund Market
Wizards.
Following the Trend: Diversified Managed Futures Trading - Andreas Clenow [BEGINNER]
This is a more casual read from a practising professional futures trader. It describes the nuances of
how futures are traded in practice, with some basic trend following algorithms, along with a healthy
dose of real-world risk management techniques. Perhaps the most interesting aspect of the book is
the diary, which allows one to see how such trend-following models work in practice over certain
periods of time. It also briefly discusses how to run a trading firm from an entrepreneurial point of
view, for those considering a career in Commoditiy Trading Advisors (CTAs), an asset management
firm or hedge fund.
Algorithmic/Quantitative Trading
Finally we come to the process of creating algorithmic/quantitative trading models and
implementing them against live markets. Having read the prior books on mathematics, statistical
and time series modelling, as well as some basic trading concepts, you will be in a good position to
tie it all together to create live automated trading strategies.
Depending upon your programming expertise and the required level of automation and redundancy,
you will either make use of external vendor software such as MT4 (for forex) or create your own
custom end-to-end backtesting and trading system against a brokerage such as Interactive Brokers,
OANDA or Dukascopy. There aren't many books that really go into the detail of how to implement
an end-to-end trading system, but the following go quite far into discussing what you'll need to
know:
Quantitative Trading: How to Build Your Own Algorithmic Trading Business - Ern est Chan
[BEGINNER]
This is probably the best book to read as a beginner entering quantitative trading. Ernest Chan does
a great job of outlining all of the issues that will affect a retail quantitative trader. The book is not
heavy on particular strategies, but rather discusses the other important issues in quant trading such
as risk management, position sizing, portfolio management and how to run an algorithmic trading
business. All strategies and techniques are coded in MatLab.
Algorithmic Trading: Winning Strategies and Their Rationale - Ernest Chan [ADVANCED]
This book is a great follow-on from Chan's previous book. It provides many more trading strategies
and definitely shows how Chan's own experience has developed since the previous book. The book
is definitely more technical and you will need to be aware of basic time series analysis methods (or
at least how to understand them in the context of this book!) in order to get the most out of it. The
book is particularly good in discussing strategies for futures and forex, which are areas not often
discussed in algorithmic trading books. Once again, all models and trading code are implemented in
MatLab.
Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading - Rishi
Narang [ADVANCED]
This was one of the first books about institutional quantitative trading that I read when I started at a
quant fund in my first quant role. It is written for investors who are considering investing in
quantitative strategies and has been designed to provide an insight into all aspects of the "black
box" so that these investors can make informed decisions as to whether to invest. However, it also
provides a fantastic non-technical overview into how an entire quantitative trading strategy is set up
and carried out in practice. The second edition discusses high-frequency trading (HFT) in detail.
Algorithmic Trading and DMA: An Introduction to Direct Access Trading Strategies - Barry
Johnson [ADVANCED]
The phrase 'algorithmic trading', in the financial industry, usually refers to the execution algorithms
used by banks and brokers to execute efficient trades. I am using the term to cover not only those
aspects of trading, but also quantitative or systematic trading. This book is mainly about the former,
being written by Barry Johnson, who is a quantitative software developer at an investment bank.
Does this mean it is of no use to the retail quant? Not at all. Possessing a deeper understanding of
how exchanges work and "market microstructure" can aid immensely the profitability of retail
strategies. Despite it being a heavy tome, it is worth picking up.
Quantitative trader roles within large quant funds are often perceived to be one of the most
prestigious and lucrative positions in the quantitative finance employment landscape. Trading
careers in a "parent" fund are often seen as a springboard towards eventually allowing one to form
their own fund, with an initial capital allocation from the parent employer and a list of early
investors to bring on board.
Competition for quantitative trading positions is intense and thus a significant investment of time
and effort is necessary to obtain a career in quant trading. In this article I will outline the common
career paths, routes in to the field, the required background and a self-study plan to help both retail
traders and would-be professionals gain skills in quantitative trading.
Setting Expectations
Before we delve into the lists of textbooks and other resources, I will attempt to set some
expectations about what the role involves. Quantitative trading research is much more closely
aligned with scientific hypothesis testing and academic rigour than the "usual" perception of
investment bank traders and the associated bravado. There is very little (or non-existent)
discretionary input when carrying out quantitative trading as the processes are almost universally
automated.
The scientific method and hypothesis testing are highly-valued processes within the quant finance
community and as such anybody wishing to enter the field will need to have been trained in
scientific methodology. This often, but not exclusively, means training to a doctoral research level
- usually via having taken a PhD or graduate level Masters in a quantitative field. Although one can
break into quantitative trading at a professional level via alternate means, it is not common.
The skills required by a sophisticated quantitative trading researcher are diverse. An extensive
background in mathematics, probability and statistical testing provide the quantitative base on
which to build. An understanding of the components of quantitative trading is essential, including
forecasting, signal generation, backtesting, data cleansing, portfolio management and execution
methods. More advanced knowledge is required for time series analysis, statistical/machine learning
(including non-linear methods), optimisation and exchange/market microstructure. Coupled with
this is a good knowledge of programming, including how to take academic models and implement
them rapidly.
This is a significant apprenticeship and should not be entered into lightly. It is often said that it
takes 5-10 years to learn sufficient material to be consistently profitable at quantitative trading in a
professional firm. However the rewards are significant. It is a highly intellectual environment
with a very smart peer group. It will provide continuous challenges at a fast pace. It is extremely
well remunerated and provides many career options, including the ability to become an
entrepreneur by starting your own fund after demonstrating a long-term track record.
Necessary Background
It is common to consider a career in quantitative finance (and ultimately quantitative trading
research) while studying on a numerate undergraduate degree or within a specialised technical
doctorate. However, the following advice is applicable to those who may wish to transition into a
quant trading career from another, albeit with the caveat that it will take somewhat longer and will
involve extensive networking and a lot of self-study.
At the most basic level, professional quantitative trading research requires a solid understanding
of mathematics and statistical hypothesis testing. The usual suspects of multivariate calculus, linear
algebra and probability theory are all required. A good class-mark in an undergraduate course of
mathematics or physics from a well-regarded school will usually provide you with the necessary
background.
If you do not have a background in mathematics or physics then I would suggest that you should
pursue a degree course from a top school in one of those fields. You will be competing with
individuals who do have such knowledge and thus it will be highly challenging to gain a position at
a fund without some definitive academic credentials.
In addition to having a solid mathematical understanding it is necessary to be adept at
implementation of models, via computer programming. The common choices of modelling
languages these days include R, the open-source statistical language; Python, with its extensive data
analysis libraries; or MatLab. Gaining extensive familiarity with one of these packages is a
necessary prerequisite to becoming a quantitative trader. If you have an extensive background in
computer programming, you may wish to consider gaining entry into a fund via the Quantitative
Developer route.
The final major skill needed by quantitative trading researchers is that of being able to objectively
interpret new research and then implement it rapidly. This is a skill learned via doctoral training
and one of the reasons why PhD candidates from top schools are often the first to be picked for
quantitative trading positions. Gaining a PhD in one of the following areas (particularly machine
learning or optimisation) is a good way into a sophisticated quant fund.
The main techniques of interest include Multivariate Linear Regression, Logistic Regression,
Resampling Techniques, Tree-Based Methods (including Random Forests), Support Vector
Machines (SVM), Principal Component Analysis (PCA), Clustering (K-Means, Hierarchical),
Kernal Methods and Neural Networks. Each of these topics is a significant learning exercise in
itself, although the above two texts will cover the necessary introductory material, providing further
references for deeper study.
A particularly useful (and free!) set of web courses on Machine Learning/AI are provided by
Coursera:
Machine Learning by Andrew Ng - This course covers the basics of the methods I have
briefly mentioned above. It has received high praise from individuals who have participated.
It is probably best watched as a companion to reading ISL or ESL, which are two books
mentioned in the Essential Algorithmic Trading Reading List PDF.
Neural Networks for Machine Learning by Geoffrey Hinton - This course focuses primarily
on neural networks, which have a long history of association with quantitative finance. If
you wish to specifically concentrate on this area, then this course is worth taking a look at,
in conjunction with a solid textbook.
Statistical learning is extremely important in quant trading research. We can bring to bear the entire
weight of the scientific method and hypothesis testing in order to rigourously assess the quant
trading research process. For quantitative trading we are interested in testable, repeatable results
that are subject to constant scrutiny. This allows easy replacement of trading strategies as and when
performance degrades. Note that this is in stark contrast to the approach taken in "discretionary"
trading where performance and risk are not often assessed in this manner.
Why Should We Use The Scientific Method In Quantitative Trading?
The statistical approach to quant trading is designed to eliminate issues that surround discretionary
methods. A great deal of discretionary technical trading is rife with cognitive biases, including loss
aversion, confirmation bias and the bandwagon effect. Quant trading research uses alternative
mathematical methods to mitigate such behaviours and thus enhance trading performance.
In order to carry out such a methodical process quant trading researchers possess a continuously
skeptical mindset and any strategy ideas or hypotheses about market behaviour are subject to
continual scrutiny. A strategy idea will only be put into a "production" environment after extensive
statistical analysis, testing and refinement. This is necessary because the market has a rather low
signal-to-noise ratio. This creates difficulties in forecasting and thus leads to a challenging trading
environment.
What Modelling Problems Do We Encounter In Quantitative Finance?
The goal of quantitative trading research is to produce algorithms and technology that can satisfy a
certain investment mandate. In practice this translates into creating trading strategies (and related
infrastructure) that produce consistent returns above a certain pre-determined benchmark, net of
costs associated with the trading transactions, while minimising "risk". Hence there are a few levers
that can be pulled to enhance the financial objectives.
A great deal of attention is often given to the signal/alpha generator, i.e. "the strategy". The best
funds and retail quants will spend a significant amount of time modelling/reducing transaction
costs, effectively managing risk and determining the optimal portfolio. This PDF is primarily aimed
at the alpha generator component of the stack, but please be aware that the other components are of
equal importance if successful long-term strategies are to be carried out.
We will now investigate problems encountered in signal generation and how to solve them. The
following is a basic list of such methods (which clearly overlap) that are often encountered in signal
generation problems:
Forecasting/Prediction - The most common technique is direct forecasting of a financial
asset price/direction based on prior prices (or fundamental factors). This usually involves
detection of an underlying signal in the "noise" of the market that can be predicted and thus
traded upon. It might also involve regressing against other factors (including lags in the
original time series) in order to assess the future response against future predictors.
Clustering/Classification - Clustering or classification techniques are methods designed to
group data into certain classes. These can be binary in nature, e.g. "up" or "down", or
multiply-grouped, e.g. "weak volatility", "strong volatility", "medium volatility".
Sentiment Analysis - More recent innovations in natural language processing and
computational speed have lead to sophisticated "sentiment analysis" techniques, which are
essentially a classification method, designed to group data based on some underlying
sentiment factors. These could be directional in nature, e.g. "bullish", "bearish", "neutral" or
emotional such as "happy", "sad", "positive" or "negative". Ultimately this will lead to a
trading signal of some form.
Big Data - Alternative sources of data, such as consumer social media activities, often lead
to terabytes (or greater) of data that requires more novel software/hardware in order to
interpret. New algorithm implementations have been created in order to handle such "big
data".
Modelling Methodology
I've provided some key Machine Learning textbooks in the accompanying PDF The Essential
Algorithmic Trading Reading List and they will discuss the following topics and models, which
are necessary for a beginning quant trader to know:
Statistical Modelling and Limitations - The books will outline what statistical learning is
and isn't capable of along with the tradeoffs that are necessary when carrying out such
research. The difference between prediction and inference is outlined as well as the
difference between supervised and unsupervised learning. The bias-variance tradeoff is also
explained in detail.
Linear Regression - Linear regression (LR) is one of the simplest supervised learning
techniques. It assumes a model where the predicted values are a linear function of the
predictor variable(s). While this may seem simplistic compared to the remaining methods in
this list, linear regression is still widely utilised in the financial industry. Being aware of LR
is important in order to grasp the later methods, some of which are generalisations of LR.
Supervised Classification: Logistic Regression, LDA, QDA, KNN - Supervised
classification techniques such as Logistic Regression, Linear/Quadratic Discriminant
Analysis and K-Nearest Neighbours are techniques for modelling qualitative classification
situations, such as prediction of whether a stock index will move up or down (i.e. a binary
value) in the next time period.
Resampling Techniques: Bootstrapping, Cross-Validation - Resampling techniques are
necessary in quantitative finance (and statistics in general) because of the dangers of model-
fitting. Such techniques are used to ascertain how a model behaves over different training
sets and how to minimise the problem of "overfitting" models.
Decision Tree Methods: Bagging, Random Forests - Decision trees are a type of graph
that are often employed in classification settings. Bagging and Random Forest techniques
are ensemble methods making use of such trees to reduce overfitting and reduce variance in
individually fitted supervised learning methods.
Neural Networks - Artificial Neural Networks (ANN) are a machine learning technique
often employed in a supervised manner to find non-linear relationships between predictors
and responses. In the financial domain they are often used for time series prediction and
forecasting.
Support Vector Machines - SVMs are also classification or regression tools, which work
by constructing a hyperplane in high or infinite dimensonal spaces. The kernel trick allows
non-linear classification to occur by a mapping of the original space into an inner-product
space.
Unsupervised Methods: PCA, K-Means, Hierarchical Clustering, NNMF - Unsupervised
learning techniques are designed to find hidden structure in data, without the use of an
objective or reward function to "train" on. Additionally, unsupervised techniques are often
used to pre-process data.
Ensemble Methods - Ensemble methods make use of multiple separate statistical learning
models in order to achieve greater predictive capability than could be achieved from any of
the individual models.
Lesson 1: Beginner's Guide to Quantitative Trading
In the first lesson in the quantitative trading email series I want to introduce you to some of the basic concepts
which accompany an end-to-end quantitative trading system.
This email will hopefully serve two audiences. The first will be individuals trying to obtain a job at a fund as a
quantitative trader. The second will be individuals who wish to try and set up their own "retail" algorithmic trading
business.
Quantitative trading is an extremely sophisticated area of quant finance. It can take a significant amount of time to
gain the necessary knowledge to pass an interview or construct your own trading strategies.
Not only that but it requires programming expertise in a language such as MATLAB, R, Python or C#. However as
the trading frequency of the strategy increases, the technological aspects become much more relevant. Thus being
familiar with C/C++ will be of paramount importance.
Strategy Identification - Finding a strategy, exploiting an edge and deciding on trading frequency
Strategy Backtesting - Obtaining data, analysing strategy performance and removing biases
Execution System - Linking to a brokerage, automating the trading and minimising transaction costs
Risk Management - Optimal capital allocation, "bet size"/Kelly criterion and trading psychology
Strategy Identification
This research process encompasses finding a strategy, seeing whether the strategy fits into a portfolio of other
strategies you may be running, obtaining any data necessary to test the strategy and trying to optimise the strategy for
higher returns and/or lower risk.
You will need to factor in your own capital requirements if running the strategy as a "retail" trader and how any
transaction costs will affect the strategy.
Contrary to popular belief it is actually quite straightforward to find profitable strategies through various public sources.
Academics regularly publish theoretical trading results (albeit mostly gross of transaction costs). Quantitative finance
blogs will discuss strategies in detail. Trade journals will outline some of the strategies employed by funds.
You might question why individuals and firms are keen to discuss their profitable strategies, especially when they
know that others "crowding the trade" may stop the strategy from working in the long term.
The reason lies in the fact that they will not often discuss the exact parameters and tuning methods that they have
carried out. These optimisations are the key to turning a relatively mediocre strategy into a highly profitable one.
In fact, one of the best ways to create your own unique strategies is to find similar methods and then carry out your
own optimisation procedure.
Quantivity - quantivity.wordpress.com
Many of the strategies you will look at will fall into the categories of mean-reversion and trend-
following/momentum.
A mean-reverting strategy is one that attempts to exploit the fact that a long-term mean on a "price series", such as
the spread between two correlated assets, exists and that short term deviations from this mean will eventually revert.
A momentum strategy attempts to exploit both investor psychology and big fund structure by "hitching a ride" on a
market trend, which can gather momentum in one direction, and follow the trend until it reverses.
Another hugely important aspect of quantitative trading is the frequency of the trading strategy. Low frequency
trading (LFT) generally refers to any strategy which holds assets longer than a trading day.
Correspondingly, high frequency trading (HFT) generally refers to a strategy which holds assets intraday.
Ultra-high frequency trading (UHFT) refers to strategies that hold assets on the order of seconds and milliseconds.
As a retail practitioner HFT and UHFT are certainly possible, but only with detailed knowledge of the trading
"technology stack" and order book dynamics.
Once a strategy, or set of strategies, has been identified it now needs to be tested for profitability on historical data.
That is the domain of backtesting.
Strategy Backtesting
The goal of backtesting is to provide evidence that the strategy identified via the above process is profitable when
applied to both historical and out-of-sample data. This sets the expectation of how the strategy will perform in the
"real world".
However, backtesting is NOT a guarantee of success, for various reasons. It is perhaps the most subtle area of
quantitative trading since it entails numerous biases, which must be carefully considered and eliminated as much as
possible.
We will discuss the common types of bias including look-ahead bias, survivorship bias and optimisation bias (also
known as "data-snooping" bias).
Other areas of importance within backtesting include availability and cleanliness of historical data, factoring in realistic
transaction costs and deciding upon a robust backtesting platform. We'll discuss transaction costs further in the
Execution Systems section below.
Once a strategy has been identified, it is necessary to obtain the historical data through which to carry out testing
and, perhaps, refinement.
There are a significant number of data vendors across all asset classes. Their costs generally scale with the quality,
depth and timeliness of the data.
The traditional starting point for beginning quant traders (at least at the retail level) is to use the free data set from
Yahoo Finance. I won't dwell on providers too much here, rather I would like to concentrate on the general issues
when dealing with historical data sets.
The main concerns with historical data include accuracy/cleanliness, survivorship bias and adjustment for corporate
actions such as dividends and stock splits:
Accuracy pertains to the overall quality of the data - whether it contains any errors. Errors can sometimes be easy to
identify, such as with a spike filter, which will pick out incorrect "spikes" in time series data and correct for them. At
other times they can be very difficult to spot. It is often necessary to have two or more providers and then check all of
their data against each other.
Survivorship bias is often a "feature" of free or cheap datasets. A dataset with survivorship bias means that it does
not contain assets which are no longer trading. In the case of equities this means delisted/bankrupt stocks. This bias
means that any stock trading strategy tested on such a dataset will likely perform better than in the "real world" as the
historical "winners" have already been preselected.
Corporate actions include "logistical" activities carried out by the company that usually cause a step-function change
in the raw price, that should not be included in the calculation of returns of the price. Adjustments for dividends and
stock splits are the common culprits. A process known as back adjustment is necessary to be carried out at each one
of these actions. One must be very careful not to confuse a stock split with a true returns adjustment. Many a trader
has been caught out by a corporate action!
In order to carry out a backtest procedure it is necessary to use a software platform. You have the choice between
dedicated backtest software, such as Tradestation, a numerical platform such as Excel or MATLAB or a full custom
implementation in a programming language such as Python or C++.
I won't dwell too much on Tradestation, Excel or MATLAB, as I believe in creating a full in-house technology stack for
reasons outlined below.
One of the benefits of doing so is that the backtest software and execution system can be tightly integrated, even with
extremely advanced statistical strategies. For HFT strategies in particular it is essential to use a custom
implementation.
When backtesting a system one must be able to quantify how well it is performing. The "industry standard" metrics for
quantitative strategies are the maximum drawdown and the Sharpe Ratio.
The maximum drawdown characterises the largest peak-to-trough drop in the account equity curve over a particular
time period, usually annual. This is most often quoted as a percentage.
LFT strategies will tend to have larger drawdowns than HFT strategies, due to a number of statistical factors. A
historical backtest will show the past maximum drawdown, which is a good guide for the future drawdown
performance of the strategy.
The second measurement is the Sharpe Ratio, which is heuristically defined as the average of the excess returns
divided by the standard deviation of those excess returns. Here, excess returns refers to the return of the strategy
above a pre-determined benchmark, such as the S&P500 or a 3-month Treasury Bill.
Once a strategy has been backtested and is deemed to be free of biases (in as much as that is possible!), with a good
Sharpe and minimised drawdowns, it is time to build an execution system.
Execution Systems
An execution system is the means by which the list of trades generated by the strategy are sent and executed by the
broker.
Despite the fact that the trade generation can be semi- or even fully-automated, the execution mechanism can be
manual, semi-manual (i.e. "one click") or fully automated.
For LFT strategies, manual and semi-manual techniques are common. For HFT strategies it is necessary to create a
fully automated execution mechanism, which will often be tightly coupled with the trade generator due to the
interdependence of strategy and technology.
The key considerations when creating an execution system are the interface to the brokerage, minimisation of
transaction costs (including commission, slippage and the spread) and divergence of performance of the live
system from backtested performance.
There are many ways to interface to a brokerage. They range from calling up your broker on the telephone right
through to a fully-automated high-performance Application Programming Interface (API).
Ideally you want to automate the execution of your trades as much as possible. This frees you up to concentrate on
further research, as well as allow you to run multiple strategies or even strategies of higher frequency.
The common backtesting software outlined above, such as MATLAB, Excel and Tradestation are good for lower
frequency, simpler strategies. However it will be necessary to construct an in-house execution system written in a
high performance language such as C++ in order to do any real HFT.
As an anecdote, in the fund I used to be employed at, we had a 10 minute "trading loop" where we would download
new market data every 10 minutes and then execute trades based on that information in the same time frame. This
was using an optimised Python stack. For anything approaching minute- or second-frequency data, I believe C/C++
would be more ideal.
In a larger fund it is often not the domain of the quant researcher to optimise execution. However in smaller shops or
HFT firms, the traders ARE the executors and so a much wider skillset is often desirable.
Bear that in mind if you wish to be employed by a fund. Your programming skills will be as important, if not more so,
than your statistics and time series talents!
Another major issue which falls under the banner of execution is that of transaction cost minimisation.
There are generally three components to transaction costs: Commissions (or tax), which are the fees charged by the
brokerage, the exchange and the SEC (or similar governmental regulatory body); slippage, which is the difference
between what you intended your order to be filled at versus what it was actually filled at; spread, which is the
difference between the bid/ask price of the security being traded.
Note that the spread is NOT constant and is dependent upon the current liquidity (i.e. availability of buy/sell orders) in
the market.
Transaction costs can make the difference between an extremely profitable strategy with a good Sharpe ratio and an
extremely unprofitable strategy with a terrible Sharpe ratio.
It can be a challenge to correctly predict transaction costs from a backtest. Depending upon the frequency of the
strategy, you will need access to historical exchange data, which will include tick data for bid/ask prices.
Entire teams of quants are dedicated to optimisation of execution in the larger funds for these reasons.
Consider the scenario where a fund needs to offload a substantial quantity of trades, of which the reasons to do so are
many and varied! By "dumping" so many shares onto the market, they will rapidly depress the price and may not
obtain optimal execution.
Hence algorithms which "drip feed" orders onto the market exist, although then the fund runs the risk of slippage.
Further to that, other strategies "prey" on these necessities and can exploit the inefficiencies. This is the domain of
fund structure arbitrage.
The final major issue for execution systems concerns divergence of strategy performance from backtested
performance.
This can happen for a number of reasons. We've already discussed look-ahead bias and optimisation bias in depth
when considering backtests.
However, some strategies do not make it easy to test for these biases prior to deployment. This occurs in HFT most
predominantly. There may be bugs in the execution system as well as the trading strategy itself that do not show up
on a backtest but DO show up in live trading.
The market may have been subject to a regime change subsequent to the deployment of your strategy. New
regulatory environments, changing investor sentiment and macroeconomic phenomena can all lead to divergences in
how the market behaves and thus the profitability of your strategy.
Risk Management
The final piece to the quantitative trading puzzle is the process of risk management.
"Risk" includes all of the previous biases we have discussed. It includes technology risk, such as servers co-located
at the exchange suddenly developing a hard disk malfunction. It includes brokerage risk, such as the broker
becoming bankrupt (not as crazy as it sounds, given the recent scare with MF Global!).
In short it covers nearly everything that could possibly interfere with the trading implementation, of which there are
many sources. Whole books are devoted to risk management for quantitative strategies so I wont't attempt to
elucidate on all possible sources of risk here.
Risk management also encompasses what is known as optimal capital allocation, which is a branch of portfolio
theory. This is the means by which capital is allocated to a set of different strategies and to the trades within those
strategies. It is a complex area and relies on some non-trivial mathematics.
The industry standard by which optimal capital allocation and leverage of the strategies are related is called the Kelly
criterion. The Kelly criterion makes some assumptions about the statistical nature of returns, which do not often hold
true in financial markets, so traders are often conservative when it comes to the implementation.
Another key component of risk management is in dealing with one's own psychological profile. There are
manycognitive biases that can creep in to trading. Although this is admittedly less problematic with algorithmic
trading if the strategy is left alone!
A common bias is that of loss aversion where a losing position will not be closed out due to the pain of having to
realise a loss. Similarly, profits can be taken too early because the fear of losing an already gained profit can be too
great.
Another common bias is known as recency bias. This manifests itself when traders put too much emphasis on recent
events and not on the longer term.
Then of course there are the classic pair of emotional biases - fear and greed. These can often lead to under- or over-
leveraging, which can cause blow-up, which is when the account equity heads to zero (or worse!), or reduced profits.
Summary
As can be seen, quantitative trading is an extremely complex, albeit very interesting, area of quantitative finance. I
have literally scratched the surface of the topic in this email and it is already getting rather long!
Whole books and papers have been written about issues which I have only given a sentence or two towards. For that
reason, before applying for quantitative fund trading jobs, it is necessary to carry out a significant amount of
groundwork study.
At the very least you will need a good background in statistics and time series analysis, with a lot of experience in
implementation, via a programming language such as MATLAB, Python or R.
For more sophisticated strategies at the higher frequency end, your skill set is likely to include Linux kernel
modification, C/C++, assembly programming and network latency optimisation.
If you are interested in trying to create your own algorithmic trading strategies, my first suggestion would be to get
good at programming.
My preference is to build as much of the data grabber, strategy backtester and execution system by yourself as
possible. If your own capital is on the line, wouldn't you sleep better at night knowing that you have fully tested your
system and are aware of its pitfalls and particular issues?
Outsourcing this to a vendor, while potentially saving time in the short term, could be extremely expensive in the long-
term.
In the next lesson we are going to look at the topic of How To Identify Algorithmic Trading Strategies.
Our goal today is to understand in detail how to find, evaluate and select such systems.
I'll explain how identifying strategies is as much about personal preference as it is about strategy performance, how
to determine the type and quantity of historical data for testing, how to dispassionately evaluate a trading strategy
and finally how to proceed towards the backtesting phase and strategy implementation.
In order to be a successful trader - either discretionally or algorithmically - it is necessary to ask yourself some
honest questions. Trading provides you with the ability to lose money at an alarming rate, so it is necessary to "know
thyself" as much as it is necessary to understand your chosen strategy.
I would say the most important consideration in trading is being aware of your own personality. Trading, and
algorithmic trading in particular, requires a significant degree of discipline, patience and emotional detachment.
Since you are letting an algorithm perform your trading for you, it is necessary to be resolved not to interfere with
However, many strategies that have been shown to be highly profitable in a backtest can be ruined by simple
interference. Understand that if you wish to enter the world of algorithmic trading you will be emotionally tested and
These questions will help determine the frequency of the strategy that you should seek. For those of you in full time
employment, an intraday futures strategy may not be appropriate, at least until it is fully automated!
Your time constraints will also dictate the methodology of the strategy. If your strategy is frequently traded and reliant
on expensive news feeds, such as a Bloomberg terminal, you will clearly have to be realistic about your ability to
For those of you with a lot of time, or the skills to automate your strategy, you may wish to look into a more technical
My belief is that it is necessary to carry out continual research into your trading strategies to maintain a consistently
Hence a significant portion of the time allocated to trading will be in carrying out ongoing research. Ask yourself
whether you are prepared to do this, as it can be the difference between strong profitability or a slow decline towards
losses.
You also need to consider your trading capital. The generally accepted ideal minimum amount for a quantitative
strategy is 50,000 USD (approximately 35,000 for us in the UK). If I was starting again, I would begin with a larger
This is because transaction costs can be extremely expensive for mid- to high-frequency strategies and it is
If you are considering beginning with less than 10,000 USD then you will need to restrict yourself to low-frequency
strategies, trading in one or two assets, as transaction costs will rapidly eat into your returns.
Interactive Brokers, which is one of the friendliest brokers to those with programming skills, due to its API, has a retail
in a programming language such as C++, Java, C#, Python or R will enable you to create the end-to-end data storage,
This has a number of advantages, chief of which is the ability to be completely aware of all aspects of the trading
infrastructure. It also allows you to explore the higher frequency strategies as you will be in full control of your
"technology stack".
While this means that you can test your own software and eliminate bugs, it also means more time spent coding up
infrastructure and less on implementing strategies, at least in the earlier part of your algo trading career.
You may find that you are comfortable trading in Excel or MATLAB and can outsource the development of other
components. I would not recommend this however, particularly for those trading at high frequency.
You also need to ask yourself what you hope to achieve by algorithmic trading.
Are you interested in a regular income, whereby you hope to draw earnings from your trading account? Or, are you
interested in a long-term capital gain and can afford to trade without the need to drawdown funds?
Income dependence will dictate the frequency of your strategy. More regular income withdrawals will require a higher
frequency trading strategy with less volatility (i.e. a higher Sharpe ratio). Long-term traders can afford a more sedate
trading frequency.
Finally, do not be deluded by the notion of becoming extremely wealthy in a short space of time! Algo trading is NOT
a get-rich-quick scheme - if anything it can be a become-poor-quick scheme. It takes significant discipline, research,
diligence and patience to be successful at algorithmic trading. It can take months, if not years, to generate consistent
profitability.
Never have trading ideas been more readily available than they are today. Academic finance journals, pre-print
servers, trading blogs, trading forums, weekly trading magazines and specialist texts provide thousands of trading
Our goal as quantitative trading researchers is to establish a strategy pipeline that will provide us with a stream of
ongoing trading ideas. Ideally we want to create a methodical approach to sourcing, evaluating and implementing
The aims of the pipeline are to generate a consistent quantity of new ideas and to provide us with a framework for
rejecting the majority of these ideas with the minimum of emotional consideration.
We must be extremely careful not to let cognitive biases influence our decision making methodology. This could be as
simple as having a preference for one asset class over another (gold and other precious metals come to mind)
Our goal should always be to find consistently profitable strategies, with positive expectation. The choice of asset
class should be based on other considerations, such as trading capital constraints, brokerage fees and leverage
capabilities.
If you are completely unfamiliar with the concept of a trading strategy then the first place to look is with established
textbooks.
Classic texts provide a wide range of simpler, more straightforward ideas, with which to familiarise yourself with
quantitative trading. Here is a selection that I recommend for those who are new to quantitative trading, which
Quantitative Trading: How to Build Your Own Algorithmic Trading Business (Wiley Trading) - Ernest Chan
Algorithmic Trading and DMA: An introduction to direct access trading strategies - Barry Johnson
Option Volatility & Pricing: Advanced Trading Strategies and Techniques - Sheldon Natenberg
For a longer list of quantitative trading books, please visit the QuantStart reading list.
The next place to find more sophisticated strategies is with trading forums and trading blogs.
However, a note of caution: Many trading blogs rely on the concept of technical analysis. Technical analysis involves
utilising basic indicators and behavioural psychology to determine trends or reversal patterns in asset prices.
Despite being extremely popular in the overall trading space, technical analysis is considered somewhat ineffective in
Some have suggested that it is no better than reading a horoscope or studying tea leaves in terms of its predictive
power!
In reality there are successful individuals making use of technical analysis. However, as quants with a more
sophisticated mathematical and statistical toolbox at our disposal, we can easily evaluate the effectiveness of
such "TA-based" strategies and make data-driven decisions rather than base ours on emotional considerations or
preconceptions.
Quantivity
Quantopian
Quantpedia
Wealth Lab
Nuclear Phynance
Wilmott Forums
Once you have had some experience at evaluating simpler strategies, it is time to look at the more sophisticated
academic offerings.
Some academic journals will be difficult to access, without high subscriptions or one-off costs. If you are a member
or alumnus of a university, you should be able to obtain access to some of these financial journals.
Otherwise, you can look at pre-print servers, which are internet repositories of late drafts of academic papers that are
undergoing peer review. Since we are only interested in strategies that we can successfully replicate, backtest and
The major downside of academic strategies is that they can often either be out of date, require obscure and
expensive historical data, trade in illiquid asset classes or do not factor in fees, slippage or spread.
It can also be unclear whether the trading strategy is to be carried out with market orders, limit orders or whether it
contains stop losses etc. Thus it is absolutely essential to replicate the strategy yourself as best you can, backtest it
and add in realistic transaction costs that include as many aspects of the asset classes that you wish to trade in.
Here is a list of the more popular pre-print servers and financial journals that you can source ideas from:
arXiv
SSRN
Mathematical Finance
What about forming your own quantitative strategies? This generally requires (but is not limited to) expertise in one or
Market microstructure - For higher frequency strategies in particular, one can make use of market microstructure,
i.e. understanding of the order book dynamics in order to generate profitability. Different markets will have various
technology limitations, regulations, market participants and constraints that are all open to exploitation via specific
strategies. This is a very sophisticated area and retail practitioners will find it hard to be competitive in this space,
particularly as the competition includes large, well-capitalised quantitative hedge funds with strong technological
capabilities.
Fund structure - Pooled investment funds, such as pension funds, private investment partnerships (hedge funds),
commodity trading advisors and mutual funds are constrained both by heavy regulation and their large capital
reserves. Thus certain consistent behaviours can be exploited with those who are more nimble. For instance, large
funds are subject to capacity constraints due to their size. Thus if they need to rapidly offload (sell) a quantity of
securities, they will have to stagger it in order to avoid "moving the market". Sophisticated algorithms can take
advantage of this, and other idiosyncrasies, in a general process known as fund structure arbitrage.
Machine learning/artificial intelligence - Machine learning algorithms have become more prevalent in recent years
in financial markets. Classifiers (such as Naive-Bayes, et al.) non-linear function matchers (neural networks) and
optimisation routines (genetic algorithms) have all been used to predict asset paths or optimise trading strategies. If
you have a background in this area you may have some insight into how particular algorithms might be applied to
certain markets.
There are, of course, many other areas for quants to investigate. We'll discuss how to come up with custom strategies
By continuing to monitor these sources on a weekly, or even daily, basis you are setting yourself up to receive a
The next step is to determine how to reject a large subset of these strategies in order to minimise wasting your time
The first, and arguably most obvious consideration is whether you actually understand the strategy.
Would you be able to explain the strategy concisely or does it require a string of caveats and endless parameter lists?
In addition, does the strategy have a good, solid basis in reality? For instance, could you point to some behavioural
rationale or fund structure constraint that might be causing the pattern(s) you are attempting to exploit?
Would this constraint hold up to a regime change, such as a dramatic regulatory environment disruption?
Does the strategy rely on complex statistical or mathematical rules? Does it apply to any financial time series or is
You should constantly be thinking about these factors when evaluating new trading methods, otherwise you may
waste a significant amount of time attempting to backtest and optimise unprofitable strategies.
Once you have determined that you understand the basic principles of the strategy you need to decide whether it fits
handle more significant periods of drawdown, or are willing to accept greater risk for larger return.
Despite the fact that we, as quants, try and eliminate as much cognitive bias as possible and should be able to
Thus we need a consistent, unemotional means through which to assess the performance of strategies. Here is the
Methodology - Is the strategy momentum based, mean-reverting, market-neutral, directional? Does the strategy rely
on sophisticated (or complex!) statistical or machine learning techniques that are hard to understand and require a
PhD in statistics to grasp? Do these techniques introduce a significant quantity of parameters, which might lead to
optimisation bias? Is the strategy likely to withstand a regime change (i.e. potential new regulation of financial
markets)?
Sharpe Ratio - The Sharpe ratio heuristically characterises the reward/risk ratio of the strategy. It quantifies how
much return you can achieve for the level of volatility endured by the equity curve. Naturally, we need to determine the
period and frequency that these returns and volatility are measured over. A higher frequency strategy will require a
greater sampling rate of standard deviation, but a shorter overall time period of measurement, for instance.
Leverage - Does the strategy require significant leverage in order to be profitable? Does the strategy necessitate the
use of leveraged derivatives contracts (futures, options, swaps) in order to make a return? These leveraged contracts
can have heavy volatility characteristics and thus can easily lead to margin calls. Do you have the trading capital and
the temperament for such volatility?
Frequency - The frequency of the strategy is intimately linked to your technology stack and thus technological
expertise, the Sharpe ratio and overall level of transaction costs. All other issues considered, higher frequency
strategies require more capital, are more sophisticated and harder to implement. However, assuming your backtesting
engine is sophisticated and bug-free, they will often have far higher Sharpe ratios.
Volatility - Volatility is related strongly to the "risk" of the strategy. The Sharpe ratio characterises this. Higher volatility
of the underlying asset classes, if unhedged, often leads to higher volatility in the equity curve and thus smaller
Sharpe ratios. I am of course assuming that the positive volatility is approximately equal to the negative volatility.
Some strategies may have greater downside volatility. You need to be aware of these attributes.
Win/Loss, Average Profit/Loss - Strategies will differ in their win/loss and average profit/loss characteristics. One
can have a very profitable strategy, even if the number of losing trades exceed the number of winning trades.
Momentum strategies tend to have this pattern as they rely on a small number of "big hits" in order to be profitable.
Mean-reversion strategies tend to have opposing profiles where more of the trades are "winners", but the losing trades
can be quite severe.
Maximum Drawdown - The maximum drawdown is the largest overall peak-to-trough percentage drop on the equity
curve of the strategy. Momentum strategies are well known to suffer from periods of extended drawdowns (due to a
string of many incremental losing trades). Many traders will give up in periods of extended drawdown, even if historical
testing has suggested this is "business as usual" for the strategy. You will need to determine what percentage of
drawdown (and over what time period) you can accept before you cease trading your strategy. This is a highly
personal decision and thus must be considered carefully.
Capacity/Liquidity - At the retail level, unless you are trading in a highly illiquid instrument (like a small-cap stock),
you will not have to concern yourself greatly with strategy capacity. Capacity determines the scalability of the strategy
to further capital. Many of the larger hedge funds suffer from significant capacity problems as their strategies increase
in capital allocation.
Parameters - Certain strategies, especially those found in the machine learning community, require a large quantity of
parameters. Every extra parameter that a strategy requires leaves it more vulnerable to optimisation bias (also known
as "curve-fitting"). You should try and target strategies with as few parameters as possible or make sure you have
sufficient quantities of data with which to test your strategies on.
Benchmark - Nearly all strategies, unless characterised as "absolute return", are measured against some
performance benchmark. The benchmark is usually an index that characterises a large sample of the underlying asset
class that the strategy trades in. If the strategy trades large-cap US equities, then the S&P500 would be a natural
benchmark to measure your strategy against. You will hear the terms "alpha" and "beta", applied to strategies of this
type.
Notice that we have not discussed the actual returns of the strategy. Why is this? In isolation, the returns actually
They don't give you an insight into leverage, volatility, benchmarks or capital requirements. Thus strategies are
rarely judged on their returns alone. Always consider the risk attributes of a strategy before looking at the returns.
At this stage many of the strategies found from your pipeline will be rejected out of hand, since they won't meet your
The strategies that do remain can now be considered for backtesting. However, before this is possible, it is necessary
to consider one final rejection criteria - that of available historical data on which to test these strategies.
Nowadays, the breadth of the technical requirements across asset classes for historical data storage is substantial.
In order to remain competitive, both the buy-side (funds) and sell-side (investment banks) invest heavily in their
In particular, we are interested in timeliness, accuracy and storage requirements. I will now outline the basics of
be writing a lot more about this in the future as my prior industry experience in the financial industry was chiefly
In the previous section we had set up a strategy pipeline that allowed us to reject certain strategies based on our own
personal rejection criteria. In this section we will filter more strategies based on our own preferences for obtaining
historical data.
The chief considerations (especially at retail practitioner level) are the costs of the data, the storage requirements
and your level of technical expertise. We also need to discuss the different types of available data and the different
Let's begin by discussing the types of data available and the key issues we will need to think about:
Fundamental Data - This includes data about macroeconomic trends, such as interest rates, inflation figures,
corporate actions (dividends, stock-splits), SEC filings, corporate accounts, earnings figures, crop reports,
meteorological data etc. This data is often used to value companies or other assets on a fundamental basis, i.e. via
some means of expected future cash flows. It does not include stock price series. Some fundamental data is freely
available from government websites. Other long-term historical fundamental data can be extremely expensive.
Storage requirements are often not particularly large, unless thousands of companies are being studied at once.
News Data - News data is often qualitative in nature. It consists of articles, blog posts, microblog posts ("tweets") and
editorial. Machine learning techniques such as classifiers are often used to interpretsentiment. This data is also often
freely available or cheap, via subscription to media outlets. The newer "NoSQL" document storage databases are
designed to store this type of unstructured, qualitative data.
Asset Price Data - This is the traditional data domain of the quant. It consists of time series of asset prices. Equities
(stocks), fixed income products (bonds), commodities and foreign exchange prices all sit within this class. Daily
historical data is often straightforward to obtain for the simpler asset classes, such as equities. However, once
accuracy and cleanliness are included and statistical biases removed, the data can become expensive. In addition,
time series data often possesses significant storage requirements especially when intraday data is considered.
Financial Instruments - Equities, bonds, futures and the more exotic derivative options have very different
characteristics and parameters. Thus there is no "one size fits all" database structure that can accommodate them.
Significant care must be given to the design and implementation of database structures for various financial
instruments. We will discuss the situation at length when we come to build a securities master database in future
emails.
Frequency - The higher the frequency of the data, the greater the costs and storage requirements. For low-frequency
strategies, daily data is often sufficient. For high frequency strategies, it might be necessary to obtain tick-level data
and even historical copies of particular trading exchange order book data. Implementing a storage engine for this type
of data is very technologically intensive and only suitable for those with a strong programming/technical background.
Benchmarks - The strategies described above will often be compared to a benchmark. This usually manifests itself
as an additional financial time series. For equities, this is often a national stock benchmark, such as the S&P500 index
(US) or FTSE100 (UK). For a fixed income fund, it is useful to compare against a basket of bonds or fixed income
products. The "risk-free rate" (i.e. appropriate interest rate) is also another widely accepted benchmark. All asset class
categories possess a favoured benchmark, so it will be necessary to research this based on your particular strategy, if
you wish to gain interest in your strategy externally.
Technology - The technology stacks behind a financial data storage centre are complex. This email can only scratch
the surface about what is involved in building one. However, it does centre around a database engine, such as a
Relational Database Management System (RDBMS), such as PostgreSQL, MySQL, SQL Server, Oracle or a
Document Storage Engine (i.e. "NoSQL"). This is accessed via "business logic" application code that queries the
database and provides access to external tools, such as MATLAB, R or Excel. Often this business logic is written in
C++, C#, Java or Python. You will also need to host this data somewhere, either on your own personal computer, or
remotely via internet servers. Products such as Amazon Web Services have made this simpler and cheaper in recent
years, but it will still require significant technical expertise to achieve in a robust manner.
As can be seen, once a strategy has been identified via the pipeline it will be necessary to evaluate the availability,
You may find it is necessary to reject a strategy based solely on historical data considerations. This is a big area and
teams of PhDs work at large funds making sure pricing is accurate and timely. Do not underestimate the difficulties
I do want to say, however, that many backtesting platforms can provide this data for you automatically - at a cost.
Thus it will take much of the implementation pain away from you, and you can concentrate purely on strategy
Tools like TradeStation possess this capability. However, my personal view is to implement as much as possible
I prefer higher frequency strategies due to their more attractive Sharpe ratios, but they are often tightly coupled to the
In the next email lesson we are going to look more closely at Strategy Backtesting in the first of two emails on
Successful Backtesting of Algorithmic Trading Strategies.
Algorithmic backtesting requires knowledge of many areas, including psychology, mathematics, statistics,
I couldn't hope to cover all of those topics in one email, so I'm going to split them into two or three smaller pieces.
What will we discuss in this section? I'll begin by defining backtesting and then I will describe the basics of how it is
carried out.
Then I will elucidate upon the biases we touched upon in the first email (Beginner's Guide to Quantitative Trading).
Next I will present a comparison of the various available backtesting software options.
In subsequent emails we will look at the details of strategy implementations that are often barely mentioned or
ignored.
We will also consider how to make the backtesting process more realistic by including the idiosyncrasies of a
trading exchange. Then we will discuss transaction costs and how to correctly model them in a backtest setting.
Let's begin by discussing what backtesting is and why we should carry it out in our algorithmic trading.
What is Backtesting?
Algorithmic trading stands apart from other types of investment classes because we can more reliably provide
expectations about future performance from past performance, as a consequence of abundant data availability. The
In simple terms, backtesting is carried out by exposing your particular strategy algorithm to a stream of historical
Each trade, which we will mean here to be a 'round-trip' of two signals, will have an associated profit or loss. The
accumulation of this profit/loss over the duration of your strategy backtest will lead to the total profit and loss (also
known as the 'P&L' or 'PnL'). That is the essence of the idea, although of course the "devil is always in the details"!
Modelling - Backtesting allows us to (safely!) test new models of certain market phenomena, such as transaction
costs, order routing, latency, liquidity or othermarket microstructure issues.
Optimisation - Although strategy optimisation is fraught with biases, backtesting allows us to increase the
performance of a strategy by modifying the quantity or values of the parameters associated with that strategy and
recalculating its performance.
Verification - Our strategies are often sourced externally, via our strategy pipeline. Backtesting a strategy ensures
that it has not been incorrectly implemented. Although we will rarely have access to the signals generated by external
strategies, we will often have access to the performance metrics such as the Sharpe Ratio and Drawdown
characteristics. Thus we can compare them with our own implementation.
Backtesting provides a host of advantages for algorithmic trading. However, it is not always possible to
In general, as thefrequency of the strategy increases, it becomes harder to correctly model the microstructure effects
This leads to less reliable backtests and thus a trickier evaluation of a chosen strategy. This is a particular problem
where the execution system is the key to the strategy performance, as with ultra-high frequency algorithms.
Unfortunately, backtesting is fraught with biases of all types. We have touched upon some of these issues in previous
There are many biases that can affect the performance of a backtested strategy.
Unfortunately, these biases have a tendency to inflate the performance rather than detract from it. Thus you should
always consider a backtest to be an idealised upper bound on the actual performance of the strategy. It is almost
impossible to eliminate biases from algorithmic trading so it is our job to minimise them as best we can in order to
Optimisation Bias
It involves adjusting or introducing additional trading parameters until the strategy performance on the backtest data
set is very attractive. However, once live the performance of the strategy can be markedly different. Another name
for this bias is "curve fitting" or "data-snooping bias".
Optimisation bias is hard to eliminate as algorithmic strategies often involve many parameters.
"Parameters" in this instance might be the entry/exit criteria, look-back periods, averaging periods (i.e the moving
Optimisation bias can be minimised by keeping the number of parameters to a minimum and increasing the quantity
In fact, one must also be careful of the latter as older training points can be subject to a prior regime (such as a
regulatory environment) and thus may not be relevant to your current strategy.
One method to help mitigate this bias is to perform a sensitivity analysis. This means varying the parameters
Sound, fundamental reasoning for parameter choices should, with all other factors considered, lead to a smoother
parameter surface.
If you have avery jumpy performance surface, it often means that a parameter is not reflecting a phenomena and is
There is a vast literature on multi-dimensional optimisation algorithms and it is a highly active area of research. I won't
dwell on it here, but keep it in the back of your mind when you find a strategy with a fantastic backtest!
Look-Ahead Bias
Look-ahead bias is introduced into a backtesting system when future data is accidentally included at a point in the
simulation where that data would not have actually been available.
If we are running the backtest chronologically and we reach time point N, then look-ahead bias occurs if data is
included for any point N+k, where k>0. Look-ahead bias errors can be incredibly subtle. Here are three examples of
how look-ahead bias can be introduced:
Technical Bugs - Arrays/vectors in code often have iterators or index variables. Incorrect offsets of these indices can
lead to a look-ahead bias by incorporating data at N+k for non-zero k.
Parameter Calculation - Another common example of look-ahead bias occurs when calculating optimal strategy
parameters, such as with linear regressions between two time series. If the whole data set (including future data) is
used to calculate the regression coefficients, and thus retroactively applied to a trading strategy for optimisation
purposes, then future data is being incorporated and a look-ahead bias exists.
Maxima/Minima - Certain trading strategies make use of extreme values in any time period, such as incorporating the
high or low prices in OHLC data. However, since these maximal/minimal values can only be calculated at the end of a
time period, a look-ahead bias is introduced if these values are used -during- the current period. It is always necessary
to lag high/low values by at least one period in any trading strategy making use of them.
As with optimisation bias, one must be extremely careful to avoid its introduction. It is often the main reason why
Survivorship Bias
Survivorship bias is a particularly dangerous phenomenon and can lead to significantly inflated performance for
It occurs when strategies are tested on datasets that do not include the full universe of prior assets that may have
been chosen at a particular point in time, but only consider those that have "survived" to the current time.
As an example, consider testing a strategy on a random selection of equities before and after the 2001 market crash.
Some technology stocks went bankrupt, while others managed to stay afloat and even prospered.
If we had restricted this strategy only to stocks which made it through the market drawdown period, we would be
introducing a survivorship bias because they have already demonstrated their success to us.
In fact, this is just another specific case of look-ahead bias, as future information is being incorporated into past
analysis.
There are two main ways to mitigate survivorship bias in your strategy backtests:
Survivorship Bias Free Datasets - In the case of equity data it is possible to purchase datasets that include delisted
entities, although they are not cheap and only tend to be utilised by institutional firms. In particular, Yahoo Finance
data is NOT survivorship bias free, and this is commonly used by many retail algo traders. One can also trade on
asset classes that are not prone to survivorship bias, such as certain commodities (and their future derivatives).
Use More Recent Data - In the case of equities, utilising a more recent data set mitigates the possibility that the stock
selection chosen is weighted to "survivors", simply as there is less likelihood of overall stock delisting in shorter time
periods. One can also start building a personal survivorship-bias free dataset by collecting data from current point
onward. After 3-4 years, you will have a solid survivorship-bias free set of equities data with which to backtest further
strategies.
We will now consider certain psychological phenomena that can influence your trading performance.
This particular phenomena is not often discussed in the context of quantitative trading. However, it is discussed
It has various names, but I've decided to call it "psychological tolerance bias" because it captures the essence of the
problem.
When creating backtests over a period of 5 years or more, it is easy to look at an upwardly trending equity curve,
calculate the compounded annual return, Sharpe ratio and even drawdown characteristics and be satisfied with the
results.
As an example, the strategy might possess a maximum relative drawdown of 25% and a maximum drawdown
It is straightforward to convince oneself that it is easy to tolerate such periods of losses because the overall picture
If historical drawdowns of 25% or more occur in the backtests, then in all likelihood you will see periods of similar
These periods of drawdown are psychologically difficult to endure. I have observed first hand what an extended
drawdown can be like, in an institutional setting, and it is not pleasant - even if the backtests suggest such periods
will occur.
The reason I have termed it a "bias" is that often a strategy which would otherwise be successful is stopped from
trading during times of extended drawdown and thus will lead to significant underperformance compared to a backtest.
Thus, even though the strategy is algorithmic in nature, psychological factors can still have a heavy influence on
profitability.
The takeaway is to ensure that if you see drawdowns of a certain percentage and duration in the backtests, then you
should expect them to occur in live trading environments, and will need to persevere in order to reach profitability once
more.
Solutions range from fully-integrated institutional grade sophisticated software through to programming languages
such as C++, Python and R where nearly everything must be written from scratch (or suitable 'plugins' obtained).
As quant traders, we are interested in the balance of being able to "own" our trading technology stack versus the
speed and reliability of our development methodology. Here are the key considerations for software choice:
Programming Skill - The choice of environment will in a large part come down to your ability to program software. I
would argue that being in control of the total stack will have a greater effect on your long term P&L than outsourcing
as much as possible to vendor software. This is due to the downside risk of having external bugs or idiosyncrasies
that you are unable to fix in vendor software, which would otherwise be easily remedied if you had more control over
your "tech stack". You also want an environment that strikes the right balance between productivity, library availability
and speed of execution. I make my own personal recommendation below.
Execution Capability/Broker Interaction - Certain backtesting software, such as TradeStation, ties in directly with a
brokerage. I am not a fan of this approach as reducing transaction costs are often a big component of getting a higher
Sharpe ratio. If you're tied into a particular broker (and Tradestation "forces" you to do this), then you will have a
harder time transitioning to new software (or a new broker) if the need arises. Interactive Brokers provide an API
which is robust, albeit with a slightly obtuse interface.
Customisation - An environment like MATLAB or Python gives you a great deal of flexibility when creating algo
strategies as they provide fantastic libraries for nearly any mathematical operation imaginable, but also allow
extensive customisation where necessary.
Strategy Complexity - Certain software just isn't cut out for heavy number crunching or mathematical complexity.
Excel is one such piece of software. While it is good for simpler strategies, it cannot really cope with numerous assets
or more complicated algorithms, at speed.
Bias Minimisation - Does a particular piece of software or data lend itself more to trading biases? You need to make
sure that if you want to create all the functionality yourself, that you don't introduce bugs which can lead to biases.
Speed of Development - One shouldn't have to spend months and months implementing a backtest engine.
Prototyping should only take a few weeks. Make sure that your software is not hindering your progress to any great
extent, just to grab a few extra percentage points of execution speed. C++ is the "elephant in the room" here!
Speed of Execution - If your strategy is completely dependent upon execution timeliness (as in HFT/UHFT) then a
language such as C or C++ will be necessary. However, you will be verging on Linux kernel optimisation and FPGA
usage for these domains, which is outside the scope of this email!
Cost - Many of the software environments that you can program algorithmic trading strategies with are completely free
and open source. In fact, many hedge funds make use of open source software for their entire algo trading stacks. In
addition, Excel and MATLAB are both relatively cheap and there are even free alternatives to each.
Now that we have listed the criteria with which we need to choose our software infrastructure, I want to run through
Note: I am only going to include software that is available to most retail practitioners and software developers, as this
is the readership of the site and the email list. While other software is available, such as the more institutional grade
tools, I feel these are too expensive to be effectively used in a retail setting and I personally have no extensive
Description: Programming environment originally designed for computational mathematics, physics and
MATLAB engineering. Very well suited to vectorised operations and those involving numerical linear algebra. Provides a wide
array of plugins for quant trading. In widespread use in quantitative hedge funds.
Execution: No native execution capability, MATLAB requires a separate execution system.
Customisation: Huge array of community plugins for nearly all areas of computational mathematics.
Strategy Complexity: Many advanced statistical methods already available and well-tested.
Bias Minimisation: Harder to detect look-ahead bias, requires extensive testing.
Development Speed: Short scripts can create sophisticated backtests easily.
Execution Speed: Assuming a vectorised/parallelised algorithm, MATLAB is highly optimised. Poor for traditional
iterated loops.
Cost: ~1,000 USD for a license.
Alternatives: Octave, SciLab
Description: High-level language designed for speed of development. Wide array of libraries for nearly any
programmatic task imaginable. Gaining wider acceptance in hedge fund and investment bank community. Not quite
as fast as C/C++ for execution speed.
Execution: Python plugins exist for larger brokers, such as Interactive Brokers. Hence backtest and execution system
can all be part of the same "tech stack".
Customisation: Python has a very healthy development community and is a mature language. NumPy/SciPy provide
fast scientific computing and statistical analysis tools relevant for quant trading.
Strategy Complexity: Many plugins exist for the main algorithms, but not quite as big a quant community as exists
Python for MATLAB.
Bias Minimisation: Same bias minimisation problems exist as for any high level language. Need to be extremely
careful about testing.
Development Speed: Python's main advantage is development speed, with robust in built in testing capabilities.
Execution Speed: Not quite as fast as C++, but scientific computing components are optimised and Python can talk
to native C code with certain plugins.
Cost: Free/Open Source
Alternatives:Ruby, Erlang, Haskell
Description: Environment designed for advanced statistical methods and time series analysis. Wide array of specific
statistical, econometric and native graphing toolsets. Large developer community.
Execution: R possesses plugins to some brokers, in particular Interactive Brokers. Thus an end-to-end system can
written entirely in R.
Customisation: R can be customised with any package, but its strengths lie in statistical/econometric domains.
Strategy Complexity: Mostly useful if performing econometric, statistical or machine-learning strategies due to
available plugins.
R
Bias Minimisation: Similar level of bias possibility for any high-level language such as Python or C++. Thus testing
must be carried out.
Development Speed: R is rapid for writing strategies based on statistical methods.
Execution Speed: R is slower than C++, but remains relatively optimised for vectorised operations (as with
MATLAB).
Cost: Free/Open Source
Alternatives: Stata
Description: Mature, high-level language designed for speed of execution. Wide array of quantitative finance and
numerical libraries. Harder to debug and often takes longer to implement than Python or MATLAB. Extremely
prevalent in both the buy- and sell-side.
Execution: Most brokerage APIs are written in C++ and Java. Thus many plugins exist.
Customisation: C/C++ allows direct access to underlying memory, hence ultra-high frequency strategies can be
implemented.
Strategy Complexity: C++ STL provides wide array of optimised algorithms. Nearly any specialised mathematical
algorithm possesses a free, open-source C/C++ implementation on the web.
C++
Bias Minimisation: Look-ahead bias can be tricky to eliminate, but no harder than other high-level language. Good
debugging tools, but one must be careful when dealing with underlying memory.
Development Speed: C++ is quite verbose compared to Python or MATLAB for the same algorithm. More lines-of-
code (LOC) often leads to greater likelihood of bugs.
Execution Speed: C/C++ has extremely fast execution speed and can be well optimised for specific computational
architectures. This is the main reason to utilise it.
Cost: Various compilers: Linux/GCC is free, MS Visual Studio has differing licenses.
Alternatives: C#, Java, Scala
Different strategies will require different software packages. HFT and UHFT strategies will be written in C/C++
(these days they are often carried out on GPUs and FPGAs), whereas low-frequency directional equity strategies are
easy to implement in TradeStation, due to the "all in one" nature of the software/brokerage.
My personal preference is for Python as it provides the right degree of customisation, speed of development,
If I need anything faster, I can "drop in" to C++ directly from my Python programs. One method favoured by many
quant traders is to prototype their strategies in Python and then convert the slower execution sections to C++ in an
iterative manner. Eventually the entire algo is written in C++ and can be "left alone to trade"!
In the next email we will take an extensive look at transaction cost modelling as well as strategy specific
implementation.
In the third lesson of our quantitative finance email course we discussed the various cognitive biases that can
affect trading, as well as different "technology stacks" and software for writing backtesters.
In today's lesson we're going to build on the last email by discussing transaction cost modelling and strategy
implementation issues.
Transaction Costs
One of the most prevalent beginner mistakes when implementing trading models is to neglect (or grossly
underestimate) the effects of transaction costs on a strategy.
Though it is often assumed that transaction costs only reflect broker commissions, there are in fact many other ways
that costs can be accrued on a trading model. The three main types of costs that must be considered include:
Commissions/Fees
The most direct form of transaction costs incurred by an algorithmic trading strategy are commissions and fees.
All strategies require some form of access to an exchange, either directly or through a brokerage intermediary ("the
broker"). These services incur an incremental cost with each trade, known as commission.
Brokers generally provide many services, although quantitative algorithms only really make use of the exchange
infrastructure. Hence brokerage commissions are often small on per trade basis.
Brokers also charge fees, which are costs incurred to clear and settle trades. Further to this are taxes imposed by
regional or national governments.
For instance, in the UK there is a stamp duty to pay on equities transactions. Since commissions, fees and taxes are
generally fixed, they are relatively straightforward to implement in a backtest engine (see below).
Slippage/Latency
Slippage is the difference in price achieved between the time when a trading system decides to transact and the time
when a transaction is actually carried out at an exchange.
Slippage is a considerable component of transaction costs and can make the difference between a very profitable
strategy and one that performs poorly.
Slippage is a function of the underlying asset volatility, the latency between the trading system and the exchange
and the type of strategy being carried out.
An instrument with higher volatility is more likely to be moving and so prices between signal and execution can differ
substantially.
Latency is defined as the time difference between signal generation and point of execution.
Higher frequency strategies are more sensitive to latency issues and improvements of milliseconds on this latency can
make all the difference towards profitability.
Momentum systems suffer more from slippage on average because they are trying to purchase instruments that are
already moving in the forecast direction. The opposite is true for mean-reverting strategies as these strategies are
moving in a direction opposing the trade.
Market Impact/Liquidity
Market impact is the cost incurred to traders due to the supply/demand dynamics of the exchange (and asset)
through which they are trying to trade.
A large order on a relatively illiquid asset is likely to move the market substantially as the trade will need to access a
large component of the current supply.
To counter this, large block trades are broken down into smaller chunkswhich are transacted periodically, as and
when new liquidity arrives at the exchange.
On the opposite end, for highly liquid instruments such as the S&P500 E-Mini index futures contract, low volume
trades are unlikely to adjust the "current price" in any great amount.
More illiquid assets are characterised by a larger spread, which is the difference between the current bid and ask
prices on the limit order book.
This spread is an additional transaction cost associated with any trade. Spread is a very important component of the
total transaction cost - as evidenced by the myriad of UK spread-betting firms whose advertising campaigns express
the "tightness" of their spreads for heavily traded instruments.
n order to successfully model the above costs in a backtesting system, various degrees of complex transaction
models have been introduced.
They range from simple flat modelling through to a non-linear quadratic approximation. Here we will outline the
advantages and disadvantages of each model:
Flat transaction costs are the simplest form of transaction cost modelling. They assume a fixed cost associated with
each trade. Thus they best represent the concept of brokerage commissions and fees.
They are not very accurate for modelling more complex behaviour such as slippage or market impact.
In fact, they do not consider asset volatility or liquidity at all. Their main benefit is that they are computationally
straightforward to implement.
However they are likely to significantly under or over estimate transaction costs depending upon the strategy
being employed. Thus they are rarely used in practice.
More advanced transaction cost models start with linear models, continue with piece-wise linear models and conclude
with quadratic models.
They lie on a spectrum of least to most accurate, albeit with least to greatest implementation effort.
Since slippage and market impact are inherently non-linear phenomena quadratic functions are the most accurate at
modelling these dynamics.
Quadratic transaction cost models are much harder to implement and can take far longer to compute than for simpler
flat or linear models, but they are often used in practice.
Algorithmic traders also attempt to make use of actual historical transaction costs for their strategies as inputs to their
current transaction models to make them more accurate.
This is tricky business and often verges on the complicated areas of modelling volatility, slippage and market
impact.
However, if the trading strategy is transacting large volumes over short time periods, then accurate estimates of the
incurred transaction costs can have a significant effect on the strategy bottom-line and so it is worth the effort to invest
in researching these models.
While transaction costs are a very important aspect of successful backtesting implementations, there are many other
issues that can affect strategy performance.
One choice that an algorithmic trader must make is how and when to make use of the different exchange orders
available.
This choice usually falls into the realm of the execution system, but we will consider it here as it can greatly affect
strategy backtest performance. There are two types of order that can be carried out: market orders and limit orders.
Thus large trades executed as market orders will often get a mixture of prices as each subsequent limit order on the
opposing side is filled. Market orders are considered aggressive orders since they will almost certainly be filled, albeit
with a potentially unknown cost.
Limit orders provide a mechanism for the strategy to determine the worst price at which the trade will get executed,
with the caveat that the trade may not get filled partially or fully.
Limit orders are considered passive orders since they are often unfilled, but when they are a price is guaranteed. An
individual exchange's collection of limit orders is known as the limit order book, which is essentially a queue of buy
and sell orders at certain sizes and prices.
When backtesting, it is essential to model the effects of using market or limit orders correctly.
For high-frequency strategies in particular, backtests can significantly outperform live trading if the effects of market
impact and the limit order book are not modelled accurately.
Note that this is precisely the form of data given out by Yahoo Finance, which is a very common source of data for
retail algorithmic traders!
Cheap or free datasets, while suffering from survivorship bias (which we have already discussed in the previous
email), are also often composite price feeds from multiple exchanges.
This means that the extreme points (i.e. the open, close, high and low) of the data are very susceptible to "outlying"
values due to small orders at regional exchanges.
Further, these values are also sometimes more likely to be tick-errors that have yet to be removed from the dataset.
This means that if your trading strategy makes extensive use of any of the OHLC points specifically, backtest
performance can differ from live performance as orders might be routed to different exchanges depending upon your
broker and your available access to liquidity.
The only way to resolve these problems is to make use of higher frequency data or obtain data directly from an
individual exchange itself, rather than a cheaper composite feed.
In the next email we will look at some of the recommended textbooks for quantitative trading.
In today's lesson we're going to look at some of the core textbooks for quantitative and algorithmic trading.
Algorithmic trading is usually perceived as a complex area for beginners to get to grips with.
It covers a wide range of disciplines, with certain aspects requiring a significant degree of mathematical and
statistical maturity.
Consequently it can be extremely off-putting for the uninitiated. In reality, the overall concepts are straightforward to
grasp, while the details can be learned in an iterative, ongoing manner.
The beauty of algorithmic trading is that there is no need to test out knowledge on real capital, as many brokerages
While there are certain caveats associated with such systems, they provide an environment to foster a deep level of
The first task is to gain a solid overview of the subject. I have found it be far easier to avoid heavy mathematical
discussions until the basics are covered and understood. The best books I have found for this purpose are as follows:
1) Quantitative Trading by Ernest Chan - This is one of my favourite finance books. Dr. Chan provides a great
overview of the process of setting up a "retail" quantitative trading system, using MatLab or Excel. He makes the
subject highly approachable and gives the impression that "anyone can do it". Although there are plenty of details that
are skipped over (mainly for brevity), the book is a great introduction to how algorithmic trading works. He discusses
alpha generation ("the trading model"), risk management, automated execution systems and certain strategies
(particularly momentum and mean reversion). This book is the place to start.
2) Inside the Black Box by Rishi K. Narang - In this book Dr. Narang explains in detail how a professional
quantitative hedge fund operates. It is pitched at a savvy investor who is considering whether to invest in such a "black
box". Despite the seeming irrelevance to a retail trader, the book actually contains a wealth of information on how a
"proper" quant trading system should be carried out. For instance, the importance of transaction costs and risk
management are outlined, with ideas on where to look for further information. Many retail algo traders could do well to
pick this up and see how the 'professionals' carry out their trading.
3) Algorithmic Trading & DMA by Barry Johnson - The phrase 'algorithmic trading', in the financial industry, usually
refers to the execution algorithms used by banks and brokers to execute efficient trades. I am using the term to cover
not only those aspects of trading, but also quantitative or systematic trading. This book is mainly about the former,
being written by Barry Johnson, who is a quantitative software developer at an investment bank. Does this mean it is
of no use to the retail quant? Not at all. Possessing a deeper understanding of how exchanges work and "market
microstructure" can aid immensely the profitability of retail strategies. Despite it being a heavy tome, it is worth picking
up.
Once the basic concepts are grasped, it is necessary to begin developing a trading strategy. This is usually known
Strategies are straightforward to find these days (as I mentioned in previous emails), however the true value comes in
determining your own trading parameters via extensive research and backtesting.
The following books discuss certain types of trading and execution systems and how to go about implementing them:
4) Algorithmic Trading by Ernest Chan - This is the second book by Dr. Chan. In the first book he discussed
momentum, mean reversion and certain high frequency strategies. This book discusses such strategies in depth and
provides significant implementation details, albeit with more mathematical complexity than in the first (e.g. Kalman
Filters, Stationarity/Cointegration, CADF etc). The strategies, once again, make extensive use of MatLab but the code
can be easily modified to C++, Python/pandas or R for those with programming experience. It also provides updates
on the latest market behaviour, as the first book was written a few years back.
5) Trading and Exchanges by Larry Harris - This book concentrates on market microstructure, which I personally feel
is an essential area to learn about, even at the beginning stages of quant trading. Market microstructure is the
"science" of how market participants interact and the dynamics that occur in the order book. It is closely related to how
exchanges function and what actually happens when a trade is placed. This book is less about trading strategies as
such, but more about things to be aware of when designing execution systems. Many professionals in the quant
finance space regard this as an excellent book and I also highly recommend it.
At this stage, as a retail trader, you will be in a good place to begin researching the other components of a trading
system such as the execution mechanism (and its deep relationship with transaction costs), as well as risk and
portfolio management.
While the above five books are very good, I'd also like to take the opportunity to recommend my own book on the
In the book I make extensive use of Python and associated libraries such as Scikit-Learn, Pandas, NumPy, SciPy
and Statsmodels to create an end-to-end algorithmic trading backtest simulator with Interactive Brokers as the
primary brokerage.
I present trading strategies at multiple frequencies and discuss how to optimise parameters of such strategies,
while outlining pitfalls to be aware of.
To find out more about the book, please visit the Successful Algorithmic Trading page.
In the next email we will ask whether quantitative traders can still succeed at the retail level.
Lesson 6 whether quantitative traders can still succeed at the retail level
In the fifth lesson of our quantitative finance email course we discussed some of the core textbooks for
quantitative and algorithmic trading.
In today's lesson we're going to look at some of the advantages enjoyed by retail quants over quantitative hedge
funds.
It is common, as a beginning algorithmic trader practising at retail level, to question whether it is still possible to
In this email lesson I would like to argue that due to the nature of the institutional regulatory environment, the
organisational structure and a need to maintain investor relations that funds suffer from certain disadvantages that
"Big money" moves the markets, and as such one can dream up many strategies to take advantage of such
movements.
We will discuss some of these strategies in future emails. At this stage I would like to highlight the comparative
Trading Advantages
There are many ways in which a retail algo trader can compete with a fund on their trading process alone, but there
Capacity - A retail trader has greater freedom to play in smaller markets. They can generate significant returns in
these spaces, even while institutional funds can't.
Crowding the trade - Funds suffer from "technology transfer", as staff turnover can be high. Non-Disclosure
Agreements and Non-Compete Agreements mitigate the issue, but it still leads to many quant funds "chasing the
same trade". Whimsical investor sentiment and the "next hot thing" exacerbate the issue. Retail traders are not
constrained to follow the same strategies and so can remain uncorrelated to the larger funds.
Market impact - When playing in highly liquid, non-OTC markets, the low capital base of retail accounts reduces
market impact substantially.
Leverage - A retail trader, depending upon their legal setup, is constrained by margin/leverage regulations. Private
investment funds do not suffer from the same disadvantage, although they are equally constrained from a risk
management perspective.
Liquidity - Having access to a prime brokerage is out of reach of the average retail algo trader. They have to "make
do" with a retail brokerage such as Interactive Brokers. Hence there is reduced access to liquidity in certain
instruments. Trade order-routing is also less clear and is one way in which strategy performance can diverge from
backtests.
Client news flow - Potentially the most important disadvantage for the retail trader is lack of access to client news
flow from their prime brokerage or credit-providing institution. Retail traders have to make use of non-traditional
sources such as meet-up groups, blogs, forums and open-access financial journals.
Risk Management
Retail algo traders often take a different approach to risk management than the larger quant funds. It is often
This allows the retail trader to deploy custom or preferred risk modelling methodologies, without the need to follow
However, the alternative argument is that this flexibility can lead to retail traders to becoming "sloppy" with risk
management.
Risk concerns may be built-in to the backtest and execution process, without external consideration given to portfolio
risk as a whole.
Although "deep thought" might be applied to the alpha model (strategy), risk management might not achieve a similar
level of consideration.
Investor Relations
Outside investors are the key difference between retail shops and large funds. This drives all manner of incentives for
the larger fund - issues which the retail trader need not concern themselves with:
Compensation structure - In the retail environment the trader is concerned only with absolute return. There are no
high-water marks to be met and no capital deployment rules to follow. Retail traders are also able to suffer more
volatile equity curves since nobody is watching their performance who might be capable of redeeming capital from
their fund.
Regulations and reporting - Beyond taxation there is little in the way of regulatory reporting constraints for the retail
trader. Further, there is no need to provide monthly performance reports or "dress up" a portfolio prior to a client
newsletter being sent. This is a big time-saver.
Benchmark comparison - Funds are not only compared with their peers, but also "industry benchmarks". For a long-
only US equities fund, investors will want to see returns in excess of the S&P500, for example. Retail traders are not
enforced in the same way to compare their strategies to a benchmark.
Performance fees - The downside to running your own portfolio as a retail trader are the lack of management and
performance fees enjoyed by the successful quant funds. There is no "2 and 20" to be had at the retail level!
Technology
One area where the retail trader is at a significant advantage is in the choice of technology stack for the trading
system.
Not only can the trader pick the "best tools for the job" as they see fit, but there are no concerns about legacy
Newer languages such as Python or R now possess packages to construct an end-to-end backtesting, execution, risk
and portfolio management system with far fewer lines-of-code (LOC) than may be needed in a more verbose
One either has to build the stack themselves or outsource all or part of it to vendors. This is expensive in terms of
Further, a trader mustdebug all aspects of the trading system - a long and potentially painstaking process.
All desktop research machines and any co-located servers must be paid for directly out of trading profits as there are
In conclusion, it can be seen that retail traders possess significant comparative advantages over the larger quant
funds. Potentially, there are many ways in which these advantages can be exploited.
In the next email we will consider the best programming languages for algorithmic trading systems.
In the sixth lesson of our quantitative finance email course we discussed some of the issues around whether retail
traders can still succeed at algorithmic trading.
In today's lesson we're going to look at some of the programming languages that are useful for building algorithmic
trading systems.
One of the most frequent questions I receive in the QS mailbag is "What is the best programming language for
algorithmic trading?".
Today's lesson will outline the necessary components of an algorithmic trading system architecture and how
Firstly, the major components of an algorithmic trading system will be considered, such as the research tools, portfolio
Subsequently, different trading strategies will be examined and how they affect the design of the system. In particular
the frequency of trading and the likely trading volume will both be discussed.
Once the trading strategy has been selected, it is necessary to architect the entire system. This includes choice of
hardware, the operating system(s) and system resiliency against rare, potentially catastrophic events.
While the architecture is being considered, due regard must be paid to performance - both to the research tools as
Before deciding on the "best" language with which to write an automated trading system it is necessary to define the
requirements.
Research is concerned with evaluation of a strategy performance over historical data. The process of evaluating a
The data size and algorithmic complexity will have a big impact on the computational intensity of the backtester. CPU
speed and concurrency are often the limiting factors in optimising research execution speed.
Signal generation is concerned with generating a set of trading signals from an algorithm and sending such orders to
often the limiting factor in optimising execution systems. Thus the choice of languages for each component of your
The type of algorithmic strategy employed will have a substantial impact on the design of the system.
It will be necessary to consider the markets being traded, the connectivity to external data vendors, the frequency and
volume of the strategy, the trade-off between ease of development and performance optimisation, as well as any
custom hardware, including co-located custom servers, GPUs or FPGAs that might be necessary.
The technology choices for a low-frequency US equities strategy will be vastly different from those of a high-
frequency statistical arbitrage strategy trading on the futures market. Prior to the choice of language many data
It will be necessary to consider connectivity to the vendor, structure of any APIs, timeliness of the data, storage
requirements and resiliency in the face of a vendor going offline. It is also wise to possess rapid access to multiple
vendors!
Various instruments all have their own storage quirks, examples of which include multiple ticker symbols for equities
and expiration dates for futures (not to mention any specific OTC data). This needs to be factored in to the platform
design.
Frequency of strategy is likely to be one of the biggest drivers of how the technology stack will be defined. Strategies
employing data more frequently than minutely or secondly bars require significant consideration with regards to
performance.
A strategy exceeding secondly bars (i.e. tick data) leads to a performance driven design as the primary requirement.
For high frequency strategies a substantial amount of market data will need to be stored and evaluated. Software such
execution system must be used. C/C++ (possibly with some assembler) is likely to the strongest language candidate.
Ultra-high frequency strategies will almost certainly require custom hardware such as FPGAs, exchange co-location
Research Systems
Research systems typically involve a mixture of interactive development and automated scripting. The former
often takes place within an IDE such as Visual Studio, MatLab or R Studio. The latter involves extensive numerical
calculations over numerous parameters and data points.
This leads to a language choice providing a straightforward environment to test code, but also provides sufficient
performance to evaluate strategies over multiple parameter dimensions.
The prime consideration at this stage is that of execution speed. A compiled language (such as C++) is often useful if
the backtesting parameter dimensions are large. Remember that it is necessary to be wary of such systems if that is
the case!
Interpreted languages such as Python often make use of high-performance libraries such as NumPy/pandas for the
backtesting step, in order to maintain a reasonable degree of competitiveness with compiled equivalents.
Ultimately the language chosen for the backtesting will be determined by specific algorithmic needs as well as the
range of libraries available in the language (more on that below).
However, the language used for the backtester and research environments can be completely independent of those
used in the portfolio construction, risk management and execution components, as will be seen.
The portfolio construction and risk management components are often overlooked by retail algorithmic
traders.
This is almost always a mistake. These tools provide the mechanism by which capital will be preserved. They not only
attempt to alleviate the number of "risky" bets, but also minimise churn of the trades themselves, reducing transaction
costs.
Sophisticated versions of these components can have a significant effect on the quality and consistentcy of
profitability.
It is straightforward to create a stable of strategies as the portfolio construction mechanism and risk manager can
easily be modified to handle multiple systems. Thus they should be considered essential components at the outset of
The job of the portfolio construction system is to take a set of desired trades and produce the set of actual trades that
minimise churn, maintain exposures to various factors (such as sectors, asset classes, volatility etc) and optimise the
Portfolio construction often reduces to a linear algebra problem (such as a matrix factorisation) and hence
performance is highly dependent upon the effectiveness of the numerical linear algebra implementation available.
Common libraries include uBLAS, LAPACK and NAG for C++. MatLab also possesses extensively optimised matrix
A frequently rebalanced portfolio will require a compiled (and well optimised!) matrix library to carry this step out, so as
Risk can come in many forms: Increased volatility (although this may be seen as desirable for certain strategies!),
increased correlations between asset classes, counter-party default, server outages, "black swan" events and
Risk management components try and anticipate the effects of excessive volatility and correlation between asset
Often this reduces to a set of statistical computations such as Monte Carlo "stress tests".
This is very similar to the computational needs of a derivatives pricing engine and as such will be CPU-bound. These
simulations are highly parallelisable (see below) and, to a certain degree, it is possible to "throw hardware at the
problem".
Execution Systems
The job of the execution system is to receive filtered trading signals from the portfolio construction and risk
management components and send them on to a brokerage or other means of market access.
For the majority of retail algorithmic trading strategies this involves an API or FIX connection to a brokerage such as
Interactive Brokers.
The primary considerations when deciding upon a language include quality of the API, language-wrapper availability
The "quality" of the API refers to how well documented it is, what sort of performance it provides, whether it needs
standalone software to be accessed or whether a gateway can be established in a headless fashion (i.e. no GUI).
In the case of Interactive Brokers, the Trader WorkStation tool needs to be running in a GUI environment in order to
access their API. Specifically, this means it cannot be run on a Linux console server environment.
Most APIs will provide a C++ and/or Java interface. It is usually up to the community to develop language-specific
Note that with every additional plugin utilised (especially API wrappers) there is scope for bugs to creep into the
system. Always test plugins of this sort and ensure they are actively maintained. A worthwhile gauge is to see how
Execution frequency is of the utmost importance in the execution algorithm. Note that hundreds of orders may be
Slippage will be incurred through a badly-performing execution system and this will have a dramatic impact on
profitability.
Statically-typed languages (see below) such as C++/Java are generally optimal for execution but there is a trade-off in
Dynamically-typed languages, such as Python and Perl are now generally "fast enough". Always make sure the
components are designed in a modular fashion (see below) so that they can be "swapped out" out as the system
scales.
The components of a trading system, its frequency and volume requirements have been discussed above, but system
infrastructure has yet to be covered.
Those acting as a retail trader or working in a small fund will likely be "wearing many hats". You will be covering the
alpha model, risk management and execution parameters, and also the final implementation of the system. Before
delving into specific languages the design of an optimal system architecture will be discussed.
Separation of Concerns
One of the most important decisions that must be made at the outset is how to "separate the concerns" of a
trading system.
In software development, this essentially means how to break up the different aspects of the trading system into
separate modular components.
By exposing interfaces at each of the components it is easy to swap out parts of the system for other versions that aid
This is the "best practice" for such systems. For strategies at lower frequencies such practices are advised.
For ultra high frequency trading the rulebook might have to be ignored at the expense of tweaking the system for
Creating a component map of an algorithmic trading system is worth an email in itself. However, an optimal approach
is to make sure there are separate components for the historical and real-time market data inputs, data storage, data
access API, backtester, strategy parameters, portfolio construction, risk management and automated execution
systems.
For instance, if the data store being used is currently underperforming, even at significant levels of optimisation, it can
be swapped out with minimal rewrites to the data ingestion or data access API. As far the as the backtester and
Another benefit of separated components is that it allows a variety of programming languages to be used in the overall
system.
There is no need to be restricted to a single language if the communication method of the components is language
independent. This will be the case if they are communicating via TCP/IP, ZeroMQ or some other language-
independent protocol.
As a concrete example, consider the case of a backtesting system being written in C++ for "number crunching"
performance, while the portfolio manager and execution systems are written in Python using SciPy and IBPy (an open
Performance Considerations
For higher frequency strategies it is the most important factor. "Performance" covers a wide range of issues, such as
algorithmic execution speed, network latency, bandwidth, data I/O, concurrency/parallelism and scaling.
Each of these areas are individually covered by large textbooks, so this email will only scratch the surface of each
topic. Architecture and language choice will now be discussed in terms of their effects on performance.
The prevailing wisdom as stated by Donald Knuth, one of the fathers of Computer Science, is that "premature
This is almost always the case - except when building a high frequency trading algorithm! For those who are
interested in lower frequency strategies, a common approach is to build a system in the simplest way possible and
either in a MS Windows or Linux environment. There are many operating system and language tools available to do
so, as well as third party utilities. Language choice will now be discussed in the context of performance.
C++, Java, Python, R and MatLab all contain high-performance libraries (either as part of their standard or externally)
for basic data structure and algorithmic work. C++ ships with the Standard Template Library, while Python contains
NumPy/SciPy. Common mathematical tasks are to be found in these libraries and it is rarely beneficial to write a new
implementation.
One exception is if highly customised hardware architecture is required and an algorithm is making extensive use of
However, often "reinvention of the wheel" wastes time that could be better spent developing and optimising other parts
of the trading infrastructure. Development time is extremely precious especially in the context of sole developers.
Latency is often an issue of the execution system as the research tools are usually situated on the same machine.
For the former, latency can occur at multiple points along the execution path. Databases must be consulted
(disk/network latency), signals must be generated (operating syste, kernal messaging latency), trade signals sent (NIC
For higher frequency operations it is necessary to become intimately familiar with kernal optimisation as well as
optimisation of network transmission. This is a deep area and is significantly beyond the scope of the email but if an
Caching is very useful in the toolkit of a quantitative trading developer. Caching refers to the concept of storing
frequently accessed data in a manner which allows higher-performance access, at the expense of potential staleness
of the data. A common use case occurs in web development when taking data from a disk-backed relational database
and putting it into memory. Any subsequent requests for the data do not have to "hit the database" and so
stored in a cache until it is rebalanced, such that the list doesn't need to be regenerated upon each loop of the trading
However, caching is not without its own issues. Regeneration of cache data all at once, due to the volatilie nature of
cache storage, can place significant demand on infrastructure. Another issue is dog-piling, where multiple
generations of a new cache copy are carried out under extremely high load, which leads to cascade failure.
Dynamic memory allocation is an expensive operation in software execution. Thus it is imperative for higher
performance trading applications to be well-aware how memory is being allocated and deallocated during program
flow. Newer language standards such as Java, C# and Python all perform automatic garbage collection, which refers
Garbage collection is extremely useful during development as it reduces errors and aids readability. However, it is
often sub-optimal for certain high frequency trading strategies. Custom garbage collection is often desired for these
cases. In Java, for instance, by tuning the garbage collector and heap configuration, it is possible to obtain high
C++ doesn't provide a native garbage collector and so it is necessary to handle all memory allocation/deallocation
as part of an object's implementation. While possibly error prone (potentially leading to dangling pointers) it is
extremely useful to have fine-grained control of how objects appear on the heap for certain applications. When
choosing a language make sure to study how the garbage collector works and whether it can be modified to optimise
Many operations in algorithmic trading systems are amenable to parallelisation. This refers to the concept of carrying
So-called "embarassingly parallel" algorithms include steps that can be computed fully independently of other steps.
Certain statistical operations, such as Monte Carlo simulations, are a good example of embarassingly parallel
algorithms as each random draw and subsequent path operation can be computed without knowledge of other paths.
Other algorithms are only partially parallelisable. Fluid dynamics simulations are such an example, where the
domain of computation can be subdivided, but ultimately these domains must communicate with each other and thus
the operations are partially sequential. Parallelisable algorithms are subject to Amdahl's Law, which provides a
theoretical upper limit to the performance increase of a parallelised algorithm when subject to N separate processes
Parallelisation has become increasingly important as a means of optimisation since processor clock-speeds have
stagnated. Newer processors contain many cores with which to perform parallel calculations.
The rise of consumer graphics hardware (predominently for video games) has lead to the development of Graphical
Processing Units (GPUs), which contain hundreds of "cores" for highly concurrent operations. Such GPUs are now
very affordable. High-level frameworks, such as Nvidia CUDA have lead to widespread adoption in academia and
finance.
Such GPU hardware is generally only suitable for the research aspect of quantitative finance, whereas other more
specialised hardware (including Field-Programmable Gate Arrays - FPGAs) are used for (U)HFT.
optimise a backtester, since all calculations are generally independent of the others.
Scaling in software engineering and operations refers to the ability of the system to handle consistently increasing
loads in the form of greater requests, higher processor usage and more memory allocation.
In algorithmic trading a strategy is able to scale if it can accept larger quantities of capital and still produce consistent
returns. The trading technology stack scales if it can endure larger trade volumes and increased latency, without
bottlenecking.
While systems must be designed to scale, it is often hard to predict beforehand where a bottleneck will occur.
Rigourous logging, testing, profiling and monitoring will aid greatly in allowing a system to scale. Languages
themselves are often described as "unscalable". This is usually the result of misinformation, rather than hard fact.
It is the total technology stack that should be ascertained for scalability, not the language. Clearly certain languages
have greater performance than others in particular use cases, but one language is never "better" than another in every
sense.
In order to further introduce the ability to handle "spikes" in the system (i.e. sudden volatility which triggers a raft of
trades), it is useful to create a "message queuing architecture". This simply means placing a message queue
system between components so that orders are "stacked up" if a certain component is unable to process many
requests.
Rather than requests being lost they are simply kept in a stack until the message is handled. This is particularly useful
If the engine is suffering under heavy latency then it will back up trades. A queue between the trade signal generator
and the execution API will alleviate this issue at the expense of potential trade slippage. A well-respected open source
The hardware running your strategy can have a significant impact on the profitability of your algorithm.
This is not an issue restricted to high frequency traders either. A poor choice in hardware and operating system can
lead to a machine crash or reboot at the most inopportune moment. Thus it is necessary to consider where your
application will reside. The choice is generally between a personal desktop machine, a remote server, a "cloud"
provider or an exchange co-located server.
Desktop machines are simple to install and administer, especially with newer user friendly operating systems such as
Windows 7/8, Mac OS X and Ubuntu. Desktop systems do possess some significant drawbacks, however.
The foremost is that the versions of operating systems designed for desktop machines are likely to require
reboots/patching (and often at the worst of times!). They also use up more computational resources by the virtue of
problems. The main benefit of a desktop system is that significant computational horsepower can be purchased for
the fraction of the cost of a remote dedicated server (or cloud based system) of comparable speed.
A dedicated server or cloud-based machine, while often more expensive than a desktop option, allows for more
significant redundancy infrastructure, such as automated data backups, the ability to more straightforwardly ensure
uptime and remote monitoring. They are harder to administer since they require the ability to use remote login
In Windows this is generally via the GUI Remote Desktop Protocol (RDP). In Unix-based systems the command-line
Secure SHell (SSH) is used. Unix-based server infrastructure is almost always command-line based which
A co-located server, as the phrase is used in the capital markets, is simply a dedicated server that resides within an
exchange in order to reduce latency of the trading algorithm. This is absolutely necessary for certain high frequency
The final aspect to hardware choice and the choice of programming language is platform-independence. Is there a
need for the code to run across multiple different operating systems? Is the code designed to be run on a
particular type of processor architecture, such as the Intel x86/x64 or will it be possible to execute on RISC
processors such as those manufactured by ARM? These issues will be highly dependent upon the frequency and type
One of the best ways to lose a lot of money on algorithmic trading is to create a system with no resiliency.
This refers to the durability of the sytem when subject to rare events, such as brokerage bankruptcies, sudden excess
volatility, region-wide downtime for a cloud server provider or the accidental deletion of an entire trading database.
Years of profits can be eliminated within seconds with a poorly-designed architecture. It is absolutely essential to
consider issues such as debuggng, testing, logging, backups, high-availability and monitoring as core components of
your system.
It is likely that in any reasonably complicated custom quantitative trading application at least 50% of development time
Nearly all programming languageseither ship with an associated debugger or possess well-respected third-party
alternatives. In essence, a debugger allows execution of a program with insertion of arbitrary break points in the code
path, which temporarily halt execution in order to investigate the state of the system. The main benefit of debugging is
that it is possible to investigate the behaviour of code prior to a known crash point.
Debugging is an essential component in the toolbox for analysing programming errors. However, they are more
widely used in compiled languages such as C++ or Java, as interpreted languages such as Python are often easier to
Despite this tendency Python does ship with the pdb, which is a sophisticated debugging tool. The Microsoft Visual
C++ IDE possesses extensive GUI debugging utilities, while for the command line Linux C++ programmer, the gdb
debugger exists.
Testing in software development refers to the process of applying known parameters and results to specific
functions, methods and objects within a codebase, in order to simulate behaviour and evaluate multiple code-paths,
A more recent paradigm is known as Test Driven Development (TDD), where test code is developed against a
specified interface with no implementation. Prior to the completion of the actual codebase all tests will fail. As code is
written to "fill in the blanks", the tests will eventually all pass, at which point development should cease.
TDD requires extensive upfront specification design as well as a healthy degree of discipline in order to carry out
successfully. In C++, Boost provides a unit testing framework. In Java, the JUnit library exists to fulfill the same
purpose. Python also has the unittest module as part of the standard library. Many other languages possess unit
In a production environment, sophisticated logging is absolutely essential. Logging refers to the process of
outputting messages, with various degrees of severity, regarding execution behaviour of a system to a flat file or
database.
Logs are a "first line of attack" when hunting for unexpected program runtime behaviour. Unfortunately the
shortcomings of a logging system tend only to be discovered after the fact! As with backups discussed below, a
Both Microsoft Windows and Linux come with extensive system logging capability and programming languages tend
to ship with standard logging libraries that cover most use cases. It is often wise to centralise logging information in
order to analyse it at a later date, since it can often lead to ideas about improving performance or error reduction,
which will almost certainly have a positive impact on your trading returns.
While logging of a system will provide information about what has transpired in the past, monitoring of an
application will provide insight into what is happening right now. All aspects of the system should be considered for
monitoring. System level metrics such as disk usage, available memory, network bandwidth and CPU usage provide
Trading metrics such as abnormal prices/volume, sudden rapid drawdowns and account exposure for different
sectors/markets should also be continuously monitored. Further, a threshold system should be instigated that provides
notification when certain metrics are breached, elevating the notification method (email, SMS, automated phone call)
System monitoring is often the domain of the system administrator or operations manager. However, as a sole
trading developer, these metrics must be established as part of the larger design. Many solutions for monitoring exist:
proprietary, hosted and open source, which allow extensive customisation of metrics for a particular use case.
Backups and high availability should be prime concerns of a trading system. Consider the following two questions:
1) If an entire production database of market data and trading history was deleted (without backups) how would the
research and execution algorithm be affected? 2) If the trading system suffers an outage for an extended period (with
open positions) how would account equity and ongoing profitability be affected? The answers to both of these
It is imperative to put in place a system for backing up data and also for testing the restoration of such data.
Many individuals do not test a restore strategy. If recovery from a crash has not been tested in a safe environment,
what guarantees exist that restoration will be available at the worst possible moment?
Similarly, high availability needs to be "baked in from the start". Redundant infrastructure (even at additional expense)
must always be considered, as the cost of downtime is likely to far outweigh the ongoing maintenance cost of such
systems. I won't delve too deeply into this topic as it is a large area, but make sure it is one of the first considerations
Choosing a Language
Considerable detail has now been provided on the various factors that arise when developing a custom high-
performance algorithmic trading system. The next stage is to discuss how programming languages are generally
categorised.
Type Systems
When choosing a language for a trading stack it is necessary to consider the type system. The languages which are
of interest for algorithmic trading are either statically- or dynamically-typed.
A statically-typed language performs checks of the types (e.g. integers, floats, custom classes etc) during the
compilation process. Such languages include C++ and Java. A dynamically-typed language performs the majority of
its type-checking at runtime. Such languages include Python, Perl and JavaScript.
For a highly numerical system such as an algorithmic trading engine, type-checking at compile time can be extremely
beneficial, as it can eliminate many bugs that would otherwise lead to numerical errors.
However, type-checking doesn't catch everything, and this is where exception handling comes in due to the
'Dynamic' languages (i.e. those that are dynamically-typed) can often lead to run-time errors that would otherwise be
caught with a compilation-time type-check. For this reason, the concept of TDD (see above) and unit testing arose
which, when carried out correctly, often provides more safety than compile-time checking alone.
Another benefit of statically-typed languages is that the compiler is able to make many optimisations that are
otherwise unavailable to the dynamically- typed language, simply because the type (and thus memory requirements)
type-inspected at run-time and this carries a performance hit. Libraries for dynamic languages, such as NumPy/SciPy
One of the biggest choices available to an algorithmic trading developer is whether to use proprietary (commercial) or
There are advantages and disadvantages to both approaches. It is necessary to consider how well a language is
supported, the activity of the community surrounding a language, ease of installation and maintenance, quality of the
The Microsoft .NET stack (including Visual C++, Visual C#) and MathWorks' MatLab are two of the larger
proprietary choices for developing custom algorithmic trading software. Both tools have had significant "battle testing"
in the financial space, with the former making up the predominant software stack for investment banking trading
infrastructure and the latter being heavily used for quantitative trading research within investment funds.
Microsoft and MathWorks both provide extensive high quality documentation for their products. Further, the
communities surrounding each tool are very large with active web forums for both. The .NET software allows cohesive
integration with multiple languages such as C++, C# and VB, as well as easy linkage to other Microsoft products such
as the SQL Server database via LINQ. MatLab also has many plugins/libraries (some free, some commercial) for
There are also drawbacks. With either piece of software the costs are not insignificant for a lone trader (although
Microsoft does provide entry-level version of Visual Studio for free). Microsoft tools "play well" with each other, but
integrate less well with external code. Visual Studio must also be executed on Microsoft Windows, which is arguably
far less performant than an equivalent Linux server which is optimally tuned.
MatLab also lacks a few key plugins such as a good wrapper around the Interactive Brokers API, one of the few
brokers amenable to high-performance algorithmic trading. The main issue with proprietary products is the lack of
availability of the source code. This means that if ultra performance is truly required, both of these tools will be far less
attractive.
Open source tools have been industry grade for sometime. Much of the alternative asset space makes extensive
use of open-source Linux, MySQL/PostgreSQL, Python, R, C++ and Java in high-performance production roles.
However, they are far from restricted to this domain. Python and R, in particular, contain a wealth of extensive
numerical libraries for performing nearly any type of data analysis imaginable, often at execution speeds comparable
The main benefit of using interpreted languages is the speed of development time. Python and R require far
fewer lines of code (LOC) to achieve similar functionality, principally due to the extensive libraries. Further, they often
allow interactive console based development, rapidly reducing the iterative development process.
Given that time as a developer is extremely valuable, and execution speed often less so (unless in the HFT space), it
is worth giving extensive consideration to an open source technology stack. Python and R possess significant
development communities and are extremely well supported, due to their popularity. Documentation is excellent and
Open source tools often suffer from a lack of a dedicated commercial support contract and run optimally on
systems with less-forgiving user interfaces. A typical Linux server (such as Ubuntu) will often be fully command-line
oriented. In addition, Python and R can be slow for certain execution tasks. There are mechanisms for integrating with
C++ in order to improve execution speeds, but it requires some experience in multi-language programming.
While proprietary software is not immune from dependency/versioning issues it is far less common to have to deal
with incorrect library versions in such environments. Open source operating systems such as Linux can be trickier to
administer.
I will venture my personal opinion here and state that I build all of my trading tools with open source
technologies. In particular I use: Ubuntu, MySQL, Python, C++ and R. The maturity, community size, ability to "dig
deep" if problems occur and lower total cost ownership (TCO) far outweigh the simplicity of proprietary GUIs and
easier installations. Having said that, Microsoft Visual Studio (especially for C++) is a fantastic Integrated
Batteries Included
The header of this section refers to the "out of the box" capabilities of the language - what libraries does it contain and
This is where mature languages have an advantage over newer variants. C++, Java and Python all now possess
extensive libraries for network programming, HTTP, operating system interaction, GUIs, regular expressions (regex),
C++ is famed for its Standard Template Library (STL) which contains a wealth of high performance data structures
and algorithms "for free". Python is known for being able to communicate with nearly any other type of system/protocol
(especially the web), mostly through its own standard library. R has a wealth of statistical and econometric tools built
in, while Matlab is extremely optimised for any numerical linear algebra code (which can be found in portfolio
Outside of the standard libraries, C++ makes use of the Boost library, which fills in the "missing parts" of the standard
library. In fact, many parts of Boost made it into the TR1 standard and subsequently are available in the C++11 spec,
Python has the high performance NumPy/SciPy/Pandas data analysis library combination, which has gained
widespread acceptance for algorithmic trading research. Further, high-performance plugins exist for access to the
main relational databases, such as MySQL++ (MySQL/C++), JDBC (Java/MatLab), MySQLdb (MySQL/Python) and
psycopg2 (PostgreSQL/Python). Python can even communicate with R via the RPy plugin!
An often overlooked aspect of a trading system while in the initial research and design stage is the connectivity to a
broker API. Most APIs natively support C++ and Java, but some also support C# and Python, either directly or with
community-provided wrapper code to the C++ APIs. In particular, Interactive Brokers can be connected to via the
IBPy plugin. If high-performance is required, brokerages will support the FIX protocol.
Conclusion
As is now evident, the choice of programming language(s) for an algorithmic trading system is not straightforward and
familiarity, maintenance, source code availability, licensing costs and maturity of libraries.
The benefit of a separated architecture is that it allows languages to be "plugged in" for different aspects of a trading
Remember that a trading system is an evolving tool and it is likely that any language choices will evolve along with it.
In the next email lesson we will consider how tochoose a platform for backtesting and automated execution.