Vous êtes sur la page 1sur 100

FINANCIAL FORECASTING

Comparison Of ARIMA, FFNN and SVR Models

Dissertation

submitted in partial fulfillment of the requirements


for the degree of

Master of Technology

by
Lahane Ashish Gajanan
(Roll no. 05329R01)

under the guidance of


Prof. Bernard Menezes

Department Of Computer Science And Engineering


Indian Institute of Technology, Bombay
2008
iii

INDIAN INSTITUTE OF TECHNOLOGY BOMBAY


CERTIFICATE OF COURSE WORK

This is to certify that Mr. Lahane Ashish Gajanan was admitted to the candidacy
of the M.Tech. Degree and has successfully completed all the courses required for the
M.Tech. Programme. The details of the course work done are given below.

Sr.No. Course No. Course Name Credits


Semester 1 (Jul – Nov 2005)
1. HS699 Communication and Presentation Skills (P/NP) 4
2. IT601 Mobile Computing 6
3. IT619 IT Foundation Lab 8
4. IT623 Foundation Course Of IT - Part II 6
5. IT653 Network Security 6
Semester 2 (Jan – Apr 2006)
6. CS685 Computer Graphics 6
7. IT606 Embedded Systems 6
8. IT680 Systems Lab 6
9. IT694 Seminar 4
Semester 3 (Jul – Nov 2006)
10. CS623 Introduction To Computing With Neural Nets 6
11. CS625 Machine Learning : Theory And Methods (Audit) 6
12. CS687 Fundamentals Of Digital Image Processing 6
Semester 4 (Jan – Apr 2007)
10. CL622 Introduction To Computational Biology (Institute Elective) 6

Semester 5 (Jul – Nov 2007)


10. CS601 Algorithms And Complexity 6

M.Tech. Project
13. IT696 M.Tech. Project Stage - I (Jul 2007) 18
17. IT697 M.Tech. Project Stage - II (Jan 2008) 30
18. IT698 M.Tech. Project Stage - III (Jul 2008) 42

I.I.T. Bombay Dy. Registrar(Academic)


Dated:
Abstract

This study compares statistical models such as ARIMA, and AI models such as Feed
Forward Neural Networks (FFNN) and Support Vector Regression (SVR), for long term
one-day-ahead forecast of financial indices. A number of forecasting experiments are
conducted on three major indices in Indian stock market: BSE Sensex, CNX IT and
S&P CNX Nifty. The models are studied for various parameters such as ARIMA(p, d, q)
for 0 ≤ p, d, q ≤ 2, FFNNs with 1 to 2 number of hidden layers and different transfer
functions, SVR with polynomial and RBF kernels. Our experiments show that FFNN
models perform better in forecasting value of the index, while ARIMA models perform
better in predicting the direction of movement of the index.
Contents

Abstract i

List of figures vii

List of tables ix

1 Introduction 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Organisation Of The Report . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 5
2.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Measures of forecasting accuracy . . . . . . . . . . . . . . . . . . . 6
2.2 Stock Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Fundamental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Technical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 AutoRegressive Integrated Moving Average(ARIMA) models . . . . . . . . 11
2.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Artificial Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Support Vector Machine(SVM) and Support Vector Regression(SVR) . . . 20
2.5.1 Support Vector Machine(SVM) . . . . . . . . . . . . . . . . . . . . 20
2.5.2 Support Vector Regression(SVR) . . . . . . . . . . . . . . . . . . . 24

3 Literature Survey 27

iii
iv Contents

4 Experimental Setup 33
4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Data Pre-preocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Model Estimation/Training . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Model Estimation for ARIMA . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Training of FFNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.3 Training of SVR models . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 Results Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Results And Analysis 45


5.1 Results For ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Results For ARIMA Model Identification . . . . . . . . . . . . . . . 45
5.1.2 Forecasting Performance Of Different ARIMA models . . . . . . . . 47
5.2 Results For FFNN Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.1 Performance of FFNN In Forecasting Value Of The Series . . . . . 51
5.2.2 Performance of FFNN In Forecasting Direction Of The Series . . . 54
5.3 Results For SVR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.1 Comparison Of Kernels . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.2 Forecasting Performance Of SVRRBF models . . . . . . . . . . . . . 57
5.4 Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4.1 Comparison of ARIMA, FFNN and SVR For Forecasting The Index
Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4.2 Comparison of ARIMA, FFNN and SVR For Forecasting The Di-
rection Of Movement Of The Index . . . . . . . . . . . . . . . . . . 60

6 Conclusion And Future Work 69


6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

A ARIMA Model Identification 73

Bibliography 77
Contents v

Acknowledgements 85
List of Figures

2.1 Error measures for forecasting accuracy[1] . . . . . . . . . . . . . . . . . . 7


2.2 Basic neuron model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Step function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Tanh function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Feed Forward Neural Network model . . . . . . . . . . . . . . . . . . . . . 16
2.7 FFNN for XOR function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Separating hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.9 Maximal separating hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . 21
2.10 Nonlinear to linear space transformation . . . . . . . . . . . . . . . . . . . 24
2.11 Support Vector Regression: ε-tube . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 BSE Sensex Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 S&P CNX Nifty Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 CNX IT Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 Data Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.6 Vector sets Preparation for FFNN and SVR . . . . . . . . . . . . . . . . . 38
4.7 Model estimation for ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.8 FFNN training with holdout validation . . . . . . . . . . . . . . . . . . . . 41
4.9 SVR training with holdout validation . . . . . . . . . . . . . . . . . . . . . 43

5.1 ACF and PACF plots for BSE Sensex with d = 0. ACF shows need for
differencing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

vii
viii List of Figures

5.2 ACF and PACF plots for BSE Sensex with d = 1. Model identified as
ARIMA(1,1,1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 ACF and PACF plots for BSE Sensex with d = 2. Model identified as
ARIMA(0,2,2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 minimum MAPE(L) Vs Window size for FFNNs . . . . . . . . . . . . . . . 65
5.5 max DS(L) Vs Window size for FFNNs . . . . . . . . . . . . . . . . . . . . 65
5.6 Forecasting performance of ARIMA, FFNN and SVR for MAPE(L) . . . . 66
5.7 Forecasting performance of ARIMA, FFNN and SVR for DS(L) . . . . . . 66
5.8 Forecasted values by ARIMA, FFNN and SVR for BSE Sensex . . . . . . . 67

A.1 Possible shapes for ACF and PACF when p = 2 . . . . . . . . . . . . . . . 75


A.2 Possible shapes for ACF and PACF when q = 2 . . . . . . . . . . . . . . . 76
A.3 Possible shapes for ACF and PACF when p = 1 and q = 1 . . . . . . . . . 77
List of Tables

4.1 Statistical details of BSE Sensex, S&P CNX Nifty and CNX IT . . . . . . 35
4.2 Vector sets preparation for FFNN and SVR . . . . . . . . . . . . . . . . . 37
4.3 FFNN parameter values tried in the experiments . . . . . . . . . . . . . . . 40
4.4 Optimisation using SVMdark . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Initial ranges of parameters in SVMdark optimisation . . . . . . . . . . . . 42

5.1 Estimates of φ1 and θ1 for ARIMA(1,1,1) . . . . . . . . . . . . . . . . . . . 46


5.2 Estimates of θ1 and θ2 for ARIMA(0,2,2) . . . . . . . . . . . . . . . . . . . 47
5.3 Comparison of different ARIMA models for forecasting on BSE Sensex . . 48
5.4 Comparison of different ARIMA models for forecasting on CNX IT . . . . 49
5.5 Comparison of different ARIMA models for forecasting on S&P CNX Nifty 50
5.6 Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for BSE Sensex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.7 Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for CNX IT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.8 Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for S&P CNX Nifty. . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.9 Min and Avg MAPE(L)s obtained per window size . . . . . . . . . . . . . 53
5.10 Maximum DS(L)s obtained per window size for BSE Sensex using FFNN
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.11 Maximum DS(L)s obtained per window size for CNX IT using FFNN models 55
5.12 Maximum DS(L)s obtained per window size for S&P CNX Nifty using
FFNN models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.13 Maximum and Avg DS(L)s obtained per window size with FFNN models . 56
5.14 SVRPOLY Vs SVRRBF based on MAPE(L) . . . . . . . . . . . . . . . . . . 57

ix
x List of Tables

5.15 Forecasting performance of SVRPOLY for BSE Sensex . . . . . . . . . . . . 57


5.16 Forecasting performance of SVRPOLY for CNX IT . . . . . . . . . . . . . . 57
5.17 Forecasting performance of SVRPOLY for S&P CNX Nifty . . . . . . . . . . 58
5.18 Forecasting performance of ARIMA, FFNN and SVR for MAPE(L) . . . . 59
5.19 Forecasting performance of ARIMA, FFNN and SVR for DS(L) . . . . . . 60
Chapter 1

Introduction

Last few years, Indian stock market has become a noticeable stock market, not just in
Asia, but in the world due to the contributions of NSE and BSE. The Bombay Stock
Exchange Limited (popularly called The Bombay Stock Exchange, or BSE) is the oldest
stock exchange in Asia. It is also the biggest stock exchange in the world in terms
of listed companies with 4,800 listed companies as of August 2007[2]. It is the largest
stock exchange in South Asia and the tenth largest in the world[3]. The BSE SENSEX
(SENSitive indEX), also called the ”BSE 30”, is a widely used market index in India
and Asia. Though many other exchanges exist, BSE and the National Stock Exchange of
India account for most of the trading in shares in India.
The National Stock Exchange of India Limited (NSE)[4], is the largest stock exchange
in India in terms of daily turnover and number of trades, for both equities and derivative
trading[5]. NSE is the third largest Stock Exchange in the world in terms of the number
of trades in equities[3]. It is the second fastest growing stock exchange in the world with
a recorded growth of 16.6%[6]. The NSE’s key index is the S&P CNX Nifty, known as the
Nifty. Nowadays IT sector companies are catching on a great deal of investment interest
due to the IT boom in India. A number of large, profitable Indian companies today belong
to the IT sector. Hence the index CNX IT, which represents major IT sector companies
in India, has also gained great importance.
Thus BSE Sensex, S&P CNX Nifty and CNX IT are the important indices in Indian
stock market.
Stock market offers too much incentive for forecasting securities/indices correctly.
Using the forecasted values, buy and sell decisions can be made to make short or long
term profits. Hence the financial forecasting has been studied widely. Generally financial
time series are noisy, non-linear and non-seasonal. Hence predicting them is a great

1
2 1.1. Problem Definition

challenge.
Statistical models(ARIMA), and AI models(Feed Forward Neural Networks and Sup-
port Vector Regression) find their application in financial domain for the forecasting stock
prices, indices, inflation rates, GDP etc. Great deal of work and study has been and is
being done in this domain.
Literature study suggests that FFNNs are better than ARIMA models for forecasting.
But strong conclusion about SVR models in comparison with FFNN and ARIMA models
is difficult to draw. Also it was found that for financial forecasting using SVR models,
RBF and Polynomial kernel functions are applicable. Hence in this study SVR with
polynomial and RBF kernels are used. Also in the literature, there is absence of long
term forecasting. E.g., For daily stock values, long term forecasting would mean that
once the model is trained or estimated, then the model is used for forecasting the daily
stock prices for several months without retraining or re-estimating the model.

1.1 Problem Definition


Hence the problem definition for this thesis is:

Problem Definition: To compare three models: ARIMA, FFNN and


SVR(with RBF and Polynomial kernels); for long term one-day-ahead
forecasting performance on three important indices in Indian stock
market: BSE Sensex, S&p CNX Nifty and CNX IT.

A number of forecasting experiments were conducted on the three major indices in


Indian stock Market: BSE Sensex, CNX IT and S&P CNX Nifty. Comparison of statistical
models(such as ARIMA), and AI models(such as Feed Forward Neural Networks and
Support Vector Regression) for long term forecasting of the value of the index using
error measures such as Mean Absolute Percentage Error(MAPE) and Mean Absolute
Error(MAE) was performed. Also they were compared for forecasting the direction of
movement of the index using the error measure: Directional Symmetry(DS). The models
were studied for various parameters. ARIMA(p, d, q) was studied for all the combinations
of p, d and q values ranging between 0 to 2. FFNNs with 1 to 2 number of hidden layers
and different transfer functions were employed. SVR with polynomial and RBF kernels
1.2. Organisation Of The Report 3

were used. Also the effect of different window sizes was studied on FFNN and SVR
models.

1.2 Organisation Of The Report


The organisation of this report is as follows. Chapter 2 gives the theory necessary for this
study, which is: basics of stock markets(section 2.2) and forecasting(section 2.1), theory of
ARIMA models(section 2.3), Artificial Neural Networks(section 2.4) and Support Vector
Regression(section 2.5). Literature study is given in chapter 3. Experimental setup is
explained in detail in chapter 4. The detailed results of the experiments are presented in
chapter 5 followed by conclusions and future work in chapter 6.
4 1.2. Organisation Of The Report
Chapter 2

Background

This chapter briefs the theory that is essential for the understanding of this project and
the project report. It includes basics of the stock market, forecasting, artificial Feed
Forward Neural Networks(FFNN), Support Vector Regression(SVR) and the statistical
forecasting models called Auto-Regressive Integrated Moving Average(ARIMA) models.

2.1 Forecasting
Forecasting is the process of estimation in unknown situations from the historical data.
For example forecasting weather, stock index values, commodity prices etc. Categories of
forecasting methods are as follows:
Time series methods: Time series methods use historical data as the basis for estimat-
ing future outcomes. i.e. given xt , xt−1 , xt−2 ........., called time series, predict xt+1 . The
methods are

• Autoregressive moving average (ARMA)

• Autoregressive integrated moving average (ARIMA)

• Exponential smoothing

• Extrapolation

• Linear prediction

• Trend estimation

• Growth curve

• Topi
6 2.1. Forecasting

In ARIMA or ARMA, the time series is analysed to find the parameters of the ARIMA
or ARMA model respectively, and using these parameters future values are predicted [7].
Causal/econometric methods: Some forecasting methods use the assumption that it is
possible to identify the underlying factors that might influence the variable that is being
forecast. For example, sales of umbrellas might be associated with weather conditions. If
the causes are understood, projections of the influencing variables can be made and used
in the forecast.
i.e. Suppose x depends on variable y1 , y2 , .., yn as x = f (y1 , y2 , ......., yn ), and we
have available with us yti , yt−1
i i
, yt−2 ....., where i = 1, 2, ....n, and we have to find xt+1 .
i
Then yt+1 are predicted using any other suitable methods, then xt+1 is calculated as
1 2 n
xt+1 = f (yt+1 , yt+1 , ....., yt+1 ). The methods are

• Regression analysis using linear regression or non-linear regression

• Econometrics

Another variant of forecasting method is that, xt+1 ispredictedusingxt , xt−1 , ......., plus
i
yti , yt−1 , .......... Neural Networks are very successfully used in these kinds of forecasting.

2.1.1 Measures of forecasting accuracy

The forecast error is the difference between the actual value and the forecast value for
the corresponding period. Et = Yt − Ft where E is the forecast error at period t, Y is the
actual value at period t, and F is the forecast for period t.

Measures of aggregate error are as shown in figure 2.1. For the estimate of directional
accuracy, Directional Symmetry is used.
Out of these MAPE and MAE are the commonly used ones. MAPE and MAE are
related with how close are the forecasted values to the target ones. Lower the MAPE
and MAE values, better is the forecaster. But they deal only with the absolute difference
between forecasted values and target values. It doesn’t take into account the directional
prediction. For the estimate of directional accuracy, Directional Symmetry(DS) is used.
DS is generally used as an error measure in directional forecasting (e.g. predicting the
movement of an index). DS gives the directional prediction efficiency; i.e. how efficient is
2.2. Stock Market 7

Figure 2.1: Error measures for forecasting accuracy[1]

the forecaster in predicting the direction of the series. It’s good to have higher DS value
with low MAPE. It is calculated as follows:

d correct
DS = × 100
d total

where d correct = number of times the forecaster predicted the direction of the series
right and d total = total number of the predictions made

2.2 Stock Market


Let’s see some basics of the stock market.
What Are Stocks? Stock is a share in the ownership of a company. Stock represents
a claim on the company’s assets and earnings. As you acquire more stock, your ownership
stake in the company becomes greater. Whether you say shares, equity, or stock, it all
means the same thing.
Why does a company issue stock? Why would the founders share the profits with
thousands of people when they could keep profits to themselves? The reason is that at
some point every company needs to raise money. To do this, companies can either borrow
it from somebody or raise it by selling part of the company, which is known as issuing
stock. A company can borrow by taking a loan from a bank or by issuing bonds. Both
methods fit under the umbrella of ”debt financing.” On the other hand, issuing stock is
8 2.2. Stock Market

called ”equity financing.” Issuing stock is advantageous for the company because it does
not require the company to pay back the money or make interest payments along the way.
All that the shareholders get in return for their money is the hope that the shares will
some day be worth more in the stock market. Other than that they also get dividends.
What is a Stock Market? The market in which shares are issued and traded either
through exchanges or over-the-counter markets. Also known as the equity market, it is
one of the most vital areas of a market economy as it provides companies with access to
capital and investors with a slice of ownership in the company and the potential of gains
based on the company’s future performance.
The stocks are listed and traded on stock exchanges which are entities (a corporation
or mutual organisation) specialised in the business of bringing buyers and sellers of stocks
and securities together. The stock market in the United States includes the trading of all
securities listed on the NYSE, the NASDAQ, the Amex, as well as on the many regional
exchanges, the OTCBB, and Pink Sheets. European examples of stock exchanges include
the Paris Bourse (now part of Euronext), the London Stock Exchange and the Deutsche
Börse.
An investor in stocks should get maximum returns on his investment. and for that
he needs to know which stocks will do good in future, so that he can invest into them.
So this is the basic incentive for forecasting stock prices. For this, he has to study about
different stocks, their price history, performance and reputation of the stock company,
etc. So this is a broad area of study. It’s mainly divided into two categories:

• Fundamental Analysis

• Technical Analysis

Financial forecasting mainly comes under technical analysis, but fundamental factors can
also be incorporated for getting better results. Let’s see them in brief.

2.2.1 Fundamental Analysis

A method of evaluating a security by attempting to measure its intrinsic value by exam-


ining related economic, financial and other qualitative and quantitative factors. Funda-
mental analysts attempt to study everything that can affect the security’s value, including
2.2. Stock Market 9

macro economic factors (like the overall economy and industry conditions) and individ-
ually specific factors (like the financial condition and management of companies). This
method of security analysis is considered to be the opposite of technical analysis.
Financial statement analysis is the biggest part of fundamental analysis. Also known
as quantitative analysis, it involves looking at historical performance data to estimate the
future performance of stocks. Followers of quantitative analysis want as much data as
they can find on revenue, expenses, assets, liabilities and all the other financial aspects of
a company. Fundamental analysts look at this information to gain insight on a company’s
future performance. This doesn’t mean that they ignore the company’s stock price; they
just avoid focusing on it exclusively.
Financial statement consists of

• Summary of the previous year

• Information about the company in general - its history, products and line of business

• Letter to shareholders from the president or the CEO

• Auditor’s report detailing the accuracy of the results

• An in-depth discussion about the financial results and other factors within the busi-
ness

• The complete set of financial statements (balance sheet, income statement, state-
ment of retained earnings, and cash flow statement)

• Notes to the financial statements

• Other information on the company’s management, officers, offices, new locations,


etc.

But the most important of these for fundamental analysis are balance sheet, income
statement and cash flow statement. But complete ignorance of other fields is not sug-
gested.

2.2.2 Technical Analysis

Technical analysis is a method of evaluating securities by analysing the statistics generated


by market activity, such as past prices and volume. Technical analysts do not attempt
10 2.2. Stock Market

to measure a security’s intrinsic value, but instead use charts and other tools to identify
patterns that can suggest future activity. Unlike fundamental analysts, technical analysts
don’t care whether a stock is undervalued - the only thing that matters is a security’s
past trading data and what information this data can provide about where the security
might move in the future.
The field of technical analysis is based on three assumptions:

• The market discounts everything

• Price moves in trends

• History tends to repeat itself

Let’s see them one by one.

• The Market Discounts Everything: A major criticism of technical analysis is


that it only considers price movement, ignoring the fundamental factors of the com-
pany. However, technical analysis assumes that, at any given time, a stock price
reflects everything that has or could affect the company - including fundamental
factors. Technical analysts believe that the company’s fundamentals, along with
broader economic factors and market psychology, are all priced into the stock, re-
moving the need to actually consider these factors separately. This only leaves the
analysis of price movement, which technical theory views as a product of the supply
and demand for a particular stock in the market.

• Price Moves in Trends: In technical analysis, price movements are believed to


follow trends. This means that after a trend has been established, the future price
movement is more likely to be in the same direction as the trend than to be against
it. Most technical trading strategies are based on this assumption.

• History Tends To Repeat Itself: Another important idea in technical analysis


is that history tends to repeat itself, mainly in terms of price movement. The
repetitive nature of price movements is attributed to market psychology; in other
words, market participants tend to provide a consistent reaction to similar market
stimuli over time.
2.3. AutoRegressive Integrated Moving Average(ARIMA) models 11

Much of the criticism of technical analysis has its roots in academic theory - specifically
the efficient market hypothesis (EMH).
Efficient market hypothesis(EMH): This theory says that the market price is
always the correct one - any past trading information is already reflected in the price of
the stock and, therefore, any analysis to find undervalued securities is useless.
Technical analyst deals with charts, moving averages, information about traded vol-
ume, indicators and oscillators(such as Accumulation/Distribution Line, Moving Average
Convergence Divergence(MACD), Average Directional Index, Aroon Indicator and Oscil-
lator etc.) to determine the trends in price movements of a security (security is a general
term for stocks, bonds etc).
Chart: A price chart is a sequence of prices plotted over a specific time frame. In
statistical terms, charts are referred to as time series plots. On the chart, the y-axis
(vertical axis) represents the price scale and the x-axis (horizontal axis) represents the
time scale. Prices are plotted from left to right across the x-axis with the most recent
plot being the furthest right. For example of a chart, see figure 4.2.
For further details please visit the sites given in [8, 9, 10].

2.3 AutoRegressive Integrated Moving Average(ARIMA)


models
In statistics, ARIMA models, sometimes called Box-Jenkins models after the iterative
Box-Jenkins methodology usually used to estimate them, are typically applied to time
series data for forecasting.
Given a time series of data Xt−1 , Xt−2 , ......X2 , X1 , the ARIMA model is a tool for
understanding and, perhaps, predicting future values in this series. The model consists of
three parts, an autoregressive (AR) part, a moving average (MA) part and the differencing
part. The model is usually then referred to as the ARIMA(p, d, q) model where p is the
order of the autoregressive part, d is the order of differencing and q is the order of the
moving average part.
If d = 0, the model becomes ARMA, which is linear stationary model. ARIMA(i.e.
d > 0) is a linear non-stationary model. If the underlying time series is non-stationary,
taking the difference of the series with itself d times makes it stationary, and then ARMA
12 2.4. Artificial Neural Networks

is applies onto the differenced series.


ARIMA(p, d, q) model is given by:

ϕ∇d Xt = θεt (2.1)

p
X
where:- AR part: ϕ = 1 − ϕi Li ,
i=1
q
X
MA part: θ = 1 + θj Lj
j=1

I(difference) part: ∇ = (1 − L1 )

Here L is lag operator, i.e. Li Xt = Xt−i . ϕi and θj are the model parameters which need
to be found before applying the model for forecasting. εt is a white noise process with
zero mean and variance σ 2 .
ϕi are the parameters of the autoregressive part of the model, θj are the parameters
of the moving average part and the εt are error terms. The error terms εt are gener-
ally assumed to be independent, identically distributed variables sampled from a normal
distribution with zero mean.
Forecasting using ARIMA model: Forecasting Xt from Xt−1 , Xt−2 , ......X2 , X1
using ARIMA model (see equation 2.1) consists of following steps:

1. Model Identification: Using AutoCorrelation Function(ACF) plot and Partial


AutoCorrelation function(PACF) plot, p, d, q values are determined. See Appendix
A for the details. This method was used in these experiments.

2. Model Estimation: Methods such as likelihood or Bayesian are used to determine


the model parameters ϕi and θj .

3. Forecasting: Putting the model parameter values in equation 2.1, Xt and itera-
tively any further future series values are calculated.

For further reading, please refer [11].

2.4 Artificial Neural Networks


Artificial neuron is a basic unit to build Neural Networks(NN) on. So let’s understand
first the basic principles of a neuron.
2.4. Artificial Neural Networks 13

2.4.1 Artificial Neuron

An artificial neuron, also called semi-linear unit, Nv neuron, binary neuron or McCulloch-
Pitts neuron, is an abstraction of biological neurons and the basic unit in an artificial
neural network. The Artificial Neuron receives one or more inputs (representing the one
or more dendrites) and sums them to produce an output (synapse). Usually the sums of
each node are weighted, and the sum is passed through a non-linear function known as
an activation or transfer function. It is represented by
!
X
y =φ w i xi
i

Figure 2.2 shows the basic neuron.


The canonical form of transfer functions is the sigmoid, but they may also take the
form of other non-linear functions, piece wise linear functions, or step functions.

Figure 2.2: Basic neuron model

Generally, transfer functions are monotonically increasing. The transfer function of a


neuron is chosen to have a number of properties which either enhance or simplify the
network containing the neuron. Let
X
u= w i xi
i

Following are some of the transfer functions:


Step function: The output y of this transfer function is binary, depending on whether
the input meets a specified threshold, θ. The ”signal” is sent, i.e. the output is set to
one, if the activation meets the threshold θ. It’s shown in figure 2.3 and given by
14 2.4. Artificial Neural Networks

y = 1 if u ≥ θ
and
y = 0 if u < θ

A neuron with step function as it’s transfer function is also called a perceptron.
Sigmoid:
1
y=
1 + e−u
It is shown in figure 2.4
Tanh:
y = tanh(u)

It is shown in figure 2.5.

Figure 2.3: Step function

Figure 2.4: Sigmoid function

A perceptron is trained by using Perceptron training algorithm.


2.4. Artificial Neural Networks 15

Figure 2.5: Tanh function

2.4.2 Neural Networks

Neural Network(NN), is an interconnected group of artificial neurons that uses a mathe-


matical model or computational model for information processing based on a connectionist
approach to computation. In most cases an ANN is an adaptive system that changes its
structure based on external or internal information that flows through the network. In
more practical terms neural networks are non-linear statistical data modelling tools. They
can be used to model complex relationships between inputs and outputs or to find pat-
terns in data. The tasks to which artificial neural networks are applied tend to fall within
the following broad categories:

• Function approximation, or regression analysis, including time series prediction and


modelling

• Classification, including pattern and sequence recognition, novelty detection and


sequential decision making

• Data processing, including filtering, clustering, blind signal separation and compres-
sion

They can be trained using two paradigms:

• Supervised learning: In supervised learning, we are given a set of examples,


where each example consists of output to corresponding input. and the network is
made to learn those examples, using certain algorithms, after which they are used
to produce unknown outputs for the new inputs.
16 2.4. Artificial Neural Networks

• Unsupervised learning: In unsupervised learning we are given some data x, and a


cost function to be minimised which can be any function of x and the network’s
output, f . The cost function is determined by the task formulation. Most appli-
cations fall within the domain of estimation problems such as statistical modelling,
compression, filtering, blind source separation and clustering.

Application areas of NN include system identification and control (vehicle control,


process control), game-playing and decision making (backgammon, chess, racing), pattern
recognition (radar systems, face identification, object recognition and more), sequence
recognition (gesture, speech, handwritten text recognition), medical diagnosis, financial
applications, data mining (or knowledge discovery in databases, ”KDD”), visualisation
and e-mail spam filtering, forecasting (stock prices).
Special type of neural networks is a Feed Forward Neural Networks.
Feed Forward Neural Networks(FFNN)

Figure 2.6: Feed Forward Neural Network model

A FFNN looks like as shown in figure 2.6. Feed-forward networks have the following
characteristics:

• Neurons are arranged in layers, with the first layer taking in inputs and the last
layer producing outputs. The middle layers have no connection with the external
world, and hence are called hidden layers.

• Each neuron in one layer is connected to every neuron on the next layer. Hence
information is constantly ”fed forward” from one layer to the next, and this explains
why these networks are called feed-forward networks.
2.4. Artificial Neural Networks 17

• There is no connection among neurons in the same layer.

For the application of FFNNs to any area, following are the parameters one needs to
consider and tune through the experimentation:
Number of hidden layers: In practice, neural networks with one and occasionally
two hidden layers are widely used and have performed very well. Increasing the number of
hidden layers also increases computation time and the danger of over fitting which leads
to poor out-of-sample forecasting performance. Over fitting occurs when a forecasting
model has too few degrees of freedom. In other words, it has relatively few observations in
relation to its parameters and therefore it is able to memorise individual points rather than
learn the general patterns. Number of weights also shouldn’t be too large. The greater
the number of weights relative to the size of the training set, the greater the ability of the
network to memorise idiosyncrasies of individual observations. As a result, generalisation
for the validation set is lost and the model is of little use in actual forecasting.
Number of hidden neurons: Despite its importance, there is no .magic. formula
for selecting the optimum number of hidden neurons. Therefore, researchers fall back on
experimentation. However few rules of thumb in literature are as follows:

• For a three-layer network with n input neurons and m output neurons, the hidden
layer would have sqrt(nm) neurons.

• The number of hidden neurons in a three-layer neural network should be 75% of the
number of input neurons.

• Optimal number of hidden neurons will generally be found between one-half to three
times the number of input neurons.

• Optimal number of hidden neurons will generally be found between one-half to three
times the number of input neurons.

• Double the number of hidden neurons until the network’s performance on the testing
set deteriorates.

• There should be at least five times as many training facts as weights, which sets an
upper limit on the number of input and hidden neurons.
18 2.4. Artificial Neural Networks

Regardless of the method used to select the range of hidden neurons to be tested, the
rule is to always select the network that performs best on the testing set with the least
number of hidden neurons.
Number of output neurons: Generally answer to this would be one. But one more
neuron can also be employed to predict the direction of price movement.
Transfer functions: The majority of current neural network models use the sigmoid
(S-shaped) function, but others such as the hyperbolic tangent, step, ramping, arc tan, and
linear have also been proposed. The purpose of the transfer function is to prevent outputs
from reaching very large values which can paralyse neural networks and thereby inhibit
training. Linear transfer functions are not useful for nonlinear mapping and classification.
It has been found that financial markets are nonlinear and have memory suggesting that
nonlinear transfer functions are more appropriate. Transfer functions such as the sigmoid
are commonly used for time series data because they are nonlinear and continuously
differentiable which are desirable properties for network learning.
Figure 2.7 shows a FFNN of neurons for XOR function which, as said earlier, a single
neuron was unable to learn. So how a FFNN is made to learn. Or in other words, how all
the weights for a particular function are found out. So this is done by Backpropagation
with gradient descent algorithm.

Figure 2.7: FFNN for XOR function


2.4. Artificial Neural Networks 19

2.4.2.1 Back Propagation using Gradient Descent Technique:

In gradient descent technique, change in weight is calculated in the negative direction of


the gradient of the error surface. Hence the change in weights for output layer is propor-
tional to the derivative of the error w.r.t. weights. Change in weights for hidden layers
is calculated by back-propagating the respective error factors in the network. Gradient
Descent needs a derivative computation which is not possible in perceptron due to the dis-
continuous step function used. Hence we employ sigmoid neurons in FFNN here onwards.
Also it adds to the Computing power due to non-linearity of sigmoid function.
Training algorithm is as follows:

1. Initialise weights to random values.

2. For each input X p =< xpn , xpn−1 , ......, xp0 >, where p iterates over all input patterns,
modify error and weights as follows

1
E = (tp − op) + E
2

p
∆wji = ηδjp op

where

δjp = (tpj − opj )opj (1 − opj )

for outermost layer. And

X
δjp = ((wkj δkp )opj (1 − opj ))
k∈next−layer

for hidden layer. Here i, j and k iterate over neurons in previous, current and next
layer respectively. η is learning factor which is a tunable parameter. Target output
= t, Observed output = o. i and j iterate over all the neurons in input and output
layer respectively. wji denotes a weight for connection from it h neuron in previous
layer to j t h neuron in next layer.

3. If E < d then stop, else go to step 2, where d is desired error threshold.

For further details please refer [12].


20 2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR)

2.5 Support Vector Machine(SVM) and Support Vec-


tor Regression(SVR)
SVM and SVR are a set of related supervised learning methods used for classification
and regression respectively. They belong to a family of generalised linear classifiers. A
special property of SVMs is that they simultaneously minimise the empirical classification
error and maximise the geometric margin; hence they are also known as maximum margin
classifiers. FFNNs with BP follow the Empirical Risk Management(ERM) principle, while
SVMs follow Structured Risk Management(SRM) principle[13].

2.5.1 Support Vector Machine(SVM)

Basic principle of SVM is that given a set of points which need to be classified into two
classes, find a separating hyperplane which maximises the margin between the two classes.
This will ensure the better classification of the unseen points, i.e. better generalisation.
See figure 2.8. It shows 3 hyperplanes. H3 doesn’t classify at all. H1 classifies, but it isn’t
the maximal separating hyperplane, while H2 is.

Figure 2.8: Separating hyperplanes

Formalisation: We are given some training data, a set of points of the form

D = {(xi , ci )|xi ∈ Rp , ci ∈ {−1, 1}}ni=1


2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR) 21

Figure 2.9: Maximal separating hyperplanes

where the ci is either 1 or −1, indicating the class to which the point xi belongs. Each xi
is a p-dimensional real vector. We want to give the maximum-margin hyperplane which
divides the points having ci = 1 from those having ci = −1. Any hyperplane can be
written as the set of points x satisfying

w · x − b = 0.

The vector w is a normal vector: it is perpendicular to the hyperplane. See figure 2.9. The
b
parameter ||w||
determines the offset of the hyperplane from the origin along the normal
vector w.
We want to choose the w and b to maximise the margin, or distance between the
parallel hyperplanes that are as far apart as possible while still separating the data.
These hyperplanes can be described by the equations

w · x − b = 1 and
w · x − b = −1

Note that if the training data are linearly separable, we can select the two hyperplanes
of the margin in a way that there are no points between them and then try to maximise
their distance. By using geometry, we find the distance between these two hyperplanes is
2
||w||
, so we want to minimise ||w||. As we also have to prevent data points falling into the
margin, we add the following constraint: for each i either
22 2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR)

w · x − b ≥ 1 for xi for the first class or


w · x − b ≤ −1 for xi of the second.

This can be rewritten as:

ci (w · xi − b) ≥ 1, ∀ : 1 ≤ i ≤ n. (2.2)

We can put this together to get the optimization problem:

choose: w, b to minimize: ||w||

subject to: ci (w · xi − b) ≥ 1, ∀ : 1 ≤ i ≤ n. (2.3)

Primal Form: The optimisation problem presented in the preceding section is hard
to optimise because it depends on the absolute value of |w|. The reason is that, in
mathematical terms, it is a non-convex optimization problem which are known to be much
more difficult to solve. Fortunately it is possible to alter the equation by substituting ||w||
with 21 ||w||2 without changing the solution (the minimum of the original and the modified
equation have the same w and b). This is a quadratic programming (QP) optimization
problem. More clearly,
1
minimize:||w||2 ,
2
subject to: ci (w · xi − b) ≥ 1, ∀ : 1 ≤ i ≤ n. (2.4)

1
The factor of 2
is used for mathematical convenience. This problem can now be solved
by standard quadratic programming techniques and programs.
Dual Form: Writing the classification rule in its unconstrained dual form reveals that
the maximum margin hyperplane and therefore the classification task is only a function
of the support vectors, the training data that lie on the margin. The dual of the SVM
can be shown to be:
n
X 1X
maximize: αi − αi αj ci cj xTi xj
i=1
2 i,j
n
X
subject to: αi ≥ 0 and αi ci = 0 (2.5)
i=1

where the α terms constitute a dual representation for the weight vector in terms of the
training set:
X
w= α i ci x i
i
2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR) 23

Soft margin: In 1995, Corinna Cortes and Vladimir Vapnik suggested a modified
maximum margin idea that allows for mislabeled examples[14]. If there exists no hyper-
plane that can split the ”yes” and ”no” examples, the Soft Margin method will choose
a hyperplane that splits the examples as cleanly as possible, while still maximizing the
distance to the nearest cleanly split examples. This work popularized the expression Sup-
port Vector Machine or SVM. The method introduces slack variables, ξi , which measure
the degree of misclassification of the datum xi

ci (w · xi − b) ≥ 1 − ξi , ∀ : 1 ≤ i ≤ n (2.6)

The objective function is then increased by a function which penalises non-zero ξi , and
the optimisation becomes a trade off between a large margin, and a small error penalty.
If the penalty function is linear, then we have to

1 X
minimize: ||w||2 + C ξi ,
2 i
subject to: ci (w · xi − b) ≥ 1 − ξi , ∀ : 1 ≤ i ≤ n. (2.7)

This constraint in equation 2.7 along with the objective of minimizing |w| can be solved
using Lagrange multipliers. The key advantage of a linear penalty function is that the
slack variables vanish from the dual problem, with the constant C appearing only as
an additional constraint on the Lagrange multipliers. Non-linear penalty functions have
been used, particularly to reduce the effect of outliers on the classifier, but unless care is
taken, the problem becomes non-convex, and thus it is considerably more difficult to find
a global solution.
Non-linear classification: Above discussed classifier is a linear classifier. However,
in 1992, Bernhard Boser, Isabelle Guyon and Vapnik suggested a way to create non-linear
classifiers by applying the kernel trick to maximum-margin hyperplanes[15]. The resulting
algorithm is formally similar, except that every dot product is replaced by a non-linear
kernel function. This allows the algorithm to fit the maximum-margin hyperplane in the
transformed feature space. The transformation may be non-linear and the transformed
space high dimensional; thus though the classifier is a hyperplane(linear) in the high-
dimensional feature space it may be non-linear in the original input space. Figure 2.10
shows an example of such a transformation.
24 2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR)

Figure 2.10: Nonlinear to linear space transformation

If the kernel used is a Gaussian radial basis function, the corresponding feature space
is a Hilbert space of infinite dimension. Maximum margin classifiers are well regularized,
so the infinite dimension does not spoil the results. Some common kernels include,

• Polynomial (homogeneous): k(x, x0 ) = (x · x0 )d

• Polynomial (inhomogeneous): k(x, x0 ) = (γ(x · x0 ) + r)d , for γ > 0, r > 0

• Radial Basis Function: k(x, x0 ) = exp(−γ|x − x0 |2 ), for γ > 0

• Sigmoid: k(x, x0 ) = tanh(κx · x0 + c), for some (not every) κ > 0 and c < 0

2.5.2 Support Vector Regression(SVR)

A version of a SVM for regression was proposed in 1996 by Vladimir Vapnik, Harris
Drucker, Chris Burges, Linda Kaufman and Alex Smola[16]. This method is called sup-
port vector regression (SVR). The model produced by support vector classification (as
described above) only depends on a subset of the training data, because the cost function
for building the model does not care about training points that lie beyond the margin.
Analogously, the model produced by SVR only depends on a subset of the training data,
because the cost function for building the model ignores any training data that are close
(within a threshold ε) to the model prediction.
The basic idea: Suppose we are given training data

D = {(xi , yi )|xi ∈ Rp , yi ∈ R}ni=1


2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR) 25

These might be, for instance, exchange rates for some currency measured at subsequent
days together with corresponding econometric indicators. In SVR, our goal is to find a
function f (x) that has at most ε deviation from the actually obtained targets yi for all
the training data, and at the same time is as flat as possible. In other words, we do not
care about errors as long as they are less than ε, but will not accept any deviation larger
than this. This may be important if you want to be sure not to lose more than ε money
when dealing with exchange rates, for instance.
For pedagogical reasons, we begin by describing the case of linear functions f , taking
the form
f (x) = hw, xi + b with w ∈ X, b ∈ R (2.8)

where h·, ·i denotes the dot product in X. Flatness in the case of equation 2.8 means that
one seeks a small w. One way to ensure this is to minimize the norm, i.e. ||w||2 = hw, wi.
We can write this problem as a convex optimization problem:

1
minimize: ||w||2
 2
 y − hw , x i − b ≤ ε
i i i
subject to: (2.9)
 hw , x i + b − y ≤ ε
i i i

Soft Margin: The tacit assumption in equation2.9 was that such a function f actually
exists that approximates all pairs (xi , yi ) with ε precision, or in other words, that the
convex optimization problem is feasible. Sometimes, however, this may not be the case,
or we also may want to allow for some errors. Analogously to the ”soft margin” loss
function which was used in SVMs by Cortes and Vapnik (1995)[14], one can introduce
slack variables ξi , ξi∗ to cope with otherwise infeasible constraints of the optimization
problem in equation 2.9. Hence we arrive at the formulation:
l
1 2 X
minimize: ||w || + c (ξi + ξi∗ )
2 i=1

 y − hwi , xi i − b ≤ ε + ξi
 i


subject to: hwi , xi i + b − yi ≤ ε + ξi∗ (2.10)


 ξ + ξ∗ ≥ 0

i i

The constant C > 0 determines the trade-off between the flatness of f and the amount
up to which deviations larger than ε are tolerated. This corresponds to dealing with a so
26 2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR)

called ε-insensitive loss function |ξ|ε described by





 0 if |ξ| ≤ ε

|ξ|ε := |ξ| − ε otherwise (2.11)


 ξ + ξ∗ ≥ 0

i i

Figure 2.11: Support Vector Regression: ε-tube

Figure 2.11 depicts the situation graphically. Only the points outside the shaded region
contribute to the cost insofar, as the deviations are penalized in a linear fashion. It turns
out that in most cases the optimization problem in equation 2.10 can be solved more easily
in its dual formulation. Moreover, the dual formulation provides the key for extending
SVmachine to nonlinear functions.
Non-linear regression: For non-linear regression the kernel functions discussed in
section 2.5.1 are applicable in SVR too. In that case dot products in equation 2.10 would
be replaced by the kernel function.
For more details on SVM and SVR, please refer [17, 18].
Chapter 3

Literature Survey

ARIMA, FFNN and SVR model, all of these find their applications in time series fore-
casting. Only few are the following examples:

• In [19] Holt-Winters(HW) exponentially weighted moving average models and Box-


Jenkins (ARIMA) modeling with intervention analysis were used for forecasting
incoming calls to telemarketing centers for the purposes of planning and budgeting.
ARIMA models outperformed HW models.

• Kohzadi, Boyd, Kermanshahi and Kaastra in [20] in year 1996, compare NN with
ARIMA. Data used was monthly live cattle and wheat prices from 1950 through
1990. The neural network models achieved a 27% and 56% lower mean squared
error than ARIMA model. The absolute mean error and mean absolute percent
error were also lower for the neural network models. The neural network models
were able to capture a significant number of turning points for both wheat and cattle,
while the ARIMA model was only able to do so for wheat. They used mean square
error(MSE), mean absolute error (MAE), mean absolute percent error (MAPE), as
previously described, along with a measure for ability to forecast turning points to
compare both methods. Paper gives very much detailed comparison, but fails to
detail on the specifics of NN and ARIMA models used.

• H. F. Zou and G. P. Xia and F. T. Yang and H. Y. Wang[21] in year-2007 used FFNN
and time series models for Chinese food grain price forecasting. FFNN model was
found out to be the best.

• SVR was used for load forecasting by B. Chen and M. Chang and C. Lin in 2001[22].

• Highway traffic forecasting by support vector regression model was performed by

27
28

Wei-Chiang Hong and Ping-Feng Pai and Shun-Lin Yang and Theng, R.[23] in year
2006.

Financial time series are non linear and among the noisiest and most difficult signals
to forecast. This has led many economists to adopt the efficient market hypothesis[24].
This hypothesis states that price changes are independent of the past, follow a random
walk, and therefore price changes are unpredictable. Although there is strong evidence
in literature against the efficient market hypothesis, it remains still a formidable task to
develop an accurate and robust financial market forecasting system.
FFNNs, SVR and ARIMA models have been and are being vastly studied in financial
forecasting also. For example:

• The modelling of the Indian stock market (price index) data using NN was done by
Mohan Neeraj and Jha Pankaj and Laha Arnab Kumar and Dutta Goutam[25] in
2005. They studied the efficacy of NN in modelling the Bombay Stock Exchange
(BSE) SENSEX weekly closing values. Paper gives very minute details about con-
structive NN model building performed; such as inputs used, transfer functions used,
how the number of layers and neurons per layer were varied throughout; and also
about the comparison of the NN models of different configurations employed.

• In financial forecasting using FFNN, technical indicators along with the original
series were used as reported in [26, 27], which improved the performance. [28] also
reports addition of the fundamental information about the companies for the better
results.

• The use of Support Vector Machines (SVMs) is studied in financial forecasting by


comparing it with a multi-layer perceptron trained by the Back Propagation (BP)
algorithm by Lijuan Cao and Francis E.H Tay in 2001[29]. The S&P 500 Daily Index
in the Chicago Mercantile was the subject of forecasting. SVMs forecast better than
BP. Since there is no structured way to choose the free parameters of SVMs, the
generalisation error with respect to the free parameters of SVMs is investigated in
this experiment. As per the article, they have little impact on the solution.

• An unsupervised neural based approach to financial forecasting is presented in the


article [30]; it was shown that the unsupervised network outperforms multilayer
29

perceptrons, radial basis function network and a standard ARIMA model. Dow-
Jones Index was the subject of forecasting.

• In 2004 forecasting of the NIFTY stock index futures returns is carried out using
backpropagation and recurrent neural network model and a linear ARIMA model by
M. Kumar and M. Thenmozhi[31]. A comparison of different models shows that for
NIFTY index futures returns backpropagation neural network model outperforms
the recurrent neural network and the traditional ARIMA models. Moreover recur-
rent neural network models outperform the traditional ARIMA models. A 3-2-1
neural network architecture is best fit for forecasting NIFTY futures returns.

• In [32] by Wun-Hua Chen and Jen-Ying Shih and Soushan Wu (year 2006) AR(1)
model, support-vector machines and back propagation neural networks were used
for forecasting the six major Asian stock markets. FFNN with BP and SVM out-
performed AR(1) in MSE, MAE, but AR(1) was better in directional symmetry1 .
Performances FFNN and SVMs were marginally different.

• SVR was studied in detail for forecasting of four major indices in Kuala Lumpur
Stock Exchange by Chai Chon Lung in 2006[33]. SVR with polynomial and RBF
kernels were compared, concluding that polynomial kernel performs better than
RBF. Also FFNN with BP was compared with SVR. SVR with polynomial kernel
was found to be superior than FFNN too.

• In 2000 FFNN were compared against SVR subjected to forecasting of AOL, YA-
HOO and IBM stock prices by Theodore B. Trafalis and Huseyin Ince[34]. Different
models of SVRs and also FFNNs were applied. For IOL best values of MSEs for
SVM and FFNN were, 2.853779 and 2.3512 respectively. For IBM best values of
MSEs for SVM and FFNN were, 2.772304 and 2.8020 respectively. And for YAHOO
best values of MSEs for SVM and FFNN were, 20.81991 and 10.6003 respectively.
Thus FFNNs came out to be winners for AOL and YAHOO with a great margin,
while SVRs beat FFNNs for IBM with a small margin.

All this literature suggests that FFNNs are better than ARIMA models at financial
forecasting. But strong conclusion about SVR models in comparison with FFNN and
1
for the definitions of these terms refer section 2.1.1
30

ARIMA models is difficult to draw. Also it was found that for financial forecasting using
SVR models, RBF and Polynomial kernel functions are applicable. Hence in this study
ARIMA, FFNN and SVR(with RBF and Polynomial kernels) models were compared for
forecasting performance subject to three main Indian stock market indices.
Also in the literature, there is absence of long term forecasting. E.g., For daily stock
values, long term forecasting would mean that once the model is trained or estimated,
then the model is used for forecasting the daily stock prices for several months without
retraining or re-estimating the model.
There are many stock markets in India and the most prominent of them are the BSE
and NSE, and between them are responsible for the vast majority of share transactions.[35].
Their brief details are as follow:

• BSE: The Bombay Stock Exchange Limited (popularly called The Bombay Stock
Exchange, or BSE) is the oldest stock exchange in Asia. It is also the biggest stock
exchange in the world in terms of listed companies with 4,800 listed companies as
of August 2007[2] and it has a significant trading volume. On 31 December 2007,
the equity market capitalization of the companies listed on the BSE was US$ 1.79
trillion, making it the largest stock exchange in South Asia and the tenth largest in
the world[3]. The BSE SENSEX (SENSitive indEX), also called the ”BSE 30”, is a
widely used market index in India and Asia. Though many other exchanges exist,
BSE and the National Stock Exchange of India account for most of the trading in
shares in India.

• NSE The National Stock Exchange of India Limited (NSE)[4], is the largest stock
exchange in India in terms of daily turnover and number of trades, for both equities
and derivative trading[5]. As of 2006, the NSE VSAT terminals, 2799 in total,
cover more than 1500 cities across India[36]. In October 2007, the equity market
capitalization of the companies listed on the NSE was US$ 1.46 trillion, making
it the second largest stock exchange in South Asia. NSE is the third largest Stock
Exchange in the world in terms of the number of trades in equities[3]. It is the second
fastest growing stock exchange in the world with a recorded growth of 16.6%[6]. The
NSE’s key index is the S&P CNX Nifty, known as the Nifty, an index of fifty major
stocks weighted by market capitalisation. Information Technology (IT) industry
31

has played a major role in the Indian economy during the last few years. Hence the
index CNX IT has also gained considerable importance this decade.

From these details, it’s clear that Indian stock market has become a noticeable stock
market, not just in Asia, but in the world due to the contributions of NSE and BSE; and
that BSE Sensex, S&P CNX Nifty and CNX IT are the important indices in Indian stock
market. Here are few briefs about these indices:

• BSE Sensex[37]: The BSE Sensex or Bombay Stock Exchange Sensitive Index is
a value-weighted index composed of 30 stocks. It consists of the 30 largest and most
actively traded stocks, representative of various sectors, on the Bombay Stock Ex-
change. These companies account for around one-fifth of the market capitalization
of the BSE. The base value of the sensex is 100 on April 1, 1979, and the base year
of BSE-SENSEX is 1978-79. The index has increased by over ten times from June
1990 to the present. Using information from April 1979 onwards, the long-run rate
of return on the BSE Sensex works out to be 18.6% per annum, which translates to
roughly 9% per annum after compensating for inflation[38].

• S&P CNX Nifty[39]: S&P CNX Nifty nicknamed Nifty 50 or simply Nifty (NSE),
is the leading index for large companies on the National Stock Exchange of India.
The Nifty is a well diversified 50 stock index accounting for 21 sectors of the economy.
It is used for a variety of purposes such as benchmarking fund portfolios, index based
derivatives and index funds. The traded value for the last six months of all Nifty
stocks is approximately 44.89% of the traded value of all stocks on the NSE. Nifty
stocks represent about 58.64% of the total market capitalization as on March 31,
2008. Impact cost of the S&P CNX Nifty for a portfolio size of Rs.2 crore is 0.15%.

• CNX IT[40]: A number of large, profitable Indian companies today belong to the
IT sector and a great deal of investment interest is now focused on the IT sector.
In order to have a good benchmark of the Indian IT sector, IISL has developed
the CNX IT sector index. CNX IT provides investors and market intermediaries
with an appropriate benchmark that captures the performance of the IT segment
of the market. Companies in this index are those that have more than 50% of their
turnover from IT related activities like software development, hardware manufac-
ture, vending, support and maintenance. The average total traded value for the
32

last six months of CNX IT Index stocks is approximately 91% of the traded value
of the IT sector. CNX IT Index stocks represent about 96% of the total market
capitalization of the IT sector as on March 31, 2005. The average total traded value
for the last six months of all CNX IT Index constituents is approximately 14% of
the traded value of all stocks on the NSE. CNX IT Index constituents represent
about 14% of the total market capitalization as on March 31, 2005.

These three indices were used to perform the forecasting on.


The details of the experiments conducted, the results and the analysis of the results
follow hereafter.
Chapter 4

Experimental Setup

Figure 4.1: Methodology

33
34 4.1. Data Collection

This chapter gives details about the experimental setup and the steps followed in experi-
ments. Figure 4.1 shows the flowchart representation of the overall methodology followed
in the experiments. Details of each step and the experimental setup is discussed below.

4.1 Data Collection


To get the best possible performance in forecasting, it’s desirable to take as much historical
data as possible, so that the training data will also be huge, which will ensure effective
training. Historical quotes for BSE Sensex, S&p CNX Nifty and CNX IT were obtained
as follows:

Figure 4.2: BSE Sensex Plot

• BSE Sensex: Historical quotes for BSE Sensex index starting from 1st July 1997
are available at [41]. From this site historical quotes for the period of 1st July 1997
to 27 December 2007 for BSE Sensex index were downloaded. It consists of open,
high, low, close, volume, adjusted close values for each of the stock market working
days. It has 2474 number of rows each containing one set of above mentioned
values. The duration is long enough in order to model the prices accurately and
has a ’boom-bust’ cycle. This is important since otherwise the ANNs will not be
properly trained to handle a ’boom-bust’ cycle if that happens in future.
4.1. Data Collection 35

Figure 4.3: S&P CNX Nifty Plot

• S&p CNX Nifty: From [4] historical quotes for the period of 1st July 1990 to 14th
March 2008 for S&p CNX Nifty index were downloaded. It consists of open, high,
low, close, volume, adjusted close values for each of the stock market working days.
It has 4220 number of rows each containing one set of above mentioned values.

• CNX IT: From [4] historical quotes for the period of 1st January 1996 to 14th
March 2008 for CNX IT index were downloaded. It consists of open, high, low,
close, volume, adjusted close values for each of the stock market working days. It
has 3037 number of rows each containing one set of above mentioned values.

Data Points Min Max Mean Std. Dev.


BSE Sensex 2474 2600.1 19978 6130.1 3813.5
S&P CNX Nifty 4220 279 6287.9 1543 1124.3
CNX IT 3037 762.47 95502 11958 13086

Table 4.1: Statistical details of BSE Sensex, S&P CNX Nifty and CNX IT

Figures 4.2,4.3 and 4.4 show 2-D plota of BSE Sensex, S&P CNX Nifty and CNX IT
respectively. Their statistical details are also shown in table 4.1
36 4.2. Data Pre-preocessing

Figure 4.4: CNX IT Plot

4.2 Data Pre-preocessing


Closing values of BSE Sensex index were taken out from the downloaded data for fore-
casting purpose. The pre-processing needed on the data for ARIMA and SVR was the
normalization in the range [0, 1]. For FFNN, the normalization range was [0, 1] in case
of sigmoid transfer function, and [−1, 1] in case of tanh transfer function. Each of these
three index time series were then partitioned into two sets (as shown in figure 4.5):

Figure 4.5: Data Partition

• Latest Data(L): This set consisted of 200 latest index values( nearly 11 months of
closing values). This data was used for testing the long run forecasting performance
of the models after the estimation/training. This testing is different than the testing
used while training the models. ’L’ enclosed in brackets will be used henceforth to
indicate the values of error measures corresponding to ’Latest Data’.
4.2. Data Pre-preocessing 37

• Historical data(H): This set consisted of remaining trailing index values. This
data was used for training(which also includes validation and testing, this testing
is different than the above mentioned one; this testing is performed during estima-
tion of the weights involved in the SVR and FFNN models.) the neural networks
and SVRs, and in case of ARIMA it was used for model estimation. ’H’ enclosed
in brackets will be used henceforth to indicate the values of error measures corre-
sponding to ’Historical Data’.

Model estimation was performed using historical data, and then latest data was used
for the long run forecasting performance. To use historical or latest data for FFNN and
SVR, it needed to be further processed to convert it into the form of vector sets. For
both FFNN and SVR, the data needs to be in the format of set of input-output vector.
i.e. each vector contains the input for the model and the output that the model should
produce. This is the basic requirement of supervised learning.

INPUT OUTPUT
x1 , x2 , ..., xw xw+1
x2 , x3 , ..., xw+1 xw+2
. .
. .
. .
. .
xn−w , xn−w+1 , ..., xn−1 xn

Table 4.2: Vector sets preparation for FFNN and SVR

Let xn , xn−1 , ......, x2 , x1 represent scaled value set H or L. Let w be the window size. The
window size determines that how many trailing data points should be used to forecast
the current value. In case of FFNN, window size is equal to the number of input layer
neurons. After the processing, vector sets would look like as shown in table 4.2. Each row
in the table represents one vector. As only one-day-ahead forecasing is to be performed,
’OUTPUT’ column of the table contains only one data value for each row(vector). The
process is shown in figure 4.6.
MATLAB[42] was used for this step.
38 4.3. Model Estimation/Training

Figure 4.6: Vector sets Preparation for FFNN and SVR

4.3 Model Estimation/Training


Using the historical data pre-processed as discussed above, model estimation for ARIMA,
and training of FFNN and SVR models was performed as follows:

4.3.1 Model Estimation for ARIMA

Ideally in ARIMA, model identification is to be performed using ACF and PACF(refer


section 2.3) plots to identify p, d and q values for the series before model estimation[7].
i.e. ARIMA(p, d, q) model is identified first, then model estimation is performed for the
identified model. Generally p, d and q values range between 0 to 2 for any practical time
series[7].
Model Identification step was also performed for ARIMA, details of which are discussed
in section 5.1.1. But model estimation was performed for all possible combinations of p, d
and q values each ranging between 0 to 2. Then the forecasting was done using all these
models. It was cross-checked that have the identified models performed better than the
other models, which was the case indeed.
4.3. Model Estimation/Training 39

Figure 4.7: Model estimation for ARIMA

The pictorial representation of these steps is shown in figure 4.7. For performing these
steps SAS[43] software was used. This software uses likelihood method to determine the
model parameters ϕi and θj (see equation 2.1).

4.3.2 Training of FFNN

Neural Network Toolbox in MATLAB was used for this part of the experiment. It provides
a wide variety of BP algorithms which are: Batch Gradient Descent(GD), Batch Gradient
Descent with Momentum, GD with Variable Learning Rate, GD with Variable Learning
Rate and momentum, Scaled Conjugate Gradient, BFGS Quasi-Newton, One Step Secant
Quasi-Newton and Levenberg-Marquardt. Out of these ’Levenberg-Marquardt’ BP algo-
rithm is the fastest and memory efficient one[44] according to the literature, which was
actually observed in few initial experiments. Hence thereafter only ’Levenberg-Marquardt’
BP algorithm was used in all the experiments. That means all the FFNN experiments
documented in this report are carried out with this algorithm only.
Different FFNN parameter values that were used in the experiments are as shown in
table 4.3. All the combinations of these values were used for each of the three series. And
40 4.3. Model Estimation/Training

in turn onto each combination of these values for each series holdout verification method
with 20 runs was applied.
In holdout verification, vectors are chosen randomly from the initial sample to form
the validation data, and the remaining observations are retained as the training data.
Normally, less than a third of the initial sample is used for validation data. In this case
validation set was 0.2th fraction of the original set.

The pictorial representation of these steps is shown in figure 4.8. Thus for each series
6440 neural networks were trained and kept to test for forecasting in the next step of the
experiment.

4.3.3 Training of SVR models

SVMdark[45] tool was used for this part of the experiment. It uses SVMlight[46] library
for the SVM computations. This library implements optimization algorithms described
in [47]. It supports linear, polynomial, rbf and sigmoid kernels. All of these kernels were
tried in the experiments.
The decision on selecting appropriate values of d is basically by trial and error. How-
ever, based on a study by Ali and Smith (2003)[48] using various sample sets with different
number of attributes and sizes, their experiments showed that the search space for d val-
ues should be ranged from 2 to 5. Therefore, in this project, the values of polynomial

FFNN Parameter Values Tried


Hidden layers 1,2
Trans func in hidden layer tanh, sigmoid
Trans func in output layer tanh, sigmoid, linear
[2 1],[3 1],[5 1],[7 1],[10 1],[15 1],[20 1],
[40 1],[3 2 1],[5 2 1],[5 3 1],[7 2 1],
Neurons per hidden layer [7 3 1],[7 5 1],[10 2 1],[10 3 1],
[10 5 1],[10 7 1],[15 2 1],[15 3 1],
[15 5 1],[15 7 1],[15 10 1]
Window size 5,7,10,20,40,60,80
Neurons in output Layer 1

Table 4.3: FFNN parameter values tried in the experiments


4.3. Model Estimation/Training 41

Figure 4.8: FFNN training with holdout validation

degree will be in the range of 0 to 5.

SVMdark provides an optimization option where we have to specify range of parameter


values, and when run produces a table like 4.4. Therefore, in this case, the best kernel
to make the prediction is polynomial kernel with C = 448805.2, ε = 0.026124, d = 3, γ =
3.528855 and r = 3.400067. Initial range of values used for these parameters is shown
in table 4.5. Then, the range of these parameters will be narrowed and the optimization
progress goes on iteratively until the best parameters and kernel types have been chosen.
Finally we will get the best values for these parameters which will be used in the SVR
model. SVR was applied to three time series with window sizes of 5,7,10,20,40,60 and 80.
For each frame size again holdout validation was used, with 5 repeats. i.e. training and
validation sets were prepared 5 times randomly with 0.8 and 0.2 fraction of the original
42 4.3. Model Estimation/Training

MSE time/s C epsilon kernel param1 param2 param3


0.0001 1.122 448805.2 0.026124 poly 3 3.528855 3.400067
0.000151 0.27 368327.9 0.04001 poly 1 3.350322 1.169774
0.00095 0.22 324625.4 0.06241 poly 5 2.66686 4.093753
0.001855 0.581 859248.6 0.074496 poly 3 0.759148 4.973907
0.031777 0.19 141331.2 0.904508 rbf 2.619251
0.031777 0.19 783196.5 0.819056 poly 3 4.239937 2.862178
0.031777 0.18 73244.42 0.289987 rbf -0.84246
0.031777 0.191 773461.1 0.267922 rbf -4.37285
0.031777 0.19 366161.1 0.511246 rbf -4.55535

Table 4.4: Optimisation using SVMdark

Parameters Start Range End Range


C 0 1000000
ε 0 1
d (Polynomial kernel parameter) 0 5
γ (Polynomial kernel parameter) 0 5
r (Polynomial kernel parameter) 0 5
γ (Radial Basis kernel parameter) -5 5
κ (Sigmoid kernel parameter) -5 5
c (Sigmoid kernel parameter) -5 5

Table 4.5: Initial ranges of parameters in SVMdark optimisation


4.4. Forecasting 43

vector set. Each time using optimisation tool, the best parameter values and the kernel
will be found out. Then for these values SVR model will be trained using ”Learn” option
in SVMdark and the trained model will be used for forecasting in the next step of the
experiment.

Figure 4.9: SVR training with holdout validation

The pictorial representation of these steps is shown in figure 4.9.

4.4 Forecasting
All the models developed in previous step are used to one-day ahead forecast the latest
data(L) which consists of 200 points. In case of SVR, SVMdark tool is used. For FFNN
44 4.5. Results Analysis

and ARIMA MATLAB is used. For ARIMA, equation 2.1 is implemented in MATLAB,
and the values of the parameters from model estimates developed in SAS are substituted
in the equation. Forecasted values are saved for further analysis.
Aim of this forecasting is to compare the forecasting performance of the models over
a long run, without making any changes in the model, without retraining the model.

4.5 Results Analysis


To compare the forecasting performance of the models MAPE, MAE and DS(please refer
section 2.1.1 for definitions of these terms) were used. From the forecasted values and the
original values, these error measures were estimated using MATLAB for each model. The
detail analysis of the results follows in the next chapter.
Chapter 5

Results And Analysis

In this chapter all the results are presented along with the analysis and discussion. First
the results for individual methods are presented, starting with ARIMA, then FFNN and
then SVR. Lastly these three models are compared with each other. For analysis of the
forecasting performance of the different models on Latest data(L)(L and H are discussed in
section 4.2) three error measures: MAPE, MAE and DS(see section 2.1.1 for definitions of
these terms) were used. These terms will be denoted by MAPE(L), MAE(L) and DS(L)
respectively hereafter to indicate that they are calculated on latest data L. A model
with minimum MAPE(L) and MAE(L) is considered to be better at forecasting values of
the series. While model with maximum DS(L) is considered to be better at forecasting
direction of movement of the series.

5.1 Results For ARIMA Models


As discussed earlier in section 4.3.1, for ARIMA, model identification was performed first,
followed by model estimation.
Lets see the model identification step first.

5.1.1 Results For ARIMA Model Identification

The model identification was performed using ACF and PACF plots(see appendix A for
the details). Figure 5.1 shows the ACF and PACF plots for BSE Sensex with d = 0. The
two lines parallel to X-axis show 95% confidence interval. Anything inside these lines can
be treated as zero. In the figure, ACF plot doesn’t decay rapidly, which means that the
series is non-stationary, hence needs differencing(according to rule 2 in appendix A).

45
46 5.1. Results For ARIMA Models

Figure 5.2 shows ACF and PACF plots for BSE Sensex. From ACF plot it’s clear
that there is no need of further differencing(according to rule 1 in appendix A). Here
ACF and PACF both plots are mixtures of exponentials and damped sine waves, which
start after the first lag Hence according to rule 5 in appendix A the model identified
is ARIMA(1,1,1). Also these plots resemble the pattern from region 4 in figure A.3,
which ensures ARIMA(1,1,1) model. Validity of this model was confirmed in the model
estimation(i.e. next) step when it was observed that the values of φ1 and θ1 for each of
these three series which were estimated using SAS software, fall in the region 4 only as
shown in table 5.1. Also out of all the ARIMA models that were tried, it was observed that
ARIMA(1,1,1) model gave minimum MAPE(L), minimum MAE(L) and also maximum
DS(L)(see the performance of models BARIMA1, CARIMA1 and SARIMA1 from tables
5.3, 5.4 and 5.5 respectively).

BSE Sensex CNX IT S&P CNX Nifty


φ1 -0.2843 -0.4500 -0.4160
θ1 -0.3628 -0.4778 -0.5045

Table 5.1: Estimates of φ1 and θ1 for ARIMA(1,1,1)

Though there is no need of further differencing, one more differencing was performed,
as the models with d = 2 were also tested for forecasting in the next step of the experi-
ments. Figure 5.3 shows ACF and PACF plots for that. From the figure, PACF decays as
good as exponentially, while ACF is nonzero till the second lag. Hence according to rule
4 in appendix A, ARIMA(0,2,2) may also be fit for this series. These plots resemble the
pattern from region 1 in figure A.2, which ensures ARIMA(0,2,2) model. In the model
estimation(i.e. next) step, it was observed that the values of θ1 and θ2 for each of these
three series also fall in the region 1 of figure A.2. See table 5.2. But the forecasting
performance of this model was too poor for all of the three series(see the performance of
models BARIMA2, CARIMA2 and SARIMA2 from tables 5.3, 5.4 and 5.5 respectively).
It shows that this second differencing is absolutely unnecessary.
For CNX IT and S&P CNX Nifty also same identification procedure was followed.
They also produced exactly the same results as that for BSE Sensex. Hence these results
are not presented here. As all the three series have shown similar statistical characteristics,
we expect that further results will also be similar for these three series.
5.1. Results For ARIMA Models 47

5.1.2 Forecasting Performance Of Different ARIMA models

Tables 5.3, 5.4 and 5.5 show the forecasting performance of different ARIMA models on
BSE Sensex, CNX IT and S&P CNX Nifty respectively. Tables are sorted in ascending
order of MAPE(L). Observe that

• for all the three series ARIMA(1,1,1) model has performed better than all the other
ARIMA models with respect to MAPE(L), MAE(L) and as well as DS(L). This
supports the findings of the model identification step from section 5.1.1.

• all the models with d = 2 show drastic reduction in performance for all the series,
which again supports findings of the section 5.1.1.

• MA models (i.e. ARIMA with p = d = 0) perform very badly for all the series.
This indicates that they are not suitable for these kind of series.

• MAPE(L) values are least while MAE(L) values are maximum for BSE SENSEX
out of the three series.

• MAE(L) is least for CNX IT.

• DS(L) is best for CNX IT.

• DS(L) of nearly 80% was achieved for each of the three series.

• for a series, MAE(L) increases and DS(L) decreases with the increase in MAPE(L).

Main conclusion of this section is that ARIMA(1,1,1) model fits best for these three
series. This was all about forecasting using ARIMA models. In the next subsection,
forecasting using FFNN is analysed.

BSE Sensex CNX IT S&P CNX Nifty


θ1 -1.0473 -1.0323 -1.1000
θ2 -0.0797 -0.0892 -0.0688

Table 5.2: Estimates of θ1 and θ2 for ARIMA(0,2,2)


48 5.1. Results For ARIMA Models

p d q MAPE(L) MAE(L) DS(L) Model ID


1 1 1 1.3513 224.8670 79.4872 BARIMA1
1 0 1 1.3619 227.4520 74.8718 —
1 1 0 1.4056 234.4544 77.9487 —
2 0 1 1.4126 236.5703 74.3590 —
2 0 0 1.4238 236.6412 77.9487 —
2 1 0 1.4248 239.0455 78.4615 —
1 0 0 1.4276 237.9159 76.4103 —
0 1 2 1.4294 239.2292 74.3590 —
1 0 2 1.4304 238.6710 76.4103 —
0 1 0 1.4320 239.6841 75.3846 —
0 1 1 1.4350 240.0288 76.4103 —
1 1 2 1.4419 240.1546 76.9231 —
2 1 1 1.4683 243.1708 74.8718 —
2 0 2 1.4717 243.7975 77.4359 —
2 1 2 1.6614 276.1696 74.3590 —
0 2 0 2.1003 352.0032 64.1026 —
1 2 0 2.7536 463.1474 65.1282 —
2 2 0 3.1878 541.6262 68.2051 —
2 2 2 20.0781 3567.9351 58.4615 —
1 2 2 20.0938 3593.9707 58.4615 —
0 2 2 21.8191 3859.0104 55.3846 BARIMA2
2 2 1 22.2683 3936.4251 52.8205 —
1 2 1 22.7764 4028.2975 55.3846 —
0 2 1 23.2463 4094.8689 52.3077 —
0 0 2 24.5507 3924.3205 50.2564 —
0 0 1 53.8217 8278.2747 46.1538 —
0 0 0 65.8243 10652.6350 41.0256 —

Table 5.3: Comparison of different ARIMA models for forecasting on BSE Sensex
5.1. Results For ARIMA Models 49

p d q MAPE(L) MAE(L) DS(L) Model ID


1 1 1 1.4788 66.3444 79.7718 CARIMA1
1 0 1 1.4801 66.7078 79.1000 —
2 0 1 1.4811 66.5029 78.3240 —
1 0 0 1.4817 66.3722 77.0000 —
2 0 0 1.4832 66.3907 76.5760 —
2 1 0 1.4997 67.3789 75.8898 —
0 1 2 1.5315 69.8205 74.9823 —
1 1 2 1.5439 69.4085 76.8533 —
1 0 2 1.5602 70.5667 73.0667 —
0 1 0 1.5850 71.3399 75.0000 —
0 1 1 1.5882 76.7787 72.6670 —
1 1 0 1.5982 73.0655 72.8333 —
2 1 1 1.6064 72.9139 72.9167 —
2 0 2 1.6122 73.7071 71.1123 —
2 1 2 1.6228 74.1229 71.9087 —
0 2 0 2.1662 98.8049 71.1238 —
1 2 0 3.0154 138.2871 69.5643 —
2 2 0 3.8608 177.8982 68.7684 —
2 2 2 25.6134 1221.1735 55.6530 —
1 2 2 42.0836 2031.1597 50.6213 —
0 2 2 64.2560 3090.8879 50.6213 CARIMA2
2 2 1 77.9525 3734.7569 50.6213 —
1 2 1 93.4798 4344.1582 50.6213 —
0 2 1 93.4934 4352.5702 50.6213 —
0 0 2 175.9593 8053.0132 50.6213 —
0 0 1 183.5418 8448.4003 50.6213 —
0 0 0 243.5168 11222.1644 50.6213 —

Table 5.4: Comparison of different ARIMA models for forecasting on CNX IT


50 5.1. Results For ARIMA Models

p d q MAPE(L) MAE(L) DS(L) Model ID


1 1 1 1.7549 95.7615 79.2417 SARIMA1
2 0 2 1.8127 99.0218 77.0833 —
1 1 2 1.8272 100.0726 78.1250 —
1 1 0 1.8332 100.4599 79.1667 —
1 0 0 1.8418 100.7505 78.1250 —
2 0 0 1.8555 101.6722 77.0833 —
2 1 1 1.8563 101.4779 75.0000 —
2 1 2 1.8621 102.2443 77.0833 —
0 1 1 1.8712 102.0500 76.0417 —
0 1 2 1.8727 102.6454 75.0000 —
2 1 0 1.8731 102.5879 76.0417 —
1 0 2 1.8880 103.5521 77.0833 —
1 0 1 1.8899 103.4321 72.9167 —
0 1 0 1.9008 103.9228 77.0833 —
2 0 1 1.9075 104.2528 77.0833 —
0 2 0 2.4770 134.9787 71.8750 —
1 2 0 3.2917 180.1777 67.7083 —
2 2 0 3.6945 202.5095 66.6667 —
0 0 1 42.6876 2401.8847 43.7500 —
1 2 2 43.1817 2365.6685 52.0833 —
2 2 2 44.4045 2432.4150 52.0833 —
0 2 1 45.5733 2501.7547 52.0833 —
1 2 1 47.6569 2616.4580 52.0833 —
2 2 1 53.2513 2926.8683 52.0833 —
0 0 0 72.4773 4084.8649 50.0000 —
0 0 2 78.2341 5535.8429 47.4230 —
0 2 2 99.4325 7745.3254 47.9000 SARIMA2

Table 5.5: Comparison of different ARIMA models for forecasting on S&P CNX Nifty
5.2. Results For FFNN Models 51

5.2 Results For FFNN Models


As discussed in section 4.3.2, 6440 FFNN models with different network parameters were
created trained and tested for each series. The detailed results and discussions are pre-
sented in this section. For FFNNs it wasn’t the case that models with minimum MAPE(L)
and MAE(L) had maximum DS(L). Hence this section is divided in two subsections. Anal-
ysis for best MAPE(L) and best MAE(L) values is done in the next subsection, while that
for DS(L) is done in the next to next subsection.

5.2.1 Performance of FFNN In Forecasting Value Of The Series

Number Of Transfer
Min Win Neurons Per function Model
MAPE(L) MAE(L) DS(L) Size Layer Per Layer ID
1.1180 182.7829 53.8462 5 [7 2 1 ] [tanh tanh lin] BNN1
1.1277 184.2603 54.2105 10 [2 1 ] [tanh tanh] BNN2
1.1734 192.0728 52.8497 7 [10 3 1 ] [tanh tanh lin] BNN3
1.1894 199.1636 54.4444 20 [3 2 1 ] [tanh tanh lin] BNN4
1.1991 204.5997 61.2500 40 [15 3 1 ] [sigmoid sigmoid lin] BNN5
1.3314 228.0641 57.1429 60 [15 2 1 ] [sigmoid sigmoid lin] BNN6
1.4141 243.8137 57.5000 80 [7 3 1 ] [sigmoid sigmoid lin] BNN7

Table 5.6: Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for BSE Sensex.

Out of 6440 FFNNs for each series, tables 5.6, 5.7 and 5.8 show the models with
minimum MAPE(L) values obtained per window size for BSE Sensex, CNX IT and S&P
CNX Nifty. Details of the models are also shown in these tables. Tables are sorted by
MAPE(L) values in ascending order. Table 5.9 shows minimum and average MAPE(L)
values per frame size for these series. This table is sorted in ascending order of window
size.
For BSE Sensex, BNN1 model performed best in terms of MAPE(L) and MAE(L)(table
5.6). For CNX IT, CNN1 model performed best in terms of MAPE(L) and MAE(L)(table
5.7). For S&P CNX Nifty, SNN1 model performed best in terms of MAPE(L) and
MAE(L)(table 5.8).
52 5.2. Results For FFNN Models

Number Of Transfer
Min Win Neurons Per function Model
MAPE(L) MAE(L) DS(L) Size Layer Per Layer ID
1.1863 56.4792 58.1250 40 [10 1 ] [sigmoid lin] CNN1
1.1876 57.8142 50.8333 80 [15 1 ] [sigmoid lin] CNN2
1.2094 58.1401 53.5714 60 [2 1 ] [tanh lin] CNN3
1.3908 62.8498 56.3158 10 [3 1 ] [tanh lin] CNN4
1.4035 62.9083 57.4359 5 [2 1 ] [tanh lin] CNN5
1.4094 64.0949 58.3333 20 [2 1 ] [sigmoid lin] CNN6
1.4169 66.7119 52.8497 7 [5 2 1 ] [tanh tanh lin] CNN7

Table 5.7: Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for CNX IT.

Number Of Transfer
Min Win Neurons Per function Model
MAPE(L) MAE(L) DS(L) Size Layer Per Layer ID
1.0834 62.5298 66.6667 80 [15 2 1 ] [sigmoid sigmoid lin] SNN1
1.2787 74.2788 48.7805 60 [2 1 ] [tanh lin] SNN2
1.3300 78.2595 55.7377 40 [2 1 ] [tanh lin] SNN3
1.6342 90.5011 54.9451 10 [3 2 1 ] [tanh tanh lin] SNN4
1.7520 97.0058 54.1667 5 [5 1 ] [tanh lin] SNN5
1.7908 99.9121 41.9753 20 [2 1 ] [sigmoid lin] SNN6
1.7977 99.7327 58.5106 7 [7 2 1 ] [tanh tanh lin] SNN7

Table 5.8: Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for S&P CNX Nifty.
5.2. Results For FFNN Models 53

According to literature[49, 50] FFNNs with one sigmoid layer and one linear layer are
capable of modeling non-linear relationships of great complexity. From these tables it’s
very clear that linear transfer function is best suited in the output layer. But for the
first layer sigmoid and tanh both have performed nearly equally. For BSE Sensex, 3-layer
FFNNs have dominated the table 5.6, for CNX IT two layer FFNNs have dominated the
table 5.7, and for S&P CNX Nifty table 5.8 shows mixture of 2 and 3 layer FFNNs. This
means that out of these three series BSE Sensex series is the most complex and CNX IT
is the least complex to forecast.
MAE(L) values are higher for BSE Sensex than the other two series. Similar finding
was there using ARIMA models too(see section 5.1). DS(L) is only 50-60% for all the
models except SNN1. It’s considerably low.

Figure 5.4 shows the plot of minimum MAPE(L) values against window size for the
three series. It shows that for CNX IT and S&P CNX Nifty, MAPE(L) increases with
decrease in the window size. That means each value depends on more number of preceding
values for these two series. For BSE Sensex it’s the other way. That means each value
depends upon less number of preceding values. i.e. BSE Sensex seems to be most dynamic
in nature of all. From the plot, 40 to 80 window size seems to be giving reasonably low
MAPE(L) values for all the three series.
Table 5.9 shows clearly the difference between minimum and average MAPE(L) values
per window size. The difference is huge. This is due to the fact that for FFNNs, con-
vergence highly depends upon the initialised weights vector before starting the training.

Min MAPE(L) Avg MAPE(L)


Window BSE S&P CNX BSE S&P CNX
Size Sensex CNX IT Nifty Sensex CNX IT Nifty
5 1.1180 1.4035 1.7520 12.0339 10.4582 7.6253
7 1.1734 1.4169 1.7977 11.4087 8.3031 7.3612
10 1.1277 1.3908 1.6342 9.5960 8.6794 7.9920
20 1.1894 1.4094 1.7908 14.3128 8.4627 10.2533
40 1.1991 1.1863 1.3300 11.8370 10.1601 13.9696
60 1.3314 1.2094 1.2787 14.7842 9.7545 13.9002
80 1.4141 1.1876 1.0834 17.6143 10.1869 14.6352

Table 5.9: Min and Avg MAPE(L)s obtained per window size
54 5.2. Results For FFNN Models

If it is near to the local minima, then it will get stuck in it. If it’s near global minima,
there are higher chances that best model will be obtained. This large difference between
minimum and average MAPE(L) values shows that more often local minima was met than
the global.

5.2.2 Performance of FFNN In Forecasting Direction Of The


Series

Neurons Transfer
Window Per function Per Model
DS(L) MAPE(L) MAE(L) Size Layer Layer ID
66.6667 3.3060 606.3522 80 [20 1 ] [tanh tanh] BNN1D
64.3750 2.4693 447.8891 40 [3 2 1 ] [tanh tanh tanh] BNN2D
63.5714 11.0811 2113.6480 60 [10 3 1 ] [tanh tanh tanh] BNN3D
61.1111 20.3531 3249.0836 20 [3 2 1 ] [tanh tanh tanh] BNN4D
59.4737 38.2639 7049.1890 10 [7 1 ] [tanh tanh] BNN5D
58.4615 10.2440 1920.0727 5 [15 5 1 ] [tanh tanh tanh] BNN6D
58.0311 4.2488 690.8693 7 [10 2 1 ] [tanh tanh lin] BNN7D

Table 5.10: Maximum DS(L)s obtained per window size for BSE Sensex using FFNN
models

Tables 5.10, 5.11 and 5.12 show the maximum values obtained for different FFNN
models for BSE Sensex, CNX IT and S&P CNX Nifty. Tables are sorted in descending
order of DS(L). Table 5.12 shows maximum and average DS(L) values obtained for the
three series. This table is sorted in ascending order of window size.
Best DS(L) of approximately 67% is obtained for BSE Sensex and S&P CNX Nifty.
For CNX IT it is around 62%. Table 5.10 is dominated by 3-layer FFNNs, again indi-
cating complexity of BSE Sensex series. Also the last layer for FFNNs for BSE Sensex is
dominated by tanh transfer function, again, indicating complexity in BSE Sensex.
Large difference between max and avg DS(L) values(see table 5.13) again shows the
effect of random weight initialisation on chances of obtaining the global minima in error
surface.
Max DS(L) values are plotted against window size in figure 5.5. From the plot it can
be inferred that DS(L) generally increases with the window size, with 60-80 range giving
5.2. Results For FFNN Models 55

Neurons Transfer
Window Per function Per Model
DS(L) MAPE(L) MAE(L) Size Layer Layer ID
62.1429 1.2795 61.0987 60 [3 1 ] [sigmoid lin] CNN1D
61.6667 1.3660 67.0545 80 [40 1 ] [tanh lin] CNN2D
61.6667 2.2270 110.5133 80 [7 1 ] [sigmoid sigmoid] CNN3D
60.5556 3.0592 134.0820 20 [7 2 1 ] [tanh tanh lin] CNN4D
60.5556 1.4856 67.4158 20 [3 2 1 ] [sigmoid sigmoid lin] CNN5D
59.5855 3.5478 154.2032 7 [7 1 ] [sigmoid sigmoid] CNN6D
59.4872 2.0002 87.8679 5 [7 1 ] [tanh tanh] CNN7D
59.3750 1.3164 62.6483 40 [3 1 ] [tanh lin] CNN8D
59.3750 1.4300 67.8715 40 [20 1 ] [sigmoid lin] CNN9D
59.3750 1.8004 85.4629 40 [15 7 1 ] [tanh tanh lin] CNN10D
58.9474 1.4102 63.6247 10 [10 5 1 ] [tanh tanh lin] CNN11D

Table 5.11: Maximum DS(L)s obtained per window size for CNX IT using FFNN models

Neurons Transfer
Window Per function Per Model
DS(L) MAPE(L) MAE(L) Size Layer Layer ID
66.9524 1.5371 88.4242 80 [2 1 ] [tanh lin] SNN1D
66.9524 8.0767 468.6258 80 [7 5 1 ] [tanh tanh lin] SNN2D
64.7317 1.5285 88.6016 60 [20 1 ] [tanh tanh] SNN3D
64.7317 2.8265 163.8969 60 [3 1 ] [tanh lin] SNN4D
64.7317 7.7946 455.9175 60 [15 1 ] [tanh lin] SNN5D
63.9344 6.6846 389.0944 40 [3 1 ] [tanh tanh] SNN6D
63.8298 3.3150 189.1471 7 [15 1 ] [tanh lin] SNN7D
61.5385 1.7110 95.1511 10 [15 5 1 ] [tanh tanh tanh] SNN8D
60.4167 5.2043 304.5783 5 [15 10 1 ] [tanh tanh lin] SNN9D
56.7901 4.8770 284.3519 20 [3 1 ] [tanh tanh] SNN10D
56.7901 3.8975 226.5408 20 [5 1 ] [sigmoid sigmoid] SNN11D
56.7901 4.0188 232.1541 20 [7 1 ] [sigmoid sigmoid] SNN12D
56.7901 52.0020 3014.5389 20 [5 3 1 ] [tanh tanh tanh] SNN13D
56.7901 9.0960 531.7788 20 [15 5 1 ] [tanh tanh tanh] SNN14D
56.7901 2.6682 151.2669 20 [10 7 1 ] [sigmoid sigmoid lin] SNN15D

Table 5.12: Maximum DS(L)s obtained per window size for S&P CNX Nifty using FFNN
models
56 5.3. Results For SVR Models

highest DS(L) values for all the three series. That means the direction of movement of
the series at each point in the series depends upon large number of preceding points, for
all the three series.

5.3 Results For SVR Models

This section again is divided in two subsections. First compares the different kernels for
SVR against the forecasting performance. Second subsection documents and analyses the
results for polynomial kernels only. Other kernels are not discussed in it as the polynomial
kernels outperformed them by a great margin.

5.3.1 Comparison Of Kernels

SVR was tried with polynomial, rbf, linear and sigmoid kernels. Out of these linear and
sigmoid never gave any solution. Hence they will not be discussed anymore hereafter.
Table 5.14 shows the comparison of SVRPOLY and SVRRBF based on MAPE(L) values
for different window sizes. It’s very much clear that SVRPOLY outperforms SVRRBF by
great margin for each case shown in the table except for the case of window size 10 for
S&P CNX Nifty. Hence in the next section, forecasting performance of only SVRRBF
models is studied in detail.

Max DS(L) Avg DS(L)


Window BSE S&P CNX BSE CNX IT S&P CNX
Size Sensex CNX IT Nifty Sensex Nifty
5 58.4615 59.4872 60.4167 45.2980 47.4911 50.7394
7 58.0311 59.5855 63.8298 44.8290 47.4983 49.7803
10 59.4737 58.9474 61.5385 44.3897 47.8307 49.9797
20 61.1111 60.5556 56.7901 43.7665 47.6769 49.8725
40 64.3750 59.3750 63.9344 45.0652 47.4402 51.6589
60 63.5714 62.1429 64.7317 45.8321 48.5559 50.8351
80 66.6667 61.6667 66.9524 45.2218 48.4547 47.7692

Table 5.13: Maximum and Avg DS(L)s obtained per window size with FFNN models
5.3. Results For SVR Models 57

5.3.2 Forecasting Performance Of SVRRBF models

Tables 5.15, 5.16 and 5.17 show the forecasting performance of SVRPOLY models on BSE
Sensex, CNX IT and S&P CNX Nifty respectively. For each window size, these tables

Window BSE Sensex CNX IT S&P CNX Nifty


POLY RBF POLY RBF
Size SVR SVR SVR SVR SVRPOLY SVRRBF
5 2.9433 4.5452 3.7557 9.1203 2.2224 3.7212
7 1.4089 6.9004 3.8041 11.1300 3.5384 5.2311
10 5.8682 9.9043 1.6254 4.4253 1.7883 1.7726
20 1.7269 3.3324 1.5261 4.9992 2.4657 2.9321
40 2.8427 5.5543 1.4872 5.0002 2.1066 3.8423
60 1.4880 4.4249 1.4695 3.8992 3.7212 4.1112
80 1.7327 3.9843 3.3608 10.0843 1.7230 3.1108

Table 5.14: SVRPOLY Vs SVRRBF based on MAPE(L)

MAPE(L) MAE(L) DS(L) C ε d γ r WinSz MdlID


1.4089 234.1536 58.5492 249748.22 0.0213 2 1.2470 4.5355 7 BSVR1
1.4880 257.8835 57.1429 792599.26 0.0106 2 2.5047 3.9518 60 —
1.7269 294.6667 56.6667 402023.38 0.0184 2 2.2115 4.4671 20 —
1.7327 304.6951 45.0000 761894.59 0.0149 1 0.9338 0.0504 80 —
2.8427 499.2695 51.2500 296975.62 0.0119 3 0.5706 2.4815 40 —
2.9433 489.9205 40.5128 936002.69 0.0099 2 2.8285 4.3638 5 —
5.8682 992.6019 40.5263 575563.83 0.0075 4 3.7899 3.1718 10 —

Table 5.15: Forecasting performance of SVRPOLY for BSE Sensex

MAPE(L) MAE(L) DS(L) C ε d γ r WinSz MdlID


1.4695 70.6894 55.2343 882549.15 0.0099 2 0.4234 1.9312 60 CSVR1
1.4872 67.1975 52.4432 400000.00 0.0134 2 2.1210 3.3123 40 —
1.5261 68.9893 54.9010 498213.28 0.0124 2 0.2328 2.7086 20 —
1.6254 73.0373 48.5432 111124.91 0.0914 1 1.7750 2.0356 10 CSVR2
3.3608 163.7998 51.1233 542375.62 0.0117 1 1.1306 3.3248 80 —
3.7557 184.6057 43.4358 991231.61 0.0091 3 3.9946 2.4384 5 —
3.8041 178.3055 44.1263 435563.83 0.0155 3 2.6472 4.2321 7 —

Table 5.16: Forecasting performance of SVRPOLY for CNX IT


58 5.3. Results For SVR Models

show only those models for which best performance was obtained. Other models are not
shown. Tables are sorted in ascending order of MAPE(L). From these tables following
observations follow:

• For all the three series, value of C is very high. It is always between the range of
100000 to 1000000. i.e. higher values of C result in better forecasts. Higher value
of C indicates that the model complexity is also higher.

• Values of ε are very low in all the models. It’s always in the range of 0.009 to 0.022,
except for the case of CSVR2 model (window size 10 on CNX IT) where it’s equal
to 0.0914. I.e. lower values of ε produce better forecasts. Smaller ε value indicate
the smaller ε-tube. And the smaller ε-tube means that the solution surface is too
tightly tied near the training points.

• For all the three series, polynomial degree d of 2 has produced best forecasts. Hence
means polynomial kernels with degree 2 are best suited for these series. Higher and
lover values of d than 2 consistently showed bad performance. That means, higher
values of d provide more complexity which overfits the model to the training data,
while lesser values of D do not provide enough complexity to learn the training
points.

• PDA(L) decreases and MAE(L) increases with MAPE(L) except few negligible ex-
ceptions.

• MAPE(L) values are least for BSE Sensex than other two series, but it’s MAE(L)
values are the highest. This was also observed in the case of ARIMA models as well

MAPE(L) MAE(L) DS(L) C ε d γ r WinSz MdlID


1.7230 98.6894 61.9048 325315.10 0.0153 2 4.7430 0.9691 80 SSVR1
1.7883 99.6453 57.1429 474668.02 0.0212 2 1.5805 1.4250 10 —
2.1066 124.4414 42.6230 319580.68 0.0163 1 2.9332 3.5337 40 —
2.2224 123.0545 58.3333 162817.77 0.0208 4 3.2448 3.2247 5 —
2.4657 140.2473 45.6790 524259.16 0.0187 4 0.1475 3.9702 20 —
3.5384 202.6957 57.4468 333194.98 0.0201 4 1.4261 1.3642 7 —
3.6755 212.2441 56.0976 606613.45 0.0094 2 2.8840 4.4192 60 —

Table 5.17: Forecasting performance of SVRPOLY for S&P CNX Nifty


5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 59

as FFNN models.

• DS(L) of approximately 58%, 55% and 61% has been achieved for BSE Sensex, CNX
IT and S&P CNX Nifty respectively.

In the next section, comparison of ARIMA, FFNN and SVR models is presented.

5.4 Comparison Of Forecasting Performances Of ARIMA,


FFNN and SVR Models
This section compares the three models: ARIMA, FFNN and SVR against their forecast-
ing performances on series: BSE Sensex, CNX IT and S&P CNX Nifty. The comparison
is done on two measures: MAPE(L) and DS(L). MAE(L) values need not be used as they
have been nearly proportional to MAPE(L) in all the experiments shown. Next subsec-
tion shows the comparison of these models for forecasting series values by investigating
MAPE(L) values, while next to next subsection compares these models for forecasting
direction of movement of the index value by investigating DS(L) values.

5.4.1 Comparison of ARIMA, FFNN and SVR For Forecasting


The Index Value

ARIMA FFNN SVM


BSE min MAPE(L) 1.3513 1.1180 1.4089
Sensex Model ID BARIMA1 BNN1 BSVR1
CNX min MAPE(L) 1.4788 1.1863 1.4695
IT Model ID CARIMA1 CNN1 CSVR1
S&P CNX min MAPE(L) 1.7549 1.0834 1.7230
Nifty Model ID SARIMA1 SNN1 SSVR1

Table 5.18: Forecasting performance of ARIMA, FFNN and SVR for MAPE(L)

Table 5.18 and figure 5.6 show the comparison of ARIMA, FFNN and SVR models for
forecasting index values of BSE Sensex BSE Sensex, CNX IT and S&P CNX Nifty using
60 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models

MAPE(L). Table also gives the name of the model for which the noted MAPE(L) value
was obtained. For details of these models please refer previous sections of this chapter.
From both, the table and the figure, it’s very clear that FFNN models outperformed
ARIMA and SVR models by considerable margin in case of all the three series. ARIMA
performed batter than SVR for BSE Sensex, while SVR performed marginally better than
ARIMA for the other two series.
FFNN models are mostly reported to be better at forecasting than ARIMA models.
This is again confirmed here. But between FFNN and SVR models, from the literature, it
is not clear that which model is better. From these results it can be asserted that FFNN
models are better for long term forecasting than SVR models for these three series.

5.4.2 Comparison of ARIMA, FFNN and SVR For Forecasting


The Direction Of Movement Of The Index

ARIMA FFNN SVM


BSE max DS(L) 79.4872 66.6667 58.5492
Sensex Model ID BARIMA1 BNN1D BSVR1
CNX max DS(L) 79.7718 62.1429 55.2343
IT Model ID CARIMA1 CNN1D CSVR1
S&P CNX max DS(L) 79.2417 66.9524 61.9048
Nifty Model ID SARIMA1 SNN1D SSVR1

Table 5.19: Forecasting performance of ARIMA, FFNN and SVR for DS(L)

Table 5.19 and figure 5.7 show the comparison of ARIMA, FFNN and SVR models for
forecasting the direction of movement of indices BSE Sensex BSE Sensex, CNX IT and
S&P CNX Nifty using DS(L). Table also gives the name of the model for which the noted
DS(L) value was obtained. For details of these models please refer previous sections of
this chapter. From both, the table and the figure, it’s very clear that ARIMA models
outperformed FFNN and SVR models by considerable margin in case of all the three
series. Also FFNN models performed consistently better than SVR models.
The possible explanation for FFNN being better for forecasting the index value while
ARIMA being better at forecasting the direction of movement of index is that FFNNss
5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 61

are trained in terms of deviation performance; to minimise MSE. Thus, in the deviation
performance criteria i.e. forecasting the index value, they perform better than ARIMA.
With regard to the direction criteria, the time-series model has an intrinsic merit because
it emphasises the trend of the series. Therefore, the two AI models cannot outperform
ARIMA in the direction criteria in all the three series.

Figure 5.8 Shows the plots of Forecasted values by ARIMA, FFNN and SVR for BSE
Sensex
62 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models

Figure 5.1: ACF and PACF plots for BSE Sensex with d = 0. ACF shows need for
differencing.
5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 63

Figure 5.2: ACF and PACF plots for BSE Sensex with d = 1. Model identified as
ARIMA(1,1,1).
64 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models

Figure 5.3: ACF and PACF plots for BSE Sensex with d = 2. Model identified as
ARIMA(0,2,2).
5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 65

Figure 5.4: minimum MAPE(L) Vs Window size for FFNNs

Figure 5.5: max DS(L) Vs Window size for FFNNs


66 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models

Figure 5.6: Forecasting performance of ARIMA, FFNN and SVR for MAPE(L)

Figure 5.7: Forecasting performance of ARIMA, FFNN and SVR for DS(L)
5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 67

Figure 5.8: Forecasted values by ARIMA, FFNN and SVR for BSE Sensex
68 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models
Chapter 6

Conclusion And Future Work

A number of forecasting experiments were conducted on BSE Sensex, CNX IT and S&P
CNX Nifty for the performance comparison of statistical models(ARIMA), and AI mod-
els(Feed Forward Neural Networks and Support Vector Regression). Their performance
was compared for forecasting the value of the index using error measures such as Mean
Absolute Percentage Error(MAPE) and Mean Absolute Error(MAE). Also they were com-
pared for forecasting the direction of movement of the index using Directional Symme-
try(DS). Next section concludes upon these experiments and next to next section discusses
the future work.

6.1 Conclusion
From the experiments performed in this study, following conclusions can be drawn:

1. Model identification indicated that ARIMA(1,1,1) is the better fit for these three
indices. This was supported by the forecasting results of different ARIMA models as
the forecasting performance of ARIMA(1,1,1) model was better than other ARIMA
models.

2. MA models(i.e. ARIMA with p = d = 0), and ARIMA models with d = 0 are not
suitable for these series.

3. According to [50, 49], linear transfer function is more suitable in the output layer of
FFNN models. These experiments also showed that linear transfer function in the
output layer gave the best results in case of value forecasting of the index.

4. In case of value forecasting of BSE sensex, three layer FFNN networks performed

69
70 6.2. Future Work

better than two layer FFNNs. For CNX IT, two layer networks showed better
results. Hence it can be asserted that BSE Sensex is most complex to forecast.

5. In case of value forecasting of these indices using SVR models, Polynomial kernels
of degree 2 were found better than RBF kernels by a considerable margin. Per-
formances of linear and sigmoid kernels were very poor. These results match the
results of [33].

6. In case of long term forecasting of the value of the index, FFNN models outperformed
ARIMA and SVR models by a fair margin, while the performance of SVR and
ARIMA was marginally different from each other. In case of FFNN Vs ARIMA,
these results match the results of [32, 31, 20]. In the case of FFNN and SVM, these
results partially conform to the results of [34, 32] where FFNN were found better
performing than SVR except in few cases of their study. These results contradict
the results of [33, 29]. From this study it can be asserted that FFNN models are
better than the other two in the case of one-day-ahead forecasting the index value
for these indices.

7. In case of long term forecasting of direction of movement of the index, ARIMA


models outperformed FFNN and SVR models by a fair margin. These results match
the results of [32]. Hence it can be asserted that ARIMA models are better than the
other two in the case of one-day-ahead forecasting the index movement direction for
these indices.

6.2 Future Work


Following are the areas where this study can be extended:

• Technical indicators such as Moving Average(MA), Moving Average Convergence/Divergence


(MACD), On Balance Volume(OBV) etc can be used to assist the forecasting. Their
use in assisting the forecasting has proven to improve the results[25, 26, 26].

• Atiya, A. and Talaat, N. and Shaheen, S. in 1997[28] developed successful neural


network forecasting model using fundamental indicators such as Price Earnings ra-
tio(PE ratio), rate of change of company sales, Price Dividend ratio(PD ratio), rate
6.2. Future Work 71

of change of profit margin etc as the input. Hence fundamental factors can also be
used to assist the forecasting.

• In these experiments only one-day-ahead forecasting was studied. N-days-ahead


forecasting can also be performed.

• Long term forecasting with model retraining/re-estimation can be tried

• These experiments need to be carried out on several other series also.

• Combining methods such as TopK, Best, DTopK, AFTER etc. have proven to be
better than individual forecasing models[51, 52]. They can be used to combine the
results of these three forecasting models.
Appendix A

ARIMA Model Identification

Given a time series of data X1 , X2 , ....., XN , it’s AutoCorrelation Functions(ACF) ρk are


calculated as follows:
ck
ρk =
c0
Where
N −k
1X
ck = (Xt − X̄)(Xt+k − X̄), k = 0, 1, 2, ...., K
k t=1
Here X̄ is the mean of X. ck is the autocovariance. k is the lag.
By solving the Yule-Walker equations:
    
1 ρ1 ρ2 . . . ρk−1 φk1 ρ1
    
 ρ1 1 ρ1 . . . ρk−2 φk2   ρ2
    
 
    
    
 . . . . . . .  .   . 
  = 
    
 . . . . . . .  .   . 
    
    
 . . . . . . .  .   . 
    
ρk−1 ρk−2 ρk−3 . . . 1 φkk ρk

, Partial AutoCorrelation function(PACF) φkk are obtained.


By plotting ρk and φkk against k we get ACF and PACF plots respectively. By
analysing these plots an experienced analyst can identify the p, d, q values of the ARIMA
model. See figures 5.1, 5.2 and 5.3 for the examples of ACF and PACF plots. General
rules for identification are as follow:

1. If the ACF function does die out rapidly, it indicates that the series is stationary.
Hence there is no need of differencing. Hence d = 0.

2. If the ACF function doesn’t die out rapidly, it indicates that the series is non-
stationary. Hence needs differencing. i.e. d > 0. Hence the series is differenced

73
74

till rapidly dying ACF is not obtained. In this case d is equal to the number of
differences taken. Then p and q values are determined from ACF and PACF of this
differenced series using further rules.

3. If ACF decays exponentially and φkk for k = 0, 1, ...K are non-zero in PACF, then
it suggests p = K and q = 0. Different possible shapes of ACF and PACF for p = 2
are shown in fig A.1.

4. If PACF decays exponentially and ρk for k = 0, 1, ...K are non-zero in ACF, then it
suggests p = 0 and q = K. Different possible shapes of ACF and PACF for q = 2
are shown in fig A.2.

5. If ACF and PACF both are mixtures of exponentials and damped sine wawes, and
the exponential decay starts after K1 th and K2 th lag for ACF and PACF respec-
tively, it suggests p = K1 and q = K2 . Different possible shapes of ACF and PACF
for p = 1 and q = 1 are shown in fig A.3.

For detailed reading, please refer [11].


Figure A.1: Possible shapes for ACF and PACF when p = 2
75
76

Figure A.2: Possible shapes for ACF and PACF when q = 2


77

Figure A.3: Possible shapes for ACF and PACF when p = 1 and q = 1
78
Bibliography

[1] Wikipedia. Forecasting - wikipedia, the free encyclopedia. In


http://en.wikipedia.org/wiki/Forecasting. [Online; accessed 20-June-2008].

[2] Bombay Stock Exchange Limited. In http://www.bseindia.com/.

[3] World Federation of Exchanges. World federation of exchanges (2007). In


http://www.world-exchanges.org/publications/EQU1107.pdf. [Online; accessed 20-
June-2008].

[4] National Stock Exchange Limited. In http://www.nseindia.com/.

[5] NASSCOM. National stock exchange. In


http://www.nasscom.in/Nasscom/templates/NormalPage.aspx?id=28461. [On-
line; accessed 20-June-2008].

[6] Indian Express Newspapers (Mumbai) Ltd. Now, nse 2nd fastest growing stock ex-
change. In http://www.expressindia.com/news/fullstory.php?newsid=91524. [Online;
accessed 20-June-2008].

[7] George Edward Pelham Box and Gwilym M. Jenkins. Time Series Analysis: Fore-
casting and Control. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994.

[8] Investopedia A Forbes Media Company. In http://www.investopedia.com/university/.


[Online; accessed 20-June-2008].

[9] Inc. StockCharts.com. Chartschool. In


http://stockcharts.com/school/doku.php?id=chart%5fschool. [Online; accessed 20-
June-2008].

79
80 Bibliography

[10] Wikipedia. Stock market - wikipedia, the free encyclopedia. In


http://en.wikipedia.org/w/index.php?title=Stock%5fMarket&oldid=141913194.
[Online; accessed 20-June-2008].

[11] G. E. P. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis, Forecasting,


and Control. Prentice-Hall, Englewood Cliffs, New Jersey, third edition, 1994.

[12] J. Zurada. Introduction to artificial neural systems. 0-314-93391-3. West Publishing


Co., St. Paul, MN, USA, 1992.

[13] Vladimir N. Vapnik. The nature of statistical learning theory. Springer-Verlag New
York, Inc., New York, NY, USA, 1995.

[14] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning,
20(3):273–297, 1995.

[15] Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik. A training algorithm for
optimal margin classifiers. In Computational Learing Theory, pages 144–152, 1992.

[16] Harris Drucker, Chris J. C. Burges, Linda Kaufman, Alex Smola, and Vladimir
Vapnik. Support vector regression machines. In Michael C. Mozer, Michael I. Jordan,
and Thomas Petsche, editors, Advances in Neural Information Processing Systems,
volume 9, page 155. The MIT Press, 1997.

[17] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Ma-
chines and Other Kernel-based Learning Methods. Cambridge University Press,
March 2000.

[18] Alex J. Smola and Bernhard Schölkopf. A tutorial on support vector regression.
Statistics and Computing, 14(3):199–222, 2004.

[19] Lisa Bianchi, Jeffrey Jarrett, and R. Choudary Hanumara. Improving forecasting for
telemarketing centers by arima modeling with intervention. International Journal of
Forecasting, 14(4):497–504, Dec 1998.

[20] Nowrouz Kohzadi, Milton S. Boyd, Bahman Kermanshahi, and Iebeling Kaastra. A
comparison of artificial neural network and time series models for forecasting com-
modity prices. Neurocomputing, 10(2):169–181, 1996.
Bibliography 81

[21] H. F. Zou, G. P. Xia, F. T. Yang, and H. Y. Wang. An investigation and compar-


ison of artificial neural network and time series models for chinese food grain price
forecasting. Neurocomput., 70(16-18):2913–2923, 2007.

[22] B. Chen, M. Chang, and C. Lin. Load forecasting using support vector machines: A
study on eunite competition, 2001.

[23] Wei-Chiang Hong, Ping-Feng Pai, Shun-Lin Yang, and R. Theng. Highway traffic
forecasting by support vector regression model with tabu search algorithms. Neural
Networks, 2006. IJCNN ’06. International Joint Conference on, pages 1617–1621,
0-0 2006.

[24] Investopedia A Forbes Media Company. Efficient market hypothesis (emh). In


http://www.investopedia.com/terms/e/efficientmarkethypothesis.asp. [Online; ac-
cessed 20-June-2008].

[25] Mohan Neeraj, Jha Pankaj, Laha Arnab Kumar, and Dutta Goutam. Artificial neural
network models for forecasting stock price index in bombay stock exchange. IIMA
Working Papers 2005-10-01, Indian Institute of Management Ahmedabad, Research
and Publication Department, October 2005.

[26] Wei Cheng, Lorry Wagner, and Chien-Hua Lin. Forecasting the 30-year u.s. treasury
bond with a system of neural networks.

[27] Chi-Cheong Chris Wong, Man-Chung Chan, and Chi-Chung Lam. Financial time
series forecasting by neural network using conjugate gradient learning algorithm and
multiple linear regression weight initialization. Computing in Economics and Finance
2000 61, Society for Computational Economics, July 2000.

[28] A. Atiya, N. Talaat, and S. Shaheen. An efficient stock market forecasting model
using neural networks. Neural Networks,1997., International Conference on, 4:2112–
2115 vol.4, Jun 1997.

[29] Lijuan Cao and Francis E.H Tay. Financial forecasting using support vector machines.
Neural Computing & Applications, 10(2):184–192, May 2001.
82 Bibliography

[30] J. Corchado, C. Fyfe, and B. Lees. Unsupervised learning for financial forecasting.
Computational Intelligence for Financial Engineering (CIFEr), 1998. Proceedings of
the IEEE/IAFE/INFORMS 1998 Conference on, pages 259–263, Mar 1998.

[31] M. Kumar and M. Thenmozhi. Forecasting nifty index futures returns using neural
network and arima models. Financial Engineering and Applications, 437, 2004.

[32] Wun-Hua Chen, Jen-Ying Shih, and Soushan Wu. Comparison of support-vector
machines and back propagation neural networks in forecasting the six major asian
stock markets. International Journal of Electronic Finance, 1(1):49–67, January
2006.

[33] Chon Lung Chai. Finding kernel function for stock market prediction with support
vector regression. Technical report, Universiti Teknologi Malaysia, 2006.

[34] Theodore B. Trafalis and Huseyin Ince. Support vector machine for regression and
applications to financial forecasting. ijcnn, 06:6348, 2000.

[35] Wikia. Indian stock markets. In http://indianstocks.wikia.com/wiki/Main%5fPage.


[Online; accessed 20-June-2008].

[36] National Stock Exchange Limited. Nse - about us - facts & figures. In http://www.nse-
india.com/content/us/us%5ffactsfigures.htm. [Online; accessed 20-June-2008].

[37] Wikipedia. Bse sensex. In http://en.wikipedia.org/wiki/BSE%5fSensex%23cite%5fnote-


0. [Online; accessed 20-June-2008].

[38] Reserve Bank of India. Handbook of statistics on indian economy. In


http://rbidocs.rbi.org.in/rdocs/Publications/PDFs/80283.pdf. [Online; accessed 20-
June-2008].

[39] National Stock Exchange Limited. S&p cnx nifty. In http://www.nse-


india.com/content/indices/ind%5fnifty.htm. [Online; accessed 20-June-2008].

[40] National Stock Exchange Limited. Cnx it. In http://www.nse-


india.com/content/indices/ind%5fcnxit.htm. [Online; accessed 20-June-2008].

[41] Yahoo Finance. In http://finance.yahoo.com/q/hp?s=%5EBSESN. [Online; 20-June-


2008].
Bibliography 83

[42] Inc. The MathWorks. In http://www.mathworks.com/.

[43] SAS Institute Inc. In http://www.sas.com/.

[44] M.B. Hagan, M.T.; Menhaj. Training feedforward networks with the marquardt
algorithm. Neural Networks, IEEE Transactions on, 5(6):989–993, Nov 1994.

[45] Martin Sewell. In http://www.cs.ucl.ac.uk/staff/M.Sewell/svmdark/.

[46] Thorsten Joachims. In http://svmlight.joachims.org/.

[47] T. Joachims. Making large-scale support vector machine learning practical. In Ad-
vances in Kernel Methods: Support Vector Machines.

[48] S. Ali and K.A. Smith. Automatic parameter selection for polynomial kernel. Infor-
mation Reuse and Integration, 2003. IRI 2003. IEEE International Conference on,
pages 243–249, Oct. 2003.

[49] Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University
Press, November 1995.

[50] Mohan Neeraj, Jha Pankaj, Laha Arnab Kumar, and Dutta Goutam. Artificial neural
network models for forecasting stock price index in bombay stock exchange. IIMA
Working Papers 2005-10-01, Indian Institute of Management Ahmedabad, Research
and Publication Department, October 2005.

[51] Y. Yang and H. Zou. Combining time series models for forecasting, 2002.

[52] Abhishek Seth. On using a multitude of time series forecasting models. Mtech thesis,
Kanwal Rekhi School of Information Technology, IIT Bombay, 2006.
84 Bibliography
Acknowledgements

First and foremost I would like to express my gratitude and appreciation to my advisor
Prof. Bernard L. Menezes. I am indebted to him for his guidance, encouragement,
support and trust in me. I strongly feel that I was privileged to interact with one of
the best advisors in the institute. From him, I learnt that no challenge is so big that it
couldn’t me met with dedicated hard work.
I would like to thank Ambikeshwar P. Singh and Girish Joshi for being great partners
in sharing knowledge during this project.
Last, but certainly not the least, I would like to thank my family. Without their
support and encouragement, this project wouldn’t have been possible.
I would also like to thank KReSIT and IIT Bombay for giving me the best moments
of my life so far.

Lahane Ashish Gajanan


KReSIT,
IIT Bombay
July 9, 2008

85

Vous aimerez peut-être aussi