Académique Documents
Professionnel Documents
Culture Documents
Dissertation
Master of Technology
by
Lahane Ashish Gajanan
(Roll no. 05329R01)
This is to certify that Mr. Lahane Ashish Gajanan was admitted to the candidacy
of the M.Tech. Degree and has successfully completed all the courses required for the
M.Tech. Programme. The details of the course work done are given below.
M.Tech. Project
13. IT696 M.Tech. Project Stage - I (Jul 2007) 18
17. IT697 M.Tech. Project Stage - II (Jan 2008) 30
18. IT698 M.Tech. Project Stage - III (Jul 2008) 42
This study compares statistical models such as ARIMA, and AI models such as Feed
Forward Neural Networks (FFNN) and Support Vector Regression (SVR), for long term
one-day-ahead forecast of financial indices. A number of forecasting experiments are
conducted on three major indices in Indian stock market: BSE Sensex, CNX IT and
S&P CNX Nifty. The models are studied for various parameters such as ARIMA(p, d, q)
for 0 ≤ p, d, q ≤ 2, FFNNs with 1 to 2 number of hidden layers and different transfer
functions, SVR with polynomial and RBF kernels. Our experiments show that FFNN
models perform better in forecasting value of the index, while ARIMA models perform
better in predicting the direction of movement of the index.
Contents
Abstract i
List of tables ix
1 Introduction 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Organisation Of The Report . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Measures of forecasting accuracy . . . . . . . . . . . . . . . . . . . 6
2.2 Stock Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Fundamental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Technical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 AutoRegressive Integrated Moving Average(ARIMA) models . . . . . . . . 11
2.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Artificial Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Support Vector Machine(SVM) and Support Vector Regression(SVR) . . . 20
2.5.1 Support Vector Machine(SVM) . . . . . . . . . . . . . . . . . . . . 20
2.5.2 Support Vector Regression(SVR) . . . . . . . . . . . . . . . . . . . 24
3 Literature Survey 27
iii
iv Contents
4 Experimental Setup 33
4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Data Pre-preocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Model Estimation/Training . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Model Estimation for ARIMA . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Training of FFNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.3 Training of SVR models . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 Results Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Bibliography 77
Contents v
Acknowledgements 85
List of Figures
4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 BSE Sensex Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 S&P CNX Nifty Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 CNX IT Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 Data Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.6 Vector sets Preparation for FFNN and SVR . . . . . . . . . . . . . . . . . 38
4.7 Model estimation for ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.8 FFNN training with holdout validation . . . . . . . . . . . . . . . . . . . . 41
4.9 SVR training with holdout validation . . . . . . . . . . . . . . . . . . . . . 43
5.1 ACF and PACF plots for BSE Sensex with d = 0. ACF shows need for
differencing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
vii
viii List of Figures
5.2 ACF and PACF plots for BSE Sensex with d = 1. Model identified as
ARIMA(1,1,1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 ACF and PACF plots for BSE Sensex with d = 2. Model identified as
ARIMA(0,2,2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 minimum MAPE(L) Vs Window size for FFNNs . . . . . . . . . . . . . . . 65
5.5 max DS(L) Vs Window size for FFNNs . . . . . . . . . . . . . . . . . . . . 65
5.6 Forecasting performance of ARIMA, FFNN and SVR for MAPE(L) . . . . 66
5.7 Forecasting performance of ARIMA, FFNN and SVR for DS(L) . . . . . . 66
5.8 Forecasted values by ARIMA, FFNN and SVR for BSE Sensex . . . . . . . 67
4.1 Statistical details of BSE Sensex, S&P CNX Nifty and CNX IT . . . . . . 35
4.2 Vector sets preparation for FFNN and SVR . . . . . . . . . . . . . . . . . 37
4.3 FFNN parameter values tried in the experiments . . . . . . . . . . . . . . . 40
4.4 Optimisation using SVMdark . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Initial ranges of parameters in SVMdark optimisation . . . . . . . . . . . . 42
ix
x List of Tables
Introduction
Last few years, Indian stock market has become a noticeable stock market, not just in
Asia, but in the world due to the contributions of NSE and BSE. The Bombay Stock
Exchange Limited (popularly called The Bombay Stock Exchange, or BSE) is the oldest
stock exchange in Asia. It is also the biggest stock exchange in the world in terms
of listed companies with 4,800 listed companies as of August 2007[2]. It is the largest
stock exchange in South Asia and the tenth largest in the world[3]. The BSE SENSEX
(SENSitive indEX), also called the ”BSE 30”, is a widely used market index in India
and Asia. Though many other exchanges exist, BSE and the National Stock Exchange of
India account for most of the trading in shares in India.
The National Stock Exchange of India Limited (NSE)[4], is the largest stock exchange
in India in terms of daily turnover and number of trades, for both equities and derivative
trading[5]. NSE is the third largest Stock Exchange in the world in terms of the number
of trades in equities[3]. It is the second fastest growing stock exchange in the world with
a recorded growth of 16.6%[6]. The NSE’s key index is the S&P CNX Nifty, known as the
Nifty. Nowadays IT sector companies are catching on a great deal of investment interest
due to the IT boom in India. A number of large, profitable Indian companies today belong
to the IT sector. Hence the index CNX IT, which represents major IT sector companies
in India, has also gained great importance.
Thus BSE Sensex, S&P CNX Nifty and CNX IT are the important indices in Indian
stock market.
Stock market offers too much incentive for forecasting securities/indices correctly.
Using the forecasted values, buy and sell decisions can be made to make short or long
term profits. Hence the financial forecasting has been studied widely. Generally financial
time series are noisy, non-linear and non-seasonal. Hence predicting them is a great
1
2 1.1. Problem Definition
challenge.
Statistical models(ARIMA), and AI models(Feed Forward Neural Networks and Sup-
port Vector Regression) find their application in financial domain for the forecasting stock
prices, indices, inflation rates, GDP etc. Great deal of work and study has been and is
being done in this domain.
Literature study suggests that FFNNs are better than ARIMA models for forecasting.
But strong conclusion about SVR models in comparison with FFNN and ARIMA models
is difficult to draw. Also it was found that for financial forecasting using SVR models,
RBF and Polynomial kernel functions are applicable. Hence in this study SVR with
polynomial and RBF kernels are used. Also in the literature, there is absence of long
term forecasting. E.g., For daily stock values, long term forecasting would mean that
once the model is trained or estimated, then the model is used for forecasting the daily
stock prices for several months without retraining or re-estimating the model.
were used. Also the effect of different window sizes was studied on FFNN and SVR
models.
Background
This chapter briefs the theory that is essential for the understanding of this project and
the project report. It includes basics of the stock market, forecasting, artificial Feed
Forward Neural Networks(FFNN), Support Vector Regression(SVR) and the statistical
forecasting models called Auto-Regressive Integrated Moving Average(ARIMA) models.
2.1 Forecasting
Forecasting is the process of estimation in unknown situations from the historical data.
For example forecasting weather, stock index values, commodity prices etc. Categories of
forecasting methods are as follows:
Time series methods: Time series methods use historical data as the basis for estimat-
ing future outcomes. i.e. given xt , xt−1 , xt−2 ........., called time series, predict xt+1 . The
methods are
• Exponential smoothing
• Extrapolation
• Linear prediction
• Trend estimation
• Growth curve
• Topi
6 2.1. Forecasting
In ARIMA or ARMA, the time series is analysed to find the parameters of the ARIMA
or ARMA model respectively, and using these parameters future values are predicted [7].
Causal/econometric methods: Some forecasting methods use the assumption that it is
possible to identify the underlying factors that might influence the variable that is being
forecast. For example, sales of umbrellas might be associated with weather conditions. If
the causes are understood, projections of the influencing variables can be made and used
in the forecast.
i.e. Suppose x depends on variable y1 , y2 , .., yn as x = f (y1 , y2 , ......., yn ), and we
have available with us yti , yt−1
i i
, yt−2 ....., where i = 1, 2, ....n, and we have to find xt+1 .
i
Then yt+1 are predicted using any other suitable methods, then xt+1 is calculated as
1 2 n
xt+1 = f (yt+1 , yt+1 , ....., yt+1 ). The methods are
• Econometrics
Another variant of forecasting method is that, xt+1 ispredictedusingxt , xt−1 , ......., plus
i
yti , yt−1 , .......... Neural Networks are very successfully used in these kinds of forecasting.
The forecast error is the difference between the actual value and the forecast value for
the corresponding period. Et = Yt − Ft where E is the forecast error at period t, Y is the
actual value at period t, and F is the forecast for period t.
Measures of aggregate error are as shown in figure 2.1. For the estimate of directional
accuracy, Directional Symmetry is used.
Out of these MAPE and MAE are the commonly used ones. MAPE and MAE are
related with how close are the forecasted values to the target ones. Lower the MAPE
and MAE values, better is the forecaster. But they deal only with the absolute difference
between forecasted values and target values. It doesn’t take into account the directional
prediction. For the estimate of directional accuracy, Directional Symmetry(DS) is used.
DS is generally used as an error measure in directional forecasting (e.g. predicting the
movement of an index). DS gives the directional prediction efficiency; i.e. how efficient is
2.2. Stock Market 7
the forecaster in predicting the direction of the series. It’s good to have higher DS value
with low MAPE. It is calculated as follows:
d correct
DS = × 100
d total
where d correct = number of times the forecaster predicted the direction of the series
right and d total = total number of the predictions made
called ”equity financing.” Issuing stock is advantageous for the company because it does
not require the company to pay back the money or make interest payments along the way.
All that the shareholders get in return for their money is the hope that the shares will
some day be worth more in the stock market. Other than that they also get dividends.
What is a Stock Market? The market in which shares are issued and traded either
through exchanges or over-the-counter markets. Also known as the equity market, it is
one of the most vital areas of a market economy as it provides companies with access to
capital and investors with a slice of ownership in the company and the potential of gains
based on the company’s future performance.
The stocks are listed and traded on stock exchanges which are entities (a corporation
or mutual organisation) specialised in the business of bringing buyers and sellers of stocks
and securities together. The stock market in the United States includes the trading of all
securities listed on the NYSE, the NASDAQ, the Amex, as well as on the many regional
exchanges, the OTCBB, and Pink Sheets. European examples of stock exchanges include
the Paris Bourse (now part of Euronext), the London Stock Exchange and the Deutsche
Börse.
An investor in stocks should get maximum returns on his investment. and for that
he needs to know which stocks will do good in future, so that he can invest into them.
So this is the basic incentive for forecasting stock prices. For this, he has to study about
different stocks, their price history, performance and reputation of the stock company,
etc. So this is a broad area of study. It’s mainly divided into two categories:
• Fundamental Analysis
• Technical Analysis
Financial forecasting mainly comes under technical analysis, but fundamental factors can
also be incorporated for getting better results. Let’s see them in brief.
macro economic factors (like the overall economy and industry conditions) and individ-
ually specific factors (like the financial condition and management of companies). This
method of security analysis is considered to be the opposite of technical analysis.
Financial statement analysis is the biggest part of fundamental analysis. Also known
as quantitative analysis, it involves looking at historical performance data to estimate the
future performance of stocks. Followers of quantitative analysis want as much data as
they can find on revenue, expenses, assets, liabilities and all the other financial aspects of
a company. Fundamental analysts look at this information to gain insight on a company’s
future performance. This doesn’t mean that they ignore the company’s stock price; they
just avoid focusing on it exclusively.
Financial statement consists of
• Information about the company in general - its history, products and line of business
• An in-depth discussion about the financial results and other factors within the busi-
ness
• The complete set of financial statements (balance sheet, income statement, state-
ment of retained earnings, and cash flow statement)
But the most important of these for fundamental analysis are balance sheet, income
statement and cash flow statement. But complete ignorance of other fields is not sug-
gested.
to measure a security’s intrinsic value, but instead use charts and other tools to identify
patterns that can suggest future activity. Unlike fundamental analysts, technical analysts
don’t care whether a stock is undervalued - the only thing that matters is a security’s
past trading data and what information this data can provide about where the security
might move in the future.
The field of technical analysis is based on three assumptions:
Much of the criticism of technical analysis has its roots in academic theory - specifically
the efficient market hypothesis (EMH).
Efficient market hypothesis(EMH): This theory says that the market price is
always the correct one - any past trading information is already reflected in the price of
the stock and, therefore, any analysis to find undervalued securities is useless.
Technical analyst deals with charts, moving averages, information about traded vol-
ume, indicators and oscillators(such as Accumulation/Distribution Line, Moving Average
Convergence Divergence(MACD), Average Directional Index, Aroon Indicator and Oscil-
lator etc.) to determine the trends in price movements of a security (security is a general
term for stocks, bonds etc).
Chart: A price chart is a sequence of prices plotted over a specific time frame. In
statistical terms, charts are referred to as time series plots. On the chart, the y-axis
(vertical axis) represents the price scale and the x-axis (horizontal axis) represents the
time scale. Prices are plotted from left to right across the x-axis with the most recent
plot being the furthest right. For example of a chart, see figure 4.2.
For further details please visit the sites given in [8, 9, 10].
p
X
where:- AR part: ϕ = 1 − ϕi Li ,
i=1
q
X
MA part: θ = 1 + θj Lj
j=1
I(difference) part: ∇ = (1 − L1 )
Here L is lag operator, i.e. Li Xt = Xt−i . ϕi and θj are the model parameters which need
to be found before applying the model for forecasting. εt is a white noise process with
zero mean and variance σ 2 .
ϕi are the parameters of the autoregressive part of the model, θj are the parameters
of the moving average part and the εt are error terms. The error terms εt are gener-
ally assumed to be independent, identically distributed variables sampled from a normal
distribution with zero mean.
Forecasting using ARIMA model: Forecasting Xt from Xt−1 , Xt−2 , ......X2 , X1
using ARIMA model (see equation 2.1) consists of following steps:
3. Forecasting: Putting the model parameter values in equation 2.1, Xt and itera-
tively any further future series values are calculated.
An artificial neuron, also called semi-linear unit, Nv neuron, binary neuron or McCulloch-
Pitts neuron, is an abstraction of biological neurons and the basic unit in an artificial
neural network. The Artificial Neuron receives one or more inputs (representing the one
or more dendrites) and sums them to produce an output (synapse). Usually the sums of
each node are weighted, and the sum is passed through a non-linear function known as
an activation or transfer function. It is represented by
!
X
y =φ w i xi
i
y = 1 if u ≥ θ
and
y = 0 if u < θ
A neuron with step function as it’s transfer function is also called a perceptron.
Sigmoid:
1
y=
1 + e−u
It is shown in figure 2.4
Tanh:
y = tanh(u)
• Data processing, including filtering, clustering, blind signal separation and compres-
sion
A FFNN looks like as shown in figure 2.6. Feed-forward networks have the following
characteristics:
• Neurons are arranged in layers, with the first layer taking in inputs and the last
layer producing outputs. The middle layers have no connection with the external
world, and hence are called hidden layers.
• Each neuron in one layer is connected to every neuron on the next layer. Hence
information is constantly ”fed forward” from one layer to the next, and this explains
why these networks are called feed-forward networks.
2.4. Artificial Neural Networks 17
For the application of FFNNs to any area, following are the parameters one needs to
consider and tune through the experimentation:
Number of hidden layers: In practice, neural networks with one and occasionally
two hidden layers are widely used and have performed very well. Increasing the number of
hidden layers also increases computation time and the danger of over fitting which leads
to poor out-of-sample forecasting performance. Over fitting occurs when a forecasting
model has too few degrees of freedom. In other words, it has relatively few observations in
relation to its parameters and therefore it is able to memorise individual points rather than
learn the general patterns. Number of weights also shouldn’t be too large. The greater
the number of weights relative to the size of the training set, the greater the ability of the
network to memorise idiosyncrasies of individual observations. As a result, generalisation
for the validation set is lost and the model is of little use in actual forecasting.
Number of hidden neurons: Despite its importance, there is no .magic. formula
for selecting the optimum number of hidden neurons. Therefore, researchers fall back on
experimentation. However few rules of thumb in literature are as follows:
• For a three-layer network with n input neurons and m output neurons, the hidden
layer would have sqrt(nm) neurons.
• The number of hidden neurons in a three-layer neural network should be 75% of the
number of input neurons.
• Optimal number of hidden neurons will generally be found between one-half to three
times the number of input neurons.
• Optimal number of hidden neurons will generally be found between one-half to three
times the number of input neurons.
• Double the number of hidden neurons until the network’s performance on the testing
set deteriorates.
• There should be at least five times as many training facts as weights, which sets an
upper limit on the number of input and hidden neurons.
18 2.4. Artificial Neural Networks
Regardless of the method used to select the range of hidden neurons to be tested, the
rule is to always select the network that performs best on the testing set with the least
number of hidden neurons.
Number of output neurons: Generally answer to this would be one. But one more
neuron can also be employed to predict the direction of price movement.
Transfer functions: The majority of current neural network models use the sigmoid
(S-shaped) function, but others such as the hyperbolic tangent, step, ramping, arc tan, and
linear have also been proposed. The purpose of the transfer function is to prevent outputs
from reaching very large values which can paralyse neural networks and thereby inhibit
training. Linear transfer functions are not useful for nonlinear mapping and classification.
It has been found that financial markets are nonlinear and have memory suggesting that
nonlinear transfer functions are more appropriate. Transfer functions such as the sigmoid
are commonly used for time series data because they are nonlinear and continuously
differentiable which are desirable properties for network learning.
Figure 2.7 shows a FFNN of neurons for XOR function which, as said earlier, a single
neuron was unable to learn. So how a FFNN is made to learn. Or in other words, how all
the weights for a particular function are found out. So this is done by Backpropagation
with gradient descent algorithm.
2. For each input X p =< xpn , xpn−1 , ......, xp0 >, where p iterates over all input patterns,
modify error and weights as follows
1
E = (tp − op) + E
2
p
∆wji = ηδjp op
where
X
δjp = ((wkj δkp )opj (1 − opj ))
k∈next−layer
for hidden layer. Here i, j and k iterate over neurons in previous, current and next
layer respectively. η is learning factor which is a tunable parameter. Target output
= t, Observed output = o. i and j iterate over all the neurons in input and output
layer respectively. wji denotes a weight for connection from it h neuron in previous
layer to j t h neuron in next layer.
Basic principle of SVM is that given a set of points which need to be classified into two
classes, find a separating hyperplane which maximises the margin between the two classes.
This will ensure the better classification of the unseen points, i.e. better generalisation.
See figure 2.8. It shows 3 hyperplanes. H3 doesn’t classify at all. H1 classifies, but it isn’t
the maximal separating hyperplane, while H2 is.
Formalisation: We are given some training data, a set of points of the form
where the ci is either 1 or −1, indicating the class to which the point xi belongs. Each xi
is a p-dimensional real vector. We want to give the maximum-margin hyperplane which
divides the points having ci = 1 from those having ci = −1. Any hyperplane can be
written as the set of points x satisfying
w · x − b = 0.
The vector w is a normal vector: it is perpendicular to the hyperplane. See figure 2.9. The
b
parameter ||w||
determines the offset of the hyperplane from the origin along the normal
vector w.
We want to choose the w and b to maximise the margin, or distance between the
parallel hyperplanes that are as far apart as possible while still separating the data.
These hyperplanes can be described by the equations
w · x − b = 1 and
w · x − b = −1
Note that if the training data are linearly separable, we can select the two hyperplanes
of the margin in a way that there are no points between them and then try to maximise
their distance. By using geometry, we find the distance between these two hyperplanes is
2
||w||
, so we want to minimise ||w||. As we also have to prevent data points falling into the
margin, we add the following constraint: for each i either
22 2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR)
ci (w · xi − b) ≥ 1, ∀ : 1 ≤ i ≤ n. (2.2)
Primal Form: The optimisation problem presented in the preceding section is hard
to optimise because it depends on the absolute value of |w|. The reason is that, in
mathematical terms, it is a non-convex optimization problem which are known to be much
more difficult to solve. Fortunately it is possible to alter the equation by substituting ||w||
with 21 ||w||2 without changing the solution (the minimum of the original and the modified
equation have the same w and b). This is a quadratic programming (QP) optimization
problem. More clearly,
1
minimize:||w||2 ,
2
subject to: ci (w · xi − b) ≥ 1, ∀ : 1 ≤ i ≤ n. (2.4)
1
The factor of 2
is used for mathematical convenience. This problem can now be solved
by standard quadratic programming techniques and programs.
Dual Form: Writing the classification rule in its unconstrained dual form reveals that
the maximum margin hyperplane and therefore the classification task is only a function
of the support vectors, the training data that lie on the margin. The dual of the SVM
can be shown to be:
n
X 1X
maximize: αi − αi αj ci cj xTi xj
i=1
2 i,j
n
X
subject to: αi ≥ 0 and αi ci = 0 (2.5)
i=1
where the α terms constitute a dual representation for the weight vector in terms of the
training set:
X
w= α i ci x i
i
2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR) 23
Soft margin: In 1995, Corinna Cortes and Vladimir Vapnik suggested a modified
maximum margin idea that allows for mislabeled examples[14]. If there exists no hyper-
plane that can split the ”yes” and ”no” examples, the Soft Margin method will choose
a hyperplane that splits the examples as cleanly as possible, while still maximizing the
distance to the nearest cleanly split examples. This work popularized the expression Sup-
port Vector Machine or SVM. The method introduces slack variables, ξi , which measure
the degree of misclassification of the datum xi
ci (w · xi − b) ≥ 1 − ξi , ∀ : 1 ≤ i ≤ n (2.6)
The objective function is then increased by a function which penalises non-zero ξi , and
the optimisation becomes a trade off between a large margin, and a small error penalty.
If the penalty function is linear, then we have to
1 X
minimize: ||w||2 + C ξi ,
2 i
subject to: ci (w · xi − b) ≥ 1 − ξi , ∀ : 1 ≤ i ≤ n. (2.7)
This constraint in equation 2.7 along with the objective of minimizing |w| can be solved
using Lagrange multipliers. The key advantage of a linear penalty function is that the
slack variables vanish from the dual problem, with the constant C appearing only as
an additional constraint on the Lagrange multipliers. Non-linear penalty functions have
been used, particularly to reduce the effect of outliers on the classifier, but unless care is
taken, the problem becomes non-convex, and thus it is considerably more difficult to find
a global solution.
Non-linear classification: Above discussed classifier is a linear classifier. However,
in 1992, Bernhard Boser, Isabelle Guyon and Vapnik suggested a way to create non-linear
classifiers by applying the kernel trick to maximum-margin hyperplanes[15]. The resulting
algorithm is formally similar, except that every dot product is replaced by a non-linear
kernel function. This allows the algorithm to fit the maximum-margin hyperplane in the
transformed feature space. The transformation may be non-linear and the transformed
space high dimensional; thus though the classifier is a hyperplane(linear) in the high-
dimensional feature space it may be non-linear in the original input space. Figure 2.10
shows an example of such a transformation.
24 2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR)
If the kernel used is a Gaussian radial basis function, the corresponding feature space
is a Hilbert space of infinite dimension. Maximum margin classifiers are well regularized,
so the infinite dimension does not spoil the results. Some common kernels include,
• Sigmoid: k(x, x0 ) = tanh(κx · x0 + c), for some (not every) κ > 0 and c < 0
A version of a SVM for regression was proposed in 1996 by Vladimir Vapnik, Harris
Drucker, Chris Burges, Linda Kaufman and Alex Smola[16]. This method is called sup-
port vector regression (SVR). The model produced by support vector classification (as
described above) only depends on a subset of the training data, because the cost function
for building the model does not care about training points that lie beyond the margin.
Analogously, the model produced by SVR only depends on a subset of the training data,
because the cost function for building the model ignores any training data that are close
(within a threshold ε) to the model prediction.
The basic idea: Suppose we are given training data
These might be, for instance, exchange rates for some currency measured at subsequent
days together with corresponding econometric indicators. In SVR, our goal is to find a
function f (x) that has at most ε deviation from the actually obtained targets yi for all
the training data, and at the same time is as flat as possible. In other words, we do not
care about errors as long as they are less than ε, but will not accept any deviation larger
than this. This may be important if you want to be sure not to lose more than ε money
when dealing with exchange rates, for instance.
For pedagogical reasons, we begin by describing the case of linear functions f , taking
the form
f (x) = hw, xi + b with w ∈ X, b ∈ R (2.8)
where h·, ·i denotes the dot product in X. Flatness in the case of equation 2.8 means that
one seeks a small w. One way to ensure this is to minimize the norm, i.e. ||w||2 = hw, wi.
We can write this problem as a convex optimization problem:
1
minimize: ||w||2
2
y − hw , x i − b ≤ ε
i i i
subject to: (2.9)
hw , x i + b − y ≤ ε
i i i
Soft Margin: The tacit assumption in equation2.9 was that such a function f actually
exists that approximates all pairs (xi , yi ) with ε precision, or in other words, that the
convex optimization problem is feasible. Sometimes, however, this may not be the case,
or we also may want to allow for some errors. Analogously to the ”soft margin” loss
function which was used in SVMs by Cortes and Vapnik (1995)[14], one can introduce
slack variables ξi , ξi∗ to cope with otherwise infeasible constraints of the optimization
problem in equation 2.9. Hence we arrive at the formulation:
l
1 2 X
minimize: ||w || + c (ξi + ξi∗ )
2 i=1
y − hwi , xi i − b ≤ ε + ξi
i
subject to: hwi , xi i + b − yi ≤ ε + ξi∗ (2.10)
ξ + ξ∗ ≥ 0
i i
The constant C > 0 determines the trade-off between the flatness of f and the amount
up to which deviations larger than ε are tolerated. This corresponds to dealing with a so
26 2.5. Support Vector Machine(SVM) and Support Vector Regression(SVR)
Figure 2.11 depicts the situation graphically. Only the points outside the shaded region
contribute to the cost insofar, as the deviations are penalized in a linear fashion. It turns
out that in most cases the optimization problem in equation 2.10 can be solved more easily
in its dual formulation. Moreover, the dual formulation provides the key for extending
SVmachine to nonlinear functions.
Non-linear regression: For non-linear regression the kernel functions discussed in
section 2.5.1 are applicable in SVR too. In that case dot products in equation 2.10 would
be replaced by the kernel function.
For more details on SVM and SVR, please refer [17, 18].
Chapter 3
Literature Survey
ARIMA, FFNN and SVR model, all of these find their applications in time series fore-
casting. Only few are the following examples:
• Kohzadi, Boyd, Kermanshahi and Kaastra in [20] in year 1996, compare NN with
ARIMA. Data used was monthly live cattle and wheat prices from 1950 through
1990. The neural network models achieved a 27% and 56% lower mean squared
error than ARIMA model. The absolute mean error and mean absolute percent
error were also lower for the neural network models. The neural network models
were able to capture a significant number of turning points for both wheat and cattle,
while the ARIMA model was only able to do so for wheat. They used mean square
error(MSE), mean absolute error (MAE), mean absolute percent error (MAPE), as
previously described, along with a measure for ability to forecast turning points to
compare both methods. Paper gives very much detailed comparison, but fails to
detail on the specifics of NN and ARIMA models used.
• H. F. Zou and G. P. Xia and F. T. Yang and H. Y. Wang[21] in year-2007 used FFNN
and time series models for Chinese food grain price forecasting. FFNN model was
found out to be the best.
• SVR was used for load forecasting by B. Chen and M. Chang and C. Lin in 2001[22].
27
28
Wei-Chiang Hong and Ping-Feng Pai and Shun-Lin Yang and Theng, R.[23] in year
2006.
Financial time series are non linear and among the noisiest and most difficult signals
to forecast. This has led many economists to adopt the efficient market hypothesis[24].
This hypothesis states that price changes are independent of the past, follow a random
walk, and therefore price changes are unpredictable. Although there is strong evidence
in literature against the efficient market hypothesis, it remains still a formidable task to
develop an accurate and robust financial market forecasting system.
FFNNs, SVR and ARIMA models have been and are being vastly studied in financial
forecasting also. For example:
• The modelling of the Indian stock market (price index) data using NN was done by
Mohan Neeraj and Jha Pankaj and Laha Arnab Kumar and Dutta Goutam[25] in
2005. They studied the efficacy of NN in modelling the Bombay Stock Exchange
(BSE) SENSEX weekly closing values. Paper gives very minute details about con-
structive NN model building performed; such as inputs used, transfer functions used,
how the number of layers and neurons per layer were varied throughout; and also
about the comparison of the NN models of different configurations employed.
• In financial forecasting using FFNN, technical indicators along with the original
series were used as reported in [26, 27], which improved the performance. [28] also
reports addition of the fundamental information about the companies for the better
results.
perceptrons, radial basis function network and a standard ARIMA model. Dow-
Jones Index was the subject of forecasting.
• In 2004 forecasting of the NIFTY stock index futures returns is carried out using
backpropagation and recurrent neural network model and a linear ARIMA model by
M. Kumar and M. Thenmozhi[31]. A comparison of different models shows that for
NIFTY index futures returns backpropagation neural network model outperforms
the recurrent neural network and the traditional ARIMA models. Moreover recur-
rent neural network models outperform the traditional ARIMA models. A 3-2-1
neural network architecture is best fit for forecasting NIFTY futures returns.
• In [32] by Wun-Hua Chen and Jen-Ying Shih and Soushan Wu (year 2006) AR(1)
model, support-vector machines and back propagation neural networks were used
for forecasting the six major Asian stock markets. FFNN with BP and SVM out-
performed AR(1) in MSE, MAE, but AR(1) was better in directional symmetry1 .
Performances FFNN and SVMs were marginally different.
• SVR was studied in detail for forecasting of four major indices in Kuala Lumpur
Stock Exchange by Chai Chon Lung in 2006[33]. SVR with polynomial and RBF
kernels were compared, concluding that polynomial kernel performs better than
RBF. Also FFNN with BP was compared with SVR. SVR with polynomial kernel
was found to be superior than FFNN too.
• In 2000 FFNN were compared against SVR subjected to forecasting of AOL, YA-
HOO and IBM stock prices by Theodore B. Trafalis and Huseyin Ince[34]. Different
models of SVRs and also FFNNs were applied. For IOL best values of MSEs for
SVM and FFNN were, 2.853779 and 2.3512 respectively. For IBM best values of
MSEs for SVM and FFNN were, 2.772304 and 2.8020 respectively. And for YAHOO
best values of MSEs for SVM and FFNN were, 20.81991 and 10.6003 respectively.
Thus FFNNs came out to be winners for AOL and YAHOO with a great margin,
while SVRs beat FFNNs for IBM with a small margin.
All this literature suggests that FFNNs are better than ARIMA models at financial
forecasting. But strong conclusion about SVR models in comparison with FFNN and
1
for the definitions of these terms refer section 2.1.1
30
ARIMA models is difficult to draw. Also it was found that for financial forecasting using
SVR models, RBF and Polynomial kernel functions are applicable. Hence in this study
ARIMA, FFNN and SVR(with RBF and Polynomial kernels) models were compared for
forecasting performance subject to three main Indian stock market indices.
Also in the literature, there is absence of long term forecasting. E.g., For daily stock
values, long term forecasting would mean that once the model is trained or estimated,
then the model is used for forecasting the daily stock prices for several months without
retraining or re-estimating the model.
There are many stock markets in India and the most prominent of them are the BSE
and NSE, and between them are responsible for the vast majority of share transactions.[35].
Their brief details are as follow:
• BSE: The Bombay Stock Exchange Limited (popularly called The Bombay Stock
Exchange, or BSE) is the oldest stock exchange in Asia. It is also the biggest stock
exchange in the world in terms of listed companies with 4,800 listed companies as
of August 2007[2] and it has a significant trading volume. On 31 December 2007,
the equity market capitalization of the companies listed on the BSE was US$ 1.79
trillion, making it the largest stock exchange in South Asia and the tenth largest in
the world[3]. The BSE SENSEX (SENSitive indEX), also called the ”BSE 30”, is a
widely used market index in India and Asia. Though many other exchanges exist,
BSE and the National Stock Exchange of India account for most of the trading in
shares in India.
• NSE The National Stock Exchange of India Limited (NSE)[4], is the largest stock
exchange in India in terms of daily turnover and number of trades, for both equities
and derivative trading[5]. As of 2006, the NSE VSAT terminals, 2799 in total,
cover more than 1500 cities across India[36]. In October 2007, the equity market
capitalization of the companies listed on the NSE was US$ 1.46 trillion, making
it the second largest stock exchange in South Asia. NSE is the third largest Stock
Exchange in the world in terms of the number of trades in equities[3]. It is the second
fastest growing stock exchange in the world with a recorded growth of 16.6%[6]. The
NSE’s key index is the S&P CNX Nifty, known as the Nifty, an index of fifty major
stocks weighted by market capitalisation. Information Technology (IT) industry
31
has played a major role in the Indian economy during the last few years. Hence the
index CNX IT has also gained considerable importance this decade.
From these details, it’s clear that Indian stock market has become a noticeable stock
market, not just in Asia, but in the world due to the contributions of NSE and BSE; and
that BSE Sensex, S&P CNX Nifty and CNX IT are the important indices in Indian stock
market. Here are few briefs about these indices:
• BSE Sensex[37]: The BSE Sensex or Bombay Stock Exchange Sensitive Index is
a value-weighted index composed of 30 stocks. It consists of the 30 largest and most
actively traded stocks, representative of various sectors, on the Bombay Stock Ex-
change. These companies account for around one-fifth of the market capitalization
of the BSE. The base value of the sensex is 100 on April 1, 1979, and the base year
of BSE-SENSEX is 1978-79. The index has increased by over ten times from June
1990 to the present. Using information from April 1979 onwards, the long-run rate
of return on the BSE Sensex works out to be 18.6% per annum, which translates to
roughly 9% per annum after compensating for inflation[38].
• S&P CNX Nifty[39]: S&P CNX Nifty nicknamed Nifty 50 or simply Nifty (NSE),
is the leading index for large companies on the National Stock Exchange of India.
The Nifty is a well diversified 50 stock index accounting for 21 sectors of the economy.
It is used for a variety of purposes such as benchmarking fund portfolios, index based
derivatives and index funds. The traded value for the last six months of all Nifty
stocks is approximately 44.89% of the traded value of all stocks on the NSE. Nifty
stocks represent about 58.64% of the total market capitalization as on March 31,
2008. Impact cost of the S&P CNX Nifty for a portfolio size of Rs.2 crore is 0.15%.
• CNX IT[40]: A number of large, profitable Indian companies today belong to the
IT sector and a great deal of investment interest is now focused on the IT sector.
In order to have a good benchmark of the Indian IT sector, IISL has developed
the CNX IT sector index. CNX IT provides investors and market intermediaries
with an appropriate benchmark that captures the performance of the IT segment
of the market. Companies in this index are those that have more than 50% of their
turnover from IT related activities like software development, hardware manufac-
ture, vending, support and maintenance. The average total traded value for the
32
last six months of CNX IT Index stocks is approximately 91% of the traded value
of the IT sector. CNX IT Index stocks represent about 96% of the total market
capitalization of the IT sector as on March 31, 2005. The average total traded value
for the last six months of all CNX IT Index constituents is approximately 14% of
the traded value of all stocks on the NSE. CNX IT Index constituents represent
about 14% of the total market capitalization as on March 31, 2005.
Experimental Setup
33
34 4.1. Data Collection
This chapter gives details about the experimental setup and the steps followed in experi-
ments. Figure 4.1 shows the flowchart representation of the overall methodology followed
in the experiments. Details of each step and the experimental setup is discussed below.
• BSE Sensex: Historical quotes for BSE Sensex index starting from 1st July 1997
are available at [41]. From this site historical quotes for the period of 1st July 1997
to 27 December 2007 for BSE Sensex index were downloaded. It consists of open,
high, low, close, volume, adjusted close values for each of the stock market working
days. It has 2474 number of rows each containing one set of above mentioned
values. The duration is long enough in order to model the prices accurately and
has a ’boom-bust’ cycle. This is important since otherwise the ANNs will not be
properly trained to handle a ’boom-bust’ cycle if that happens in future.
4.1. Data Collection 35
• S&p CNX Nifty: From [4] historical quotes for the period of 1st July 1990 to 14th
March 2008 for S&p CNX Nifty index were downloaded. It consists of open, high,
low, close, volume, adjusted close values for each of the stock market working days.
It has 4220 number of rows each containing one set of above mentioned values.
• CNX IT: From [4] historical quotes for the period of 1st January 1996 to 14th
March 2008 for CNX IT index were downloaded. It consists of open, high, low,
close, volume, adjusted close values for each of the stock market working days. It
has 3037 number of rows each containing one set of above mentioned values.
Table 4.1: Statistical details of BSE Sensex, S&P CNX Nifty and CNX IT
Figures 4.2,4.3 and 4.4 show 2-D plota of BSE Sensex, S&P CNX Nifty and CNX IT
respectively. Their statistical details are also shown in table 4.1
36 4.2. Data Pre-preocessing
• Latest Data(L): This set consisted of 200 latest index values( nearly 11 months of
closing values). This data was used for testing the long run forecasting performance
of the models after the estimation/training. This testing is different than the testing
used while training the models. ’L’ enclosed in brackets will be used henceforth to
indicate the values of error measures corresponding to ’Latest Data’.
4.2. Data Pre-preocessing 37
• Historical data(H): This set consisted of remaining trailing index values. This
data was used for training(which also includes validation and testing, this testing
is different than the above mentioned one; this testing is performed during estima-
tion of the weights involved in the SVR and FFNN models.) the neural networks
and SVRs, and in case of ARIMA it was used for model estimation. ’H’ enclosed
in brackets will be used henceforth to indicate the values of error measures corre-
sponding to ’Historical Data’.
Model estimation was performed using historical data, and then latest data was used
for the long run forecasting performance. To use historical or latest data for FFNN and
SVR, it needed to be further processed to convert it into the form of vector sets. For
both FFNN and SVR, the data needs to be in the format of set of input-output vector.
i.e. each vector contains the input for the model and the output that the model should
produce. This is the basic requirement of supervised learning.
INPUT OUTPUT
x1 , x2 , ..., xw xw+1
x2 , x3 , ..., xw+1 xw+2
. .
. .
. .
. .
xn−w , xn−w+1 , ..., xn−1 xn
Let xn , xn−1 , ......, x2 , x1 represent scaled value set H or L. Let w be the window size. The
window size determines that how many trailing data points should be used to forecast
the current value. In case of FFNN, window size is equal to the number of input layer
neurons. After the processing, vector sets would look like as shown in table 4.2. Each row
in the table represents one vector. As only one-day-ahead forecasing is to be performed,
’OUTPUT’ column of the table contains only one data value for each row(vector). The
process is shown in figure 4.6.
MATLAB[42] was used for this step.
38 4.3. Model Estimation/Training
The pictorial representation of these steps is shown in figure 4.7. For performing these
steps SAS[43] software was used. This software uses likelihood method to determine the
model parameters ϕi and θj (see equation 2.1).
Neural Network Toolbox in MATLAB was used for this part of the experiment. It provides
a wide variety of BP algorithms which are: Batch Gradient Descent(GD), Batch Gradient
Descent with Momentum, GD with Variable Learning Rate, GD with Variable Learning
Rate and momentum, Scaled Conjugate Gradient, BFGS Quasi-Newton, One Step Secant
Quasi-Newton and Levenberg-Marquardt. Out of these ’Levenberg-Marquardt’ BP algo-
rithm is the fastest and memory efficient one[44] according to the literature, which was
actually observed in few initial experiments. Hence thereafter only ’Levenberg-Marquardt’
BP algorithm was used in all the experiments. That means all the FFNN experiments
documented in this report are carried out with this algorithm only.
Different FFNN parameter values that were used in the experiments are as shown in
table 4.3. All the combinations of these values were used for each of the three series. And
40 4.3. Model Estimation/Training
in turn onto each combination of these values for each series holdout verification method
with 20 runs was applied.
In holdout verification, vectors are chosen randomly from the initial sample to form
the validation data, and the remaining observations are retained as the training data.
Normally, less than a third of the initial sample is used for validation data. In this case
validation set was 0.2th fraction of the original set.
The pictorial representation of these steps is shown in figure 4.8. Thus for each series
6440 neural networks were trained and kept to test for forecasting in the next step of the
experiment.
SVMdark[45] tool was used for this part of the experiment. It uses SVMlight[46] library
for the SVM computations. This library implements optimization algorithms described
in [47]. It supports linear, polynomial, rbf and sigmoid kernels. All of these kernels were
tried in the experiments.
The decision on selecting appropriate values of d is basically by trial and error. How-
ever, based on a study by Ali and Smith (2003)[48] using various sample sets with different
number of attributes and sizes, their experiments showed that the search space for d val-
ues should be ranged from 2 to 5. Therefore, in this project, the values of polynomial
vector set. Each time using optimisation tool, the best parameter values and the kernel
will be found out. Then for these values SVR model will be trained using ”Learn” option
in SVMdark and the trained model will be used for forecasting in the next step of the
experiment.
4.4 Forecasting
All the models developed in previous step are used to one-day ahead forecast the latest
data(L) which consists of 200 points. In case of SVR, SVMdark tool is used. For FFNN
44 4.5. Results Analysis
and ARIMA MATLAB is used. For ARIMA, equation 2.1 is implemented in MATLAB,
and the values of the parameters from model estimates developed in SAS are substituted
in the equation. Forecasted values are saved for further analysis.
Aim of this forecasting is to compare the forecasting performance of the models over
a long run, without making any changes in the model, without retraining the model.
In this chapter all the results are presented along with the analysis and discussion. First
the results for individual methods are presented, starting with ARIMA, then FFNN and
then SVR. Lastly these three models are compared with each other. For analysis of the
forecasting performance of the different models on Latest data(L)(L and H are discussed in
section 4.2) three error measures: MAPE, MAE and DS(see section 2.1.1 for definitions of
these terms) were used. These terms will be denoted by MAPE(L), MAE(L) and DS(L)
respectively hereafter to indicate that they are calculated on latest data L. A model
with minimum MAPE(L) and MAE(L) is considered to be better at forecasting values of
the series. While model with maximum DS(L) is considered to be better at forecasting
direction of movement of the series.
The model identification was performed using ACF and PACF plots(see appendix A for
the details). Figure 5.1 shows the ACF and PACF plots for BSE Sensex with d = 0. The
two lines parallel to X-axis show 95% confidence interval. Anything inside these lines can
be treated as zero. In the figure, ACF plot doesn’t decay rapidly, which means that the
series is non-stationary, hence needs differencing(according to rule 2 in appendix A).
45
46 5.1. Results For ARIMA Models
Figure 5.2 shows ACF and PACF plots for BSE Sensex. From ACF plot it’s clear
that there is no need of further differencing(according to rule 1 in appendix A). Here
ACF and PACF both plots are mixtures of exponentials and damped sine waves, which
start after the first lag Hence according to rule 5 in appendix A the model identified
is ARIMA(1,1,1). Also these plots resemble the pattern from region 4 in figure A.3,
which ensures ARIMA(1,1,1) model. Validity of this model was confirmed in the model
estimation(i.e. next) step when it was observed that the values of φ1 and θ1 for each of
these three series which were estimated using SAS software, fall in the region 4 only as
shown in table 5.1. Also out of all the ARIMA models that were tried, it was observed that
ARIMA(1,1,1) model gave minimum MAPE(L), minimum MAE(L) and also maximum
DS(L)(see the performance of models BARIMA1, CARIMA1 and SARIMA1 from tables
5.3, 5.4 and 5.5 respectively).
Though there is no need of further differencing, one more differencing was performed,
as the models with d = 2 were also tested for forecasting in the next step of the experi-
ments. Figure 5.3 shows ACF and PACF plots for that. From the figure, PACF decays as
good as exponentially, while ACF is nonzero till the second lag. Hence according to rule
4 in appendix A, ARIMA(0,2,2) may also be fit for this series. These plots resemble the
pattern from region 1 in figure A.2, which ensures ARIMA(0,2,2) model. In the model
estimation(i.e. next) step, it was observed that the values of θ1 and θ2 for each of these
three series also fall in the region 1 of figure A.2. See table 5.2. But the forecasting
performance of this model was too poor for all of the three series(see the performance of
models BARIMA2, CARIMA2 and SARIMA2 from tables 5.3, 5.4 and 5.5 respectively).
It shows that this second differencing is absolutely unnecessary.
For CNX IT and S&P CNX Nifty also same identification procedure was followed.
They also produced exactly the same results as that for BSE Sensex. Hence these results
are not presented here. As all the three series have shown similar statistical characteristics,
we expect that further results will also be similar for these three series.
5.1. Results For ARIMA Models 47
Tables 5.3, 5.4 and 5.5 show the forecasting performance of different ARIMA models on
BSE Sensex, CNX IT and S&P CNX Nifty respectively. Tables are sorted in ascending
order of MAPE(L). Observe that
• for all the three series ARIMA(1,1,1) model has performed better than all the other
ARIMA models with respect to MAPE(L), MAE(L) and as well as DS(L). This
supports the findings of the model identification step from section 5.1.1.
• all the models with d = 2 show drastic reduction in performance for all the series,
which again supports findings of the section 5.1.1.
• MA models (i.e. ARIMA with p = d = 0) perform very badly for all the series.
This indicates that they are not suitable for these kind of series.
• MAPE(L) values are least while MAE(L) values are maximum for BSE SENSEX
out of the three series.
• DS(L) of nearly 80% was achieved for each of the three series.
• for a series, MAE(L) increases and DS(L) decreases with the increase in MAPE(L).
Main conclusion of this section is that ARIMA(1,1,1) model fits best for these three
series. This was all about forecasting using ARIMA models. In the next subsection,
forecasting using FFNN is analysed.
Table 5.3: Comparison of different ARIMA models for forecasting on BSE Sensex
5.1. Results For ARIMA Models 49
Table 5.5: Comparison of different ARIMA models for forecasting on S&P CNX Nifty
5.2. Results For FFNN Models 51
Number Of Transfer
Min Win Neurons Per function Model
MAPE(L) MAE(L) DS(L) Size Layer Per Layer ID
1.1180 182.7829 53.8462 5 [7 2 1 ] [tanh tanh lin] BNN1
1.1277 184.2603 54.2105 10 [2 1 ] [tanh tanh] BNN2
1.1734 192.0728 52.8497 7 [10 3 1 ] [tanh tanh lin] BNN3
1.1894 199.1636 54.4444 20 [3 2 1 ] [tanh tanh lin] BNN4
1.1991 204.5997 61.2500 40 [15 3 1 ] [sigmoid sigmoid lin] BNN5
1.3314 228.0641 57.1429 60 [15 2 1 ] [sigmoid sigmoid lin] BNN6
1.4141 243.8137 57.5000 80 [7 3 1 ] [sigmoid sigmoid lin] BNN7
Table 5.6: Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for BSE Sensex.
Out of 6440 FFNNs for each series, tables 5.6, 5.7 and 5.8 show the models with
minimum MAPE(L) values obtained per window size for BSE Sensex, CNX IT and S&P
CNX Nifty. Details of the models are also shown in these tables. Tables are sorted by
MAPE(L) values in ascending order. Table 5.9 shows minimum and average MAPE(L)
values per frame size for these series. This table is sorted in ascending order of window
size.
For BSE Sensex, BNN1 model performed best in terms of MAPE(L) and MAE(L)(table
5.6). For CNX IT, CNN1 model performed best in terms of MAPE(L) and MAE(L)(table
5.7). For S&P CNX Nifty, SNN1 model performed best in terms of MAPE(L) and
MAE(L)(table 5.8).
52 5.2. Results For FFNN Models
Number Of Transfer
Min Win Neurons Per function Model
MAPE(L) MAE(L) DS(L) Size Layer Per Layer ID
1.1863 56.4792 58.1250 40 [10 1 ] [sigmoid lin] CNN1
1.1876 57.8142 50.8333 80 [15 1 ] [sigmoid lin] CNN2
1.2094 58.1401 53.5714 60 [2 1 ] [tanh lin] CNN3
1.3908 62.8498 56.3158 10 [3 1 ] [tanh lin] CNN4
1.4035 62.9083 57.4359 5 [2 1 ] [tanh lin] CNN5
1.4094 64.0949 58.3333 20 [2 1 ] [sigmoid lin] CNN6
1.4169 66.7119 52.8497 7 [5 2 1 ] [tanh tanh lin] CNN7
Table 5.7: Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for CNX IT.
Number Of Transfer
Min Win Neurons Per function Model
MAPE(L) MAE(L) DS(L) Size Layer Per Layer ID
1.0834 62.5298 66.6667 80 [15 2 1 ] [sigmoid sigmoid lin] SNN1
1.2787 74.2788 48.7805 60 [2 1 ] [tanh lin] SNN2
1.3300 78.2595 55.7377 40 [2 1 ] [tanh lin] SNN3
1.6342 90.5011 54.9451 10 [3 2 1 ] [tanh tanh lin] SNN4
1.7520 97.0058 54.1667 5 [5 1 ] [tanh lin] SNN5
1.7908 99.9121 41.9753 20 [2 1 ] [sigmoid lin] SNN6
1.7977 99.7327 58.5106 7 [7 2 1 ] [tanh tanh lin] SNN7
Table 5.8: Minimum MAPE(L) obtained per window size along with the details of the
FFNNs for S&P CNX Nifty.
5.2. Results For FFNN Models 53
According to literature[49, 50] FFNNs with one sigmoid layer and one linear layer are
capable of modeling non-linear relationships of great complexity. From these tables it’s
very clear that linear transfer function is best suited in the output layer. But for the
first layer sigmoid and tanh both have performed nearly equally. For BSE Sensex, 3-layer
FFNNs have dominated the table 5.6, for CNX IT two layer FFNNs have dominated the
table 5.7, and for S&P CNX Nifty table 5.8 shows mixture of 2 and 3 layer FFNNs. This
means that out of these three series BSE Sensex series is the most complex and CNX IT
is the least complex to forecast.
MAE(L) values are higher for BSE Sensex than the other two series. Similar finding
was there using ARIMA models too(see section 5.1). DS(L) is only 50-60% for all the
models except SNN1. It’s considerably low.
Figure 5.4 shows the plot of minimum MAPE(L) values against window size for the
three series. It shows that for CNX IT and S&P CNX Nifty, MAPE(L) increases with
decrease in the window size. That means each value depends on more number of preceding
values for these two series. For BSE Sensex it’s the other way. That means each value
depends upon less number of preceding values. i.e. BSE Sensex seems to be most dynamic
in nature of all. From the plot, 40 to 80 window size seems to be giving reasonably low
MAPE(L) values for all the three series.
Table 5.9 shows clearly the difference between minimum and average MAPE(L) values
per window size. The difference is huge. This is due to the fact that for FFNNs, con-
vergence highly depends upon the initialised weights vector before starting the training.
Table 5.9: Min and Avg MAPE(L)s obtained per window size
54 5.2. Results For FFNN Models
If it is near to the local minima, then it will get stuck in it. If it’s near global minima,
there are higher chances that best model will be obtained. This large difference between
minimum and average MAPE(L) values shows that more often local minima was met than
the global.
Neurons Transfer
Window Per function Per Model
DS(L) MAPE(L) MAE(L) Size Layer Layer ID
66.6667 3.3060 606.3522 80 [20 1 ] [tanh tanh] BNN1D
64.3750 2.4693 447.8891 40 [3 2 1 ] [tanh tanh tanh] BNN2D
63.5714 11.0811 2113.6480 60 [10 3 1 ] [tanh tanh tanh] BNN3D
61.1111 20.3531 3249.0836 20 [3 2 1 ] [tanh tanh tanh] BNN4D
59.4737 38.2639 7049.1890 10 [7 1 ] [tanh tanh] BNN5D
58.4615 10.2440 1920.0727 5 [15 5 1 ] [tanh tanh tanh] BNN6D
58.0311 4.2488 690.8693 7 [10 2 1 ] [tanh tanh lin] BNN7D
Table 5.10: Maximum DS(L)s obtained per window size for BSE Sensex using FFNN
models
Tables 5.10, 5.11 and 5.12 show the maximum values obtained for different FFNN
models for BSE Sensex, CNX IT and S&P CNX Nifty. Tables are sorted in descending
order of DS(L). Table 5.12 shows maximum and average DS(L) values obtained for the
three series. This table is sorted in ascending order of window size.
Best DS(L) of approximately 67% is obtained for BSE Sensex and S&P CNX Nifty.
For CNX IT it is around 62%. Table 5.10 is dominated by 3-layer FFNNs, again indi-
cating complexity of BSE Sensex series. Also the last layer for FFNNs for BSE Sensex is
dominated by tanh transfer function, again, indicating complexity in BSE Sensex.
Large difference between max and avg DS(L) values(see table 5.13) again shows the
effect of random weight initialisation on chances of obtaining the global minima in error
surface.
Max DS(L) values are plotted against window size in figure 5.5. From the plot it can
be inferred that DS(L) generally increases with the window size, with 60-80 range giving
5.2. Results For FFNN Models 55
Neurons Transfer
Window Per function Per Model
DS(L) MAPE(L) MAE(L) Size Layer Layer ID
62.1429 1.2795 61.0987 60 [3 1 ] [sigmoid lin] CNN1D
61.6667 1.3660 67.0545 80 [40 1 ] [tanh lin] CNN2D
61.6667 2.2270 110.5133 80 [7 1 ] [sigmoid sigmoid] CNN3D
60.5556 3.0592 134.0820 20 [7 2 1 ] [tanh tanh lin] CNN4D
60.5556 1.4856 67.4158 20 [3 2 1 ] [sigmoid sigmoid lin] CNN5D
59.5855 3.5478 154.2032 7 [7 1 ] [sigmoid sigmoid] CNN6D
59.4872 2.0002 87.8679 5 [7 1 ] [tanh tanh] CNN7D
59.3750 1.3164 62.6483 40 [3 1 ] [tanh lin] CNN8D
59.3750 1.4300 67.8715 40 [20 1 ] [sigmoid lin] CNN9D
59.3750 1.8004 85.4629 40 [15 7 1 ] [tanh tanh lin] CNN10D
58.9474 1.4102 63.6247 10 [10 5 1 ] [tanh tanh lin] CNN11D
Table 5.11: Maximum DS(L)s obtained per window size for CNX IT using FFNN models
Neurons Transfer
Window Per function Per Model
DS(L) MAPE(L) MAE(L) Size Layer Layer ID
66.9524 1.5371 88.4242 80 [2 1 ] [tanh lin] SNN1D
66.9524 8.0767 468.6258 80 [7 5 1 ] [tanh tanh lin] SNN2D
64.7317 1.5285 88.6016 60 [20 1 ] [tanh tanh] SNN3D
64.7317 2.8265 163.8969 60 [3 1 ] [tanh lin] SNN4D
64.7317 7.7946 455.9175 60 [15 1 ] [tanh lin] SNN5D
63.9344 6.6846 389.0944 40 [3 1 ] [tanh tanh] SNN6D
63.8298 3.3150 189.1471 7 [15 1 ] [tanh lin] SNN7D
61.5385 1.7110 95.1511 10 [15 5 1 ] [tanh tanh tanh] SNN8D
60.4167 5.2043 304.5783 5 [15 10 1 ] [tanh tanh lin] SNN9D
56.7901 4.8770 284.3519 20 [3 1 ] [tanh tanh] SNN10D
56.7901 3.8975 226.5408 20 [5 1 ] [sigmoid sigmoid] SNN11D
56.7901 4.0188 232.1541 20 [7 1 ] [sigmoid sigmoid] SNN12D
56.7901 52.0020 3014.5389 20 [5 3 1 ] [tanh tanh tanh] SNN13D
56.7901 9.0960 531.7788 20 [15 5 1 ] [tanh tanh tanh] SNN14D
56.7901 2.6682 151.2669 20 [10 7 1 ] [sigmoid sigmoid lin] SNN15D
Table 5.12: Maximum DS(L)s obtained per window size for S&P CNX Nifty using FFNN
models
56 5.3. Results For SVR Models
highest DS(L) values for all the three series. That means the direction of movement of
the series at each point in the series depends upon large number of preceding points, for
all the three series.
This section again is divided in two subsections. First compares the different kernels for
SVR against the forecasting performance. Second subsection documents and analyses the
results for polynomial kernels only. Other kernels are not discussed in it as the polynomial
kernels outperformed them by a great margin.
SVR was tried with polynomial, rbf, linear and sigmoid kernels. Out of these linear and
sigmoid never gave any solution. Hence they will not be discussed anymore hereafter.
Table 5.14 shows the comparison of SVRPOLY and SVRRBF based on MAPE(L) values
for different window sizes. It’s very much clear that SVRPOLY outperforms SVRRBF by
great margin for each case shown in the table except for the case of window size 10 for
S&P CNX Nifty. Hence in the next section, forecasting performance of only SVRRBF
models is studied in detail.
Table 5.13: Maximum and Avg DS(L)s obtained per window size with FFNN models
5.3. Results For SVR Models 57
Tables 5.15, 5.16 and 5.17 show the forecasting performance of SVRPOLY models on BSE
Sensex, CNX IT and S&P CNX Nifty respectively. For each window size, these tables
show only those models for which best performance was obtained. Other models are not
shown. Tables are sorted in ascending order of MAPE(L). From these tables following
observations follow:
• For all the three series, value of C is very high. It is always between the range of
100000 to 1000000. i.e. higher values of C result in better forecasts. Higher value
of C indicates that the model complexity is also higher.
• Values of ε are very low in all the models. It’s always in the range of 0.009 to 0.022,
except for the case of CSVR2 model (window size 10 on CNX IT) where it’s equal
to 0.0914. I.e. lower values of ε produce better forecasts. Smaller ε value indicate
the smaller ε-tube. And the smaller ε-tube means that the solution surface is too
tightly tied near the training points.
• For all the three series, polynomial degree d of 2 has produced best forecasts. Hence
means polynomial kernels with degree 2 are best suited for these series. Higher and
lover values of d than 2 consistently showed bad performance. That means, higher
values of d provide more complexity which overfits the model to the training data,
while lesser values of D do not provide enough complexity to learn the training
points.
• PDA(L) decreases and MAE(L) increases with MAPE(L) except few negligible ex-
ceptions.
• MAPE(L) values are least for BSE Sensex than other two series, but it’s MAE(L)
values are the highest. This was also observed in the case of ARIMA models as well
as FFNN models.
• DS(L) of approximately 58%, 55% and 61% has been achieved for BSE Sensex, CNX
IT and S&P CNX Nifty respectively.
In the next section, comparison of ARIMA, FFNN and SVR models is presented.
Table 5.18: Forecasting performance of ARIMA, FFNN and SVR for MAPE(L)
Table 5.18 and figure 5.6 show the comparison of ARIMA, FFNN and SVR models for
forecasting index values of BSE Sensex BSE Sensex, CNX IT and S&P CNX Nifty using
60 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models
MAPE(L). Table also gives the name of the model for which the noted MAPE(L) value
was obtained. For details of these models please refer previous sections of this chapter.
From both, the table and the figure, it’s very clear that FFNN models outperformed
ARIMA and SVR models by considerable margin in case of all the three series. ARIMA
performed batter than SVR for BSE Sensex, while SVR performed marginally better than
ARIMA for the other two series.
FFNN models are mostly reported to be better at forecasting than ARIMA models.
This is again confirmed here. But between FFNN and SVR models, from the literature, it
is not clear that which model is better. From these results it can be asserted that FFNN
models are better for long term forecasting than SVR models for these three series.
Table 5.19: Forecasting performance of ARIMA, FFNN and SVR for DS(L)
Table 5.19 and figure 5.7 show the comparison of ARIMA, FFNN and SVR models for
forecasting the direction of movement of indices BSE Sensex BSE Sensex, CNX IT and
S&P CNX Nifty using DS(L). Table also gives the name of the model for which the noted
DS(L) value was obtained. For details of these models please refer previous sections of
this chapter. From both, the table and the figure, it’s very clear that ARIMA models
outperformed FFNN and SVR models by considerable margin in case of all the three
series. Also FFNN models performed consistently better than SVR models.
The possible explanation for FFNN being better for forecasting the index value while
ARIMA being better at forecasting the direction of movement of index is that FFNNss
5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 61
are trained in terms of deviation performance; to minimise MSE. Thus, in the deviation
performance criteria i.e. forecasting the index value, they perform better than ARIMA.
With regard to the direction criteria, the time-series model has an intrinsic merit because
it emphasises the trend of the series. Therefore, the two AI models cannot outperform
ARIMA in the direction criteria in all the three series.
Figure 5.8 Shows the plots of Forecasted values by ARIMA, FFNN and SVR for BSE
Sensex
62 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models
Figure 5.1: ACF and PACF plots for BSE Sensex with d = 0. ACF shows need for
differencing.
5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 63
Figure 5.2: ACF and PACF plots for BSE Sensex with d = 1. Model identified as
ARIMA(1,1,1).
64 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models
Figure 5.3: ACF and PACF plots for BSE Sensex with d = 2. Model identified as
ARIMA(0,2,2).
5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 65
Figure 5.6: Forecasting performance of ARIMA, FFNN and SVR for MAPE(L)
Figure 5.7: Forecasting performance of ARIMA, FFNN and SVR for DS(L)
5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models 67
Figure 5.8: Forecasted values by ARIMA, FFNN and SVR for BSE Sensex
68 5.4. Comparison Of Forecasting Performances Of ARIMA, FFNN and SVR Models
Chapter 6
A number of forecasting experiments were conducted on BSE Sensex, CNX IT and S&P
CNX Nifty for the performance comparison of statistical models(ARIMA), and AI mod-
els(Feed Forward Neural Networks and Support Vector Regression). Their performance
was compared for forecasting the value of the index using error measures such as Mean
Absolute Percentage Error(MAPE) and Mean Absolute Error(MAE). Also they were com-
pared for forecasting the direction of movement of the index using Directional Symme-
try(DS). Next section concludes upon these experiments and next to next section discusses
the future work.
6.1 Conclusion
From the experiments performed in this study, following conclusions can be drawn:
1. Model identification indicated that ARIMA(1,1,1) is the better fit for these three
indices. This was supported by the forecasting results of different ARIMA models as
the forecasting performance of ARIMA(1,1,1) model was better than other ARIMA
models.
2. MA models(i.e. ARIMA with p = d = 0), and ARIMA models with d = 0 are not
suitable for these series.
3. According to [50, 49], linear transfer function is more suitable in the output layer of
FFNN models. These experiments also showed that linear transfer function in the
output layer gave the best results in case of value forecasting of the index.
4. In case of value forecasting of BSE sensex, three layer FFNN networks performed
69
70 6.2. Future Work
better than two layer FFNNs. For CNX IT, two layer networks showed better
results. Hence it can be asserted that BSE Sensex is most complex to forecast.
5. In case of value forecasting of these indices using SVR models, Polynomial kernels
of degree 2 were found better than RBF kernels by a considerable margin. Per-
formances of linear and sigmoid kernels were very poor. These results match the
results of [33].
6. In case of long term forecasting of the value of the index, FFNN models outperformed
ARIMA and SVR models by a fair margin, while the performance of SVR and
ARIMA was marginally different from each other. In case of FFNN Vs ARIMA,
these results match the results of [32, 31, 20]. In the case of FFNN and SVM, these
results partially conform to the results of [34, 32] where FFNN were found better
performing than SVR except in few cases of their study. These results contradict
the results of [33, 29]. From this study it can be asserted that FFNN models are
better than the other two in the case of one-day-ahead forecasting the index value
for these indices.
of change of profit margin etc as the input. Hence fundamental factors can also be
used to assist the forecasting.
• Combining methods such as TopK, Best, DTopK, AFTER etc. have proven to be
better than individual forecasing models[51, 52]. They can be used to combine the
results of these three forecasting models.
Appendix A
1. If the ACF function does die out rapidly, it indicates that the series is stationary.
Hence there is no need of differencing. Hence d = 0.
2. If the ACF function doesn’t die out rapidly, it indicates that the series is non-
stationary. Hence needs differencing. i.e. d > 0. Hence the series is differenced
73
74
till rapidly dying ACF is not obtained. In this case d is equal to the number of
differences taken. Then p and q values are determined from ACF and PACF of this
differenced series using further rules.
3. If ACF decays exponentially and φkk for k = 0, 1, ...K are non-zero in PACF, then
it suggests p = K and q = 0. Different possible shapes of ACF and PACF for p = 2
are shown in fig A.1.
4. If PACF decays exponentially and ρk for k = 0, 1, ...K are non-zero in ACF, then it
suggests p = 0 and q = K. Different possible shapes of ACF and PACF for q = 2
are shown in fig A.2.
5. If ACF and PACF both are mixtures of exponentials and damped sine wawes, and
the exponential decay starts after K1 th and K2 th lag for ACF and PACF respec-
tively, it suggests p = K1 and q = K2 . Different possible shapes of ACF and PACF
for p = 1 and q = 1 are shown in fig A.3.
Figure A.3: Possible shapes for ACF and PACF when p = 1 and q = 1
78
Bibliography
[6] Indian Express Newspapers (Mumbai) Ltd. Now, nse 2nd fastest growing stock ex-
change. In http://www.expressindia.com/news/fullstory.php?newsid=91524. [Online;
accessed 20-June-2008].
[7] George Edward Pelham Box and Gwilym M. Jenkins. Time Series Analysis: Fore-
casting and Control. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994.
79
80 Bibliography
[13] Vladimir N. Vapnik. The nature of statistical learning theory. Springer-Verlag New
York, Inc., New York, NY, USA, 1995.
[14] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning,
20(3):273–297, 1995.
[15] Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik. A training algorithm for
optimal margin classifiers. In Computational Learing Theory, pages 144–152, 1992.
[16] Harris Drucker, Chris J. C. Burges, Linda Kaufman, Alex Smola, and Vladimir
Vapnik. Support vector regression machines. In Michael C. Mozer, Michael I. Jordan,
and Thomas Petsche, editors, Advances in Neural Information Processing Systems,
volume 9, page 155. The MIT Press, 1997.
[17] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Ma-
chines and Other Kernel-based Learning Methods. Cambridge University Press,
March 2000.
[18] Alex J. Smola and Bernhard Schölkopf. A tutorial on support vector regression.
Statistics and Computing, 14(3):199–222, 2004.
[19] Lisa Bianchi, Jeffrey Jarrett, and R. Choudary Hanumara. Improving forecasting for
telemarketing centers by arima modeling with intervention. International Journal of
Forecasting, 14(4):497–504, Dec 1998.
[20] Nowrouz Kohzadi, Milton S. Boyd, Bahman Kermanshahi, and Iebeling Kaastra. A
comparison of artificial neural network and time series models for forecasting com-
modity prices. Neurocomputing, 10(2):169–181, 1996.
Bibliography 81
[22] B. Chen, M. Chang, and C. Lin. Load forecasting using support vector machines: A
study on eunite competition, 2001.
[23] Wei-Chiang Hong, Ping-Feng Pai, Shun-Lin Yang, and R. Theng. Highway traffic
forecasting by support vector regression model with tabu search algorithms. Neural
Networks, 2006. IJCNN ’06. International Joint Conference on, pages 1617–1621,
0-0 2006.
[25] Mohan Neeraj, Jha Pankaj, Laha Arnab Kumar, and Dutta Goutam. Artificial neural
network models for forecasting stock price index in bombay stock exchange. IIMA
Working Papers 2005-10-01, Indian Institute of Management Ahmedabad, Research
and Publication Department, October 2005.
[26] Wei Cheng, Lorry Wagner, and Chien-Hua Lin. Forecasting the 30-year u.s. treasury
bond with a system of neural networks.
[27] Chi-Cheong Chris Wong, Man-Chung Chan, and Chi-Chung Lam. Financial time
series forecasting by neural network using conjugate gradient learning algorithm and
multiple linear regression weight initialization. Computing in Economics and Finance
2000 61, Society for Computational Economics, July 2000.
[28] A. Atiya, N. Talaat, and S. Shaheen. An efficient stock market forecasting model
using neural networks. Neural Networks,1997., International Conference on, 4:2112–
2115 vol.4, Jun 1997.
[29] Lijuan Cao and Francis E.H Tay. Financial forecasting using support vector machines.
Neural Computing & Applications, 10(2):184–192, May 2001.
82 Bibliography
[30] J. Corchado, C. Fyfe, and B. Lees. Unsupervised learning for financial forecasting.
Computational Intelligence for Financial Engineering (CIFEr), 1998. Proceedings of
the IEEE/IAFE/INFORMS 1998 Conference on, pages 259–263, Mar 1998.
[31] M. Kumar and M. Thenmozhi. Forecasting nifty index futures returns using neural
network and arima models. Financial Engineering and Applications, 437, 2004.
[32] Wun-Hua Chen, Jen-Ying Shih, and Soushan Wu. Comparison of support-vector
machines and back propagation neural networks in forecasting the six major asian
stock markets. International Journal of Electronic Finance, 1(1):49–67, January
2006.
[33] Chon Lung Chai. Finding kernel function for stock market prediction with support
vector regression. Technical report, Universiti Teknologi Malaysia, 2006.
[34] Theodore B. Trafalis and Huseyin Ince. Support vector machine for regression and
applications to financial forecasting. ijcnn, 06:6348, 2000.
[36] National Stock Exchange Limited. Nse - about us - facts & figures. In http://www.nse-
india.com/content/us/us%5ffactsfigures.htm. [Online; accessed 20-June-2008].
[44] M.B. Hagan, M.T.; Menhaj. Training feedforward networks with the marquardt
algorithm. Neural Networks, IEEE Transactions on, 5(6):989–993, Nov 1994.
[47] T. Joachims. Making large-scale support vector machine learning practical. In Ad-
vances in Kernel Methods: Support Vector Machines.
[48] S. Ali and K.A. Smith. Automatic parameter selection for polynomial kernel. Infor-
mation Reuse and Integration, 2003. IRI 2003. IEEE International Conference on,
pages 243–249, Oct. 2003.
[49] Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University
Press, November 1995.
[50] Mohan Neeraj, Jha Pankaj, Laha Arnab Kumar, and Dutta Goutam. Artificial neural
network models for forecasting stock price index in bombay stock exchange. IIMA
Working Papers 2005-10-01, Indian Institute of Management Ahmedabad, Research
and Publication Department, October 2005.
[51] Y. Yang and H. Zou. Combining time series models for forecasting, 2002.
[52] Abhishek Seth. On using a multitude of time series forecasting models. Mtech thesis,
Kanwal Rekhi School of Information Technology, IIT Bombay, 2006.
84 Bibliography
Acknowledgements
First and foremost I would like to express my gratitude and appreciation to my advisor
Prof. Bernard L. Menezes. I am indebted to him for his guidance, encouragement,
support and trust in me. I strongly feel that I was privileged to interact with one of
the best advisors in the institute. From him, I learnt that no challenge is so big that it
couldn’t me met with dedicated hard work.
I would like to thank Ambikeshwar P. Singh and Girish Joshi for being great partners
in sharing knowledge during this project.
Last, but certainly not the least, I would like to thank my family. Without their
support and encouragement, this project wouldn’t have been possible.
I would also like to thank KReSIT and IIT Bombay for giving me the best moments
of my life so far.
85