Multiple Regression A Data Mining Approa PDF

International Journal of Computer Science Engineering
and Information Technology Research (IJCSEITR)

ISSN 2249-6831
Vol. 3, Issue 4, Oct 2013, 173-180
© TJPRC Pvt. Ltd.
MULTIPLE REGRESSION: A DATA MINING APPROACH FOR PREDICTING THE

STOCK MARKET TRENDS BASED ON OPEN, CLOSE AND HIGH PRICE OF THE
MONTH
SACHIN KAMLEY1, SHAILESH JALOREE2 & R. S. THAKUR3

1,2
Samart Ashok Technological Institute, Vidisha, Madhya Pradesh, India
3
Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
ABSTRACT
Stock market is basically non linear in nature and the research on stock market is one of the most important issue
in recent years. People invest in stock market based on some prediction. For predict, the stock market prices people search
such methods and tools which will increase their profits, while minimize their risks. Prediction plays a very important role
in stock market business which is very complicated and challenging process. Employing traditional methods like
fundamental and technical analysis may not ensure the reliability of the prediction. To make predictions using two
variables the linear regression analysis is used mostly. The drawback of this approach is that it works only for two
variables. In this paper we applied well known efficient multiple regression approach to predict the stock market price from
stock market data based on three variables. In future the results of multiple regression approach could be improved using
more number of variables.
KEYWORDS: Stock Market, Prediction, Data Mining, Multiple Regression, Standard Error of the Estimate
INTRODUCTION
Stock market plays a very important role in fast economic growth of the developing country like India. So our
country and other developing nation’s growth may depend on performance of stock market. If stock market rises then
countries economic growth would be high. If stock market falls then countries economic growth would be down [1][2]. In
other words we can say that stock market and country growth is tightly bounded with the performance of stock market. In
any country, only 10% of the people engaging themselves with the stock market investment because of the dynamic nature
of the stock market [2]. There is a misconception about the stock market i.e. buying and selling of shares is an act of
gambling. So this misconception can be changed and bringing the awareness across the people for this. The prediction
techniques in stock market can plays a crucial role in bringing more people and existing investors at one place. The more
promising results of the prediction methods can be change the mindset of the people. Data mining tools also helps to
predict future trends and behaviors; helping organizations in active business solutions to knowledge driven decisions
[3][4]. Intelligent data analysis tools produce a data base to search for hidden information that may be missed due to
beyond expert’s predictions. Extraction which was previously unknown, implicit and potentially useful information from
data in databases, is an effective way of data mining. It is commonly known as knowledge discovery in databases (KDD)
[5]. Although data mining and knowledge discovery in databases (or KDD) both are used as similar often, Data mining is
actually part of knowledge discovery [3][4][5]. Data mining techniques play important role in stock market which can
search uncover and hidden patterns and increasing the certain level of accuracy, where traditional and statistical methods
are lacking. There are huge amount of data are generated by stock markets forced the researchers to apply data mining to
174 Sachin Kamley, Shailesh Jaloree, R. S. Thakur
make investment decisions. The following challenges are addressed by data mining techniques in stock market analysis
[2][6].
 Future stock price prediction
 To develop feasible and efficient methods for finding the useful patterns and future trends.
 To optimally utilize the capital resources of investors.
 To stabilize the market.
 To create the interests in the favor of the stock market.
 To prevent misconception about the market and increase the transparency in the market.
 To check corruptive practices and illegal trade practices.
 To enhance the knowledge of the people for bringing tech savvy investors into market.
RELATED WORKS
Prediction of stock prices is very challenging and complicated process because price movement just behaves like
a random walk and time varying. In recent years various researchers have used intelligent methods and techniques in stock
market for trading decisions. Here, we present a brief review of some of the significant researchers. A Sheta [7] has used
Takagi-Sugeno (TS) technique to develop fuzzy models for two nonlinear processes. They were estimated software effort
for a NASA software projects and the prediction of the next week S&P 500 for stock market. The development process of
the TS fuzzy model can be achieved in two steps 1) the determination of the membership functions in the rule antecedents
using the model input data; 2) the estimation of the consequence parameters. They used least-square estimation to estimate
these parameters. The results were promising. M.H. FazelZarandiet al. [8] have developed a type-2 fuzzy rule based expert
system for stock price analysis. Interval type-2 fuzzy logic system permitted to model rule uncertainties and every
membership value of an element was interval itself. The proposed type-2 fuzzy model applied the technical and
fundamental indexes as the input variables. The model can be tested on stock price prediction of an automotive
manufactory in Asia. Robert K. Lai et al. [9] have established a financial time series-forecasting model by evolving and
clustering fuzzy decision tree for stocks in Taiwan Stock Exchange Corporation (TSEC). The forecasting model integrated
a data clustering technique, a fuzzy decision tree (FDT), and genetic algorithms (GA) to construct a decision-making
system based on historical data and technical indexes. The set of historical data can be divided into k sub-clusters by
adopting K-means algorithm. GA was then applied to evolve the number of fuzzy terms for each input index in FDT so the
forecasting accuracy of the model can be further improved. S Abdulsalam Sulaiman Olaniyi et al [11] have proposed a
linear regression method of analyzing coupled behavior of stocks in the market. The method successfully predicts stock
prices based on two variables.
DATA AND PREPROCESSING STEP
In proposed trend prediction system, firstly we prepare dataset for the further study and analysis purposes. Here
41/2 years, monthly data from yahoo finance for Infosys Company is obtained for making the prediction model and same
11/2 years data is used to check the accuracy of the model [16]. Today’s real world databases are highly susceptible to
noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple heterogeneous
sources. Low quality data will lead to low quality mining results [3][5]. Quality results must be made on quality data.
Multiple Regression: A Data Mining Approach for Predicting the Stock 175
Market Trends Based on Open, Close and High Price of the Month
Preprocessing step in order to help improve the quality of data and consequently, of mining results. There are various steps
of pre- preprocessing which is describing below [12].
 Data Cleaning: Real world data tend to be incomplete, noisy, and inconsistent. Data cleaning routines attempt to
fill in missing values, smooth out noise while identifying outliers and correct inconsistencies in the data.
 Data Integration: Data integration involves combining data residing in different sources and providing users
with a unified view of these data.
 Data Transformation: in data transformation, the data are transformed or consolidated into forms appropriate for
mining. Data transformations can involve smoothing (remove noise from data), aggregation (summary or
aggregation operations are applied to the data), normalization (attribute data are scaled into small specified range
such as -1.0 to 1.0).
 Data Reduction: Data reduction techniques can be applied to obtain a reduced representation of the data set that
is much smaller in volume, yet closely maintains the integrity of the original data
 Data Discretization: Data discretization techniques can be used to reduce the number of values for a given
continuous attribute by dividing the range of the attribute into intervals. Interval labels can then be used to replace
actual data values.
The stock prices data are numerically large and not feasible for computation. We applied data mining
preprocessing steps on stock data for clean and normalizable. As the sample of stock market data is shown in table 1. We
have used following formula (1) to scale and normalize the price values within the range of [0, 1].
METHOD USED: MULTIPLE REGRESSION (R) AND CORRELATION COEFFICIENT

Multiple Regression
Multiple regression is the most common data mining technique for predicting the future value of variable based on
with other variables[13][15]. The most basic use of the regression equation is to make predictions. We can predict the
value of Y for any particular combination of values of into the regression equation and seeing what we
get for y’……..? Of course we would not think that the actual Y value would be exactly equal to our prediction i.e. there
some amount of variability. In other words we can say If the number of independent variables in a regression model is
more than one then the model is called a multiple linear regression model[13][14][15]. The independent variables are
referred to as predictor variable or regressors. The term linear is used because the model
is a linear function of the unknown parameters

where the response variable is related to n regressor variables. The model describes a hyper plan in the k- dimensional
space we sometimes call etc. partial regression coefficients because measures the expected
change in per unit change in when all other s are held constant. The equation (6)(7) are estimated by the following
equation:
Here equation (2) is basic multiple regression equation used to obtain the value of (dependent variable)
corresponding to independent variable in this study denotes the close price and denotes the
open, high price. Now equation (2) is described below:
= Estimated value corresponding to the dependent variable.
= Intercept
= Slope of the “estimated” regression equation associated with Xi. The least-squares criterion estimates the
parameters in such a way as to minimize the total error. This process also maximizes the correlation between the actual
value of and the predicted value . We define some associated statistics below and then describe the procedure for
multiple regression analysis.
(3)
(4)
Here equation (3),(4),(5) is used to calculate the coefficient of multiple regression equation. Using these equations
we calculate the value of these variables which is given below:
So finally our multiple regression equation is
We prepared the equation for 4 variables three independent variable i.e. open, high, low and one dependent
variable i.e. close price). The equation is made for using these variables is given below:
Correlation Coefficient
Correlation coefficient is used to determine whether the relationship exists between 2 variables or not, how strong
that relationship is and whether the relationship is positive or negative [14]. All correlations value must be between +1 or -
1, +1 shows the whether the 2 variables are taken together have a perfect relationship with each other. The 0 value
indicates that there is no relationship exists between variables. Here we correlated the open price, high price and close price
with each other. So we have calculated the value of R using the formula (8).
(8)
Here equation (8) is used to calculate the combined correlation between(X1, X2, Y), where denotes t open
price, denotes high price and denotes close price. Now the value of R is found 0.98 which shows the very strong
relationship between these variables. In other words we can say variables open and high are closely related with close
price.
Figure 1: Shows Strong Correlation between and
ANALYSIS OF RESULTS
In this paper we built the database using the information obtained from the monthly activity summary (equities)
published by Yahoo Finance for Infosys company spanning through 41/2 years [16]. Data obtained firstly were analyzed and
summarized as shown in Table 1 [16]. The discovered data are used to generate new knowledge about the data in the
database and identified patterns are depicted in the figure 2[16] which shows the market trends of predicted stock prices.
Furthermore, we extracted values of variables from the discovered data to predict 1 1/2 year future values for the year (July
2011 to December 2012) [16].Moreover, our system predicts monthly movement of stock prices using regression analysis.
Here method of least square is used, where value of dependent variable is calculated putting the values of in equation
(6).
Table 1: Sample of Stock Data Set

Normalized Normalized Normalized
Month Open High Close
Open High Close
Jan-07 2240.5 2324.95 2244.45 0.48 0.46 0.48
Feb-07 2244.45 2439 2078.35 0.48 0.51 0.41
Jan-09 1125 1318.7 1305.5 0 0.008 0.08
Feb-09 1293 1324.95 1231.3 0.072 0.011 0.048
Mar-09 1211.3 1398 1324.1 0.037 0.044 0.088
Apr-11 3225 3316.85 2905.95 0.9 0.91 0.76
May-11 2924.7 2952.95 2791.85 0.77 0.75 0.71
Jun-11 2794 2915.95 2907.4 0.71 0.73 0.76
The Table 1[16] shows the 41/2 years stock data and figure 2[16] shows the stock prices for the against of months
from January 2007 to June 2011 [16].
Figure 2: Opening and Closing Price of the Stock Market Against the Months of Year 2007-2011
Table 2: Sample of Stock Forecasted Price

Month Open(X1) High(X2) Close(Y) Y'
Jul-11 1 1 0.88 0.93
Aug-11 0.78 0.43 0.17 0.13
Sep-11 0.2 0.13 0.59 0.65
Oct-11 0.36 0.61 0.72 0.64
Table 2: Contd.,
Nov-11 0.87 0.5 0.41 0.38
Dec-11 0.62 0.41 0.37 0.34
Jul-12 0.38 0.078 0 0.022
Aug-12 0 0.029 0.067 0.066
Sep-12 0.2 0.23 0.22 0.24
Oct-12 0.45 0.19 0.21 0.13
Nov-12 0.18 0.028 0.014 0.017
Dec-12 0.3 0 0.14 -0.046
The Table 2[16] shows the 11/2 year (July 11-Dec 12) forecasted value which have been drawn from our prediction
system, which predicts the value of dependent variable (close price) based on independent variable 1 and (i.e. open
and high).
Figure 3: Shows a Line Graph of Predicted Patterns of Stock Prices
STANDARD ERROR OF THE ESTIMATE FOR THE MULTIPLE REGRESSION
In simple regression, the estimation becomes more accurate as the degree of dispersion around the regression gets
smaller. The same is true of the sample point around the multiple regression planes. To measure this variation, we shall use
the measure called the standard error of estimate [13][14][15].
Table 3: Standard Error Estimation

Close Predicted Residual Residual
Month
Price(Y) Close Price(Y)' (Y-Y') (Y-Y' )2
Jul-11 0.88 0.93 -0.05 0.0025
Aug-11 0.17 0.13 0.04 0.0016
Sep-11 0.59 0.65 -0.06 0.0036
Oct-11 0.72 0.64 0.08 0.0064
Nov-11 0.41 0.38 0.03 0.0009
Dec-11 0.37 0.34 0.03 0.0009
Jan-12 0.41 0.43 -0.02 0.0004
Feb-12 1 0.58 0.42 0.1764
Mar-12 0.41 0.43 -0.02 0.0004
Apr-12 0.36 0.38 -0.02 0.0004
May-12 0.075 0.096 -0.021 0.000441
Jun-12 0.062 0.069 -0.007 0.000049
Jul-12 0 0.022 -0.022 0.000484
Aug-12 0.067 0.066 0.001 0.000001
Sep-12 0.22 0.24 -0.02 0.0004
Oct-12 0.21 0.13 0.08 0.0064
Nov-12 0.014 0.017 -0.003 0.000009
Dec-12 0.14 -0.046 0.186 0.034596
∑(Y- )2 =
0.23588
The last column of Table 3 [16] shows that the sum of the squared errors of the prediction is 0.23588 Therefore,
the calculation of standard error we have used formula no. (9) Which is given below;
SE= (9)
Where,
= sample values of the dependent variable.
= corresponding estimated value from the regression equation
= number of data points in the sample
So the standard error of estimate is 0.11 and prediction accuracy is 89% that means accuracy of our prediction is
89% which may be acceptable.
Figure 4: A Bar Graph for Showing Trends of Actual and Predicted Price
CONCLUSIONS
The aim of our research study is to help the stock brokers and investors for investing money in the stock market.
The prediction plays a very important role in stock market business which is very complicated and challenging process due
to dynamic nature of the stock market. As per the discussed works above our system, predicts the stock prices based on
multiple regression approach using 3 variables and we found the accuracy of system is 89%, which is more accurate than
previous linear regression approach. Stock data is highly volatile and unpredictable; it needs the intelligence of human for
effective prediction. In future, for more promising results stock market data needs rigorous training for analysis. For this
we would apply neural network technique and improving the results of multiple regression approach.
REFERENCES
1. Eugene F. Fama “The Behavior of Stock Market Prices”, the Journal of Business, Vol 2, No. 2, pp. 7–26, January
1965.
2. Ambika Prasad Das “Security analysis and portfolio Management”, I.K. International Publication, 3rd Edition
2008.
3. Introduction to Data Mining and Knowledge Discovery (1999), Third Edition ISBN: 1-892095-02-5, Two Crows
Corporation, 10500 Falls Road, Potomac, MD 20854 (U.S.A.).
4. Larose, D. T. (2005), “Discovering Knowledge in Data: An Introduction to Data Mining”, ISBN 0-471-66657-2,
John Wiley & Sons, Inc.
5. Dunham, M. H. & Sridhar S. (2006), “Data Mining: Introductory and Advanced Topics”, Pearson Education, New
Delhi, ISBN: 81-7758-785-4, 1st Edition.
6. Stock market challenges from http://www.google.com.
7. A Sheta, “Software Effort Estimation and Stock Market Prediction Using Takagi-Sugeno Fuzzy Models”, In
Proceedings of The IEEE International Conference on Fuzzy Systems, pp.171-178, Vancouver, BC, 2006.
8. M.H. FazelZarandi, B. Rezaee, I.B. Turksen and E. Neshat, “A type-2 fuzzy rule-based expert system model for
stock price analysis”, Expert Systems with Applications, Vol.36, No.1, pp. 139-154, January 2009.
9. Robert K. Lai, Chin-Yuan Fan, Wei-Hsiu Huang and Pei-Chann Chang, “Evolving and clustering fuzzy decision
tree for financial Time series data forecasting”, An International Journal of Expert Systems with Applications,
Vol.36, No.2, pp. 3761-3773, March 2009.
10. Shyi-Ming Chen and Yu-Chuan Chang, “Multi-Variable Fuzzy Forecasting Based On Fuzzy Clustering and
Fuzzy Rule Interpolation Techniques”, Information Sciences, Vol.180, No.24, pp. 4772-4783, 2010.
11. S Abdulsalam Sulaiman Olaniyi, Adewole, Kayode S., Jimoh, R. G,” Stock Trend Prediction Using Regression
Analysis – A Data Mining Approach”, ARPN Journal of Systems and Software Volume 1 No. 4, JULY 2011,
Brisbane, Australia.
12. Jiawan Han, Micheline Kamber “Data Mining Concepts and Techniques” 2nd edition 2004.
13. Multiple Regression analysis from http://www.google.com.
14. M.Ray, Harswarup Sharma “A Reference book on Mathematical Statistics” 4 TH Edition, 2006.
15. Introduction to multiple regression downloaded pdf file from http://google.com.
16. Data sources from http://yahoofinance.com.

Multiple Regression A Data Mining Approa PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Multiple Regression A Data Mining Approa PDF

Transféré par

Droits d'auteur :

Formats disponibles

International Journal of Computer Science Engineering

and Information Technology Research (IJCSEITR)

MULTIPLE REGRESSION: A DATA MINING APPROACH FOR PREDICTING THE

SACHIN KAMLEY1, SHAILESH JALOREE2 & R. S. THAKUR3

 Future stock price prediction

 To optimally utilize the capital resources of investors.

 To stabilize the market.

 To create the interests in the favor of the stock market.

 To check corruptive practices and illegal trade practices.

DATA AND PREPROCESSING STEP

METHOD USED: MULTIPLE REGRESSION (R) AND CORRELATION COEFFICIENT

is a linear function of the unknown parameters

= Estimated value corresponding to the dependent variable.

So finally our multiple regression equation is

Figure 1: Shows Strong Correlation between and

Table 1: Sample of Stock Data Set

Table 2: Sample of Stock Forecasted Price

Figure 3: Shows a Line Graph of Predicted Patterns of Stock Prices

STANDARD ERROR OF THE ESTIMATE FOR THE MULTIPLE REGRESSION

Table 3: Standard Error Estimation

= sample values of the dependent variable.

= corresponding estimated value from the regression equation

= number of data points in the sample

6. Stock market challenges from http://www.google.com.

13. Multiple Regression analysis from http://www.google.com.

15. Introduction to multiple regression downloaded pdf file from http://google.com.

16. Data sources from http://yahoofinance.com.

Vous aimerez peut-être aussi