Académique Documents
Professionnel Documents
Culture Documents
1.INTRODUCTION
1.1Objective
The basic objective of the study is to scrutinize the theoretical concepts of “multiple
regression analysis” on the basis of real life experience. Research is not only for our
course requirement but also as a requirement of the mathematical knowledge of APLIED
STATISTICS.
1.2Limitations
6
1.3Methodology:
There are two types of methods of collecting data -Primary and Secondary. From the
secondary sources we have taken the data to prepare the research paper. Our course
teacher gives this data. From secondary data we have taken “Permanent sampling”.
7
Chapter 2
2.Literature Review
2.1 Regression
Regression analysis is a collective name for techniques for the modeling and analysis of
numerical data consisting of values of a dependent variable also called Response variable
or measurement and of one or more independent variables also known as explanatory
variables or predictors. The dependent variable in the regression equation is modeled as
a function of the independent variables, corresponding parameters ("constants"), and an
error term. The error term is treated as a random variable. It represents unexplained
variation in the dependent variable. The parameters are estimated so as to give a "best fit"
of the data. Most commonly the best fit is evaluated by using the least squares method,
but other criteria have also been used.
8
determinations, . In multiple regression analysis, the set of predictor variables
is used to explain variability of the criterion variable .
2.3 R-Square
Note that R-square is the square of Multiple R which is the coefficient of correlation. R-
Square = 0.947 means that 94.68% of the variations in the quantities sold are explained
by the independent variables, i.e. the population growth, the price point and the new
roads.
The adjusted R-Square takes into account the factors that can contribute to inflating the
results; it is R-Square minus the inflation factors.
If the regression model were perfect, all the residuals would have been equal to zero.
Since they are not all equal to zero and we allowed ourselves a confidence level of 95%,
how do we interpret all the numbers in the residual column?
It would be a lot easier if we could have a single number that gives an account of all the
residuals. That number is called the Standard error. The standard error is the standard
deviation for the residuals.
It is the square root of the Sum of square of Error (SSE) divided by the degree of freedom
9
2.6 F – Test:
The P – Values in the following SigmaXL output show the significant factors in the
equation. The significant factors are those that have an effect on the response factor, i.e.
the quantity sold.
Two factors have P – Values lower than 0.05 (“Price per gallon” and “Population
growth”), therefore those two are significant. The other variable “New roads” is not
significant.
10
Chapter 3
3. Statistical Analysis
Our analytical data is given by our course Teacher. We have selected two districts
(Munshigonge and Sherpur).From this, we have to do” Multiple Regression
Analysis” .We have to analysis how dependent variable is related to two or more
independent variable. From our 80 data,
Let,
Model Summary
11
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.793048598
R Square 0.628926078
Adjusted R
Square 0.62416872
Standard
Error 2184.797546
Observations 80
ANOVA
Significanc
df SS MS F eF
1.82394E-
Regression 1 631038955.3 631038955.3 132.2007051 18
Residual 78 372320544.7 4773340.316
Total 79 1003359500
scatter Grafh
25000
20000
15000
scatter Grafh
10000
5000
0
0 10000 20000 30000 40000
Putting Dependent arable and independent variable in X-axis and Y-axis, we get the
graph.
From this graph, we can understand that this is not heteroscedasticity.
1. Park Test
2. Glejer Test
3. spearman’s rank correlation teat
4. Goldfeld-quandt tet
5. Breusch-Pagan-Godfrey Test
6. white’s general Heteroscedasticity Test
12
But we will do only three methods
3.1 (1)Park Test: Park formalizes the graphical method by suggesting that sigma
square is some function of the explanatory variable Xi.
Yi = β1 + β2Xi +Ui
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.479022947
R Square 0.229462983
Adjusted R
Square 0.219584304
Standard Error 2.116207609
Observations 80
ANOVA
Significance
df SS MS F F
Regression 1 104.0232155 104.0232155 23.22810237 6.95713E-06
Residual 78 349.3101021 4.478334643
Total 79 453.3333177
Here, p value=6.95713E-06
Standard Error=3.72623E-05
From the park test one may conclude that there is no heteroscedasticity in the error
variance.
13
3.2 (2) Glejer Test: The Glejer test is similar in spirit to the park test. After obtaining
the residuals from µ from the OLS regression, Glejer suggests regressing the absolute
values of µ on the X variable that is thought to be closely associated with sigma square.
In these experiments, glejer used the following functional forms:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.562994797
R Square 0.316963141
Adjusted R
Square 0.308206259
Standard
Error 1335.995065
Observations 80
ANOVA
Significanc
e
df SS MS F F
5.44881E-
Regression 1 64605416.88 64605416.88 36.1958 08
Residual 78 139220859.5 1784882.815
Total 79 203826276.4
14
3.3(3).Breusch-Pagan-Godfrey Test: The success of the goldfield-Quandt
test depends not only on the value of (the number of central observations to be
omitted) but also on identifying the correct X variable with which to order the
observation. This limitation of this test can be avoided if we consider the Breusch-
Pagan-Godfrey Test.
15
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.450953
R Square 0.203358
Adjusted R
Square 0.193145
Standard Error 1.962505
Observations 80
ANOVA
Significanc
df SS MS F eF
Regression 1 76.68587 76.68587 19.91104 2.69E-05
Residual 78 300.4112 3.851425
Total 79 377.097
Under the assumption of the BPG test Θ in asymptotically flows the chi-square
contribution with 1 DF. Now from the chi square table we find that for one df the 5
percent critical chi square value is
Here we don’t need to analyze this test because we need only 1 variable for
(Munshigonge and sherpur.)
16
Chapter 4
4. Remove Error:
4.1 Assumption 1
y β0 β1x1
1. = +
x x x
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.406519152
R Square 0.165257821
Adjusted R
Square 0.141735485
Standard
Error 0.266633738
Observations 80
ANOVA
Significanc
df SS MS F eF
Regression 2 1.097828418 0.548914209 7.721012753 0.00087886
Residual 78 5.545296927 0.07109355
Total 80 6.643125345
17
4.2 Assumption 2
4 β β1 x1
= +0
x x x
SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.36347866
R Square 0.13211674
Adjusted R
Square 0.10816952
Standard
Error 21.9040752
Observations 80
ANOVA
Significance
df SS MS F F
Regression 2 5696.931 2848.465639 5.936919254 0.004000334
Residual 78 37423.50 479.7885094
Total 80 43120.43
18
4.3 Conclusion
19