Chapter 1

Chapter 1
1.INTRODUCTION
A common goal for a statistical research project is to investigate causality, and in

particular to draw a conclusion on the effect of changes in the values of predictors or
independent variables or dependent variables on response. Multiple regression analysis is
the study of how a dependent variable y is related to two or more independent variables.
This research paper is based on the elements of multiple regression analysis. Here we will
find out the regression Analysis of a particular topic of Practical life example. This kind
of research is the first in ASA University Bangladesh.
1.1Objective
The basic objective of the study is to scrutinize the theoretical concepts of “multiple
regression analysis” on the basis of real life experience. Research is not only for our
course requirement but also as a requirement of the mathematical knowledge of APLIED
STATISTICS.
1.2Limitations
• Sufficient time to prepare such a research report.

• Limitations of Primary information
• Lack of sufficient Technological availability (Software)
• Result of the analysis may not support always.
• Ways of analyzing the report (Testing)
• By choosing (or rejecting, or modifying) a certain sample, results can be
manipulated.
• Interpretation of same data in different ways which may arise contradiction
• Timely analysis of the data
• Data may not be effective in case of variation of current time
• Sufficient knowledge regarding the Hypothesis testing
6
1.3Methodology:
There are two types of methods of collecting data -Primary and Secondary. From the
secondary sources we have taken the data to prepare the research paper. Our course
teacher gives this data. From secondary data we have taken “Permanent sampling”.
7
Chapter 2
2.Literature Review
2.1 Regression
Regression analysis is a collective name for techniques for the modeling and analysis of
numerical data consisting of values of a dependent variable also called Response variable
or measurement and of one or more independent variables also known as explanatory
variables or predictors. The dependent variable in the regression equation is modeled as
a function of the independent variables, corresponding parameters ("constants"), and an
error term. The error term is treated as a random variable. It represents unexplained
variation in the dependent variable. The parameters are estimated so as to give a "best fit"
of the data. Most commonly the best fit is evaluated by using the least squares method,
but other criteria have also been used.
The regression equation deals with the following variables:
• The unknown parameters denoted as β. This may be a scalar or a vector of

length k.
• The Independent variables X.
• The Dependent variable, Y.
2.2 Multiple Regression Analysis:
Multiple regression analysis is a method for explanation of phenomena and prediction of

future events. A coefficient of correlation between variables X and Y is a quantitative
index of association between these two variables. In its squared form, as a coefficient of
determination, indicates the amount of variance (information) in the criterion variable Y
that is accounted for by the variation in the predictor variable X. A multivariate
counterpart of the coefficient of determination is the coefficient of multiple
8
determinations, . In multiple regression analysis, the set of predictor variables
is used to explain variability of the criterion variable .
2.3 R-Square
R-square is the coefficient of determination. It measures the proportion in the variation in

the Y variable (in the case, the quantity sold) that is explained by variations in the
independent factors.
Note that R-square is the square of Multiple R which is the coefficient of correlation. R-
Square = 0.947 means that 94.68% of the variations in the quantities sold are explained
by the independent variables, i.e. the population growth, the price point and the new
roads.
2.4 Adjusted R-Square
The adjusted R-Square takes into account the factors that can contribute to inflating the
results; it is R-Square minus the inflation factors.
2.5 The Standard Error
If the regression model were perfect, all the residuals would have been equal to zero.
Since they are not all equal to zero and we allowed ourselves a confidence level of 95%,
how do we interpret all the numbers in the residual column?
It would be a lot easier if we could have a single number that gives an account of all the
residuals. That number is called the Standard error. The standard error is the standard
deviation for the residuals.
It is the square root of the Sum of square of Error (SSE) divided by the degree of freedom
9
2.6 F – Test:
Interpreting the P – values
The P – Values in the following SigmaXL output show the significant factors in the
equation. The significant factors are those that have an effect on the response factor, i.e.
the quantity sold.
Two factors have P – Values lower than 0.05 (“Price per gallon” and “Population
growth”), therefore those two are significant. The other variable “New roads” is not
significant.
10
Chapter 3
3. Statistical Analysis
Our analytical data is given by our course Teacher. We have selected two districts
(Munshigonge and Sherpur).From this, we have to do” Multiple Regression
Analysis” .We have to analysis how dependent variable is related to two or more
independent variable. From our 80 data,
Let,
Total household Expenditure= Dependent variable(Y)

Monthly Income Total Household= Independent variable(X)
.At first, we have to do model selection.
Model Summary
Adjusted R Std. Error of the

Model R R Square Square Estimate
1 .793(a) .629 .624 2184.798
2 .817(b) .667 .658 2082.952
3 .833(c) .693 .681 2012.093
4 .844(d) .712 .696 1964.158
a Predictors: (Constant), monthly income total households

b Predictors: (Constant), monthly income total households, number of school going
children of higher secondary
c Predictors: (Constant), monthly income total households, number of school going
children of higher secondary, number of school going children of secondary
d Predictors: (Constant), monthly income total households, number of school going
children of higher secondary, number of school going children of secondary, amount of
spend on smoking
11
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.793048598
R Square 0.628926078
Adjusted R
Square 0.62416872
Standard
Error 2184.797546
Observations 80
ANOVA
Significanc
df SS MS F eF
1.82394E-
Regression 1 631038955.3 631038955.3 132.2007051 18
Residual 78 372320544.7 4773340.316
Total 79 1003359500
Standard Lower Upper Lower Upper

Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept 1503.980676 401.8292809 3.74283495 0.000346155 703.999 2303.961 7able 1
0.442323355 0.038470071 11.49785654 1.82394E-18 0.3657 0.518 0.36573 0.5189114
We can use two method to analyze this data
# Informal Method:
scatter Grafh
25000
20000
15000
scatter Grafh
10000
5000
0
0 10000 20000 30000 40000
Putting Dependent arable and independent variable in X-axis and Y-axis, we get the
graph.
From this graph, we can understand that this is not heteroscedasticity.
# Formal Method: In this method, we have 6 method. These are
1. Park Test
2. Glejer Test
3. spearman’s rank correlation teat
4. Goldfeld-quandt tet
5. Breusch-Pagan-Godfrey Test
6. white’s general Heteroscedasticity Test
12
But we will do only three methods
3.1 (1)Park Test: Park formalizes the graphical method by suggesting that sigma
square is some function of the explanatory variable Xi.
Yi = β1 + β2Xi +Ui
Where Y = Total House Expenditure

X1 = Monthly Income Total Households
X2 = Types of Households
And I = ith experiment size of the establishment. The result of the regression as
follows:
SUMMARY OUTPUT
R Square 0.229462983
Adjusted R
Square 0.219584304
Standard Error 2.116207609
Observations 80
ANOVA
Significance
df SS MS F F
Regression 1 104.0232155 104.0232155 23.22810237 6.95713E-06
Residual 78 349.3101021 4.478334643
Total 79 453.3333177
Standard Lower Upper Lower

Coefficients Error t Stat P-value 95% 95% 95.0%
Intercept 11.8396645 0.389214178 30.41940701 5.09126E-45 11.06479842 12.6145 11.0647
X Variable 1 0.000179588 3.72623E-05 4.819554167 6.95713E-06 0.000105404 0.00025 0.00010
Here, p value=6.95713E-06
Standard Error=3.72623E-05
From the park test one may conclude that there is no heteroscedasticity in the error
variance.
13
3.2 (2) Glejer Test: The Glejer test is similar in spirit to the park test. After obtaining
the residuals from µ from the OLS regression, Glejer suggests regressing the absolute
values of µ on the X variable that is thought to be closely associated with sigma square.
In these experiments, glejer used the following functional forms:
SUMMARY OUTPUT
R Square 0.316963141
Adjusted R
Square 0.308206259
Standard
Error 1335.995065
Observations 80
ANOVA
Significanc
e
df SS MS F F
5.44881E-
Regression 1 64605416.88 64605416.88 36.1958 08
Residual 78 139220859.5 1784882.815
Total 79 203826276.4
Standard Upper Lower Upper

Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
0.2622814
Intercept 277.4594986 245.7170173 1.129183081 4 -211.7256 766.644 -211.7256 766.6446
X Variable 5.44881E-
1 0.141529265 0.023524296 6.016301781 08 0.0946 0.1883 0.0946 0.1883
Here, P value= 5.44881E-08

Standard Error=0.023524296
14
3.3(3).Breusch-Pagan-Godfrey Test: The success of the goldfield-Quandt
test depends not only on the value of (the number of central observations to be
omitted) but also on identifying the correct X variable with which to order the
observation. This limitation of this test can be avoided if we consider the Breusch-
Pagan-Godfrey Test.
To illustrate this test , consider the m-variable liner regression model.
15
SUMMARY OUTPUT
Multiple R 0.450953
R Square 0.203358
Adjusted R
Square 0.193145
Standard Error 1.962505
Observations 80
ANOVA
Significanc
df SS MS F eF
Regression 1 76.68587 76.68587 19.91104 2.69E-05
Residual 78 300.4112 3.851425
Total 79 377.097
Coefficient Standard Upper Lower Upper

s Error t Stat P-value Lower 95% 95% 95.0% 95.0%
Intercept -0.27885 0.360945 -0.77256 0.442117 -0.99744 0.439734 -0.99744 0.439734
X Variable 1 0.000154 3.46E-05 4.462178 2.69E-05 8.54E-05 0.000223 8.54E-05 0.000223
Here, P value=2.69E-05
Standard Error=3.46E-05
Theata =38.34293
Chi with bsk =3.841455
Under the assumption of the BPG test Θ in asymptotically flows the chi-square
contribution with 1 DF. Now from the chi square table we find that for one df the 5
percent critical chi square value is
Here we don’t need to analyze this test because we need only 1 variable for
(Munshigonge and sherpur.)
16
Chapter 4
4. Remove Error:
4.1 Assumption 1
y β0 β1x1
1. = +
x x x
SUMMARY OUTPUT
R Square 0.165257821
Adjusted R
Square 0.141735485
Standard
Error 0.266633738
Observations 80
ANOVA
Significanc
df SS MS F eF
Regression 2 1.097828418 0.548914209 7.721012753 0.00087886
Residual 78 5.545296927 0.07109355
Total 80 6.643125345
Standard Lower Upper Lower Upper

Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept 0 #N/A #N/A #N/A #N/A #N/A #N/A #N/A
X Variable 1 920.0505 234.13135 3.9296342 0.0001827 453.9307 1386.1704 453.9307681 1386.17
X Variable 2 0.539003 0.0551488 9.7736059 3.43974E-15 0.42921 0.648796 0.429210352 0.648796
17
4.2 Assumption 2
4 β β1 x1
= +0
x x x
SUMMARY
OUTPUT
R Square 0.13211674
Adjusted R
Square 0.10816952
Standard
Error 21.9040752
Observations 80
ANOVA
Significance
df SS MS F F
Regression 2 5696.931 2848.465639 5.936919254 0.004000334
Residual 78 37423.50 479.7885094
Total 80 43120.43
Coefficient Standard Lower Upper Lower Upper

s Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept 0 #N/A #N/A #N/A #N/A #N/A #N/A #N/A
X Variable 1 1258.65 278.0099 4.527356393 2.11293E-05 705.1746362 1812.12537 705.1746 1812.125
X Variable 2 0.47190354 0.0429736 10.98122704 1.702E-17 0.386349533 0.55745755 0.386349 0.55745
18
4.3 Conclusion
An experimental study involves taking measurements of the system under study,

manipulating the system, and then taking additional measurements using the same
procedure to determine if the manipulation has modified the values of the measurements.
In contrast, an observational study does not involve experimental manipulation. Instead,
data are gathered and correlations between predictors and response are investigated
19

Chapter 1

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Chapter 1

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 1

A common goal for a statistical research project is to investigate causality, and in

• Sufficient time to prepare such a research report.

The regression equation deals with the following variables:

• The unknown parameters denoted as β. This may be a scalar or a vector of

2.2 Multiple Regression Analysis:

Multiple regression analysis is a method for explanation of phenomena and prediction of

R-square is the coefficient of determination. It measures the proportion in the variation in

2.4 Adjusted R-Square

2.5 The Standard Error

Interpreting the P – values

Total household Expenditure= Dependent variable(Y)

Adjusted R Std. Error of the

a Predictors: (Constant), monthly income total households

Standard Lower Upper Lower Upper

# Formal Method: In this method, we have 6 method. These are

Where Y = Total House Expenditure

Standard Lower Upper Lower

Standard Upper Lower Upper

Here, P value= 5.44881E-08

To illustrate this test , consider the m-variable liner regression model.

Coefficient Standard Upper Lower Upper

Standard Lower Upper Lower Upper

Coefficient Standard Lower Upper Lower Upper

An experimental study involves taking measurements of the system under study,

Vous aimerez peut-être aussi